Open Access

ARTICLE

Automatic Diagnosis of COVID-19 Patients from Unstructured Data Based on a Novel Weighting Scheme

Amir Yasseen Mahdi1,2,*, Siti Sophiayati Yuhaniz1
1 Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur, 54100, Malaysia
2 Computer Sciences and Mathematics College, University of Thi_Qar, Thi_Qar, 64000, Iraq
* Corresponding Authors: Amir Yasseen Mahdi. Email: ,

Computers, Materials & Continua 2023, 74(1), 1375-1392. https://doi.org/10.32604/cmc.2023.032671

Received 25 May 2022; Accepted 27 June 2022; Issue published 22 September 2022

Abstract

The extraction of features from unstructured clinical data of Covid-19 patients is critical for guiding clinical decision-making and diagnosing this viral disease. Furthermore, an early and accurate diagnosis of COVID-19 can reduce the burden on healthcare systems. In this paper, an improved Term Weighting technique combined with Parts-Of-Speech (POS) Tagging is proposed to reduce dimensions for automatic and effective classification of clinical text related to Covid-19 disease. Term Frequency-Inverse Document Frequency (TF-IDF) is the most often used term weighting scheme (TWS). However, TF-IDF has several developments to improve its drawbacks, in particular, it is not efficient enough to classify text by assigning effective weights to the terms in unstructured data. In this research, we proposed a modification term weighting scheme: RTF-C-IEF and compare the proposed model with four extraction methods: TF, TF-IDF, TF-IHF, and TF-IEF. The experiment was conducted on two new datasets for COVID-19 patients. The first dataset was collected from government hospitals in Iraq with 3053 clinical records, and the second dataset with 1446 clinical reports, was collected from several different websites. Based on the experimental results using several popular classifiers applied to the datasets of Covid-19, we observe that the proposed scheme RTF-C-IEF achieves is a consistent performer with the best scores in most of the experiments. Further, the modified RTF-C-IEF proposed in the study outperformed the original scheme and other employed term weighting methods in most experiments. Thus, the proper selection of term weighting scheme among the different methods improves the performance of the classifier and helps to find the informative term.

Keywords

Covid-19; clinical text; natural language processing; TWS; machine learning

Cite This Article

A. Y. Mahdi and S. S. Yuhaniz, "Automatic diagnosis of covid-19 patients from unstructured data based on a novel weighting scheme," Computers, Materials & Continua, vol. 74, no.1, pp. 1375–1392, 2023.



This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 160

    View

  • 147

    Download

  • 1

    Like

Share Link

WeChat scan