The COVID-19 outbreak initiated from the Chinese city of Wuhan and eventually affected almost every nation around the globe. From China, the disease started spreading to the rest of the world. After China, Italy became the next epicentre of the virus and witnessed a very high death toll. Soon nations like the USA became severely hit by SARS-CoV-2 virus. The World Health Organisation, on 11th March 2020, declared COVID-19 a pandemic. To combat the epidemic, the nations from every corner of the world has instituted various policies like physical distancing, isolation of infected population and researching on the potential vaccine of SARS-CoV-2. To identify the impact of various policies implemented by the affected countries on the pandemic spread, a myriad of AI-based models have been presented to analyse and predict the epidemiological trends of COVID-19. In this work, the authors present a detailed study of different artificial intelligence frameworks applied for predictive analysis of COVID-19 patient record. The forecasting models acquire information from records to detect the pandemic spreading and thus enabling an opportunity to take immediate actions to reduce the spread of the virus. This paper addresses the research issues and corresponding solutions associated with the prediction and detection of infectious diseases like COVID-19. It further focuses on the study of vaccinations to cope with the pandemic. Finally, the research challenges in terms of data availability, reliability, the accuracy of the existing prediction models and other open issues are discussed to outline the future course of this study.
The COVID-19 disease, caused by SARS-CoV-2, was initially reported on 31st December 2019, in Wuhan from the Hubei province in China. Since then, the disease started spreading rapidly to all other continents. The SARS-CoV-2 disease is highly infectious and exhibited fast-spreading capacity, endangering the life of a vast number of people across the world. As on 27 July 2020, the number of total COVID-19 infected patients are 1,64,80,485 and number of deceased are 6,54,036 [
Quantitative and mathematical analyses of time series data of COVID-19 patients are required for estimating the number of the affected individual in this case. The spread of infectious diseases among the population is a complex transmission process. Predictive models can be built to examine the diffusion of this extremely infectious disease. Prediction of time series data of COVID-19 pandemic outbreak may be achieved through a variety of models like machine learning, statistical, and epidemiological models. Prediction estimates the future trend of the epidemic, however, to curb the menace of the virus infection tracing and identifying vaccination strategies using AI framework, are equally important. RT-PCR is usually used for detection of COVID-19. However, non-abundance of test kits and false-negative results requires a supplementary method for detection of SARS-CoV-2 infection. Chest X-ray and CT scan are being used along with medical conditions of an individual to detect COVID-19 infection, where deep learning methods are deployed for image analysis and correlating with clinical symptoms. The human trial for COVID-19 vaccine has started.
In this paper, the authors review existing work pertaining to artificial intelligence-based predictive analysis of COVID-19, disease prediction and diagnosis. Further, the authors assess the prediction models used in each study and prepare a comparative analysis of the same with other predictive models. The paper is ordered as follows. Background research on different predictive models available for infectious diseases is presented in Section 2. Section 3 summarizes a detailed review of the COVID-19 prediction models. Section 4 presents a detailed analysis of the review. The paper is concluded by describing the impending issues and prospects of this research on the research trajectory.
For successful management of an epidemic, precise estimation and prediction of a disease outbreak are important. Many researchers have deployed different time series models for forecasting of infectious disease using machine learning algorithms, mathematical models and disease-based models. Study of these models is significant for perceiving the review work presented in this literature.
Machine learning a part of Artificial Intelligence which can be deployed to learn important information from a huge data set. The applications of the field of machine learning span a vast range of fields including engineering and technology, criminal forensics, medicine, statistics, engineering technology [
Numerous statistical techniques are available for time series prediction. Statistical models have been built for identifying the dependencies between the input variables and the outputs using a variety of stochastic models to forecast the possible transmission pattern of this epidemic. The statistical models need prior information about the distribution of data in the given time series. The popular statistical time series prediction models, used for forecasting infectious diseases like COVID-19 using one-dimensional time series dataset are Exponential Smoothing (ES) [
Epidemiological models [
A myriad of statistics based schemes has been traditionally employed in epidemiological surveys and predictive assessment for epidemics and spread of viral infraction. With the growing number of COVID-19 cases, experts and public health officials have been scrambling to scout for a model that enables them to make informed decisions and implement appropriate measures to address the pandemic. However, these schemes are deprived of pivotal data, general robustness and exhibit a lower accuracy with respect to long term predictions. Machine Learning (ML) based approach establishes itself as a promising avenue to address the impending crisis and benchmarks its potential as a tool to model and make a predictive assessment in the COVID crisis on numerous fronts. Efficacy of traditional statistics epidemiological surveys and models rely on various factors which are often not favourable in a novel situation such as this where we are still learning and discovering the behaviour of the virus and its spread.
Ardabili et al. [
Owing to the complexities of disease models, ML-based models have been proposed for the outbreak prediction and assessment of epidemiological surveys. The subsequent paragraphs expand on various ML-based approaches that model the pandemic data and make suitable considerations for providing relevant insights. Further
Model | Methodology | Advantages | Limitations | |
---|---|---|---|---|
Alzab et al. [ |
Uses VGG16 based CNN on X-ray data for diagnosis. Further makes use of three forecasting models-ARIMA, LSTM and PA. | CNN is used which enables a high accuracy and F score. Prediction models showed significant accuracy for Australia and Jordan case studies. | Dataset is limited on which CNN is implemented. Further, the forecasting is done for a short period of time. | |
Al-qaness et al. [ |
Employs ANFIS in conjunction with FPA-NIA which is enhanced by the SSA. | The SSA enhanced FPA-ANFIS algorithm augments the exploitation ability and mitigates the limitation of traditional ANFIS approach. | Computational time is large and there is a significant scope of improvement on that front. | |
Car et al. [ |
Makes use of MLP ANN with a 5-fold K-fold algorithm for cross-validation. | The coefficient of determination displayed a large value for each group. The model had high robustness for deceased patients. | The proposed model had moderate robustness for confirmed cases and low robustness of recovered patients. | |
Li et al. [ |
Considers transmission characteristics and uses Gaussian distribution theory. | The paper was extremely objective and aimed at providing answers to 11 critical considerations. | The scope of the study was narrowed to only the Hubei Province of China. | |
Yang et al. [ |
Proposes a modified SEIR and also employs an LSTM model. The AI-based model is optimized using Adam optimizer and .trained on SARS epidemic data from 2003. | Using LSTM RNN overfitting was prevented even though the dataset was small. The study conducted an impact assessment of control strategies which was particularly helpful. | The dataset was relatively small and consequently, the results were not broad scope and constrained by both region and time period. | |
Piccolomiini et al. [ |
Models the transmission through a differential SEIRD model for Italian regions | Performs better than a traditional SEIRD model for the dataset in consideration. The increased flexibility with variable infection rate gives experts a reference to assess the impacts of lockdowns. | This study too is narrow scope and the dataset is limited hence the issue of overfitting is present. | |
Jiang et al. [ |
The proposed scheme used a PA-AI method to diagnose severe cases using several clinical indicators. | The PA scheme uses a filter-based method as opposed to a wrapper method that factors in the variance of individual attributes, something that is prevalent in clinical cases of COVID. | Limited clinical indicators were used which narrows down the diagnosis. Further, the data set was from only two hospitals and 53 patients. | |
CoronaTracker Group [ |
Employed statistics and predictive analysis based on the prevalence and incidence of the coronavirus. | Provided impact assessment of containment measures and how information pertaining to the epidemic was flowing. | The early research underscored the unknown properties of the virus and is a preliminary research in the field. | |
Alvarez et al. [ |
Proposed a nature-inspired metaheuristic approach to map the spread of coronavirus. Uses Deep learning and metaheuristics. | Use of metaheuristics enables the scheme to factor in large search spaces and find sub-optimal solutions. | The reduction of infections is only carried out by solely considering social isolation as a primary measure. | |
Yudistra [ |
Makes use of a heuristically searched LSTM architecture to make a prediction of COVID cases over time. | Since RNN based models are used, the temporal model stores the activation of every time stamp under internal state, thus making it suitable for time series data such as COVID. | The authors anticipated that the long-term prediction with their model will have less accuracy. | |
Chimmula et al. [ |
Uses LSTM networks to predict the COVID infection transmission for Canadian regions. | Restrains from using a high probability and neglecting of previous data, something that prevalent in some previous studies in the domain. | The model is trained on the CHA dataset which reduces its scope and diversity. |
Nature Inspired Algorithms (NIA) and metaheuristic approaches have been used to solve numerous optimization and research problem statements from a myriad of domains. Martínez-Álvarez et al. [
CoronaTracker, a data analytics and prediction model, was proposed by the CoronaTracker Community Research Group composed of professionals and academicians from a plethora of backgrounds and institutions [
Alzab et al. proposed a deep learning-based approach and uses VGG16 vision model architecture to diagnose COVID-19 and make a predictive assessment of its spread. The authors employ an artificial intelligence paradigm based on Convolution Neural Network that is implemented on chest X-rays to identify COVID-19 patients [
Car et al. [
The COVID-19 outbreak emanating from the Hubei province also clashed with mass migration associated with the annual spring festival Chunyun in the region, followed by unprecedented measures by Chinese authorities to curb the spread and break the transmission chain. Yang et al. [
In a bid to harness computational intelligence in clinical decision-making Jiang et al. [
Using CT scan of patients, Chen et al. [
The above research survey has highlighted a wide array of work in various domains of combating COVID. With vaccine trials still underway, the fact of the matter remains that modelling of this viral disease remains of critical importance in understanding the impact. Analysis based on pure statistics and stochastic approaches fails to consider the intricacies within the data. Computational Intelligence (CI) based schemes such as AI and ML in addition to “learning” from the general trend also factor in the entanglement and deliver higher quality models.
The above-surveyed articles attest to the accelerating nature of CI-based applications in various domains of medicine. But their clinical applicability as effective techniques is limited. Even COVID-19 research has gained significant traction with the implementation of computational intelligence based paradigms.
As seen above, most of the research conducted above utilized a great number of data of COVID patients with relevant benchmarking via expert performance. Through them, we have seen a great majority of these studies that use a retrospective approach though employing historical data to train and validate the various AI and ML-based models. It has been argued that only through a prospective approach the actual ability of these models and systems can be understood, these studies often experienced a deteriorated performance when dealing with real-world data which is dissimilar to the one used in algorithm training. With COVID-19 pandemic, the regional and global nature has been extremely transient, and the data from a few months ago is extremely dissimilar from a trend perspective to the current data. This can likely be attributed to the fact we are still learning about this novel infection and different steps and therapeutics are being taken on an ad-hoc basis that is impacting the same.
A great majority of AI and CI-based studies, including the ones on COVID surveyed above are primarily published on preprint avenues as opposed to being submitted in peer-reviewed journals. A peer review study holds more importance and trust which consequently leads to a smoother adoption within the medical diaspora. Further, even fewer studies considered Randomised Controlled Trials (RCT), considered as a “gold standard” in the field of medicine, of CI-based models to date. This is even more obscure for CI models on COVID-19.
Clinical applicability has plagued the community and been an impending endeavour for the computational intelligence researchers from an applicability perspective. This “chasm” of AI becomes evident when we weigh in efficacy from a clinical point and compare it with accuracy. The area under operating characteristics of a receiver is not best reflective of clinical applicability despite the prevalence of machine learning-based research and also perplexes clinical experts.
Research studies often encounter several impediments in objectively comparing algorithms and schemes, especially for COVID-19 as these models employ different methods, dissimilar population sets and varying distribution of a sample. In order for the comparison to be fair these studies needed to be subjected to similar data and similar metrics for performance, in absence of this the medical experts were not be able to identify the best-suited model or scheme for their patients. Hence, independent testing is required to detail a comparative assessment that is in congruence to clinical standards.
Algorithms based on electronic health records often neglect the fact the input data is produced from an environment that is non-stationary in nature where there is a significant shift in the population of the patients. This, in particular, is prevalent with COVID-19 where the practice on operation and clinical is dynamically evolving with time. In addition, a precursor to deployment is a robust regulatory framework that is necessary for the safety and effectiveness of CI-based systems. This is particularly challenging amidst the “novel” coronavirus pandemic. While learning of CI-based models is perpetual but system updates that are periodic are preferred for any clinical application, hence this dynamism needs to be factored in regulatory approvals in light of continual calibrations.
Generalizability is often one front the AI-based models are unable to achieve which is a precursor to clinical applicability. There exist several blind spots in a model that if not identified through rigorous testing impedes in translation into actual use and contributes to incorrect decisions. Based on the above-conducted survey, these differences can be attributed to dissimilarity in e-health records, sensing equipment and associated technical and administrative incongruity. Algorithm bias present in the models can have three components; model bias, model variance and outcome notes.
Susceptibility of adversarial attacks is somewhat theoretical within the scope of this paper but it’s more amenable to real-world scenarios. CI-based systems have a myriad of underlying security issues that require conflict resolution before they can be safely deployed. This entails system related, privacy-related and authentication-related issues. This underscores the importance of uniform format of data collection and a system-wide coherence with regards to COVID-19 patient records.
There has been a myriad of metrics, clinical and non-clinical, used in the above described techniques to enable forecasting of COVID cases. The metrics can be clustered as per
Category | Metrics | |
---|---|---|
COVID case statistics | Daily death count, number of patients, number of cases, number of deceased, number of recovered cases, report time, serial interval, epidemic doubling period | |
Geographic and environmental parameters | Longitude & latitude, temperature, humidity, wind speed | |
Socio-Economic parameters | COVID-19 awareness, economic indices | |
Epidemiological metrics | Incubation period, rate of infection, rate of reproduction, | |
Patient related | Age, gender, underlying condition |
Dataset | Remarks | |
---|---|---|
WHO weather database [ |
Environmental factors such as wind speed, humidity and temperature used to map spread of virus | |
John Hopkins database [ |
Epidemiological curve and global data of cases | |
European centre for disease prevention and control (ECDC) database [ |
Epidemiological curve and regional data of cases for EU nations. | |
Center for disease control (CDC) database: Cases, data, and surveillance [ |
Cases, deaths, hospital capacity, testing and seroprevalence in United States | |
Italian national database [ |
Regional perspective on virus spread information in Italy | |
Github database [ |
Impact of social distancing in the US | |
UK based EHRs for COVID-19 patients [ |
Effect of a previous health condition on COVID patients | |
International ATA data—Chinese CDC [ |
Impact and mobility assessment owing to the COVID-19 |
To under the scope of various model and techniques, a comparative assessment of various predictive techniques is performed. A times series database of worldwide COVID-19 cases starting from 22nd January, 2020 to 27 July, 2020 has been acquired from the ECDC database [
The models are trained and tested with the above-mentioned dataset using holdout and cross-validation techniques. The RMSE (Root Mean Square Error), MAPE (Mean Absolute Percentage Error) and RMSLE (Root Mean Square Logarithmic Error) values obtained are shown in
Model | Description | Validation technique | Performance metrics | ||
---|---|---|---|---|---|
RMSE | MAPE (%) | RMSLE | |||
LSTM | LSTM network has the capability to learn about the dependencies of the outcomes with older values. “RELU” activation function is used in this study. | Holdout | 954.28 | 2.81 | 0.029 |
Linear regression | The polynomial regression model implemented in this work is modelled as 3rd degree polynomial in input term. | Cross-validation | 5345.44 | 13.53 | 1.43 |
SVR | In this study, SVR model of degree 4 is used with a polynomial kernel to forecast. | Cross-validation | 3843.24 | 8.86 | 1.679 |
ARIMA | ARIMA forecasts on the basis of the dependencies on the past values and also accommodates the effect of past forecast errors. | Cross-validation | 9987.44 | 24.04 | 1.71 |
SEIR | SEIR model relies on differential equations (based on the relationship of susceptible, exposed infected and recovered) for forecasting. The initial |
Holdout | 12833.48 | 29.66 | 1.79 |
RMSE exhibits distributed property and serves as a better indicator of fit. The MAPE provides the extent of the difference between predicted and actual outcome. The RMSLE counts the fraction between the actual and predicted outcome. The true performance can be evaluated by using RMSE.
This technical composition provides a comprehensive review of the research trajectory in light of COVID-19 pandemic with special focus on computational intelligence-based paradigms in forecasting and diagnosis. This paper serves as a reference point for prospective researchers to orient future research work in this domain. AI and ML have proven to be a promising research avenue with wide-ranging applicability in various fields of medicine and beyond. From dealing with large search space, ascertaining optimal solution and providing data-driven intelligent techniques, CI has proven and continues to assert itself as a prospective area in medicine that provides state of art solutions to the improving the diagnostics and medical domain. The work surveyed above in light of the coronavirus pandemic attests to the same. The authors here chart out various computational intelligence-based methodologies used for forecasting and diagnosis of COVID-19, in particular, the learning and data-driven techniques with a critical focus on machine learning and AI.
Based on the analysis, this paper also identifies key impediments from a practical standpoint that impedes the translation of this research into application. This confluence of computational intelligence and medicine requires a deliberative and holistic approach so the insights from the models can be employed to combat this and future pandemics. Under performance review, this paper assesses the accuracy of various models, namely LSTM, Linear Prediction, SVR, ARIMA and SEIR in a comparative assessment across RMSE. The RSME value of LSTM based model showed the lowest RMSE value at 954.28 thus making it stand out in the comparative assessment whereas traditional SIER model showed the least accuracy where the RMSE stood at 12833.48 thereby attesting to the need to use ML-AI driven approaches for the forecasting of cases for COVID-19. Further LSTM is the only technique used in detection, prediction and therapeutic research of COVID-19.