|Computers, Materials & Continua |
Deep Learning and Holt-Trend Algorithms for Predicting Covid-19 Pandemic
1Community College of Abqaiq, King Faisal University, Al Hofuf, Saudi Arabia
2Department of Quantitative Methods, School of Business, King Faisal University, Al Hofuf, Saudi Arabia
3Deanship of E-Learning and Distance Education, King Faisal University, Al Hofuf, Saudi Arabia
4Department of Computer Sciences and Information Technology, Albaha University, Al Bahah, Saudi Arabia
5Department of Computer Science and Information, Taibah University, Madinah, Kingdom of Saudi Arabia
*Corresponding Author: Theyazn H. H. Aldhyani. Email: email@example.com
Received: 24 September 2020; Accepted: 22 November 2020
Abstract: The Covid-19 epidemic poses a serious public health threat to the world, where people with little or no pre-existing human immunity can be more vulnerable to its effects. Thus, developing surveillance systems for predicting the Covid-19 pandemic at an early stage could save millions of lives. In this study, a deep learning algorithm and a Holt-trend model are proposed to predict the coronavirus. The Long-Short Term Memory (LSTM) and Holt-trend algorithms were applied to predict confirmed numbers and death cases. The real time data used has been collected from the World Health Organization (WHO). In the proposed research, we have considered three countries to test the proposed model, namely Saudi Arabia, Spain and Italy. The results suggest that the LSTM models show better performance in predicting the cases of coronavirus patients. Standard measure performance Mean squared Error (MSE), Root Mean Squared Error (RMSE), Mean error and correlation are employed to estimate the results of the proposed models. The empirical results of the LSTM, using the correlation metrics, are 99.94%, 99.94% and 99.91% in predicting the number of confirmed cases in the three countries. As far as the results of the LSTM model in predicting the number of death of Covid-19, they are 99.86%, 98.876% and 99.16% with respect to Saudi Arabia, Italy and Spain respectively. Similarly, the experiment’s results of the Holt-Trend model in predicting the number of confirmed cases of Covid-19, using the correlation metrics, are 99.06%, 99.96% and 99.94%, whereas the results of the Holt-Trend model in predicting the number of death cases are 99.80%, 99.96% and 99.94% with respect to the Saudi Arabia, Italy and Spain respectively. The empirical results indicate the efficient performance of the presented model in predicting the number of confirmed and death cases of Covid-19 in these countries. Such findings provide better insights regarding the future of Covid-19 this pandemic in general. The results were obtained by applying time series models, which need to be considered for the sake of saving the lives of many people.
Keywords: Deep learning algorithm; holt-trend; prediction Covid-19; machine learning
The Covid-19 pandemic is currently regarded as a threat to global health. Coronaviruses comprise a large number of virus species that may cause diseases in animals and humans. A number of coronaviruses are known to cause respiratory infections in humans, ranging from common colds to more severe diseases such as Middle East Respiratory Syndrome (MERS) and other Severe Acute Respiratory Syndromes (SARS). The recently discovered disease known as Covid-19 is an infectious disease that has spread throughout the world . The emergence of the disease was not known prior to its outbreak in Wuhan, China, in December 2019. Fig. 1 shows how SARS-CoV, SARS-CoV2, and Covid-19 can be transmitted from various animals, such as camels, pigs, cows, rats, and bats, to humans. The most common symptoms of Covid-19 are fever, fatigue, and dry cough. Some patients may experience pain, nasal congestion, coldness, sore throat, or diarrhea. These symptoms are usually mild but develop gradually. Some people become infected with the virus without showing any symptoms or without feeling sick. Most people (80%) are likely to recover from the disease without the need for any special treatment. Approximately one in six people infected with Covid-19 develop severe symptoms. People can be infected with Covid-19 via physical contact with other infected people. The disease can be transmitted from one person to another via the small droplets that become airborne when an infected person coughs or sneezes. These droplets fall on objects and surfaces surrounding the infected person. People who touch these objects or surfaces and then touch their eyes, nose, or mouth can become infected with Covid-19. People can also develop Covid-19 if they breathe these airborne droplets. Therefore, it is important to keep a distance of nearly one meter (3 feet) from sick people.
This health crisis has led to significant economic repercussions due to shocks to supply and demand that differ from previous crises. Policies are needed to help economies overcome the pandemic while maintaining the integrity of the network of economic and financial relations between workers and businesses, lenders and borrowers, and suppliers and end-users so that activity can recover once the outbreak ends. The aim is to prevent such a temporary crisis from causing permanent harm to people and companies due to job loss and bankruptcy. Deaths caused by the outbreak of Covid-19 have increased at an alarming rate, while the disease continues to spread throughout many large countries. The highest priority should be to maintain people’s health and safety as much as possible. Countries can help by increasing spending to fight the virus and improve their health care systems, including spending on personal protective equipment, testing, diagnostic tests, and increasing the number of beds in hospitals. Since a vaccine has not yet been found, countries have taken actions to curb its spread. The economic impact was considerable in the countries that were most affected by the outbreak. For example, in China, activity in the manufacturing and service sectors fell sharply in February. While the fall in activity in the manufacturing sector is comparable to the beginning of the global financial crisis, the decline in the service sector appears to be greater due to the significant impact of social distancing .
Machine learning, deep learning, and traditional statistical models can be used to model and forecast Covid-19. In this study, we developed models that can predict the spread of Covid-19 with a high degree of accuracy. The main contribution of this paper is the use of the LSTM and Holt-Trend models to effectively predict the numbers of confirmed cases and deaths in Saudi Arabia, Italy, and Spain. The results of time series models to predict the spread of Covid-19 based on real-time data gathered from the WHO were more satisfactory. The main contributions of the present research are as follows:
1. To present the advanced time series model, namely, the LSTM deep learning model to predict the spread of Covid-19 in Saudi Arabia, Italy, and Spain.
2. To validate the proposed system and examine the reliability of the LSTM model for predicting the spread of Covid-19.
Researchers have used search queries to monitor health care systems [3–10]. Social media has also been used to predict the spread of diseases . Google is one of the most used search engines for finding information about specific diseases [12,13]. From January 2004, the adjusted total search volume of data obtained from a geographic region carries a significant pattern that can help to identify many issues . A number of researchers have used Google Trends to identify epidemics of infectious diseases such as influenza, chickenpox , and gastroenteritis . Google Trends data have been utilized to predict suicide risk at a population-wide level. A number of researchers have developed models for predicting the spread of infectious diseases. Internet search queries have been utilized to predict the spread of infectious disease [17–22]. Internet search data are considered crucial for analyzing and predicting new epidemics, as most people around the world use the internet to find information on current events. According to Towers et al., internet search data can be utilized to develop an epidemic detection and surveillance system . For instance, Huang et al. have proposed a generalized additive model to predict the outbreak of hand, foot, and mouth disease with search internet queries; the results of the model showed that it can detect the disease before it spreads. Big data surveillance tools have the benefit of accessibility and can detect infectious disease trends before health officials can . Furthermore, social media is considered a source of information for predicting and analyzing outbreaks. Tenkanen et al.  reported that social media generates big data, which is relatively easy to collect and can be used freely, as the data are generated continuously in real time and with rich content . Twitter data can be used to predict mental illness  and infectious epidemics [27–29] that can be utilized for predictions in a variety of other scientific fields [30–33]. Shin et al.  created an LSTM model to predict infectious disease outbreaks using Twitter data. They identified a strong relationship between Twitter data and infectious diseases and concluded that it would be possible to develop surveillance systems to predict infectious disease outbreaks using Twitter data. These studies used deep learning algorithms to predict infectious disease outbreaks [34,35]. Deep learning algorithms can be used to analyze big data . A deep learning algorithm is a more powerful algorithm that can predict patterns in big data, and the results of deep learning are really satisfactory [36–38]. According to Xu et al. , deep learning algorithms perform better than other models, such as the generalized linear model (GLM), the least absolute shrinkage and selection operator model, and the autoregressive integrated moving average (ARIMA) model. Aldhyani et al.  proposed an adaptive network fuzzy inference system (ANFIS) to predict the incidence of chronic diseases using Google Trends data. Aldhyani et al.  presented soft clustering to apply machine learning algorithms and classify chronic diseases. The objective of the present research is to build an advanced time-series model that can predict the spread of Covid-19.
2 Materials and Methods
This section presents the proposed method for predicting the spread of Covid-19. Fig. 2 displays the overall framework of the proposed model. This research includes real-time data on three countries (collected from the WHO), which were used to test the model. The min–max normalization algorithm was used to normalize the data. The deep learning and Holt-Trend models were used to predict the number of confirmed cases and deaths.
2.1 Research Data
Saudi Arabia, Italy, and Spain are the three countries we examined. Data for 85 days (between 21 January 2020 and 15 April 2020) were used. Confirmed cases were used to predict the future spread of Covid-19. Tab. 1 summarizes the real dataset, which comprises daily collected data .
2.2 Normalization Method
The min-max method was employed in MATLAB to scale the data. This method transformed data within a range of 0 to 1 scales.
where xmin is the minimum of the data and xmax is the maximum of the data. Newminx is the minimum number 0, and Newmaxx is the maximum number 1.
2.3 Prediction Models
This section presents the models used to predict the numbers of confirmed cases and deaths in Saudi Arabia, Italy, and Spain.
2.3.1 Recurrent Neural Network (RNN)
Recurrent neural networks (RNNs) were designed in 1980 [42–44]. RNN algorithms consist of a hidden layer, an input layer, and an output layer. They have a chain-like structure for repeating cells, which are used to store significant information from previous process steps.
The hidden layer is represented by ht from input xt to output yt. Furthermore, RNNs have a recurrent loop, which loops back to the past to express that the output is not only a function of new input but also a function of the past hidden layer, and in this way, the network keeps growing. RNNs are capable of addressing the issue to explode and vanish gradient by using a loop, which allows information to persist. RNNs support the process of cell state, which helps to transmit information between cells with yt data and embeds on top of them all.
Fig. 3 displays a chunk of a neural network; xt is the input, and yt value is the output. If we loop the figure, the look value helps the information to pass from one step of the network cell to the next network cell. The loops create the RNN type of ambiguous object; as we know, RNNs are totally different from normal neural networks. An RNN comprises multiple copies of the same network in which each network passes message to a successor. Cell states within networks function much like conveyor belts; they transmit information through the entire chain. The cells have gates, which contain sigmoid functions; the output gate value and yt are subject to multiplication. The sigmoid function can take values between 0 and 1; a value of 0 refers to transition information, whereas the value refers to 1 as close to the entire information.
Fig. 4 shows an unrolled loop; the RNN is presented as a sequence and list. Let us consider the hidden layer yt at time step t. The LSTM cell needs to decide the cell status
where ht is the hidden layer corresponding to xt, ht −1 is the hidden state of the RNN, xt is the input data, and Ot is the output value. Whereas W, U, and V are the weight vectors of the neural networks, the bias vector of the neural network is represented by b. In order to transfer the value from the hidden layer to the output, the activation function is used. The structure of the long short-term memory cell is shown in Fig. 2. It contains a forget gate (ft), an input gate (it), an input modulation gate (mt), an output gate (Ot), a memory cell (ct), and a hidden state (ht). These gates are computed as follows:
where xt is the training input data, W and U are parameters used to adjust the weight matrices, and ht −1 is the previous hidden layer in the long short-term memory network. In order to transfer the data from input to output by the logistic sigmoid function. The hyperbolic tangent function is based on the function, and b is the bias vector of training data. We computed the memory cell (ct) and the hidden state (ht) with the equations
Fig. 6 shows the flow steps of the LSTM model used to predict the spread of Covid-19. Tab. 2 demonstrates the significant parameters of the LSTM algorithm. Note that the parameters are significant, thus obtaining better predictions.
2.3.2 Holt-Trend Model
Exponential smoothing models are among the most important prediction approaches and are widely used in industry and commerce. The exponential smoothing method is a generalization of the moving average technique. Exponential smoothing models use stationary time-series data. The idea behind exponential smoothing is to smooth original time-series data to forecast future values. Holt-Trend Exponential Smoothing (HTES) model is similar to weighted exponential smoothing. However, it uses a trend estimator that changes over time:
where and Ft+m is a forecast future value; indicates an estimate at the level of the series at time series t, and bt indicates an estimate at the trend level at time series t. The Holt-Trend Exponential smoothing method has two smoothing constants denoted by and . There are two estimators: and bt −1; t −1 refers to the estimate of the level of the time series constructed at time t −1 (this is typically called the level component); bt −1 refers to the estimate of the growth rate of the time series constructed at time bt −1 (this is typically called the trend component). Fig. 5 demonstrates the Holt-Trend algorithm steps to predict the spread of Covid-19; Pi is the prediction output and fi is the forecasted future value.
2.4 Model Evaluation Criteria
To evaluate the performance of the LSTM and the Holt-Trend model, mean squared error (MSE), root-mean-square error (RMSE), and mean error metrics were applied. These standard metrics have the capability to find prediction errors made by the LSTM and Holt-Trend models.
where xt is the observed response, is the estimated response, and N is the total number of observations.
where xt is the observed response, is the estimated response, and N is the total number of observations.
where xt is the observed response and is the estimated response.
where r is Pearson’s correlation coefficient, x is the input value in the first set of training data, y is the input value of the second set of training data, and n is the total of simple input data.
Our analyses used WHO data for 55 days (21 January 2020 to 15 April 2020). The min–max method was applied for normalization purposes. Saudi Arabia, Spain, and Italy were used to test and evaluate the proposed model. Two advanced time series models were applied to predict confirmed cases and deaths. Two experiments, which are described below, were conducted in a specific environment in MATLAB 2018 to obtain the prediction results.
3.1 Analysis of the LSTM Model
In this section, we evaluate the performance of the LSTM deep learning approach. The deep learning algorithm is proposed, and the real dataset was divided into 80% training and 20% testing. Whereas the training data are considered self-similar predictions, the testing predicted and validated the proposed model. Evaluation metrics (MSE, RMSE, mean error, and R-values) were employed to examine and evaluate the LSTM model. Tab. 3 summarizes the results. The results of applying the LSTM model to Saudi Arabia were 0.00132, 0.0363, 0.00023, and 0.0370 with respect to MSE, RMSE, mean error, and standard deviation, respectively, in the training data; the testing results were , , mean , and standard deviation .
As shown in Tab. 3, the results obtained from applying the LSTM model to Italy are the training data, and the prediction results were 0.0028, 0.01676, 2.0558e −05, and 0.0169 in terms of MSE, RMSE, mean error, and standard deviation, respectively. Note that the LSTM’s performance was higher and the prediction errors were very high compared with the Saud Arabia data. To validate and test the proposed model, we created 20 tests, and the results were as follows: , , mean , and . Similarly, the LSTM model was applied to the Spain dataset. The performance of the LSTM to predict the confirmed cases in Spain are the training 0.00020, 0.01426, 3.5212e −05, and 0.01441. As for the testing, the results were the following: , , mean , and . The prediction is very low. This indicates that the LSTM model is more efficient and effective. To handle Covid-19, find out the number of confirmed cases that will be discovered in the future.
Figs. 6a–6c provide graphical information about the performance of the LSTM model; it shows that the training and testing are suitable for the regression line, which indicates that the proposed model is appropriate for predicting cases of Covid-19. The percentage value of R is very high between the observation value and the output value. The empirical results of the validation process of the LSTM model further indicate that the constructed model can achieve equally impressive performance. Tab. 4 illustrates the prediction results of the LSTM for the number of deaths in Saudi Arabia, Italy, and Spain. To validate the proposed model, we divided the data into training and testing, while the testing considered forecasting future values.
The prediction results obtained from the LSTM when predicting deaths in Saudi Arabia were 0.00605, 0.0778, 0.00023, and 0.082 with respect to MSE, RMSE, mean error, and standard deviation, respectively; in training, the performance of the proposed model for forecasting the number of deaths was as follows: , , , and . The accuracy of the proposed model was similar when predicting the number of deaths in Italy: , , mean .5161e −05, and in the training data; however, the results of the proposed model in testing data were 0.09302, 0.305, 0.2991, and 0.0660 with respect to MSE, RMSE, mean error, and standard deviation error, respectively. The prediction results of the LSTM when predicting the number of deaths in Spain were 0.00047, 0.02178, 0.000170, and 0.0222. The results from the test dataset were 0.01355, 0.1164, 0.0364, and 0.1125. Figs. 7a–7c are regression plots showing the correlation between the observation data of the number of deaths and the prediction output. The graphical representation shows that the training and testing data fit the regression line, which indicates that the proposed model is appropriate for predicting the number of covid-19 deaths. The percentage value of correlation metrics is very high between the observation of the number of deaths and the output value. The empirical results of the validation process of the LSTM model further indicated that the constructed model can achieve equally impressive performance even if some parameters, such as the number of units or the number of input variables, have been changed. Tab. 4 shows the number of deaths obtained from the LSTM model in the testing phase. The testing phase is used to predict unseen data; the prediction results are closer to the observation data.
3.2 Analysis of the Holt-Trend Model
The Holt-Trend model is an exponential smoothing model used to predict trend data. The Holt-Trend model has two smoothing constants, one for the level and one for the trend. In these experiments, we took different parameters for the level and the trend to obtain higher predictions. The level (alpha) parameter values were 0.1, 0.5, and 0.15; the trend (beta) parameter value was 0.20. The MSE metric was used to measure the best parameters; these were and because the prediction errors were the lowest according to the standard evaluation metrics.
Tab. 5 shows the results of the Holt-Trend model’s predictions of the number of confirmed cases in the three countries. For Saudi Arabia, the results were as follows: 0.007, 0.085, and 0.0377. For Italy, the results were as follows: .8378e −04, , and mean ; note that the prediction errors are much lower. For Spain, the results were as follows: 5.1711e −04, 0.0227, and 0.0085. The parameter values of and were more suitable for predicting the number of confirmed cases of Covid-19. Note that the correlation between the observed number of confirmed cases in Saudi Arabia and the prediction was 99.64%; for Italy it was 99.96%, and for Spain it was 99.94%. Overall, the Holt-Trend model has the ability to predict the numbers of confirmed cases of Covid-19 with greater accuracy. In this section, the focus is on predicting the future values for the number of confirmed cases in the three countries. We forecasted future values at intervals of one month from 15-4-2020 to 15-05-2020. The Holt-Trend model was applied to forecast future values by using observation data collected from the WHO. Figs. 8a–8c demonstrate the performance of the Holt-Trend model; the trend is going up, which indicates that the number of confirmed cases will increase. Figs. 9a–9c illustrate the prediction performance of the Holt-Trend model. It indicates that the trend is going up, which indicates that the number of confirmed cases will increase.
In this section, the Holt-Trend model is applied to predict the number of deaths in the three countries. The Holt-Trend model depends on two constant values for the level and the trend. and were selected for the same reasons as above.
Tab. 6 shows the results of the Holt-Trend model for predicting the number of deaths in three countries. For Saudi Arabia, they were as follows: 0.0030, 0.0546, and 0.0248. For Italy, the results were as follows: .6995e −04, , and mean ; note that the prediction errors were much lower. For Spain, they were as follows: 9.3129e −04, 0.0304, and 0.0145. The parameter values of and were more suitable for the prediction of the number of Covid-19 deaths. Figs. 10a–10c show scatterplots of correlation and time-series analysis that were applied to assess the association between the observation data and the prediction output. Cross-correlation results were obtained as product-moment correlations between the observed data and the prediction output. The time dependence between two variables was termed lag, which indicates the degree and direction of association between the observed values and the prediction values. The correlation between the observed number of deaths in Saudi Arabia and the prediction was 99.80%; for Italy it was 99.96%, and for Spain it was 99.93%. Overall, the Holt-Trend model has the ability to predict the number of Covid-19 deaths with greater accuracy.
In this section, we focused on predicting the future values of the number of deaths in the three countries. We forecasted future values at a time interval of one month from 15-4-2020 to 15-05-2020. The Holt-Trend model was applied to forecast future values by using data from the WHO. Figures demonstrate the performance of the Holt-Trend to forecast the future values; the trend is going up, which indicates that the number of confirmed cases will increase. Figs. 11a–11c illustrate the prediction performance of the Holt-Trend model when forecasting the number of confirmed cases in the future. The trend is going up, which suggests that the number of confirmed cases will increase.
This study applied deep learning and the Holt-Trend model to predict the risk of Covid-19 outbreaks based on real-time data collected from the WHO. The main objective of the proposed model is to predict the number of future cases and deaths. The proposed models can be used to estimate the future risk of Covid-19 outbreaks. Max–min normalization was applied to save the range of data. The algorithms were applied to predict the number of Covid-19 cases and deaths. For the LSTM algorithm, the data were divided into 80% training (used for self-prediction) and 20% testing (used for validation and future forecasting). The statistical Holt-Trend model was applied to predict the number of cases and deaths. To validate the model, we forecasted future values over a period of 30 days. The prediction results demonstrated that the LSTM and Holt-Trend models can be effectively employed to predict Covid-19 outbreaks by using real-time data gathered from the WHO. Comparative predicted results between the LSTM and Holt-Trend models were presented. The proposed models showed effective performance according to the MSE, RMSE, mean error, and correlation of increment performance measures. In addition, the LSTM and Holt-Trend models are more satisfying to predict Covid-19 cases. One limitation of this study was the lockdown. We could not meet with medical 525 experts due to the quarantine, so we collected the data from the WHO. In future work, we will use Google search terms to predict Covid-19 cases.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|