Open Access
ARTICLE
GA-Stacking: A New Stacking-Based Ensemble Learning Method to Forecast the COVID-19 Outbreak
1 Department of Management Information Systems, College of Business Administration, Al Yamamah University, Riyadh, 11512, Saudi Arabia
2 Faculty of Computers and Information, Minia University, Minia, 61519, Egypt
3 Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh, 4545, Saudi Arabia
4 Computer Engineering Department, College of Engineering and Architecture, Al Yamamah University, Riyadh, 11512, Saudi Arabia
* Corresponding Author: Walaa N. Ismail. Email:
Computers, Materials & Continua 2023, 74(2), 3945-3976. https://doi.org/10.32604/cmc.2023.031194
Received 12 April 2022; Accepted 29 June 2022; Issue published 31 October 2022
Abstract
As a result of the increased number of COVID-19 cases, Ensemble Machine Learning (EML) would be an effective tool for combatting this pandemic outbreak. An ensemble of classifiers can improve the performance of single machine learning (ML) classifiers, especially stacking-based ensemble learning. Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results. However, building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected. Therefore, the goal of this paper is to develop and evaluate a generic, data-independent predictive method using stacked-based ensemble learning (GA-Stacking) optimized by a Genetic Algorithm (GA) for outbreak prediction and health decision aided processes. GA-Stacking utilizes five well-known classifiers, including Decision Tree (DT), Random Forest (RF), RIGID regression, Least Absolute Shrinkage and Selection Operator (LASSO), and eXtreme Gradient Boosting (XGBoost), at its first level. It also introduces GA to identify comparisons to forecast the number, combination, and trust of these base classifiers based on the Mean Squared Error (MSE) as a fitness function. At the second level of the stacked ensemble model, a Linear Regression (LR) classifier is used to produce the final prediction. The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering, Johns Hopkins University, which consisted of 10,722 data samples. The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99% for the three selected countries. Furthermore, the proposed model achieved good performance when compared with existing bagging-based approaches. The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model to predict the epidemic trend for other countries when comparing preventive and control measures.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.