Open Access
ARTICLE
A Stacked Ensemble Deep Learning Approach for Imbalanced Multi-Class Water Quality Index Prediction
Wen Yee Wong1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6
1 Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, 50603, Kuala Lumpur, Malaysia
2 Department of Electrical Engineering, Faculty of Engineering, University of Malaya, 50603, Kuala Lumpur, Malaysia
3 Institute of Biological Sciences, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia
4 Department of Chemical Engineering, Faculty of Engineering, University of Malaya, 50603, Kuala Lumpur, Malaysia
5 Department of Science and Technology Studies, Faculty of Science, University of Malaya, 50603, Kuala Lumpur,
Malaysia
6 Department of Electrical and Electronic Engineering, Faculty of Engineering and Built Environment, Universiti Sains
Islam Malaysia, Bandar Baru Nilai, 71800, Nilai, Negeri Sembilan, Malaysia
* Corresponding Author: Khairunnisa Hasikin. Email:
Computers, Materials & Continua 2023, 76(2), 1361-1384. https://doi.org/10.32604/cmc.2023.038045
Received 25 November 2022; Accepted 11 April 2023; Issue published 30 August 2023
Abstract
A common difficulty in building prediction models with realworld environmental datasets is the skewed distribution of classes. There
are significantly more samples for day-to-day classes, while rare events such
as polluted classes are uncommon. Consequently, the limited availability of
minority outcomes lowers the classifier’s overall reliability. This study assesses
the capability of machine learning (ML) algorithms in tackling imbalanced
water quality data based on the metrics of precision, recall, and F1 score. It
intends to balance the misled accuracy towards the majority of data. Hence, 10
ML algorithms of its performance are compared. The classifiers included are
AdaBoost, Support Vector Machine, Linear Discriminant Analysis, k-Nearest
Neighbors, Naïve Bayes, Decision Trees, Random Forest, Extra Trees, Bagging, and the Multilayer Perceptron. This study also uses the Easy Ensemble
Classifier, Balanced Bagging, and RUSBoost algorithm to evaluate multi-class
imbalanced learning methods. The comparison results revealed that a highaccuracy machine learning model is not always good in recall and sensitivity.
This paper’s stacked ensemble deep learning (SE-DL) generalization model
effectively classifies the water quality index (WQI) based on 23 input variables.
The proposed algorithm achieved a remarkable average of 95.69%, 94.96%,
92.92%, and 93.88% for accuracy, precision, recall, and F1 score, respectively.
In addition, the proposed model is compared against two state-of-the-art
classifiers, the XGBoost (eXtreme Gradient Boosting) and Light Gradient
Boosting Machine, where performance metrics of balanced accuracy and
g-mean are included. The experimental setup concluded XGBoost with a
higher balanced accuracy and G-mean. However, the SE-DL model has a
better and more balanced performance in the F1 score. The SE-DL model
aligns with the goal of this study to ensure the balance between accuracy and completeness for each water quality class. The proposed algorithm is also
capable of higher efficiency at a lower computational time against using the
standard Synthetic Minority Oversampling Technique (SMOTE) approach to
imbalanced datasets.
Keywords
Cite This Article
APA Style
Wong, W.Y., Hasikin, K., Khairuddin, A.S.M., Razak, S.A., Hizaddin, H.F. et al. (2023). A stacked ensemble deep learning approach for imbalanced multi-class water quality index prediction. Computers, Materials & Continua, 76(2), 1361-1384. https://doi.org/10.32604/cmc.2023.038045
Vancouver Style
Wong WY, Hasikin K, Khairuddin ASM, Razak SA, Hizaddin HF, Mokhtar MI, et al. A stacked ensemble deep learning approach for imbalanced multi-class water quality index prediction. Comput Mater Contin. 2023;76(2):1361-1384 https://doi.org/10.32604/cmc.2023.038045
IEEE Style
W.Y. Wong et al., "A Stacked Ensemble Deep Learning Approach for Imbalanced Multi-Class Water Quality Index Prediction," Comput. Mater. Contin., vol. 76, no. 2, pp. 1361-1384. 2023. https://doi.org/10.32604/cmc.2023.038045