Open Access
ARTICLE
Water Quality Index Using Modified Random Forest Technique: Assessing Novel Input Features
Wen Yee Wong1, Ayman Khallel Ibrahim Al-Ani1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6
1
Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
2
Department of Electrical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
3
Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
4
Department of Chemical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
5
Department of Science and Technology Studies, Faculty of Science, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
6
Department of Electrical and Electronic Engineering, Faculty of Engineering and Built Environment,
Universiti Sains Islam Malaysia, Bandar Baru Nilai, Nilai, Negeri Sembilan, 71800, Malaysia
* Corresponding Author: Khairunnisa Hasikin. Email:
(This article belongs to this Special Issue: Computer Modeling for Smart Cities Applications)
Computer Modeling in Engineering & Sciences 2022, 132(3), 1011-1038. https://doi.org/10.32604/cmes.2022.019244
Received 01 September 2021; Accepted 27 January 2022; Issue published 27 June 2022
Abstract
Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality
index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen
(DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH
3-N), and
suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal–polluted
river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved
WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new
discriminant features are discussed. The new discriminant features are a step toward formulating adaptive water
quality parameters according to the land use activities surrounding the river. The results and analysis obtained
from this study have proven the possibility of predicting WQI using new input features. This work analyzes 17
new input features, namely conductivity (COND), salinity (SAL), turbidity (TUR), dissolved solids (DS), nitrate
(NO
3), chloride (Cl), phosphate (PO
4), arsenic (As), chromium (Cr), zinc (Zn), calcium (Ca), iron (Fe), potassium
(K), magnesium (Mg), sodium (Na),
E. coli, and total coliform, in predicting WQI using machine learning
techniques. Five regression algorithms—random forest (RF), AdaBoost, support vector regression (SVR), decision
tree regression (DTR), and multilayer perception (MLP)—are applied for preliminary model selection. The results
show that the RF algorithm exhibits better prediction performance, with R
2 of 0.974. Then, this work proposes a
modified RF by incorporating the synthetic minority oversampling technique (SMOTE) into the conventional RF
method. The proposed modified RF method is shown to achieve 77.68%, 74%, 69%, and 71% accuracy, precision,
recall, and F1-score, respectively. In addition, the sensitivity analysis is included to highlight the importance of the
turbidity variable in WQI prediction. The results of sensitivity analysis highlight the importance of certain water
quality variables that are not present in the conventional WQI formulation.
Keywords
Cite This Article
Wong, W. Y., Khallel, A., Hasikin, K., Salwa, A., Razak, S. A. et al. (2022). Water Quality Index Using Modified Random Forest Technique: Assessing Novel Input Features.
CMES-Computer Modeling in Engineering & Sciences, 132(3), 1011–1038.