Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

Yap Wah; Azlan Ismail; Nur Niswah; Jafreezal Jaafar; Izzatdin Aziz; Mohd Hasan; Jasni Zain

doi:10.32604/cmc.2023.034470

Open Access icon Open Access

ARTICLE

Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

Yap Bee Wah^1,5,*, Azlan Ismail^1,2, Nur Niswah Naslina Azid³, Jafreezal Jaafar⁴, Izzatdin Abdul Aziz⁴, Mohd Hilmi Hasan⁴, Jasni Mohamad Zain^1,2

1 Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Kompleks Al-Khawarizmi, Universiti Teknologi MARA (UiTM), Shah Alam, 40450, Selangor, Malaysia
2 School of Computing Sciences, College of Computing, Informatics and Media,Universiti Teknologi MARA (UiTM), 40450, Shah Alam, Selangor, Malaysia
3 Mathematical Sciences Studies, College of Computing, Informatics and Media, Universiti Teknologi MARA (UiTM) Kelantan Branch, Machang Campus, Bukit Ilmu, 18500, Machang, Kelantan Darul Naim, Malaysia
4 Centre for Research in Data Science (CeRDaS), Department of Computer and Information Sciences (DCIS), Universiti Teknologi PETRONAS (UTP), Seri Iskandar, 32610, Perak, Malaysia
5 UNITAR International University, Jalan SS6/3, SS6, Petaling Jaya, 47301, Selangor, Malaysia

* Corresponding Author: Yap Bee Wah. Email: email

Computers, Materials & Continua 2023, 75(3), 4821-4841. https://doi.org/10.32604/cmc.2023.034470

Received 18 July 2022; Accepted 17 February 2023; Issue published 29 April 2023

Abstract

Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate. The common approach to handle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling, random oversampling, or Synthetic Minority Oversampling Technique (SMOTE) algorithms. This paper compared the classification performance of three popular classifiers (Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine) in predicting machine failure in the Oil and Gas industry. The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945 (97%) ‘non-failure’ and 528 (3%) ‘failure data’. The three independent variables to predict machine failure were pressure indicator, flow indicator, and level indicator. The accuracy of the classifiers is very high and close to 100%, but the sensitivity of all classifiers using the original dataset was close to zero. The performance of the three classifiers was then evaluated for data with different imbalance rates (10% to 50%) generated from the original data using SMOTE, SMOTE-Support Vector Machine (SMOTE-SVM) and SMOTE-Edited Nearest Neighbour (SMOTE-ENN). The classifiers were evaluated based on improvement in sensitivity and F-measure. Results showed that the sensitivity of all classifiers increases as the imbalance rate increases. SVM with radial basis function (RBF) kernel has the highest sensitivity when data is balanced (50:50) using SMOTE (Sensitivity_test =0.5686, F_test= 0.6927) compared to Naïve Bayes (Sensitivity_test =0.4033, F_test= 0.6218) and Logistic Regression (Sensitivity_test =0.4194, F_test= 0.621). Overall, the Gaussian Naïve Bayes model consistently improves sensitivity and F-measure as the imbalance ratio increases, but the sensitivity is below 50%. The classifiers performed better when data was balanced using SMOTE-SVM compared to SMOTE and SMOTE-ENN.

Keywords

Machine failure; machine learning; imbalanced data; SMOTE; classification

Cite This Article

APA Style

Wah, Y.B., Ismail, A., Naslina Azid, N.N., Jaafar, J., Aziz, I.A. et al. (2023). Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction. Computers, Materials & Continua, 75(3), 4821–4841. https://doi.org/10.32604/cmc.2023.034470

Vancouver Style

Wah YB, Ismail A, Naslina Azid NN, Jaafar J, Aziz IA, Hasan MH, et al. Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction. Comput Mater Contin. 2023;75(3):4821–4841. https://doi.org/10.32604/cmc.2023.034470

IEEE Style

Y. B. Wah et al., “Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction,” Comput. Mater. Contin., vol. 75, no. 3, pp. 4821–4841, 2023. https://doi.org/10.32604/cmc.2023.034470

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

Abstract

Keywords

Cite This Article

1515

717

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link