Oversampling Method Based on Gaussian Distribution and K-Means Clustering

Masoud Hassan; Adel Eesa; Ahmed Mohammed; Wahab Arabo

doi:10.32604/cmc.2021.018280

Open Access icon Open Access

ARTICLE

Oversampling Method Based on Gaussian Distribution and K-Means Clustering

Masoud Muhammed Hassan¹, Adel Sabry Eesa^1,*, Ahmed Jameel Mohammed², Wahab Kh. Arabo¹

1 Department of Computer Science, University of Zakho, Duhok, 42001, Kurdistan Region, Iraq
2 Department of Information Technology, Duhok Polytechnic University, Duhok, 42001, Kurdistan Region, Iraq

* Corresponding Author: Adel Sabry Eesa. Email: email

Computers, Materials & Continua 2021, 69(1), 451-469. https://doi.org/10.32604/cmc.2021.018280

Received 01 March 2021; Accepted 03 April 2021; Issue published 04 June 2021

Abstract

Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognize data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.

Keywords

Class imbalance; oversampling; gaussian; multivariate distribution; k-means clustering

Cite This Article

APA Style

Hassan, M.M., Eesa, A.S., Mohammed, A.J., Arabo, W.K. (2021). Oversampling Method Based on Gaussian Distribution and K-Means Clustering. Computers, Materials & Continua, 69(1), 451–469. https://doi.org/10.32604/cmc.2021.018280

Vancouver Style

Hassan MM, Eesa AS, Mohammed AJ, Arabo WK. Oversampling Method Based on Gaussian Distribution and K-Means Clustering. Comput Mater Contin. 2021;69(1):451–469. https://doi.org/10.32604/cmc.2021.018280

IEEE Style

M. M. Hassan, A. S. Eesa, A. J. Mohammed, and W. K. Arabo, “Oversampling Method Based on Gaussian Distribution and K-Means Clustering,” Comput. Mater. Contin., vol. 69, no. 1, pp. 451–469, 2021. https://doi.org/10.32604/cmc.2021.018280

BibTex EndNote RIS

Copyright © 2021 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Oversampling Method Based on Gaussian Distribution and K-Means Clustering

Abstract

Keywords

Cite This Article

3526

2331

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link