An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

Bo Zhu; Xiaona Jing; Lan Qiu; Runbo Li

doi:10.32604/cmc.2024.048062

Open Access icon Open Access

ARTICLE

An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

Bo Zhu^*, Xiaona Jing, Lan Qiu, Runbo Li

College of Mechanical and Electrical Engineering, Kunming University of Science & Technology, Kunming, 650500, China

* Corresponding Author: Bo Zhu. Email: email

Computers, Materials & Continua 2024, 79(3), 3977-3999. https://doi.org/10.32604/cmc.2024.048062

Received 26 November 2023; Accepted 22 March 2024; Issue published 20 June 2024

Abstract

When building a classification model, the scenario where the samples of one class are significantly more than those of the other class is called data imbalance. Data imbalance causes the trained classification model to be in favor of the majority class (usually defined as the negative class), which may do harm to the accuracy of the minority class (usually defined as the positive class), and then lead to poor overall performance of the model. A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article, which is based on a new hybrid resampling approach (MSHR) and a new fine cost-sensitive support vector machine (CS-SVM) classifier (FCSSVM). The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples, based on which, the so-called pseudo-negative samples are screened out to generate new positive samples (over-sampling step) through linear interpolation and are deleted finally (under-sampling step). This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline, without changing the overall scale of the dataset. The FCSSVM is an improved version of the traditional CS-SVM. It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously, and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice (RIME) algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline. To verify the effectiveness of the proposed method, a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets. The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases, and both the MSHR and the FCSSVM played significant roles.

Keywords

Imbalanced data classification; Silhouette value; Mahalanobis distance; RIME algorithm; CS-SVM

Cite This Article

APA Style

Zhu, B., Jing, X., Qiu, L., Li, R. (2024). An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine. Computers, Materials & Continua, 79(3), 3977–3999. https://doi.org/10.32604/cmc.2024.048062

Vancouver Style

Zhu B, Jing X, Qiu L, Li R. An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine. Comput Mater Contin. 2024;79(3):3977–3999. https://doi.org/10.32604/cmc.2024.048062

IEEE Style

B. Zhu, X. Jing, L. Qiu, and R. Li, “An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3977–3999, 2024. https://doi.org/10.32604/cmc.2024.048062

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

Abstract

Keywords

Cite This Article

841

430

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link