Open Access
ARTICLE
A Novel Framework for Learning and Classifying the Imbalanced Multi-Label Data
1 Department of Computer Science and Engineering, SRM Institute of Science and Technology, Tiruchirappalli, Tamil Nadu, 603203, India
2 Department of Computer Science and Engineering, Periyar Maniammai Institute of Science & Technology (Deemed to be University), Thanjavur, Tamil Nadu, 613403, India
3 School of Computer Science and Engineering, Chennai, Tamil Nadu, 600048, India
4 Department of Applied Data Science, Noroff University College, Kristiansand, 4612, Norway
5 Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, P.O. Box 346, Ajman, United Arab Emirates
6 Department of Electrical and Computer Engineering, Lebanese American University, Byblos, 10150, Lebanon
7 Department of Software, Kongju National University, Cheonan, 31080, Republic of Korea
8 Division of Computer Engineering, Hansung University, Seoul, 02876, Republic of Korea
* Corresponding Author: Jungeun Kim. Email:
Computer Systems Science and Engineering 2024, 48(5), 1367-1385. https://doi.org/10.32604/csse.2023.034373
Received 15 July 2022; Accepted 14 December 2022; Issue published 13 September 2024
Abstract
A generalization of supervised single-label learning based on the assumption that each sample in a dataset may belong to more than one class simultaneously is called multi-label learning. The main objective of this work is to create a novel framework for learning and classifying imbalanced multi-label data. This work proposes a framework of two phases. The imbalanced distribution of the multi-label dataset is addressed through the proposed Borderline MLSMOTE resampling method in phase 1. Later, an adaptive weighted l21 norm regularized (Elastic-net) multi-label logistic regression is used to predict unseen samples in phase 2. The proposed Borderline MLSMOTE resampling method focuses on samples with concurrent high labels in contrast to conventional MLSMOTE. The minority labels in these samples are called difficult minority labels and are more prone to penalize classification performance. The concurrent measure is considered borderline, and labels associated with samples are regarded as borderline labels in the decision boundary. In phase II, a novel adaptive l21 norm regularized weighted multi-label logistic regression is used to handle balanced data with different weighted synthetic samples. Experimentation on various benchmark datasets shows the outperformance of the proposed method and its powerful predictive performances over existing conventional state-of-the-art multi-label methods.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.