Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (25)
  • Open Access

    ARTICLE

    Data-Driven Decision-Making for Bank Target Marketing Using Supervised Learning Classifiers on Imbalanced Big Data

    Fahim Nasir1, Abdulghani Ali Ahmed1,*, Mehmet Sabir Kiraz1, Iryna Yevseyeva1, Mubarak Saif2

    CMC-Computers, Materials & Continua, Vol.81, No.1, pp. 1703-1728, 2024, DOI:10.32604/cmc.2024.055192 - 15 October 2024

    Abstract Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making. However, imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics, limiting their overall effectiveness. This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers (SLCs) and evaluates their performance in data-driven decision-making. The evaluation uses various metrics, with a particular focus on the Harmonic Mean Score (F-1 score) on an imbalanced real-world bank target marketing dataset. The findings indicate… More >

  • Open Access

    ARTICLE

    A Novel Framework for Learning and Classifying the Imbalanced Multi-Label Data

    P. K. A. Chitra1, S. Appavu alias Balamurugan2, S. Geetha3, Seifedine Kadry4,5,6, Jungeun Kim7,*, Keejun Han8

    Computer Systems Science and Engineering, Vol.48, No.5, pp. 1367-1385, 2024, DOI:10.32604/csse.2023.034373 - 13 September 2024

    Abstract A generalization of supervised single-label learning based on the assumption that each sample in a dataset may belong to more than one class simultaneously is called multi-label learning. The main objective of this work is to create a novel framework for learning and classifying imbalanced multi-label data. This work proposes a framework of two phases. The imbalanced distribution of the multi-label dataset is addressed through the proposed Borderline MLSMOTE resampling method in phase 1. Later, an adaptive weighted l21 norm regularized (Elastic-net) multi-label logistic regression is used to predict unseen samples in phase 2. The proposed… More >

  • Open Access

    ARTICLE

    Cost-Sensitive Dual-Stream Residual Networks for Imbalanced Classification

    Congcong Ma1,2, Jiaqi Mi1, Wanlin Gao1,2, Sha Tao1,2,*

    CMC-Computers, Materials & Continua, Vol.80, No.3, pp. 4243-4261, 2024, DOI:10.32604/cmc.2024.054506 - 12 September 2024

    Abstract Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes. This task is prevalent in practical scenarios such as industrial fault diagnosis, network intrusion detection, cancer detection, etc. In imbalanced classification tasks, the focus is typically on achieving high recognition accuracy for the minority class. However, due to the challenges presented by imbalanced multi-class datasets, such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries, existing methods often do not perform well in multi-class imbalanced data… More >

  • Open Access

    ARTICLE

    Learning Vector Quantization-Based Fuzzy Rules Oversampling Method

    Jiqiang Chen, Ranran Han, Dongqing Zhang, Litao Ma*

    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5067-5082, 2024, DOI:10.32604/cmc.2024.051494 - 20 June 2024

    Abstract Imbalanced datasets are common in practical applications, and oversampling methods using fuzzy rules have been shown to enhance the classification performance of imbalanced data by taking into account the relationship between data attributes. However, the creation of fuzzy rules typically depends on expert knowledge, which may not fully leverage the label information in training data and may be subjective. To address this issue, a novel fuzzy rule oversampling approach is developed based on the learning vector quantization (LVQ) algorithm. In this method, the label information of the training data is utilized to determine the antecedent… More >

  • Open Access

    ARTICLE

    An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine

    Bo Zhu*, Xiaona Jing, Lan Qiu, Runbo Li

    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 3977-3999, 2024, DOI:10.32604/cmc.2024.048062 - 20 June 2024

    Abstract When building a classification model, the scenario where the samples of one class are significantly more than those of the other class is called data imbalance. Data imbalance causes the trained classification model to be in favor of the majority class (usually defined as the negative class), which may do harm to the accuracy of the minority class (usually defined as the positive class), and then lead to poor overall performance of the model. A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article, which is based on a new hybrid… More >

  • Open Access

    ARTICLE

    A Stacked Ensemble Deep Learning Approach for Imbalanced Multi-Class Water Quality Index Prediction

    Wen Yee Wong1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6

    CMC-Computers, Materials & Continua, Vol.76, No.2, pp. 1361-1384, 2023, DOI:10.32604/cmc.2023.038045 - 30 August 2023

    Abstract A common difficulty in building prediction models with realworld environmental datasets is the skewed distribution of classes. There are significantly more samples for day-to-day classes, while rare events such as polluted classes are uncommon. Consequently, the limited availability of minority outcomes lowers the classifier’s overall reliability. This study assesses the capability of machine learning (ML) algorithms in tackling imbalanced water quality data based on the metrics of precision, recall, and F1 score. It intends to balance the misled accuracy towards the majority of data. Hence, 10 ML algorithms of its performance are compared. The classifiers… More >

  • Open Access

    ARTICLE

    Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

    Yap Bee Wah1,5,*, Azlan Ismail1,2, Nur Niswah Naslina Azid3, Jafreezal Jaafar4, Izzatdin Abdul Aziz4, Mohd Hilmi Hasan4, Jasni Mohamad Zain1,2

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4821-4841, 2023, DOI:10.32604/cmc.2023.034470 - 29 April 2023

    Abstract Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate. The common approach to handle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling, random oversampling, or Synthetic Minority Oversampling Technique (SMOTE) algorithms. This paper compared the classification performance of three popular classifiers (Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine) in predicting machine failure in the Oil and Gas industry. The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945 (97%) ‘non-failure’ and… More >

  • Open Access

    ARTICLE

    Fault Diagnosis of Power Transformer Based on Improved ACGAN Under Imbalanced Data

    Tusongjiang. Kari1, Lin Du1, Aisikaer. Rouzi2, Xiaojing Ma1,*, Zhichao Liu1, Bo Li1

    CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 4573-4592, 2023, DOI:10.32604/cmc.2023.037954 - 31 March 2023

    Abstract The imbalance of dissolved gas analysis (DGA) data will lead to over-fitting, weak generalization and poor recognition performance for fault diagnosis models based on deep learning. To handle this problem, a novel transformer fault diagnosis method based on improved auxiliary classifier generative adversarial network (ACGAN) under imbalanced data is proposed in this paper, which meets both the requirements of balancing DGA data and supplying accurate diagnosis results. The generator combines one-dimensional convolutional neural networks (1D-CNN) and long short-term memories (LSTM), which can deeply extract the features from DGA samples and be greatly beneficial to ACGAN’s… More >

  • Open Access

    ARTICLE

    Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction

    Hussein Ibrahim Hussein1, Said Amirul Anwar2,*, Muhammad Imran Ahmad2

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 547-564, 2023, DOI:10.32604/cmc.2023.036025 - 06 February 2023

    Abstract Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differences in the number of data samples between its classes. In most cases, the performance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples. In this paper, a hybrid approach combining data pre-processing technique and SVM algorithm based on improved Simulated Annealing (SA) was proposed. Firstly,… More >

  • Open Access

    ARTICLE

    LexDeep: Hybrid Lexicon and Deep Learning Sentiment Analysis Using Twitter for Unemployment-Related Discussions During COVID-19

    Azlinah Mohamed1,3,*, Zuhaira Muhammad Zain2, Hadil Shaiba2,*, Nazik Alturki2, Ghadah Aldehim2, Sapiah Sakri2, Saiful Farik Mat Yatin1, Jasni Mohamad Zain1

    CMC-Computers, Materials & Continua, Vol.75, No.1, pp. 1577-1601, 2023, DOI:10.32604/cmc.2023.034746 - 06 February 2023

    Abstract The COVID-19 pandemic has spread globally, resulting in financial instability in many countries and reductions in the per capita gross domestic product. Sentiment analysis is a cost-effective method for acquiring sentiments based on household income loss, as expressed on social media. However, limited research has been conducted in this domain using the LexDeep approach. This study aimed to explore social trend analytics using LexDeep, which is a hybrid sentiment analysis technique, on Twitter to capture the risk of household income loss during the COVID-19 pandemic. First, tweet data were collected using Twint with relevant keywords… More >

Displaying 1-10 on page 1 of 25. Per Page