    Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction

    Yap Bee Wah1,5,*, Azlan Ismail1,2, Nur Niswah Naslina Azid3, Jafreezal Jaafar4, Izzatdin Abdul Aziz4, Mohd Hilmi Hasan4, Jasni Mohamad Zain1,2

    CMC-Computers, Materials & Continua, Vol.75, No.3, pp. 4821-4841, 2023, DOI:10.32604/cmc.2023.034470

    Abstract Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate. The common approach to handle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling, random oversampling, or Synthetic Minority Oversampling Technique (SMOTE) algorithms. This paper compared the classification performance of three popular classifiers (Logistic Regression, Gaussian Naïve Bayes, and Support Vector Machine) in predicting machine failure in the Oil and Gas industry. The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945 (97%) ‘non-failure’ and 528 (3%) ‘failure data’. The… More >

    Type 2 Diabetes Risk Prediction Using Deep Convolutional Neural Network Based-Bayesian Optimization

    Alawi Alqushaibi1,2,*, Mohd Hilmi Hasan1,2, Said Jadid Abdulkadir1,2, Amgad Muneer1,2, Mohammed Gamal1,2, Qasem Al-Tashi3, Shakirah Mohd Taib1,2, Hitham Alhussian1,2

    CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 3223-3238, 2023, DOI:10.32604/cmc.2023.035655

    Abstract Diabetes mellitus is a long-term condition characterized by hyperglycemia. It could lead to plenty of difficulties. According to rising morbidity in recent years, the world’s diabetic patients will exceed 642 million by 2040, implying that one out of every ten persons will be diabetic. There is no doubt that this startling figure requires immediate attention from industry and academia to promote innovation and growth in diabetes risk prediction to save individuals’ lives. Due to its rapid development, deep learning (DL) was used to predict numerous diseases. However, DL methods still suffer from their limited prediction performance due to the hyperparameters… More >

    BS-SC Model: A Novel Method for Predicting Child Abuse Using Borderline-SMOTE Enabled Stacking Classifier

    Saravanan Parthasarathy, Arun Raj Lakshminarayanan*

    Computer Systems Science and Engineering, Vol.46, No.2, pp. 1311-1336, 2023, DOI:10.32604/csse.2023.034910

    Abstract For a long time, legal entities have developed and used crime prediction methodologies. The techniques are frequently updated based on crime evaluations and responses from scientific communities. There is a need to develop type-based crime prediction methodologies that can be used to address issues at the subgroup level. Child maltreatment is not adequately addressed because children are voiceless. As a result, the possibility of developing a model for predicting child abuse was investigated in this study. Various exploratory analysis methods were used to examine the city of Chicago’s child abuse events. The data set was balanced using the Borderline-SMOTE technique,… More >

    Hybrid Grey Wolf and Dipper Throated Optimization in Network Intrusion Detection Systems

    Reem Alkanhel1,*, Doaa Sami Khafaga2, El-Sayed M. El-kenawy3, Abdelaziz A. Abdelhamid4,5, Abdelhameed Ibrahim6, Rashid Amin7, Mostafa Abotaleb8, B. M. El-den6

    CMC-Computers, Materials & Continua, Vol.74, No.2, pp. 2695-2709, 2023, DOI:10.32604/cmc.2023.033153

    Abstract The Internet of Things (IoT) is a modern approach that enables connection with a wide variety of devices remotely. Due to the resource constraints and open nature of IoT nodes, the routing protocol for low power and lossy (RPL) networks may be vulnerable to several routing attacks. That’s why a network intrusion detection system (NIDS) is needed to guard against routing assaults on RPL-based IoT networks. The imbalance between the false and valid attacks in the training set degrades the performance of machine learning employed to detect network attacks. Therefore, we propose in this paper a novel approach to balance… More >

    MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data

    Jyoti Arora1, Meena Tushir2, Keshav Sharma1, Lalit Mohan1, Aman Singh3,*, Abdullah Alharbi4, Wael Alosaimi4

    CMC-Computers, Materials & Continua, Vol.73, No.3, pp. 4801-4817, 2022, DOI:10.32604/cmc.2022.025960

    Abstract Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms. In supervised learning, dealing with the problem of class imbalance is still considered to be a challenging research problem. Various machine learning techniques are designed to operate on balanced datasets; therefore, the state of the art, different under-sampling, over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets, but highly skewed datasets still pose the problem of generalization and noise generation during resampling. To over-come these problems, this paper proposes a majority clustering model for classification of imbalanced datasets known… More >

    Water Quality Index Using Modified Random Forest Technique: Assessing Novel Input Features

    Wen Yee Wong1, Ayman Khallel Ibrahim Al-Ani1, Khairunnisa Hasikin1,*, Anis Salwa Mohd Khairuddin2, Sarah Abdul Razak3, Hanee Farzana Hizaddin4, Mohd Istajib Mokhtar5, Muhammad Mokhzaini Azizan6

    CMES-Computer Modeling in Engineering & Sciences, Vol.132, No.3, pp. 1011-1038, 2022, DOI:10.32604/cmes.2022.019244

    Abstract Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3-N), and suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal–polluted river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new discriminant features… More >

    An Imbalanced Dataset and Class Overlapping Classification Model for Big Data

    Mini Prince1,*, P. M. Joe Prathap2

    Computer Systems Science and Engineering, Vol.44, No.2, pp. 1009-1024, 2023, DOI:10.32604/csse.2023.024277

    Abstract Most modern technologies, such as social media, smart cities, and the internet of things (IoT), rely on big data. When big data is used in the real-world applications, two data challenges such as class overlap and class imbalance arises. When dealing with large datasets, most traditional classifiers are stuck in the local optimum problem. As a result, it’s necessary to look into new methods for dealing with large data collections. Several solutions have been proposed for overcoming this issue. The rapid growth of the available data threatens to limit the usefulness of many traditional methods. Methods such as oversampling and… More >

    Hyper-Parameter Optimization of Semi-Supervised GANs Based-Sine Cosine Algorithm for Multimedia Datasets

    Anas Al-Ragehi1, Said Jadid Abdulkadir1,2,*, Amgad Muneer1,2, Safwan Sadeq3, Qasem Al-Tashi4,5

    CMC-Computers, Materials & Continua, Vol.73, No.1, pp. 2169-2186, 2022, DOI:10.32604/cmc.2022.027885

    Abstract Generative Adversarial Networks (GANs) are neural networks that allow models to learn deep representations without requiring a large amount of training data. Semi-Supervised GAN Classifiers are a recent innovation in GANs, where GANs are used to classify generated images into real and fake and multiple classes, similar to a general multi-class classifier. However, GANs have a sophisticated design that can be challenging to train. This is because obtaining the proper set of parameters for all models-generator, discriminator, and classifier is complex. As a result, training a single GAN model for different datasets may not produce satisfactory results. Therefore, this study… More >

    SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification

    Mohd Anul Haq*

    CMC-Computers, Materials & Continua, Vol.71, No.1, pp. 1403-1425, 2022, DOI:10.32604/cmc.2022.021968

    Abstract Rapid industrialization and urbanization are rapidly deteriorating ambient air quality, especially in the developing nations. Air pollutants impose a high risk on human health and degrade the environment as well. Earlier studies have used machine learning (ML) and statistical modeling to classify and forecast air pollution. However, these methods suffer from the complexity of air pollution dataset resulting in a lack of efficient classification and forecasting of air pollution. ML-based models suffer from improper data pre-processing, class imbalance issues, data splitting, and hyperparameter tuning. There is a gap in the existing ML-based studies on air pollution due to improper data… More >

    Improving Routine Immunization Coverage Through Optimally Designed Predictive Models

    Fareeha Sameen1, Abdul Momin Kazi2, Majida Kazmi1,*, Munir A Abbasi3, Saad Ahmed Qazi1,4, Lampros K Stergioulas3,5

    CMC-Computers, Materials & Continua, Vol.70, No.1, pp. 375-395, 2022, DOI:10.32604/cmc.2022.019167

    Abstract Routine immunization (RI) of children is the most effective and timely public health intervention for decreasing child mortality rates around the globe. Pakistan being a low-and-middle-income-country (LMIC) has one of the highest child mortality rates in the world occurring mainly due to vaccine-preventable diseases (VPDs). For improving RI coverage, a critical need is to establish potential RI defaulters at an early stage, so that appropriate interventions can be targeted towards such population who are identified to be at risk of missing on their scheduled vaccine uptakes. In this paper, a machine learning (ML) based predictive model has been proposed to… More >

