Open Access
ARTICLE
Improving Performance Prediction on Education Data with Noise and Class Imbalance
a Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey;
b Department of Information Technology, University College of Applied Sciences, Gaza, Palestine;
c tazi.io Machine Learning Solutions, Istanbul, Turkey
* Corresponding Author: Akram M. Radwan,
Intelligent Automation & Soft Computing 2018, 24(4), 777-783. https://doi.org/10.1080/10798587.2017.1337673
Abstract
This paper proposes to apply machine learning techniques to predict students’ performance on two real-world educational data-sets. The first data-set is used to predict the response of students with autism while they learn a specific task, whereas the second one is used to predict students’ failure at a secondary school. The two data-sets suffer from two major problems that can negatively impact the ability of classification models to predict the correct label; class imbalance and class noise. A series of experiments have been carried out to improve the quality of training data, and hence improve prediction results. In this paper, we propose two noise filter methods to eliminate the noisy instances from the majority class located inside the borderline area. Our methods combine the over-sampling SMOTE technique with the thresholding technique to balance the training data and choose the best boundary between classes. Then we apply a noise detection approach to identify the noisy instances. We have used the two data-sets to assess the efficacy of class-imbalance approaches as well as both proposed methods. Results for different classifiers show that, the AUC scores significantly improved when the two proposed methods combined with existing class-imbalance techniques.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.