Open Access
ARTICLE
Machine Learning Empowered Software Defect Prediction System
1 College of Engineering, Al Ain University, Abu Dhabi, 112612, UAE
2 School of Computer Science, National College of Business Administration & Economics, Lahore, 54000, Pakistan
3 Department of Computer Science, Virtual University of Pakistan, Lahore, 54000, Pakistan
4 Riphah School of Computing & Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore, 54000, Pakistan
5 Pattern Recognition and Machine Learning Lab, Department of Software, Gachon University, Seongnam, 13557, Korea
6 School of Computer Science, Kean University, Union, NJ 07083, USA
7 Department of Computer Science, College of Science and Technology, Wenzhou Kean University, 325060, China
* Corresponding Author: Muhammad Adnan Khan. Email:
Intelligent Automation & Soft Computing 2022, 31(2), 1287-1300. https://doi.org/10.32604/iasc.2022.020362
Received 20 May 2021; Accepted 21 June 2021; Issue published 22 September 2021
Abstract
Production of high-quality software at lower cost has always been the main concern of developers. However, due to exponential increases in size and complexity, the development of qualitative software with lower costs is almost impossible. This issue can be resolved by identifying defects at the early stages of the development lifecycle. As a significant amount of resources are consumed in testing activities, if only those software modules are shortlisted for testing that is identified as defective, then the overall cost of development can be reduced with the assurance of high quality. An artificial neural network is considered as one of the extensively used machine-learning techniques for predicting defect-prone software modules. In this paper, a cloud-based framework for real-time software-defect prediction is presented. In the proposed framework, empirical analysis is performed to compare the performance of four training algorithms of the back-propagation technique on software-defect prediction: Bayesian regularization (BR), Scaled Conjugate Gradient, Broyden–Fletcher–Goldfarb–Shanno Quasi-Newton, and Levenberg-Marquardt algorithms. The proposed framework also includes a fuzzy layer to identify the best training function based on performance. Publicly available cleaned versions of NASA datasets are used in this study. Various measures are used for performance evaluation including specificity, precision, recall, F-measure, an area under the receiver operating characteristic curve, accuracy, R2, and mean-square error. Two graphical user interface tools are developed in MatLab software to implement the proposed framework. The first tool is developed for comparing training functions as well as for extracting the results; the second tool is developed for the selection of the best training function using fuzzy logic. A BR training algorithm is selected by the fuzzy layer as it outperformed the others in most of the performance measures. The accuracy of the BR training function is also compared with other widely used machine-learning techniques, from which it was found that the BR performed better among all training functions.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.