Open Access
ARTICLE
Outsmarting Android Malware with Cutting-Edge Feature Engineering and Machine Learning Techniques
1 Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
2 Artificial Intelligence and Data Analytics (AIDA) Lab, CCIS Prince Sultan University, Riyadh, 11586, Saudi Arabia
3 Faculty of Information Sciences, University of Education, Vehari Campus, Vehari, 61100, Pakistan
4 Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, Riyadh, 84428, Saudi Arabia
* Corresponding Author: Faten S. Alamri. Email:
Computers, Materials & Continua 2024, 79(1), 651-673. https://doi.org/10.32604/cmc.2024.047530
Received 08 November 2023; Accepted 11 February 2024; Issue published 25 April 2024
Abstract
The growing usage of Android smartphones has led to a significant rise in incidents of Android malware and privacy breaches. This escalating security concern necessitates the development of advanced technologies capable of automatically detecting and mitigating malicious activities in Android applications (apps). Such technologies are crucial for safeguarding user data and maintaining the integrity of mobile devices in an increasingly digital world. Current methods employed to detect sensitive data leaks in Android apps are hampered by two major limitations they require substantial computational resources and are prone to a high frequency of false positives. This means that while attempting to identify security breaches, these methods often consume considerable processing power and mistakenly flag benign activities as malicious, leading to inefficiencies and reduced reliability in malware detection. The proposed approach includes a data preprocessing step that removes duplicate samples, manages unbalanced datasets, corrects inconsistencies, and imputes missing values to ensure data accuracy. The Minimax method is then used to normalize numerical data, followed by feature vector extraction using the Gain ratio and Chi-squared test to identify and extract the most significant characteristics using an appropriate prediction model. This study focuses on extracting a subset of attributes best suited for the task and recommending a predictive model based on domain expert opinion. The proposed method is evaluated using Drebin and TUANDROMD datasets containing 15,036 and 4,464 benign and malicious samples, respectively. The empirical result shows that the Random Forest (RF) and Support Vector Machine (SVC) classifiers achieved impressive accuracy rates of 98.9% and 98.8%, respectively, in detecting unknown Android malware. A sensitivity analysis experiment was also carried out on all three ML-based classifiers based on MAE, MSE, R2, and sensitivity parameters, resulting in a flawless performance for both datasets. This approach has substantial potential for real-world applications and can serve as a valuable tool for preventing the spread of Android malware and enhancing mobile device security.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.