An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification

J. Pavithra; S. Selvakumarasamy

doi:10.32604/jcs.2022.031889

Open Access icon Open Access

ARTICLE

An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification

J. Pavithra^*, S. Selvakumarasamy

SRM Institute of Science and Technology, Kattankulathur, 603203, India

* Corresponding Author: J. Pavithra. Email: email

Journal of Cyber Security 2022, 4(3), 135-151. https://doi.org/10.32604/jcs.2022.031889

Received 01 September 2022; Accepted 02 October 2022; Issue published 01 February 2023

Abstract

Machine learning (ML) is often used to solve the problem of malware detection and classification, and various machine learning approaches are adapted to the problem of malware classification; still acquiring poor performance by the way of feature selection, and classification. To address the problem, an efficient novel algorithm for adaptive feature-centered XG Boost Ensemble Learner Classifier “AFC-XG Boost” is presented in this paper. The proposed model has been designed to handle varying data sets of malware detection obtained from Kaggle data set. The model turns the XG Boost classifier in several stages to optimize performance. At preprocessing stage, the data set given has been noise removed, normalized and tamper removed using Feature Base Optimizer “FBO” algorithm. The FBO would normalize the data points, as well as perform noise removal according to the feature values and their base information. Similarly, the performance of standard XG Boost has been optimized by adapting the selection using Class Based Principle Component Analysis “CBPCA” algorithm, which performs the selection according to the fitness of any feature for different classes. Based on the selected features, the method generates a regression tree for each feature considered. Based on the generated trees, the method performs classification by computing the tree-level ensemble similarity ‘TLES’ and the class-level ensemble similarity ‘CLES’. Using both methods calculates the value of the class match similarity ‘CMS’ based on which the malware has been classified. The proposed approach achieves 97% accuracy in malware detection and classification with the less time complexity of 34 s for 75000 samples.

Keywords

Malware detection; machine learning; XGBoost; PCA; ensemble learner; CBPCA; CMS; AFC-XGBoost

Cite This Article

APA Style

Pavithra, J., Selvakumarasamy, S. (2022). An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification. Journal of Cyber Security, 4(3), 135–151. https://doi.org/10.32604/jcs.2022.031889

Vancouver Style

Pavithra J, Selvakumarasamy S. An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification. J Cyber Secur. 2022;4(3):135–151. https://doi.org/10.32604/jcs.2022.031889

IEEE Style

J. Pavithra and S. Selvakumarasamy, “An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification,” J. Cyber Secur., vol. 4, no. 3, pp. 135–151, 2022. https://doi.org/10.32604/jcs.2022.031889

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

An Adaptive-Feature Centric XGBoost Ensemble Classifier Model for Improved Malware Detection and Classification

Abstract

Keywords

Cite This Article

1558

850

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link