Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus

Murat Dener; Sercan Gulburun

doi:10.32604/cmc.2023.036357

Open Access icon Open Access

ARTICLE

Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus

Murat Dener^*, Sercan Gulburun

Information Security Engineering, Graduate School of Natural and Applied Sciences, Gazi University, Ankara, 06560, Turkey

* Corresponding Author: Murat Dener. Email: email

Computers, Materials & Continua 2023, 75(1), 1235-1251. https://doi.org/10.32604/cmc.2023.036357

Received 27 September 2022; Accepted 30 November 2022; Issue published 06 February 2023

Abstract

One of the most common types of threats to the digital world is malicious software. It is of great importance to detect and prevent existing and new malware before it damages information assets. Machine learning approaches are used effectively for this purpose. In this study, we present a model in which supervised and unsupervised learning algorithms are used together. Clustering is used to enhance the prediction performance of the supervised classifiers. The aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 score. In the first stage of the model, the data are clustered with the k-means algorithm. In the second stage, the prediction is made with the combination of the classifier with the best prediction performance for the related cluster. While choosing the best classifiers for the given clusters, triple combinations of ten machine learning algorithms (kernel support vector machine, k-nearest neighbor, naïve Bayes, decision tree, random forest, extra gradient boosting, categorical boosting, adaptive boosting, extra trees, and gradient boosting) are used. The selected triple classifier combination is positioned in two stages. The prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second stage. The selected triple classifier combination is positioned in two tiers. The prediction time of the model is improved by positioning the classifier with the highest prediction time in the second tier. It is seen that clustering before classification improves prediction performance, which is presented using Blue Hexagon Open Dataset for Malware Analysis (BODMAS), Elastic Malware Benchmark for Empowering Researchers (EMBER) 2018 and Kaggle malware detection datasets. The model has 99.74% accuracy and 99.77% f1 score for the BODMAS dataset, 99.04% accuracy and 98.63% f1 score for the Kaggle malware detection dataset, and 96.77% accuracy and 96.77% f1 score for the EMBER 2018 dataset. In addition, the tiered positioning of classifiers shortened the average prediction time by 76.13% for the BODMAS dataset and 95.95% for the EMBER 2018 dataset. The proposed method’s prediction performance is better than the rest of the studies in the literature in which BODMAS and EMBER 2018 datasets are used.

Keywords

Malware detection; ensemble learning; classification; clustering; specialized classifier; early consensus

Cite This Article

APA Style

Dener, M., Gulburun, S. (2023). Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus. Computers, Materials & Continua, 75(1), 1235–1251. https://doi.org/10.32604/cmc.2023.036357

Vancouver Style

Dener M, Gulburun S. Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus. Comput Mater Contin. 2023;75(1):1235–1251. https://doi.org/10.32604/cmc.2023.036357

IEEE Style

M. Dener and S. Gulburun, “Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus,” Comput. Mater. Contin., vol. 75, no. 1, pp. 1235–1251, 2023. https://doi.org/10.32604/cmc.2023.036357

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus

Abstract

Keywords

Cite This Article

1698

813

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link