Open Access iconOpen Access

ARTICLE

crossmark

A Learning-based Static Malware Detection System with Integrated Feature

Zhiguo Chen1,*, Xiaorui Zhang1,2, Sungryul Kim3

1 School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
2 Jiangsu Engineering Center of Network Monitoring, Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing, 210044, China
3 Department of Internet and Multimedia Engineering, Konkuk University, Seoul, 05029, Korea

* Corresponding Author: Zhiguo Chen. Email: email

Intelligent Automation & Soft Computing 2021, 27(3), 891-908. https://doi.org/10.32604/iasc.2021.016933

Abstract

The rapid growth of malware poses a significant threat to the security of computer systems. Analysts now need to examine thousands of malware samples daily. It has become a challenging task to determine whether a program is a benign program or malware. Making accurate decisions about the program is crucial for anti-malware products. Precise malware detection techniques have become a popular issue in computer security. Traditional malware detection uses signature-based strategies, which are the most widespread method used in commercial anti-malware software. This method works well against known malware but cannot detect new malware. To overcome the deficiency of the signature-based approach, we proposed a static malware detection system using data mining techniques to identify known and unknown malware by comparing the malware and benign programs’ profiles with real-time response with low false-positive ratio. The proposed system includes a sample labeling module, a feature extraction module, a pre-processing module, and a decision module. The sample labeling module used the VirusTotal to correctly label the collected samples. The feature extraction module statically extracts a set of header information, section entropy, APIs, and section opcode n-grams. The pre-processing module is primarily based on the PCA algorithm used to reduce the dimensionality of the features, thus reducing the overhead costs of computation. The decision module uses various machine-learning algorithms such as K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Decision Tree (GBDT), and Extreme Gradient Boosting (XGBoost) to build the detection model for judging whether the program is a benign program or malware. The experimental results indicate our proposed system can achieve 99.56% detection accuracy and 99.55% f1-score on the extracted 79 features using the XGBoost algorithm, and it has the potential for real-time large-scale malware detection tasks.

Keywords


Cite This Article

Z. Chen, X. Zhang and S. Kim, "A learning-based static malware detection system with integrated feature," Intelligent Automation & Soft Computing, vol. 27, no.3, pp. 891–908, 2021.

Citations




cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1800

    View

  • 1306

    Download

  • 1

    Like

Share Link