Open Access
ARTICLE
Detecting Domain Generation Algorithms with Bi-LSTM
1 School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, China.
2 Department of Computer Science, University of Texas at Dallas, Campbell Rd, Richardson, Texas, 75080, USA.
* Corresponding Authors: Liang Ding. Email: liangding@hfut.edu.cn;
Yuqi Fan. Email: yuqi.fan@utdallas.edu.
Computers, Materials & Continua 2019, 61(3), 1285-1304. https://doi.org/10.32604/cmc.2019.06160
Abstract
Botnets often use domain generation algorithms (DGA) to connect to a command and control (C2) server, which enables the compromised hosts connect to the C2 server for accessing many domains. The detection of DGA domains is critical for blocking the C2 server, and for identifying the compromised hosts as well. However, the detection is difficult, because some DGA domain names look normal. Much of the previous work based on statistical analysis of machine learning relies on manual features and contextual information, which causes long response time and cannot be used for real-time detection. In addition, when a new family of DGA appears, the classifier has to be re-trained from the very beginning. This paper presents a deep learning approach based on bidirectional long short-term memory (Bi-LSTM) model for DGA domain detection. The classifier can extract features without the need for manual feature extraction, and the trainable model can effectively deal with new unknown DGA family members. In addition, the proposed model only needs the domain name without any additional context information. All domain names are preprocessed by bigram and the length of each processed domain name is set as a value longer than the most samples. Bidirectional LSTM model receives the encoded data and returns labels to check whether domain names are normal or not. Experiments show that our model outperforms state-of-the-art approaches and is able to detect new DGA families reliably.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.