Open Access
ARTICLE
Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns
1 School of Cyberspace Science, Harbin Institute of Technology, Harbin, 150001, China
2 Science and Technology on Communication Networks Laboratory, Shijiazhuang, 050081, China
3 School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, 150001, China
* Corresponding Author: Lin Ye. Email:
(This article belongs to the Special Issue: Blockchain Security)
Computer Modeling in Engineering & Sciences 2022, 130(1), 483-498. https://doi.org/10.32604/cmes.2022.017467
Received 12 May 2021; Accepted 20 July 2021; Issue published 29 November 2021
Abstract
With the rapid development of the Internet, a large number of private protocols emerge on the network. However, some of them are constructed by attackers to avoid being analyzed, posing a threat to computer network security. The blockchain uses the P2P protocol to implement various functions across the network. Furthermore, the P2P protocol format of blockchain may differ from the standard format specification, which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them. Therefore, the ability to distinguish different types of unknown network protocols is vital for network security. In this paper, we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols, which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats. We mine the maximum frequent sequences of protocol message sets in bytes. And we calculate the fuzzy membership of the protocol message to each maximum frequent sequence, which is based on fuzzy set theory. Then we construct the fuzzy membership vector for each protocol message. Finally, we adopt K-means++ to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity, integrity, and Fowlkes and Mallows Index (FMI). Besides, the clustering algorithms based on Needleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper. Compared with these traditional clustering methods, we demonstrate a certain improvement in the clustering performance of our work.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.