Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Jiaxin Shi; Lin Ye; Zhongwei Li; Dongyang Zhan

doi:10.32604/cmes.2022.017467

Open Access icon Open Access

ARTICLE

Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Jiaxin Shi¹, Lin Ye^1,2,*, Zhongwei Li³, Dongyang Zhan¹

1 School of Cyberspace Science, Harbin Institute of Technology, Harbin, 150001, China
2 Science and Technology on Communication Networks Laboratory, Shijiazhuang, 050081, China
3 School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, 150001, China

* Corresponding Author: Lin Ye. Email: email

(This article belongs to the Special Issue: Blockchain Security)

Computer Modeling in Engineering & Sciences 2022, 130(1), 483-498. https://doi.org/10.32604/cmes.2022.017467

Received 12 May 2021; Accepted 20 July 2021; Issue published 29 November 2021

Abstract

With the rapid development of the Internet, a large number of private protocols emerge on the network. However, some of them are constructed by attackers to avoid being analyzed, posing a threat to computer network security. The blockchain uses the P2P protocol to implement various functions across the network. Furthermore, the P2P protocol format of blockchain may differ from the standard format specification, which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them. Therefore, the ability to distinguish different types of unknown network protocols is vital for network security. In this paper, we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols, which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats. We mine the maximum frequent sequences of protocol message sets in bytes. And we calculate the fuzzy membership of the protocol message to each maximum frequent sequence, which is based on fuzzy set theory. Then we construct the fuzzy membership vector for each protocol message. Finally, we adopt K-means++ to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity, integrity, and Fowlkes and Mallows Index (FMI). Besides, the clustering algorithms based on Needleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper. Compared with these traditional clustering methods, we demonstrate a certain improvement in the clustering performance of our work.

Keywords

Binary protocol; blockchain; maximum frequent sequence; protocol message clustering; protocol reverse engineering

Cite This Article

APA Style

Shi, J., Ye, L., Li, Z., Zhan, D. (2022). Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns. Computer Modeling in Engineering & Sciences, 130(1), 483–498. https://doi.org/10.32604/cmes.2022.017467

Vancouver Style

Shi J, Ye L, Li Z, Zhan D. Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns. Comput Model Eng Sci. 2022;130(1):483–498. https://doi.org/10.32604/cmes.2022.017467

IEEE Style

J. Shi, L. Ye, Z. Li, and D. Zhan, “Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns,” Comput. Model. Eng. Sci., vol. 130, no. 1, pp. 483–498, 2022. https://doi.org/10.32604/cmes.2022.017467

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Abstract

Keywords

Cite This Article

2877

1867

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link