Open Access
ARTICLE
FirmVulSeeker—BERT and Siamese Network-Based Vulnerability Search for Embedded Device Firmware Images
State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu, 214083, China
* Corresponding Author: Yingchao Yu. Email:
Journal on Internet of Things 2022, 4(1), 1-20. https://doi.org/10.32604/jiot.2022.019469
Received 15 November 2021; Accepted 15 February 2022; Issue published 16 May 2022
Abstract
In recent years, with the development of the natural language processing (NLP) technologies, security analyst began to use NLP directly on assembly codes which were disassembled from binary executables in order to examine binary similarity, achieved great progress. However, we found that the existing frameworks often ignored the complex internal structure of instructions and didn’t fully consider the long-term dependencies of instructions. In this paper, we propose firmVulSeeker—a vulnerability search tool for embedded firmware images, based on BERT and Siamese network. It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus. Then, a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage. Finally, it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it’s a real vulnerability manually. We evaluate the accuracy, robustness, scalability and vulnerability search capability of firmVulSeeker. Results show that it can greatly improve the accuracy of matching semantically similar functions, and can successfully find more real vulnerabilities in real-world firmware than other tools.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.