Open Access
ARTICLE
Deep Learning-Based Program-Wide Binary Code Similarity for Smart Contracts
1 Harbin Engineering University, Harbin, 150000, China
2 University of Sanya, Sanya, 572000, China
3 University of Alberta, Edmonton, T5J4P6, Canada
* Corresponding Author: Jianguo Sun. Email:
Computers, Materials & Continua 2023, 74(1), 1011-1024. https://doi.org/10.32604/cmc.2023.028058
Received 01 February 2022; Accepted 05 May 2022; Issue published 22 September 2022
Abstract
Recently, security issues of smart contracts are arising great attention due to the enormous financial loss caused by vulnerability attacks. There is an increasing need to detect similar codes for hunting vulnerability with the increase of critical security issues in smart contracts. Binary similarity detection that quantitatively measures the given code diffing has been widely adopted to facilitate critical security analysis. However, due to the difference between common programs and smart contract, such as diversity of bytecode generation and highly code homogeneity, directly adopting existing graph matching and machine learning based techniques to smart contracts suffers from low accuracy, poor scalability and the limitation of binary similarity on function level. Therefore, this paper investigates graph neural network to detect smart contract binary code similarity at the program level, where we conduct instruction-level normalization to reduce the noise code for smart contract pre-processing and construct contract control flow graphs to represent smart contracts. In particular, two improved Graph Convolutional Network (GCN) and Message Passing Neural Network (MPNN) models are explored to encode the contract graphs into quantitatively vectors, which can capture the semantic information and the program-wide control flow information with temporal orders. Then we can efficiently accomplish the similarity detection by measuring the distance between two targeted contract embeddings. To evaluate the effectiveness and efficient of our proposed method, extensive experiments are performed on two real-world datasets, i.e., smart contracts from Ethereum and Enterprise Operation System (EOS) blockchain-based platforms. The results show that our proposed approach outperforms three state-of-the-art methods by a large margin, achieving a great improvement up to 6.1% and 17.06% in accuracy.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.