Open Access
ARTICLE
HCRVD: A Vulnerability Detection System Based on CST-PDG Hierarchical Code Representation Learning
School of Cyberspace Security, Information Engineering University, Zhengzhou, 450001, China
* Corresponding Author: Zheng Shan. Email:
Computers, Materials & Continua 2024, 79(3), 4573-4601. https://doi.org/10.32604/cmc.2024.049310
Received 03 January 2024; Accepted 12 April 2024; Issue published 20 June 2024
Abstract
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations. However, due to limitations in code representation and neural network design, the validity and practicality of the model still need to be improved. Additionally, due to differences in programming languages, most methods lack cross-language detection generality. To address these issues, in this paper, we analyze the shortcomings of previous code representations and neural networks. We propose a novel hierarchical code representation that combines Concrete Syntax Trees (CST) with Program Dependence Graphs (PDG). Furthermore, we introduce a Tree-Graph-Gated-Attention (TGGA) network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection (HCRVD) system. This system enables cross-language vulnerability detection at the function-level. The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities. It benefits from the hierarchical code representation learning method, and outperforms baseline in cross-language vulnerability detection by 9.772% and 11.819% in the C/C++ and Java datasets, respectively. Moreover, HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects. HCRVD shows good validity, generality and practicality.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.