Home / Journals / CMC / Online First / doi:10.32604/cmc.2024.059018
Special Issues
Table of Content

Open Access

ARTICLE

LEGF-DST: LLMs-Enhanced Graph-Fusion Dual-Stream Transformer for Fine-Grained Chinese Malicious SMS Detection

Xin Tong1, Jingya Wang1,*, Ying Yang2, Tian Peng3, Hanming Zhai1, Guangming Ling4
1 School of Information and Cybersecurity, People’s Public Security University of China, Beijing, 100038, China
2 Cyber Investigation Technology Research and Development Center, The Third Research Institute of the Ministry of Public Security, Shanghai, 201204, China
3 Department of Cybersecurity Defense, Beijing Police College, Beijing, 102202, China
4 School of Computer Science, Henan Institute of Engineering, Zhengzhou, 451191, China
* Corresponding Author: Jingya Wang. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2024.059018

Received 26 September 2024; Accepted 20 November 2024; Published online 20 December 2024

Abstract

With the widespread use of SMS (Short Message Service), the proliferation of malicious SMS has emerged as a pressing societal issue. While deep learning-based text classifiers offer promise, they often exhibit suboptimal performance in fine-grained detection tasks, primarily due to imbalanced datasets and insufficient model representation capabilities. To address this challenge, this paper proposes an LLMs-enhanced graph fusion dual-stream Transformer model for fine-grained Chinese malicious SMS detection. During the data processing stage, Large Language Models (LLMs) are employed for data augmentation, mitigating dataset imbalance. In the data input stage, both word-level and character-level features are utilized as model inputs, enhancing the richness of features and preventing information loss. A dual-stream Transformer serves as the backbone network in the learning representation stage, complemented by a graph-based feature fusion mechanism. At the output stage, both supervised classification cross-entropy loss and supervised contrastive learning loss are used as multi-task optimization objectives, further enhancing the model’s feature representation. Experimental results demonstrate that the proposed method significantly outperforms baselines on a publicly available Chinese malicious SMS dataset.

Keywords

Transformers; malicious SMS; multi-task learning; large language models
  • 36

    View

  • 8

    Download

  • 0

    Like

Share Link