Leveraging Transformers for Detection of Arabic Cyberbullying on Social Media: Hybrid Arabic Transformers
Amjad A. Alsuwaylimi1,*, Zaid S. Alenezi2
1 Department of Computer Science, College of Science, Northern Border University, Arar, 91431, Saudi Arabia
2 Information Technology Management, Northern Border University, Arar, 91431, Saudi Arabia
* Corresponding Author: Amjad A. Alsuwaylimi. Email:
Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.061674
Received 30 November 2024; Accepted 20 February 2025; Published online 25 March 2025
Abstract
Cyberbullying is a remarkable issue in the Arabic-speaking world, affecting children, organizations, and businesses. Various efforts have been made to combat this problem through proposed models using machine learning (ML) and deep learning (DL) approaches utilizing natural language processing (NLP) methods and by proposing relevant datasets. However, most of these endeavors focused predominantly on the English language, leaving a substantial gap in addressing Arabic cyberbullying. Given the complexities of the Arabic language, transfer learning techniques and transformers present a promising approach to enhance the detection and classification of abusive content by leveraging large and pretrained models that use a large dataset. Therefore, this study proposes a hybrid model using transformers trained on extensive Arabic datasets. It then fine-tunes the hybrid model on a newly curated Arabic cyberbullying dataset collected from social media platforms, in particular Twitter. Additionally, the following two hybrid transformer models are introduced: the first combines CAmelid Morphologically-aware pre-trained Bidirectional Encoder Representations from Transformers (CAMeLBERT) with Arabic Generative Pre-trained Transformer 2 (AraGPT2) and the second combines Arabic BERT (AraBERT) with Cross-lingual Language Model - RoBERTa (XLM-R). Two strategies, namely, feature fusion and ensemble voting, are employed to improve the model performance accuracy. Experimental results, measured through precision, recall, F1-score, accuracy, and Area Under the Curve-Receiver Operating Characteristic (AUC-ROC), demonstrate that the combined CAMeLBERT and AraGPT2 models using feature fusion outperformed traditional DL models, such as Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM), as well as other independent Arabic-based transformer models.
Keywords
Cyberbullying; transformers; pre-trained models; arabic cyberbullying detection; deep learning