Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches

Enas Alqulaity; Wael Yafooz; Abdullah Alourani; Ayman Jaradat

doi:10.32604/iasc.2024.055470

Open Access icon Open Access

ARTICLE

Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches

Enas Yahya Alqulaity¹, Wael M.S. Yafooz^1,*, Abdullah Alourani², Ayman Jaradat³

1 Department of Computer Science, College of Computer Science and Engineering, Taibah University, Medina, 42353, Saudi Arabia
2 Department of Management Information Systems, College of Business and Economics, Qassim University, Buraydah, 51452, Saudi Arabia
3 Internet of Things Department, Faculty of Science and Information Technology, Jadara University, Irbid, 21110, Jordan

* Corresponding Author: Wael M.S. Yafooz. Email: email

Intelligent Automation & Soft Computing 2024, 39(5), 907-928. https://doi.org/10.32604/iasc.2024.055470

Received 28 June 2024; Accepted 15 August 2024; Issue published 31 October 2024

Abstract

Arabic dialect identification is essential in Natural Language Processing (NLP) and forms a critical component of applications such as machine translation, sentiment analysis, and cross-language text generation. The difficulties in differentiating between Arabic dialects have garnered more attention in the last 10 years, particularly in social media. These difficulties result from the overlapping vocabulary of the dialects, the fluidity of online language use, and the difficulties in telling apart dialects that are closely related. Managing dialects with limited resources and adjusting to the ever-changing linguistic trends on social media platforms present additional challenges. A strong dialect recognition technique is essential to improving communication technology and cross-cultural understanding in light of the increase in social media usage. To distinguish Arabic dialects on social media, this research suggests a hybrid Deep Learning (DL) approach. The Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) architectures make up the model. A new textual dataset that focuses on three main dialects, i.e., Levantine, Saudi, and Egyptian, is also available. Approximately 11,000 user-generated comments from Twitter are included in this dataset, which has been painstakingly annotated to guarantee accuracy in dialect classification. Transformers, DL models, and basic machine learning classifiers are used to conduct several tests to evaluate the performance of the suggested model. Various methodologies, including TF-IDF, word embedding, and self-attention mechanisms, are used. The suggested model fares better than other models in terms of accuracy, obtaining a remarkable 96.54%, according to the trial results. This study advances the discipline by presenting a new dataset and putting forth a practical model for Arabic dialect identification. This model may prove crucial for future work in sociolinguistic studies and NLP.

Keywords

Dialectal Arabic; transformers; deep learning; natural language processing systems

Cite This Article

APA Style

Alqulaity, E.Y., Yafooz, W.M., Alourani, A., Jaradat, A. (2024). Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches. Intelligent Automation & Soft Computing, 39(5), 907–928. https://doi.org/10.32604/iasc.2024.055470

Vancouver Style

Alqulaity EY, Yafooz WM, Alourani A, Jaradat A. Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches. Intell Automat Soft Comput. 2024;39(5):907–928. https://doi.org/10.32604/iasc.2024.055470

IEEE Style

E. Y. Alqulaity, W. M. Yafooz, A. Alourani, and A. Jaradat, “Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches,” Intell. Automat. Soft Comput., vol. 39, no. 5, pp. 907–928, 2024. https://doi.org/10.32604/iasc.2024.055470

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches

Abstract

Keywords

Cite This Article

1564

664

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link