Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

Arar Tawil; Laiali Almazaydeh; Doaa Qawasmeh; Baraah Qawasmeh; Mohammad Alshinwan; Khaled Elleithy

doi:10.32604/cmc.2024.057279

Open Access icon Open Access

ARTICLE

Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

Arar Al Tawil^1,*, Laiali Almazaydeh², Doaa Qawasmeh³, Baraah Qawasmeh⁴, Mohammad Alshinwan^1,5, Khaled Elleithy⁶

1 Faculty of Information Technology, Applied Science Private University, Amman, 11931, Jordan
2 College of Engineering, Abu Dhabi University, Abu Dhabi, Al Ain, P.O. Box 1790, United Arab Emirates
3 Faculty of Artificial Intelligence, Al-Balqa Applied University, Salt, 19117, Jordan
4 Department of Civil and Construction Engineering, Western Michigan University, Kalamazoo, MI 49008, USA
5 MEU Research Unit, Middle East University, Amman, 11831, Jordan
6 Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

* Corresponding Author: Arar Al Tawil. Email: email

Computers, Materials & Continua 2024, 81(2), 3395-3412. https://doi.org/10.32604/cmc.2024.057279

Received 14 August 2024; Accepted 15 October 2024; Issue published 18 November 2024

Abstract

Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec’s best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.

Keywords

Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP)

Cite This Article

APA Style

Tawil, A.A., Almazaydeh, L., Qawasmeh, D., Qawasmeh, B., Alshinwan, M. et al. (2024). Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT. Computers, Materials & Continua, 81(2), 3395–3412. https://doi.org/10.32604/cmc.2024.057279

Vancouver Style

Tawil AA, Almazaydeh L, Qawasmeh D, Qawasmeh B, Alshinwan M, Elleithy K. Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT. Comput Mater Contin. 2024;81(2):3395–3412. https://doi.org/10.32604/cmc.2024.057279

IEEE Style

A. A. Tawil, L. Almazaydeh, D. Qawasmeh, B. Qawasmeh, M. Alshinwan, and K. Elleithy, “Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT,” Comput. Mater. Contin., vol. 81, no. 2, pp. 3395–3412, 2024. https://doi.org/10.32604/cmc.2024.057279

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

Abstract

Keywords

Cite This Article

4256

1627

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link