Open Access iconOpen Access

ARTICLE

crossmark

Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

by Arar Al Tawil1,*, Laiali Almazaydeh2, Doaa Qawasmeh3, Baraah Qawasmeh4, Mohammad Alshinwan1,5, Khaled Elleithy6

1 Faculty of Information Technology, Applied Science Private University, Amman, 11931, Jordan
2 College of Engineering, Abu Dhabi University, Abu Dhabi, Al Ain, P.O. Box 1790, United Arab Emirates
3 Faculty of Artificial Intelligence, Al-Balqa Applied University, Salt, 19117, Jordan
4 Department of Civil and Construction Engineering, Western Michigan University, Kalamazoo, MI 49008, USA
5 MEU Research Unit, Middle East University, Amman, 11831, Jordan
6 Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

* Corresponding Author: Arar Al Tawil. Email: email

Computers, Materials & Continua 2024, 81(2), 3395-3412. https://doi.org/10.32604/cmc.2024.057279

Abstract

Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec’s best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.

Keywords


Cite This Article

APA Style
Tawil, A.A., Almazaydeh, L., Qawasmeh, D., Qawasmeh, B., Alshinwan, M. et al. (2024). Comparative analysis of machine learning algorithms for email phishing detection using TF-IDF, word2vec, and BERT. Computers, Materials & Continua, 81(2), 3395-3412. https://doi.org/10.32604/cmc.2024.057279
Vancouver Style
Tawil AA, Almazaydeh L, Qawasmeh D, Qawasmeh B, Alshinwan M, Elleithy K. Comparative analysis of machine learning algorithms for email phishing detection using TF-IDF, word2vec, and BERT. Comput Mater Contin. 2024;81(2):3395-3412 https://doi.org/10.32604/cmc.2024.057279
IEEE Style
A. A. Tawil, L. Almazaydeh, D. Qawasmeh, B. Qawasmeh, M. Alshinwan, and K. Elleithy, “Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT,” Comput. Mater. Contin., vol. 81, no. 2, pp. 3395-3412, 2024. https://doi.org/10.32604/cmc.2024.057279



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 490

    View

  • 197

    Download

  • 0

    Like

Share Link