Open Access
ARTICLE
Sentence Level Analysis Model for Phishing Detection Using KNN
Faculty of Computing and Informatics, Mount Kenya University, Thika, Kenya
* Corresponding Authors: Lindah Sawe. Email: ,
Journal of Cyber Security 2024, 6, 25-39. https://doi.org/10.32604/jcs.2023.045859
Received 10 September 2023; Accepted 23 November 2023; Issue published 11 January 2024
Abstract
Phishing emails have experienced a rapid surge in cyber threats globally, especially following the emergence of the COVID-19 pandemic. This form of attack has led to substantial financial losses for numerous organizations. Although various models have been constructed to differentiate legitimate emails from phishing attempts, attackers continuously employ novel strategies to manipulate their targets into falling victim to their schemes. This form of attack has led to substantial financial losses for numerous organizations. While efforts are ongoing to create phishing detection models, their current level of accuracy and speed in identifying phishing emails is less than satisfactory. Additionally, there has been a concerning rise in the frequency of phished emails recently. Consequently, there is a pressing need for more efficient and high-performing phishing detection models to mitigate the adverse impact of such fraudulent messages. In the context of this research, a comprehensive analysis is conducted on both components of an email message—namely, the email header and body. Sentence-level characteristics are extracted and leveraged in the construction of a new phishing detection model. This model utilizes K Nearest Neighbor (KNN) introducing the novel dimension of sentence-level analysis. Established datasets from Kaggle were employed to train and validate the model. The evaluation of this model’s effectiveness relies on key performance metrics including accuracy of 0.97, precision, recall, and F1-measure.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.