Open Access iconOpen Access

ARTICLE

crossmark

Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework

Simona-Vasilica Oprea*, Adela Bâra

Department of Economic Informatics and Cybernetics, Bucharest University of Economic Studies, Bucharest, 010572, Romania

* Corresponding Author: Simona-Vasilica Oprea. Email: email

Computers, Materials & Continua 2024, 79(3), 3827-3853. https://doi.org/10.32604/cmc.2024.051598

Abstract

The potential of text analytics is revealed by Machine Learning (ML) and Natural Language Processing (NLP) techniques. In this paper, we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators (URLs). Three categories of features, both ML and Deep Learning (DL) algorithms and a ranking schema are included in the proposed framework. We apply frequency and prediction-based embeddings, such as hash vectorizer, Term Frequency-Inverse Dense Frequency (TF-IDF) and predictors, word to vector-word2vec (continuous bag of words, skip-gram) from Google, to extract features from text. Further, we apply more state-of-the-art methods to create vectorized features, such as GloVe. Additionally, feature engineering that is specific to URL structure is deployed to detect scams and other threats. For framework assessment, four ranking indicators are weighted: computational time and performance as accuracy, F1 score and type error II. For the computational time, we propose a new metric-Feature Building Time (FBT) as the cutting-edge feature builders (like doc2vec or GloVe) require more time. By applying the proposed assessment step, the skip-gram algorithm of word2vec surpasses other feature builders in performance. Additionally, eXtreme Gradient Boost (XGB) outperforms other classifiers. With this setup, we attain an accuracy of 99.5% and an F1 score of 0.99.

Keywords


Cite This Article

APA Style
Oprea, S., Bâra, A. (2024). Detecting malicious uniform resource locators using an applied intelligence framework. Computers, Materials & Continua, 79(3), 3827-3853. https://doi.org/10.32604/cmc.2024.051598
Vancouver Style
Oprea S, Bâra A. Detecting malicious uniform resource locators using an applied intelligence framework. Comput Mater Contin. 2024;79(3):3827-3853 https://doi.org/10.32604/cmc.2024.051598
IEEE Style
S. Oprea and A. Bâra, "Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework," Comput. Mater. Contin., vol. 79, no. 3, pp. 3827-3853. 2024. https://doi.org/10.32604/cmc.2024.051598



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 140

    View

  • 45

    Download

  • 0

    Like

Share Link