Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework

Simona-Vasilica Oprea; Adela Bâra

doi:10.32604/cmc.2024.051598

Open Access icon Open Access

ARTICLE

Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework

Simona-Vasilica Oprea^*, Adela Bâra

Department of Economic Informatics and Cybernetics, Bucharest University of Economic Studies, Bucharest, 010572, Romania

* Corresponding Author: Simona-Vasilica Oprea. Email: email

Computers, Materials & Continua 2024, 79(3), 3827-3853. https://doi.org/10.32604/cmc.2024.051598

Received 10 March 2024; Accepted 17 May 2024; Issue published 20 June 2024

Abstract

The potential of text analytics is revealed by Machine Learning (ML) and Natural Language Processing (NLP) techniques. In this paper, we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators (URLs). Three categories of features, both ML and Deep Learning (DL) algorithms and a ranking schema are included in the proposed framework. We apply frequency and prediction-based embeddings, such as hash vectorizer, Term Frequency-Inverse Dense Frequency (TF-IDF) and predictors, word to vector-word2vec (continuous bag of words, skip-gram) from Google, to extract features from text. Further, we apply more state-of-the-art methods to create vectorized features, such as GloVe. Additionally, feature engineering that is specific to URL structure is deployed to detect scams and other threats. For framework assessment, four ranking indicators are weighted: computational time and performance as accuracy, F1 score and type error II. For the computational time, we propose a new metric-Feature Building Time (FBT) as the cutting-edge feature builders (like doc2vec or GloVe) require more time. By applying the proposed assessment step, the skip-gram algorithm of word2vec surpasses other feature builders in performance. Additionally, eXtreme Gradient Boost (XGB) outperforms other classifiers. With this setup, we attain an accuracy of 99.5% and an F1 score of 0.99.

Keywords

Detecting malicious URL; classifiers; text to feature; deep learning; ranking algorithms; feature building time

Cite This Article

APA Style

Oprea, S., Bâra, A. (2024). Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework. Computers, Materials & Continua, 79(3), 3827–3853. https://doi.org/10.32604/cmc.2024.051598

Vancouver Style

Oprea S, Bâra A. Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework. Comput Mater Contin. 2024;79(3):3827–3853. https://doi.org/10.32604/cmc.2024.051598

IEEE Style

S. Oprea and A. Bâra, “Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework,” Comput. Mater. Contin., vol. 79, no. 3, pp. 3827–3853, 2024. https://doi.org/10.32604/cmc.2024.051598

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Detecting Malicious Uniform Resource Locators Using an Applied Intelligence Framework

Abstract

Keywords

Cite This Article

584

353

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link