Simona-Vasilica Oprea*, Adela Bâra
CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 3827-3853, 2024, DOI:10.32604/cmc.2024.051598
- 20 June 2024
Abstract The potential of text analytics is revealed by Machine Learning (ML) and Natural Language Processing (NLP) techniques. In this paper, we propose an NLP framework that is applied to multiple datasets to detect malicious Uniform Resource Locators (URLs). Three categories of features, both ML and Deep Learning (DL) algorithms and a ranking schema are included in the proposed framework. We apply frequency and prediction-based embeddings, such as hash vectorizer, Term Frequency-Inverse Dense Frequency (TF-IDF) and predictors, word to vector-word2vec (continuous bag of words, skip-gram) from Google, to extract features from text. Further, we apply more… More >