Pseudo NLP Joint Spam Classification Technique for Big Data Cluster

WooHyun Park; Nawab Muhammad; Dong Shin

doi:10.32604/cmc.2022.021421

Open Access icon Open Access

ARTICLE

Pseudo NLP Joint Spam Classification Technique for Big Data Cluster

WooHyun Park¹, Nawab Muhammad Faseeh Qureshi^2,*, Dong Ryeol Shin¹

1 Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, 16419, Korea
2 Department of Computer Education, Sungkyunkwan University, Seoul, 03063, Korea

* Corresponding Author: Nawab Muhammad Faseeh Qureshi. Email: email

Computers, Materials & Continua 2022, 71(1), 517-535. https://doi.org/10.32604/cmc.2022.021421

Received 02 July 2021; Accepted 12 August 2021; Issue published 03 November 2021

Abstract

Spam mail classification considered complex and error-prone task in the distributed computing environment. There are various available spam mail classification approaches such as the naive Bayesian classifier, logistic regression and support vector machine and decision tree, recursive neural network, and long short-term memory algorithms. However, they do not consider the document when analyzing spam mail content. These approaches use the bag-of-words method, which analyzes a large amount of text data and classifies features with the help of term frequency-inverse document frequency. Because there are many words in a document, these approaches consume a massive amount of resources and become infeasible when performing classification on multiple associated mail documents together. Thus, spam mail is not classified fully, and these approaches remain with loopholes. Thus, we propose a term frequency topic inverse document frequency model that considers the meaning of text data in a larger semantic unit by applying weights based on the document’s topic. Moreover, the proposed approach reduces the scarcity problem through a frequency topic-inverse document frequency in singular value decomposition model. Our proposed approach also reduces the dimensionality, which ultimately increases the strength of document classification. Experimental evaluations show that the proposed approach classifies spam mail documents with higher accuracy using individual document-independent processing computation. Comparative evaluations show that the proposed approach performs better than the logistic regression model in the distributed computing environment, with higher document word frequencies of 97.05%, 99.17% and 96.59%.

Keywords

NLP; big data; machine learning; TFT-IDF; spam mail

Cite This Article

APA Style

Park, W., Qureshi, N.M.F., Shin, D.R. (2022). Pseudo NLP Joint Spam Classification Technique for Big Data Cluster. Computers, Materials & Continua, 71(1), 517–535. https://doi.org/10.32604/cmc.2022.021421

Vancouver Style

Park W, Qureshi NMF, Shin DR. Pseudo NLP Joint Spam Classification Technique for Big Data Cluster. Comput Mater Contin. 2022;71(1):517–535. https://doi.org/10.32604/cmc.2022.021421

IEEE Style

W. Park, N. M. F. Qureshi, and D. R. Shin, “Pseudo NLP Joint Spam Classification Technique for Big Data Cluster,” Comput. Mater. Contin., vol. 71, no. 1, pp. 517–535, 2022. https://doi.org/10.32604/cmc.2022.021421

BibTex EndNote RIS

Citations

2

[click to view]

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Pseudo NLP Joint Spam Classification Technique for Big Data Cluster

Abstract

Keywords

Cite This Article

Citations

3044

1933

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link