Open Access iconOpen Access

ARTICLE

A Multivariate Relevance Frequency Analysis Based Feature Selection for Classification of Short Text Data

Saravanan Arumugam*

Department of Computing, Coimbatore Institute of Technology, Coimbatore, 641014, India

* Corresponding Author: Saravanan Arumugam. Email: email

Computer Systems Science and Engineering 2024, 48(4), 989-1008. https://doi.org/10.32604/csse.2024.051770

Abstract

Text mining presents unique challenges in extracting meaningful information from the vast volumes of digital documents. Traditional filter feature selection methods often fall short in handling the complexities of short text data. To address this issue, this paper presents a novel approach to feature selection in text classification, aiming to overcome challenges posed by high dimensionality and reduced accuracy in the face of increasing digital document volumes. Unlike traditional filter feature selection techniques, the proposed method, Multivariate Relevance Frequency Analysis, offers a tailored solution for diverse text data types. By integrating positive, negative, and dependency relevance computations, the proposed approach effectively prunes features, enhancing classification performance. Extensive experimental analysis has been performed for the proposed model and compared with several standard existing feature selection models on five datasets involving short and long texts using four standard classifiers. The results indicate that the proposed model has the highest macro-F1 score of 94% for the SMS dataset, 78.1% for the SLS dataset, 89.4% for the AYSC dataset, 71.32% for the Reuters dataset, and 98.63% for the 20Newsgroup dataset. The statistical analysis also indicates that the proposed model provides better performance with both short texts such as messages and reviews as well as long texts containing documents, with superior performance for short-text data. The comparative analysis shows that the proposed model offers better performance than many other standard filtration models.

Keywords


Cite This Article

APA Style
Arumugam, S. (2024). A multivariate relevance frequency analysis based feature selection for classification of short text data. Computer Systems Science and Engineering, 48(4), 989-1008. https://doi.org/10.32604/csse.2024.051770
Vancouver Style
Arumugam S. A multivariate relevance frequency analysis based feature selection for classification of short text data. Comput Syst Sci Eng. 2024;48(4):989-1008 https://doi.org/10.32604/csse.2024.051770
IEEE Style
S. Arumugam, "A Multivariate Relevance Frequency Analysis Based Feature Selection for Classification of Short Text Data," Comput. Syst. Sci. Eng., vol. 48, no. 4, pp. 989-1008. 2024. https://doi.org/10.32604/csse.2024.051770



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 199

    View

  • 40

    Download

  • 0

    Like

Share Link