Home / Journals / CMES / Online First / doi:10.32604/cmes.2024.053373
Special Issues
Table of Content

Open Access

ARTICLE

Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification

Oluwaseun Peter Ige1,2, Keng Hoon Gan1,*
1 School of Computer Sciences, Universiti Sains Malaysia, Gelugor, 11800, Malaysia
2 Universal Basic Education Commission, Abuja, 900284, Nigeria
* Corresponding Author: Keng Hoon Gan. Email: email
(This article belongs to the Special Issue: Lightweight Methods and Resource-efficient Computing Solutions)

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2024.053373

Received 30 April 2024; Accepted 31 July 2024; Published online 11 September 2024

Abstract

Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality. This involves eliminating irrelevant, redundant, and noisy features to streamline the classification process. Various methods, from single feature selection techniques to ensemble filter-wrapper methods, have been used in the literature. Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents. Feature selection is inherently multi-objective, balancing the enhancement of feature relevance, accuracy, and the reduction of redundant features. This research presents a two-fold objective for feature selection. The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods: Information Gain (Infogain), Chi-Square (Chi2), and Analysis of Variance (ANOVA). This aims to maximize feature relevance while minimizing redundancy. The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony (ABC) and Genetic Algorithms (GA). This hybrid method operates in a wrapper framework to identify the most informative subset of text features. Support Vector Machine (SVM) was employed as the performance evaluator for the proposed model, tested on two high-dimensional multiclass datasets. The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection, offering superior performance compared to other existing feature selection algorithms.

Keywords

Metaheuristic algorithms; text classification; multi-univariate filter feature selection; ensemble filter-wrapper techniques
  • 170

    View

  • 35

    Download

  • 0

    Like

Share Link