Open Access
ARTICLE
Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
1 School of Computer Sciences, Universiti Sains Malaysia, Gelugor, 11800, Malaysia
2 Universal Basic Education Commission, Abuja, 900284, Nigeria
* Corresponding Author: Keng Hoon Gan. Email:
(This article belongs to the Special Issue: Lightweight Methods and Resource-efficient Computing Solutions)
Computer Modeling in Engineering & Sciences 2024, 141(2), 1847-1865. https://doi.org/10.32604/cmes.2024.053373
Received 30 April 2024; Accepted 31 July 2024; Issue published 27 September 2024
Abstract
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset’s dimensionality. This involves eliminating irrelevant, redundant, and noisy features to streamline the classification process. Various methods, from single feature selection techniques to ensemble filter-wrapper methods, have been used in the literature. Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents. Feature selection is inherently multi-objective, balancing the enhancement of feature relevance, accuracy, and the reduction of redundant features. This research presents a two-fold objective for feature selection. The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods: Information Gain (Infogain), Chi-Square (Chi2), and Analysis of Variance (ANOVA). This aims to maximize feature relevance while minimizing redundancy. The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony (ABC) and Genetic Algorithms (GA). This hybrid method operates in a wrapper framework to identify the most informative subset of text features. Support Vector Machine (SVM) was employed as the performance evaluator for the proposed model, tested on two high-dimensional multiclass datasets. The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection, offering superior performance compared to other existing feature selection algorithms.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.