Open Access
ARTICLE
Filter and Embedded Feature Selection Methods to Meet Big Data Visualization Challenges
1 Mathematics Department, Faculty of Science, Al-Azhar University, Cairo, 11884, Egypt
* Corresponding Author: Luay Thamer Mohammed. Email:
Computers, Materials & Continua 2023, 74(1), 817-839. https://doi.org/10.32604/cmc.2023.032287
Received 13 May 2022; Accepted 22 June 2022; Issue published 22 September 2022
Abstract
This study focuses on meeting the challenges of big data visualization by using of data reduction methods based the feature selection methods. To reduce the volume of big data and minimize model training time (Tt) while maintaining data quality. We contributed to meeting the challenges of big data visualization using the embedded method based “Select from model (SFM)” method by using “Random forest Importance algorithm (RFI)” and comparing it with the filter method by using “Select percentile (SP)” method based chi square “Chi2” tool for selecting the most important features, which are then fed into a classification process using the logistic regression (LR) algorithm and the k-nearest neighbor (KNN) algorithm. Thus, the classification accuracy (AC) performance of LR is also compared to the KNN approach in python on eight data sets to see which method produces the best rating when feature selection methods are applied. Consequently, the study concluded that the feature selection methods have a significant impact on the analysis and visualization of the data after removing the repetitive data and the data that do not affect the goal. After making several comparisons, the study suggests (SFMLR) using SFM based on RFI algorithm for feature selection, with LR algorithm for data classify. The proposal proved its efficacy by comparing its results with recent literature.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.