Feature Selection for Cluster Analysis in Spectroscopy

Simon Crase; Benjamin Hall; Suresh Thennadil

doi:10.32604/cmc.2022.022414

Open Access icon Open Access

ARTICLE

Feature Selection for Cluster Analysis in Spectroscopy

Simon Crase^1,2, Benjamin Hall², Suresh N. Thennadil^3,*

1 College of Engineering, IT & Environment, Charles Darwin University, Casuarina, NT 0909, Australia
2 Defence Science and Technology Group, Edinburgh, 5111, Australia
3 Energy and Resources Institute, Charles Darwin University, Casuarina, NT 0909, Australia

* Corresponding Author: Suresh N. Thennadil. Email: email

Computers, Materials & Continua 2022, 71(2), 2435-2458. https://doi.org/10.32604/cmc.2022.022414

Received 06 August 2021; Accepted 07 September 2021; Issue published 07 December 2021

Abstract

Cluster analysis in spectroscopy presents some unique challenges due to the specific data characteristics in spectroscopy, namely, high dimensionality and small sample size. In order to improve cluster analysis outcomes, feature selection can be used to remove redundant or irrelevant features and reduce the dimensionality. However, for cluster analysis, this must be done in an unsupervised manner without the benefit of data labels. This paper presents a novel feature selection approach for cluster analysis, utilizing clusterability metrics to remove features that least contribute to a dataset's tendency to cluster. Two versions are presented and evaluated: The Hopkins clusterability filter which utilizes the Hopkins test for spatial randomness and the Dip clusterability filter which utilizes the Dip test for unimodality. These new techniques, along with a range of existing filter and wrapper feature selection techniques were evaluated on eleven real-world spectroscopy datasets using internal and external clustering indices. Our newly proposed Hopkins clusterability filter performed the best of the six filter techniques evaluated. However, it was observed that results varied greatly for different techniques depending on the specifics of the dataset and the number of features selected, with significant instability observed for most techniques at low numbers of features. It was identified that the genetic algorithm wrapper technique avoided this instability, performed consistently across all datasets and resulted in better results on average than utilizing the all the features in the spectra.

Keywords

Cluster analysis; spectroscopy; unsupervised learning; feature selection; wavenumber selection

Cite This Article

APA Style

Crase, S., Hall, B., Thennadil, S.N. (2022). Feature Selection for Cluster Analysis in Spectroscopy. Computers, Materials & Continua, 71(2), 2435–2458. https://doi.org/10.32604/cmc.2022.022414

Vancouver Style

Crase S, Hall B, Thennadil SN. Feature Selection for Cluster Analysis in Spectroscopy. Comput Mater Contin. 2022;71(2):2435–2458. https://doi.org/10.32604/cmc.2022.022414

IEEE Style

S. Crase, B. Hall, and S. N. Thennadil, “Feature Selection for Cluster Analysis in Spectroscopy,” Comput. Mater. Contin., vol. 71, no. 2, pp. 2435–2458, 2022. https://doi.org/10.32604/cmc.2022.022414

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Feature Selection for Cluster Analysis in Spectroscopy

Abstract

Keywords

Cite This Article

2768

1751

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link