Open Access
ARTICLE
FSPAM: A Feature Construction Method to Identifying Cell Populations in ScRNA-seq Data
1 Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran.
2 Department of Electrical Engineering, Faculty of Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran.
* Corresponding Author: Mohammad Mosleh. Email: .
Computer Modeling in Engineering & Sciences 2020, 122(1), 377-397. https://doi.org/10.32604/cmes.2020.08496
Received 30 August 2019; Accepted 12 November 2019; Issue published 01 January 2020
Abstract
The emergence of single-cell RNA-sequencing (scRNA-seq) technology has introduced new information about the structure of cells, diseases, and their associated biological factors. One of the main uses of scRNA-seq is identifying cell populations, which sometimes leads to the detection of rare cell populations. However, the new method is still in its infancy and with its advantages comes computational challenges that are just beginning to address. An important tool in the analysis is dimensionality reduction, which transforms high dimensional data into a meaningful reduced subspace. The technique allows noise removal, visualization and compression of high-dimensional data. This paper presents a new dimensionality reduction approach where, during an unsupervised multistage process, a feature set including high valuable markers is created which can facilitate the isolation of cell populations. Our proposed method, called fusion of the Spearman and Pearson affinity matrices (FSPAM), is based on a graph-based Gaussian kernel. Use of the graph theory can be effective to overcome the challenge of the nonlinear relations between cellular markers in scRNA-seq data. Furthermore, with a proper fusion of the Pearson and Spearman correlation coefficient criteria, it extracts a set of the most important features in a new space. In fact, the FSPAM aggregates the various aspects of cell-to-cell similarity derived from the Pearson and Spearman metrics, and reveals new aspects of cell-to-cell similarity, which can be used to extract new features. The results of the identification of cell populations via k-means++ clustering method based on the features extracted from the FSPAM and different datasets of scRNA-seq suggested that the proposed method, regardless of the characteristics that govern each dataset, enjoys greater accuracy and better quality compared to previous methods.Keywords
Cite This Article
Citations
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.