A Method for Fast Feature Selection Utilizing Cross-Similarity within the Context of Fuzzy Relations

Wenchang Yu; Xiaoqin Ma; Zheqing Zhang; Qinli Zhang

doi:10.32604/cmc.2025.060833

icon Open Access

ARTICLE

A Method for Fast Feature Selection Utilizing Cross-Similarity within the Context of Fuzzy Relations

Wenchang Yu¹, Xiaoqin Ma^1,2, Zheqing Zhang¹, Qinli Zhang^1,2,*

1 School of Big Data and Artificial Intelligence, Chizhou University, Chizhou, 247000, China
2 Anhui Education Big Data Intelligent Perception and Application Engineering Research Center, Anhui Provincial Joint Construction Key Laboratory of Intelligent Education Equipment and Technology, Chizhou, 247000, China

* Corresponding Author: Qinli Zhang. Email: email

(This article belongs to the Special Issue: Advanced Algorithms for Feature Selection in Machine Learning)

Computers, Materials & Continua 2025, 83(1), 1195-1218. https://doi.org/10.32604/cmc.2025.060833

Received 11 November 2024; Accepted 28 January 2025; Issue published 26 March 2025

Abstract

Feature selection methods rooted in rough sets confront two notable limitations: their high computational complexity and sensitivity to noise, rendering them impractical for managing large-scale and noisy datasets. The primary issue stems from these methods’ undue reliance on all samples. To overcome these challenges, we introduce the concept of cross-similarity grounded in a robust fuzzy relation and design a rapid and robust feature selection algorithm. Firstly, we construct a robust fuzzy relation by introducing a truncation parameter. Then, based on this fuzzy relation, we propose the concept of cross-similarity, which emphasizes the sample-to-sample similarity relations that uniquely determine feature importance, rather than considering all such relations equally. After studying the manifestations and properties of cross-similarity across different fuzzy granularities, we propose a forward greedy feature selection algorithm that leverages cross-similarity as the foundation for information measurement. This algorithm significantly reduces the time complexity from O(m²n²) to O(mn²). Experimental findings reveal that the average runtime of five state-of-the-art comparison algorithms is roughly 3.7 times longer than our algorithm, while our algorithm achieves an average accuracy that surpasses those of the five comparison algorithms by approximately 3.52%. This underscores the effectiveness of our approach. This paper paves the way for applying feature selection algorithms grounded in fuzzy rough sets to large-scale gene datasets.

Keywords

Fuzzy rough sets; feature selection; cross-similarity; fuzzy relations

Abbreviations

FRS	Fuzzy rough set
FCRS	Fuzzy compatible relation space
FDT	Fuzzy decision table
FRSM	Fuzzy rough set model
CSFS	Cross-similarity based feature selection

1 Introduction

With the evolution of information technology, a substantial amount of information is captured in the form of data. These datasets contain a substantial quantity of redundant information. In data mining and model training, these redundant features can increase computational complexity and even result in model distortion, ultimately affecting the model’s predictive capability. Feature selection (or attribute selection, attribute reduction) is a crucial technique to minimize redundant features, aiming to identify the smallest subset of features that retain the distinguishability of the data.

Rough set theory [1] enjoys widespread application across diverse fields [2–4], with feature selection emerging as a particularly prominent research area within its scope. However, a limitation of rough set theory is its inability to handle continuous data directly. When dealing with features with continuous values, discretization is necessary, which can lead to the loss of valuable data information. To address these limitations of traditional rough set theory, Dubois et al. [5] introduced a novel concept by integrating fuzzy sets with rough sets, resulting in the emergence of a fuzzy rough set (FRS). This extension of Pawlak’s rough set model to the fuzzy domain represents a significant advancement in the field. Since then, researchers have conducted thorough investigations into fuzzy rough set theory, further refining its theoretical framework, alleviating constraints in practical applications, mitigating the impact of noisy data, and enhancing its overall robustness. Some researchers [6,7] have endeavored to generalize the fuzzy relationships among samples in fuzzy rough sets, broadening their scope and enhancing the model’s adaptability. Additionally, they have gone beyond the traditional definitions of the upper and lower approximation operators, giving the model a more comprehensive and flexible framework. These innovative extensions not only increase the model’s flexibility but also enable it to handle a wider range of practical applications with greater precision and adaptability. Despite the efforts of some researchers [8–11] to design more reasonable relationships and upper-lower approximation operations in order to mitigate the impact of noisy data and enhance the robustness of these fuzzy rough set models, they are often accompanied by increased computational complexity. Some researchers [12–16] have broadened the concept of fuzzy partition to fuzzy covering, thereby easing the constraints on the model’s application and enhancing its adaptability to diverse scenarios. This generalization not only facilitates a more flexible application of the model but also empowers it to tackle a broader spectrum of practical problems with enhanced precision and accuracy.

With the evolution of fuzzy rough set theory, numerous feature selection algorithms have emerged, built upon this theory. These algorithms aim to identify and extract salient features from datasets, aiding in their effective representation and interpretation. Jensen et al. [17] introduced the concepts of the fuzzy positive region and the fuzzy-rough dependency function, subsequently developing a feature selection model based on this dependency. Yang et al. [18] proposed a granular matrix and a granular reduction model, informed by fuzzy β-covering theory. Huang et al. [19,20] introduced a novel discernibility measure and a novel fitting model for attribute reduction, utilizing the fuzzy $β$ -neighborhood to capture the distinguishing power of a fuzzy covering family. Huang et al. [21] devised a robust rough set model that integrates fuzzy rough sets, covering-based rough sets, and multi-granulation rough sets. Furthermore, he successfully employed this model in the realm of feature selection. Wan et al. [22] proposed a novel interactive and complementary feature selection approach, which is based on the fuzzy multi-neighborhood rough set model, specifically designed for use with fuzzy and uncertain data. Hu et al. [23] developed a fuzzy multi-objective feature selection method, incorporating a fuzzy dominance relationship and particle swarm optimization. Deng et al. [24] formulated the dual similarity by leveraging the concept of neighborhood fuzzy entropy, and subsequently utilized it for feature selection in label distribution learning. Dai et al. [25] introduced the concept of multi-fuzzy $β$ -covering approximation spaces and subsequently proposed a forward greedy fuzzy $β$ -covering reduction algorithm, which is based on the monotone conditional entropy in a multi-fuzzy $β$ -covering approximation space.

Current feature selection algorithms predominantly employ forward greedy feature selection, that follows a sequential process, consisting of several distinct steps:

1) Determine an evaluation metric.

2) Model initialization: commences with an empty set as the selected feature subset and a candidate feature subset consisting of all features.

3) Feature selection: in each iteration, incorporate one feature from the candidate feature subset to the already selected feature subset according to the evaluation metric. This process continues until the stopping criteria are satisfied.

When dealing with a dataset containing $n$ features and $m$ samples, the forward greedy feature selection algorithm, in the worst-case scenario, necessitates the calculation of the evaluation metric $n(n−1)/2$ times. Moreover, for each evaluation of the evaluation metric, the calculation of relationships between every pair of samples in the dataset is essential, which necessitates calculating $m2$ relationships between samples, such as distances or similarities. Consequently, the time complexity of these algorithms typically lies in the order of $O(m2n2)$ or higher. A significant challenge faced by these feature selection algorithms is their high computational complexity, which makes it challenging for them to effectively handle large sample sizes or high-dimensional data in practical settings.

Through rigorous research and meticulous analysis, we have identified the following issues with existing uncertainty measures and their corresponding feature selection algorithms:

1) Existing feature selection algorithms must consider the relationships between all samples when calculating uncertainty measures (based on dependency or conditional information entropy).

2) Existing sample relationships may exhibit weak associations due to noisy data, which can result in the selection of redundant features.

3) Existing feature selection algorithms struggle to process large sample sizes or high-dimensional datasets effectively due to their high computational complexity.

Drawing upon the insights obtained, we introduce a novel and efficient feature selection algorithm that is grounded in cross-similarity. This algorithm represents a significant advancement in addressing the above challenges associated with feature selection. The primary contributions of this paper are outlined as follows:

1) We establish a fuzzy binary relation known as the “cross-similarity relation”, which is highly sensitive to samples and relationships that impact the uncertainty of the decision tables while ignoring samples that are already consistent in the decision table.

2) Using the cross-similarity relation as the uncertainty measure, we propose a new feature selection algorithm. Compared to existing feature selection algorithms, our algorithm reduces the time complexity from $O(m2n2)$ to $O(mn2)$ .

3) To mitigate the impact of noise-induced weak relationships, we introduce a truncation parameter that effectively reduces the influence of noisy data on the feature selection algorithm.

The paper is organized as follows: Section 2 covers the basics of similar relations and fuzzy rough sets. Section 3 introduces cross-similarity relations and their significance, along with their key properties. Section 4 details our new forward greedy feature selection model using fuzzy cross-similarity for data analysis. Section 5 presents experiments proving our method’s effectiveness, stability, and efficiency. Finally, Section 6 summarizes our findings, limitations of the algorithm, and further research directions.

2 Preliminaries

In this section, we will delve into the fundamental concepts of similar relations and fuzzy rough sets.

2.1 Similar Relations

In an information table, a binary relation among samples can be constructed based on the feature set. This binary relation is generally either a similarity relation [5,20,25] or a distance relation [7,26].

Definition 1. Let $U={x1,x2,…,xm}$ be a set of samples, $A={a1,a2,…,an}$ be a set of conditional features, and $R={r1,r2,…,rn}$ be a set of similar relations between samples, where $ri∈R(i=1,2,⋅⋅⋅,n)$ is a fuzzy compatible relation derived from the corresponding feature $ai∈A$ , signifying that $ri(xj,xk)∈[0,1](xj,xk∈U)$ and $ri$ fulfills the properties of reflexivity and symmetry. Then, the pair $(U,R)$ is referred to as a fuzzy compatible relation space (FCRS).

Definition 2. Let $(U;R)$ be an FCRS, $x,y∈U,G⊆R$ , and $SimG(x,y)=⋀ri∈Gri(x,y)$ . Then $SimG$ is called a multi-attribute fuzzy compatible relation concerning $G$ .

Definition 3. A parameterized multi-attribute fuzzy compatible relation is defined as follows:

$SimGλ(x,y)={0,SimGλ(x,y)<λSimGλ(x,y),SimGλ(x,y)≥λ$ (1)

where $0≤λ<1$ .

The truncation parameter $λ$ can be used to eliminate weak similarity relationships introduced by noisy data in the dataset. $SimGλ$ is capable of filtering out weak similarity relationships that exert influence on data uncertainty.

Proposition 4. Let $(U,R)$ be an FCRS. It holds that, for any $x,y∈U$ ,

1) if $G1⊆G2⊆R$ , then $SimG1λ(x,y)≥SimG2λ(x,y)$ ;

2) if $λ1≤λ2,$ then $SimGλ1(x,y)≥SimGλ2(x,y)$ .

Proof. 1) For any $ri∈G1$ , $G1⊆G2⟹ri∈G2$ , $SimG1λ(x,y)=⋀ri∈G1ri(x,y)≥⋀ri∈G2ri(x,y)=SimG2λ(x,y)$ .

2) If $SimG(x,y)<λ1≤λ2$ , then $SimGλ1(x,y)=SimGλ2(x,y)=0$ ; if $λ1≤SimG(x,y)<λ2$ , $SimGλ1(x,y)=SimG(x,y)>SimGλ2(x,y)=0$ ; if $λ1≤λ2≤SimG(x,y)$ , then $SimGλ1(x,y)=SimGλ2(x,y)=SimG(x,y)$ . In summary, $SimGλ1(x,y)≥SimGλ2(x,y)$ . □

2.2 Fuzzy Rough Sets

Definition 5. Let $(U,R)$ be an FCRS, with $D$ serving as the decision feature. Then the triple $(U,R,D)$ is known as a fuzzy decision table (FDT).

Let $U/D={D1,D2,…,Dp}$ denote the ensemble of equivalent classes that are categorized by $D$ . $Di∈U/D(i=1,2,…,p)$ can also be conceptualized as a fuzzy set, characterized by clear boundaries.

$Di(x)={0,x∉Di1,x∈Di$ (2)

Definition 6. Let $(U,R)$ be an FCRS, $∀A∈ℱ(U)$ , $∀G⊆R$ , and $∀λ∈[0,1]$ . Then the lower and upper approximations of $A$ concerning $G$ and $λ$ are respectively as follows:

$Apr_Gλ(A)(x)=⋀y∈U{A(y)⋁(1−SimGλ(x,y))}$ (3)

$Apr¯Gλ(A)(x)=⋁y∈U{A(y)⋀SimGλ(x,y)}$ (4)

Then the quadruple ( $U,G,A,λ$ ) is called a fuzzy rough set model (FRSM) concerning $G$ and $λ$ .

Proposition 7. Let ( $U,G,A,λ$ ) be an FRSM. Then the following conclusions hold:

1) if $G1⊆G2⊆G$ , then $Apr_G1λ(A)⊆Apr_G2λ(A)$ ;

2) if $λ1≤λ2$ , then $Apr_Gλ1(A)⊆Apr_Gλ2(A)$ .

Proof. 1) Base on Proposition 4, for any $x,y∈U$ , if $G1⊆G2⊆G$ , then $SimG1λ(x,y)≥SimG2λ(x,y)⇒Apr_G1λ(A)(x)=⋀y∈U{A(y)⋁(1−SimG1λ(x,y))}≤$ $⋀y∈U{A(y)⋁(1−SimG2λ(x,y))}=Apr_G2λ(A)(x)⇒Apr_G1β,λ(A)⊆Apr_G2β,λ(A)$ .

2) Base on Proposition 4, for any $x,y∈U$ , if $λ1≤λ2$ , then $SimGλ1(x,y)≥SimGλ2(x,y)⇒Apr_Gλ1(A)(x)=⋀y∈U{A(y)⋁(1−SimGλ1(x,y))}≤$ $⋀y∈U{A(y)⋁(1−SimGλ2(x,y))}=Apr_Gλ2(A)(x)⇒Apr_Gλ1(A)⊆Apr_Gλ2(A)$ . □

Proposition 8. Let ( $U,G,A,λ$ ) be an FRSM. Then the following conclusions hold:

1) if $G1⊆G2⊆G$ , then $Apr¯G1λ(A)⊇Apr¯G2λ(A)$ ;

2) if $λ1≤λ2$ , then $Apr¯Gλ1(A)⊇Apr¯Gλ2(A)$ .

Proof of Proposition 8 follows a similar approach to Proposition 7.

Proposition 9. Let ( $U,G,A,λ$ ) be an FRSM. Then $Apr_Gλ(A)⊆A⊆Apr¯Gλ(A)$ .

Proof. Based on Definitions 2.2 and 2.7, for any $x∈U$ , $SimGλ(x,x)=1$ , $Apr_Gλ(A)(x)=⋀y∈U{A(y)⋁(1−SimGλ(x,y))}≤A(x)⋁(1−SimGλ(x,x))=A(x)$ and $Apr¯Gλ(A)(x)=⋁y∈U{A(y)⋀SimGλ(x,y)}≥A(x)⋀SimGλ(x,x)=A(x)$ , $Apr_Gλ(A)(x)≤A(x)≤Apr¯Gλ(A)(x)$ , so we have $Apr_Gλ(A)⊆A⊆Apr¯Gλ(A)$ . □

Definition 10. Let $(U,R,D)$ be an FDT, $∀G⊆R$ and $∀λ∈[0,1)$ . If $Apr_Gλ(Di)(x)=Apr¯Gλ(Di)(x)$ ( $∀Di∈U/D$ ), $x$ is referred to as a consistent sample regarding $G$ and $λ$ . If all samples of $U$ demonstrate consistency, the FDT is considered a consistent one.

Proposition 11. Let $(U,R,D)$ be an $FDT$ , $∀A∈ℱ(U)$ , $∀G⊆R$ , and $∀λ∈[0,1]$ . Then the following conclusions are true:

1) if $SimGλ(x,y)>0(∀y∈U)⇒Di(x)=Di(y)(i=1,2,…,p)$ , then the sample $x$ is a consistent sample;

2) if $SimGλ(x,y)>0(∀y∈U)⇒∃Di∈U/D$ such that $Di(x)≠Di(y)$ , then the sample x is an inconsistent sample.

Proof. 1) For $Di(x)=1$ , $Apr_Gλ(Di)(x)=⋀y∈U{Di(y)∨(1−SimGλ(x,y))}={⋀y∈Di{Di(y)⋁(1−SimGλ(x,y))}}∧{⋀y∈U−Di{Di(y)⋁(1−SimGλ(x,y))}}$ . For $∈Di⇒Di(y)=1⇒⋀y∈Di{Di(y)⋁(1−SimGλ(x,y))}=1$ ; for $y∉Di⇒SimGλ(x,y)=0⇒⋀y∈U−Di{Di(y)⋁(1−SimGλ(x,y))}=1$ . $Apr_Gλ(Di)(x)=1$ holds. $Apr¯Gλ(Di)(x)=⋁y∈U{Di(y)⋀SimGλ(x,y)}≥Di(x)⋀SimGλ(x,x)=1$ , and $Apr¯Gλ(Di)(x)∈[0,1]$ , so $∀y∈U,Apr¯Gλ(Di)(x)=1=Apr_Gλ(Di)(x)$ .

For $Di(x)=0$ , $Apr_Gλ(Di)(x)=⋀y∈U{Di(y)⋁(1−SimGλ(x,y))}≤Di(x)⋁(1−SimGλ(x,x))=0$ , and $Apr_Gλ(Di)(x)≥0$ , so $Apr_Gλ(Di)(x)=0$ ; $Apr¯Gλ(Di)(x)=⋁y∈U{Di(y)⋀SimGλ(x,y)}={⋁y∈Di{Di(y)⋀SimGλ(x,y)}}∨{⋁y∈U−Di{Di(y)⋀SimGλ(x,y)}}$ . For $y∈Di⇒SimGλ(x,y)=0⇒⋁y∈Di{Di(y)⋀SimGλ(x,y)}=0$ ; and for $y∉Di⇒Di(y)=0⇒⋁y∈U−Di{Di(y)⋀SimGλ(x,y)}=0. Apr¯Gλ(Di)(x)=0$ holds. $Apr_Gλ(Di)(x)=0=Apr¯Gλ(Di)(x)$ .

Hence, the sample $x$ is a consistent sample.

2) For $Di(x)=1$ , $Di(y)=0$ and $SimGλ(x,y)>0$ , $Apr_Gλ(Di)(x)=⋀y∈U{Di(y)⋁(1−SimGλ(x,y))}≤Di(y)⋁(1−SimGλ(x,y))<1$ ; $Apr¯Gλ(Di)(x)=⋁y∈U{Di(y)⋀SimGλ(x,y)}≥Di(x)⋀SimGλ(x,x)=1$ and $Apr¯Gλ(Di)(x)∈[0,1]$ , $Apr¯Gλ(Di)(x)=1>Apr_Gλ(Di)(x)$ .

For $Di(x)=0$ , $Di(y)=1$ and $SimGλ(x,y)>0$ , $Apr_Gλ(Di)(x)=⋀y∈U{Di(y)⋁(1−SimGλ(x,y))}≤Di(x)⋁(1−SimGλ(x,x))=0$ ; $Apr¯Gλ(Di)(x)=⋁y∈U{Di(y)⋀SimGλ(x,y)}≥Di(y)⋀SimGλ(x,y)=SimGλ(x,y)>0,soApr¯Gλ(Di)(x)>Apr_Gλ(Di)(x)$ .

Hence, the sample $x$ is an inconsistent sample. □

3 Cross-Similarity

According to Proposition 11, the non-zero similarity of samples that are inconsistent originates from distinct classes. Hence, the focus should be exclusively on the similarity relationships that exist between samples belonging to different classes. Now, it’s time to delve into the intriguing concept of cross-similarity.

Definition 12. Let $(U,R,D)$ be an FDT, $G⊆R$ , $x∈U$ , $y∈U$ , and $0≤λ≤1$ . The cross-similarity between $x$ and $y$ is defined as follows:

$CSG,Dλ(x,y)={0,D(x)=D(y)SimGλ(x,y),D(x)≠D(y).$ (5)

One can consider the FDT as a graph.

• The samples in $U$ are visualized as nodes in the graph.

• The parameterized multi-attribute similarity relations, designated as $SimGλ$ , are conceptually understood as undirected weighted edges connecting nodes within the graph. If $SimGλ=0$ , then there is no edge connecting node $x$ and node $y$ in the graph.

• The equivalence relations derived from $D$ can be visualized as distinct enclosed areas within the graph.

By Definition 12, we narrow our focus to the similarity relations among samples belonging to distinct target partitions. The term “cross” denotes the intersection of the edge connecting samples of different target partitions and the boundary of enclosed areas, from a graphical perspective.

Definition 13. Let $(U,R,D)$ be an FDT, $G⊆R$ , $x∈U$ , and $0≤λ≤1$ . The fuzzy set with respect to $x$ is defined as follows:

$CSG,Dλ(x)=CSG,Dλ(x)(x1)x1+CSG,Dλ(x)(x2)x2+⋯+CSG,Dλ(x)(xm)xm$ (6)

The fuzzy set can be conceptualized as the fuzzy information granule of $x$ , with its cardinality represented as follows:

$|CSG,Dλ(x)|=∑i∈[1,m]CSG,Dλ(x)(xi)$ (7)

Let

$CSG,Dλ(U)=[CSG,Dλ(x1,x1)CSG,Dλ(x1,x2)⋯CSG,Dλ(x1,xm)0CSG,Dλ(x1,x2)⋯CSG,Dλ(x1,xm)⋮⋱⋮00⋯CSG,Dλ(xm,xm)]$

and

$|CSG,Dλ(U)|=∑i∈[1,m],j∈[i,m]CSG,Dλ(xi)(xj)$ (8)

$|CSG,Dλ(U)|$ represents the cross-similarity level of $U$ .

Proposition 14. Let $(U,R,D)$ be an FDT, $x∈U$ , $y∈U$ , $0≤λ≤1$ and $G⊆R$ . The following conclusions are true:

1) $CSG,Dλ(x,y)=0$ ;

2) $CSG,Dλ(x,y)=CSG,Dλ(y,x)$ ;

3) $|CSG,Dλ(U)|=12∑i∈[1,m]|CSG,Dλ(xi)|$ ;

4) $0≤|CSG,Dλ(x)|≤m−1$ ;

5) $0≤|CSG,Dλ(U)|≤m(m−1)2$ .

Proof. 1) By Definition 12, it is evident that $CSG,Dλ(x,y)=0;$

2) Under Definitions 2.2–2.4 and Definition 12, it is evident that this conclusion holds true;

3) According to 1), 2), and Eq. (8), it is evident that this conclusion holds true;

4) Since $SimGλ(x,y)∈[0,1]$ and $SimGλ(x,x)=0$ , this conclusion is true;

Theorem 15. Let $(U,R,D)$ be an FDT, $x∈U$ , $y∈U$ , $λ∈[0,1]$ and $G⊆R$ . The following conclusions hold true:

1) if $G1⊆G2⊆R$ , then $CSG1,Dλ(x,y)≥CSG2,Dλ(x,y)$ ;

2) if $0≤λ1≤λ2≤1,$ then $CSG,Dλ1(x)(y)≥CSG,Dλ2(x)(y)$ .

Proof. 1) In accordance with Definition 12 and Proposition 4, it is evident that this conclusion holds true.

2) According to Definition 12 and Proposition 4, it is straightforward. □

Theorem 16. Let $(U,R,D)$ be an FDT, $x∈U$ , $λ∈[0,1]$ and $G⊆R$ . The following conclusions hold true:

1) if $G1⊆G2⊆R$ , then $|CSG1,Dλ(x)|≥|CSG2,Dλ(x)|$ ;

2) if $0≤λ1≤λ2≤1,$ then $|CSG,Dλ1(x)|≥|CSG,Dλ2(x)|$ .

Proof. According to Theorem 15 and Eq. (7), it is straightforward to demonstrate the validity of these conclusions. □

Theorem 17. Let $(U,R,D)$ be an FDT, $x∈U$ , $λ∈[0,1]$ and $G⊆R$ . The following conclusions are true:

1) if $G1⊆G2⊆R$ , then $|CSG1,Dλ(U)|≥|CSG2,Dλ(U)|$ ;

2) if $0≤λ1≤λ2≤1$ , then $|CSG,Dλ1(U)|≥|CSG,Dλ2(U)|$ .

Proof. According to Theorem 16 Proposition 14, and Eq. (8), it is straightforward to demonstrate the validity of these conclusions. □

Example 1. Let $(U,R,D)$ be an $FDT$ , where $U={x1,x2,x3,x4,x5,x6,x7}$ and $U/D={D1={x2,x3},D2={x1,x7},D3={x4,x5,x6}}$ . $SimGλ$ is shown in Table 1, where $G⊆R$ and $0≤λ<1$ .

images

Target partition $D1$ can be represented as a fuzzy set $D1=0x1+1x2+1x3+0x4+0x5+0x6+0x7$ . The lower and upper approximations of $D1$ under FCRS $(U,G)$ and parament $λ$ are:

$Apr_Gλ(D1)=0x1+0.5x2+0.3x3+0x4+0x5+0x6+0x7$ , $Apr¯Gλ(D1)=0.5x1+1x2+1x3+0.6x4+0x5+0x6+0.7x7$ .

Obviously, $Apr_Gλ(D1)⊂Apr¯Gλ(D1)$ , there is uncertainty in the FCRS $(U,G)$ for Target partition $D1$ . For a decision table, if there is an approximate inequality between the upper and lower parts of a decision partition, it is claimed that the decision table has uncertainty.

By Definition 13, the fuzzy set associated with each sample point is computed as follows:

$CSG,Dλ(x1)=0x1+0.5x2+0x3+0x4+0x5+0x6+0x7,|CSG,Dλ(x1)|=0.5;$

$CSG,Dλ(x2)=0.5x1+0x2+0x3+0x4+0x5+0x6+0x7,|CSG,Dλ(x2)|=0.5;$

$CSG,Dλ(x3)=0x1+0x2+0x3+0.6x4+0x5+0x6+0.7x7,|CSG,Dλ(x3)|=1.3;$

$CSG,Dλ(x4)=0x1+0x2+0.6x3+0x4+0x5+0x6+0x7,|CSG,Dλ(x4)|=0.6;$

$CSG,Dλ(x5)=0x1+0x2+0x3+0x4+0x5+0x6+0x7,|CSG,Dλ(x5)|=0;$

$CSG,Dλ(x6)=0x1+0x2+0x3+0x4+0x5+0x6+0.4x7,|CSG,Dλ(x6)|=0.4;$

$CSG,Dλ(x7)=0x1+0x2+0.7x3+0x4+0x5+0.4x6+0x7,|CSG,Dλ(x7)|=1.1.$

Therefore, based on Definition 12, the FDT $(U,R,D)$ under $G$ and $λ$ can be represented in Fig. 1. Each sample is represented as a node, a similarity relationship greater than 0 corresponds to an edge, and a target partition is depicted as an area enclosed by a dashed line in the graph. Cross-similarity relationships within similar relationships are highlighted in bold red.

images

Figure 1: Graphical representation of the FDT

The cross-similarity matrix $CSG,Dλ(U)$ of the FDT is shown as follows:

$CSG,Dλ(U)=[00.50000000000000000.6000.7000000000000000000000.40000000]$

Therefore, the cross-similarity level of $U$ is:

$|CSG,Dλ(U)|=1/2∑x∈U|CSG,Dλ(x)|=2.2$

Theorem 18. Let $(U,R,D)$ be an FDT, $λ∈[0,1]$ and $G⊆R$ . If $|CSG,Dλ(U)|=0$ , the FDT is consistent:

Proof. $|CSG,Dλ(U)|=0⟺∀x,y∈U$ , $CSG,Dλ(x,y)=0⟺ifSimGλ(x,y)>0⇒D(x)=D(y)$ . Therefore, the FDT is consistent according to Proposition 11 1). □

Theorem 19. Let $(U,R,D)$ be an FDT, $λ∈[0,1]$ and $G⊆R$ . If $|CSG,Dλ(U)|>0$ , the FDT is inconsistent.

Proof. $|CSG,Dλ(U)|>0⇒∃x,y∈U$ , $CSG,Dλ(x,y)>0⇒∃Dl∈U/D$ , $Dl(x)≠Dl(y)$ and $SimGλ(x,y)>0$ . Therefore, the FDT is inconsistent according to Proposition 11 2). □

4 Feature Selection Based on Cross-Similarity

As $|CSG,Dλ(U)|$ decreases, the similarity between samples belonging to different classes diminishes. In other words, a low $|CSG,Dλ(U)|$ suggests that $G$ exhibits strong discriminatory capabilities. Consequently, $|CSG,Dλ(U)|$ serves as an effective tool for measuring the uncertainty associated with the FDT. The framework diagram of feature selection based on cross-similarity is shown in Fig. 2.

images

Figure 2: A framework diagram of feature selection

Definition 20. Let $(U,R,D)$ be an FDT, $λ∈[0,1]$ and $red⊆R$ . The $red$ is referred to as a reduction of $R$ , if

1) $|CSred,Dλ(U)|=|CSR,Dλ(U)|$ ;

2) $|CSred−{r},Dλ(U)|>|CSred,Dλ(U)|,∀r∈red$ .

Definition 21. Let $(U,R,D)$ be an FDT, $λ∈[0,1]$ , $red⊆R$ and $r∈R−red$ . The significance of $r$ with respect to $red$ is defined as follows:

$DCλ(r,red,D)=|CSred,Dλ(U)|−|CSred⋃{r},Dλ(U)|$ (9)

The larger $DCλ(r,red,D)$ is, the smaller $|CSred⋃{r},Dλ(U)|$ becomes, making the $r$ more signicant.

Algorithm 1, a cross-similarity based feature selection (CSFS) algorithm, is a forward greedy feature selection algorithm.

images

Considering that as more features are chosen, the number of relationships to be considered when calculating cross-similarity progressively diminishes, and only the relationships between samples located in different categories are considered, the number of samples gradually decreases. It can be assumed that the number of samples decreases in geometric progression with a common ratio of $k(0<k<1)$ . Then $m+km+k2m+⋅⋅⋅+ktm=[(1−kt+1)/(1−k)]m=cm$ . Therefore, the time complexity of Algorithm 1 is improved to $O(mn2)$ . In contrast, the time complexity of the feature selection algorithm based on positive domain, dependency, and conditional entropy is $O(m2n2)$ .

Example 2. A decision information table is shown in Table 2, where $U={x1,x2,x3,x4,x5,x5,x7}$ , $A={a1,a2,a3,a4}$ and $D={p,n}$ . Let $U∗=U$ and $R∗=R$ .

images

In this paper, the value of $ri$ is determined through the utilization of the following equation:

$rai(x,y)=max{min{ai(x)−ai(y)+σaiσai,ai(y)−ai(x)+σaiσai},0}$ (10)

where $σai$ is the standard deviation of the attribute $ai$ . By calculations, it is determined that $σa1=0.32$ , Let $λ=0.4$ . Then $ra1$ , $Sim{ra1}λ$ and $CS{ra1}λ$ are shown in Table 3.

images

The samples correspond to the nodes, similarity relations correspond to edges, bold red edges highlight the cross-similarity relations, and an area enclosed by a dashed line represents the target partition. The cross-similarity relation derived from $a1$ is shown in Fig. 3.

images

Figure 3: Cross-similarity graph representation

Similarly, $CS{ra2}λ$ , $CS{ra3}λ$ and $CS{ra4}λ$ can be determined, as shown in Table 4.

images

If $U∗=∅$ , let $Simredλ(x,y)=0$ . Therefore, $|CSred,Dλ(x)|=0$ . If $red=∅$ , let $Simredλ(x,y)=1$ . $|CSred,Dλ(x)|=12$ .

According to Tables 3 and 4, we obtain $|CSred⋃{ra1}λ(U)|=2.38$ , $|CSred⋃{ra2}λ(U)|=2.29$ , $|CSred⋃{ra3}λ(U)|=3.02$ and $|CSred⋃{ra4}λ(U)|=2.14$ . Therefore, the minimum value is $|CSred⋃{ra4}λ(U)|$ and $DCλ(ra4,red,D)=12−2.14=9.86>0$ . Thus, $red={ra4}$ , $R∗={ra1,ra2,ra3}$ and $U∗={x1,x3,x4,x5,x6,x7}$ . Subsequently, we can obtain the following values by calculation. $|CSred⋃{ra1}λ(U)|=0.69$ , $|CSred⋃{ra2}λ(U)|=0.43$ and $|CSred⋃{ra3}λ(U)|=0.57$ . Therefore, the minimum value is $|CSred⋃{ra2}λ(U)|$ and $DCλ(ra2,red,D)=2.14−0.43=1.71>0$ . Thus, $red={ra2,ra4}$ , $R∗={ra1,ra3}$ and $U∗={x1,x6}$ . Proceeding further, we have $|CSred⋃{ra1}λ(U)|=0$ and $|CSred⋃{ra3}λ(U)|=0.43$ . Therefore, the minimum value is $|CSred⋃{ra1}λ(U)|$ and $DCλ(ra2,red,D)=0.43−0=0.43>0$ . Thus, $red={ra1,ra2,ra4}$ , $R∗={ra3}$ and $U∗=∅$ . As $DCλ(ra3,red,D)=0−0=0$ , the feature selection process is terminated. As a result, the final feature selection outcome is $red={ra1,ra2,ra4}$ . Given $m=7$ , $k=6/7$ and $c=3$ . Then, the descending sequence of sample sizes fulfills the condition: $7+6+2+0≤m+km+k2m+k3m≤cm<<m2.$

5 Experimental Analysis

To establish the validity of CSFS, we conduct rigorous experimental comparisons between CSFS and five other state-of-the-art feature selection algorithms.

The following provides brief introductions to the five comparison algorithms, highlighting their key features and characteristics:

• FRFS [15] (2024): It is a classic feature selection algorithm based on the dependency between the decision attribute and the conditional attributes subset.

• FBC [18] (2020): It proposes a forward attribute selection algorithm based on conditional discernibility measure which can reflect the change of distinguishing ability caused by different attribute subsets.

• NEFE [22] (2023): It provides a feature selection algorithm for label distribution using dual-similarity-based neighborhood fuzzy entropy.

• MFBC [23] (2021): It is a feature selection algorithm based on the conditional information entropy under multi-fuzzy β-covering approximation space.

• HANDI [24] (2022): It proposes a feature selection algorithm based on a neighborhood discrimination index, which can characterize the distinguishing information of a neighborhood relation.

5.1 Experimental Design

5.1.1 Evaluation Criteria

In the following three dimensions, we comprehensively evaluate the six algorithms, emphasizing their individual strengths and limitations.

• The size of the selected feature subset.

• The time cost of the feature selection process.

• The performance of the selected feature subset in classification.

5.1.2 Classifiers

To conduct a rigorous and comprehensive evaluation of the efficacy of these six algorithms, we have chosen KNN (k = 5), SVM, and DT (CART) as the primary classification algorithms for assessing their classification accuracy. All other parameters of the three classifiers are set to default values. All testing procedures are conducted using ten-fold cross-validation for rigorous evaluation.

5.1.3 Experimental Environment

All experiments are implemented by Python 3.8.16 and scikit-learn 1.3.2, and run on a computer with an Intel(R) Xeon(R) Gold 5318Y CPU at 2.1 GHz, 128 GB RAM, and the operation system is Ubuntu18.04.1 LTS.

5.1.4 Datasets

In our experiments, we utilized 10 publicly available datasets as benchmarks to assess the performance of various algorithms. The first six datasets, being of low dimension, were procured from the UCI Machine Learning Repository [27], whereas the remaining four high-dimensional datasets were retrieved from the KRBD Repository [28]. For a comprehensive overview of these datasets, please refer to Table 5, which provides detailed descriptions of each dataset’s characteristics. If there are missing values in the dataset, we will fill them with the average value of samples that belong to the same class and have the same feature as the missing value.

images

5.1.5 Parameter Settings

Among the six algorithms, FRFS stands alone as the sole algorithm that does not involve any parameters. To ensure the fairness and integrity of comparative experiments, we employ a meticulous grid search approach to identify the optimal parameter values for all algorithms encompassing parameters. The step size of the CSFS algorithm has been set to 0.01, while the step sizes of other algorithms have been configured to 0.1. As per the specifications provided in the respective references for each algorithm, the search scope of the parameters is summarized in Table 6. The reason why the other algorithms set the step size to 0.1 instead of 0.01 is that if they also set the step size to 0.01, the experiments would not be able to complete. Initially, we outline the time taken by each algorithm in their search for optimal parameters. Subsequently, we present the optimal parameter values that were chosen based on their classification accuracy.

images

Table 7 provides an enumeration of the total time consumed by each algorithm on each dataset when running all possible parameters under the parameter configuration outlined in Table 7. Notably, CSFS stands out in terms of efficiency, consistently achieving the lowest average time for parameter search across all datasets.

images

5.2 Result Analysis

5.2.1 The Size of the Selected Feature Subset

Table 8 presents the count of features chosen by six feature selection algorithms across ten datasets. FRFS and NFEM were terminated early due to excessive runtime, hence their outcomes are undisclosed.

images

Table 8 indicates the following findings:

• In comparison to the original feature count in Table 5, it is noteworthy that these feature selection algorithms effectively select salient features and reduce data redundancy.

• The optimal feature subsets for different classifiers can vary.

• CSFS selects fewer features than FRFS, HANDI, NFEM, and MFBC, yet slightly more than FBC.

5.2.2 The Runtime of the Algorithms

Table 9 and Fig. 4 present a comparison of the time taken by the six feature selection algorithms to execute a feature selection using parameters that achieved the highest classification accuracy for each classifier.

images

Figure 4: The time consumption of the feature selection process across various datasets. (a) KNN classifier with the optimal parameter. (b) SVM classifier with the optimal parameter. (c) DT classifier with the optimal parameter. FRFS and NFEM failed to complete execution on the DLBCLNIH dataset within 30 days and the runtime is set to 2,592,000 s (30 days)

Table 9 and Fig. 4a–c indicate the following findings:

• When it comes to time efficiency, CSFS outperforms all other algorithms, exhibiting a clear advantage. The scale of the vertical axis representing time consumption in the graph is exponentially increasing.

• As the sample size increases, the advantage of CSFS becomes increasingly apparent. This observation is consistent with the theoretical time complexity of CSFS, which is $O(mn2)$ .

It is worth noting that for the DLBCL NIH dataset with 240 samples and 7399 conditional attributes, as the dimensionality increases, the time required for feature selection by the comparison algorithms is significantly higher than that of the CSFS algorithm. The NFEM algorithm is even unable to produce results within an acceptable time frame.

5.2.3 Classification Accuracy

Tables 10–12 present the classification accuracy of three distinct classifiers across the selected feature subsets of the six algorithms. The highest classification accuracy is boldly displayed.

images

Tables 10–12 clearly indicate the following findings:

• Through 30 rigorous evaluations, algorithm CSFS has emerged as the clear leader, achieving the highest classification accuracy on 19 occasions. In comparison, algorithm FBC excelled on 11 occasions, while algorithms HANDI and MFBC delivered the best results on 8 and 3 occasions, respectively. Algorithms FRFS and NFEM lagged behind, achieving the highest accuracy on just 2 occasions each.

• The standard deviation values clearly indicate that the features selected by algorithm CSFS exhibit excellent representativeness, providing a stable and accurate representation of the data.

Compared with other comparison methods, the CSFS algorithm focuses more attention on samples that have not yet been perfectly partitioned under current conditions when measuring the uncertainty of target attributes in feature subsets and is more sensitive to improving the uncertainty of such samples when selecting subsequent features.

5.3 Parameter Analysis

The value of parameter ‘ $λ$ ’ in the CSFS algorithm impacts the outcome of feature selection. In Fig. 5a–d, we can observe the correlation between the number of selected features and the value of parameter ‘ $λ$ ’ in the CSFS algorithm. In Fig. 6a–d, we can observe the correlation between the classification accuracy of selected features and the value of parameter ‘ $λ$ ’ in the CSFS algorithm.

images

Figure 5: The impact of parameter ‘ $λ$ ’ on the number of selected features

images

Figure 6: The impact of parameter ‘ $λ$ ’ on the classification accuracy of selected features. (a) DLBCL Stanford (b) DLBCL Tumor (c) DLBCL NIH (d) AMLALL

Figs. 5 and 6 clearly indicate the following findings:

• As the value of parameter ‘ $λ$ ’ increases, the overall tendency is for the number of selected features to decrease. This is because with an increase in the value of ‘ $λ$ ’, the number of cross-similar relationships being removed also rises.

• When the value of parameter ‘ $λ$ ’ surpasses a certain threshold, the classification accuracy undergoes a drastic drop. This is because at high values of ‘ $λ$ ’, crucial information is also being eliminated, leading to a significant decline in performance.

• The parameter ‘ $λ$ ’ serves to eliminate noisy and redundant relationships, hence the value of ‘ $λ$ ’ should generally not be set too high.

Let’s illustrate the impact of the truncation parameter $λ$ on noise reduction in feature selection through a specific case. For the gene dataset DLBCL NIH, which contains noisy data, by observing Fig. 6c, we find that as $λ$ increases from a small value to a large one, the accuracy of the selected attribute subset in the classifier first increases and then decreases. This indicates that within a certain range, the parameter $λ$ can effectively eliminate the impact of noise and identify a truly meaningful subset of attributes for target classification.

6 Conclusions and Discussions

In this article, we introduce a novel and unique metric for quantifying data uncertainty, termed cross-similarity. This metric iteratively emphasizes similarity relations that substantially influence the uncertainty of fuzzy decision tables. Leveraging this metric, we have devised a forward greedy feature selection algorithm, known as CSFS. The CSFS algorithm significantly reduces the time complexity of feature selection algorithms based on fuzzy rough sets, transforming it from O (m2n2) to O (mn2), thereby greatly enhancing computational efficiency. Experimental results validate that the CSFS algorithm not only accelerates the feature selection process but also accurately identifies the most representative features. Consequently, our algorithm effectively tackles the challenges posed by the high computational complexity of existing fuzzy rough set-based feature selection algorithms.

However, the CSFS algorithm also has some limitations. Firstly, when integrating multi-attribute similarity relationships, it relies on the minimum similarity relationship, making the algorithm sensitive to outlier features. Secondly, the uncertainty measure is based on cross-similarity within the decision table, thus its application is limited to comparing the uncertainty of different attribute subsets within the same decision table, rather than across different decision tables.

In response to the above limitations, our future research endeavors will encompass the following: Firstly, we aim to delve into multi-attribute similarity metrics that consider sample distribution or incorporate linear reconstruction of samples prior to similarity assessment. Both approaches hold promise in reducing the impact of outlier features, ultimately bolstering the algorithm’s robustness. Secondly, to navigate the trade-off between computational complexity and robustness, we will explore the potential of parallel computing. Lastly, we will undertake further research into normalized uncertainty measures grounded in cross-similarity, with the objective of expanding their applicability across diverse scenarios.

Acknowledgement: The authors would like to thank the anonymous reviewers for their valuable comments.

Funding Statement: This work was supported by the Anhui Provincial Department of Education University Research Project (2024AH051375), Research Project of Chizhou University (CZ2022ZRZ06), Anhui Province Natural Science Research Project of Colleges and Universities (2024AH051368) and Excellent Scientific Research and Innovation Team of Anhui Colleges (2022AH010098).

Author Contributions: The authors confirm their contribution to the paper as follows: study conception and design: Wenchang Yu; analysis and interpretation of results: Xiaoqin Ma, Zheqing Zhang; draft manuscript preparation: Qinli Zhang. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Not applicable.

Ethics Approval: Not applicable.

Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.

References

1. Pawlak Z. Rough sets. Int J Comput Inf Sci. 1982;11(4):341–56. doi:10.1007/BF01001956. [Google Scholar] [CrossRef]

2. Dinçer H, Yüksel S, An J, Mikhaylov A. Quantum and AI-based uncertainties for impact-relation map of multidimensional NFT investment decisions. Finance Res Lett. 2024;66(3):105723. doi:10.1016/j.frl.2024.105723. [Google Scholar] [CrossRef]

3. Kamran M, Salamat N, Jana C, Xin Q. Decision-making technique with neutrosophic Z-rough set approach for sustainable industry evaluation using sine trigonometric operators. Appl Soft Comput. 2025;169(2):112539. doi:10.1016/j.asoc.2024.112539. [Google Scholar] [CrossRef]

4. Banerjee M, Mitra S, Banka H. Evolutionary rough feature selection in gene expression data. IEEE Trans Syst Man Cybern C Appl Rev. 2007;37(4):622–32. doi:10.1109/TSMCC.2007.897498. [Google Scholar] [CrossRef]

5. Dubois D, Prade H. Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst. 1990;17(3):191–209. doi:10.1080/03081079008935107. [Google Scholar] [CrossRef]

6. Mi J, Leung Y, Zhao H, Feng T. Generalized fuzzy rough sets determined by a triangular norm. Inf Sci. 2008;178(12):3203–13. doi:10.1016/j.ins.2008.03.013. [Google Scholar] [CrossRef]

7. Yao W, Zhang G, Zhou C. Real-valued hemimetric-based fuzzy rough sets and an application to contour extraction of digital surfaces. Fuzzy Sets Syst. 2023;459(1–3):201–19. doi:10.1016/j.fss.2022.07.010. [Google Scholar] [CrossRef]

8. Zhang QL, Chen YY, Zhang GQ, Li ZW, Chen LJ, Wen CF. New uncertainty measurement for categorical data based on fuzzy information structures: an application in attribute reduction. Inf Sci. 2021;580(5):541–77. doi:10.1016/j.ins.2021.08.089. [Google Scholar] [CrossRef]

9. Yao Y, Zhao Y. Attribute reduction in decision-theoretic rough set models. Inf Sci. 2008;178(17):3356–73. doi:10.1016/j.ins.2008.05.010. [Google Scholar] [CrossRef]

10. Zhao S, Tsang E, Chen D, Wang X. Building a rule-based classifier—a fuzzy-rough set approach. IEEE Trans Knowl Data Eng. 2009;22(4):624–38. doi:10.1109/TKDE.2009.118. [Google Scholar] [CrossRef]

11. Hu Q, Zhang L, An S, Zhang D, Yu D. On robust fuzzy rough set models. IEEE Trans Fuzzy Syst. 2011;20(4):636–51. doi:10.1109/TFUZZ.2011.2181180. [Google Scholar] [CrossRef]

12. Ma L. Two fuzzy covering rough set models and their generalizations over fuzzy lattices. Fuzzy Sets Syst. 2016;294(11):1–17. doi:10.1016/j.fss.2015.05.002. [Google Scholar] [CrossRef]

13. Deer L, Cornelis C. A comprehensive study of fuzzy covering-based rough set models: definitions, properties and interrelationships. Fuzzy Sets Syst. 2018;336(16):1–26. doi:10.1016/j.fss.2017.06.010. [Google Scholar] [CrossRef]

14. Yang B, Hu B. On some types of fuzzy covering-based rough sets. Fuzzy Sets Syst. 2017;312:36–65. doi:10.1016/j.fss.2016.10.009. [Google Scholar] [CrossRef]

15. Zhang Q, Yan S, Peng Y, Li Z. Attribute reduction algorithms with an anti-noise mechanism for hybrid data based on fuzzy evidence theory. Eng Appl Artif Intell. 2024;129:107659. doi:10.1016/j.engappai.2023.107659. [Google Scholar] [CrossRef]

16. Zhang L, Zhan J. Fuzzy soft β-covering based fuzzy rough sets and corresponding decision-making applications. Int J Mach Learn Cybern. 2019;10(6):1487–502. doi:10.1007/s13042-018-0828-3. [Google Scholar] [CrossRef]

17. Jensen R, Shen Q. Fuzzy-rough sets assisted attribute selection. IEEE Trans Fuzzy Syst. 2007;15(1):73–89. doi:10.1109/TFUZZ.2006.889761. [Google Scholar] [CrossRef]

18. Yang T, Zhong X, Lang G, Qian Y, Dai J. Granular matrix: a new approach for granular structure reduction and redundancy evaluation. IEEE Trans Fuzzy Syst. 2020;28(12):3133–44. doi:10.1109/TFUZZ.2020.2984198. [Google Scholar] [CrossRef]

19. Huang Z, Li J. A fitting model for attribute reduction with fuzzy XX-covering. Fuzzy Sets Syst. 2021;413:114–37. doi:10.1016/j.fss.2020.07.010. [Google Scholar] [CrossRef]

20. Huang Z, Li J. Discernibility measures for fuzzy β covering and their application. IEEE Trans Cybern. 2021;52(4):9722–35. doi:10.1109/TCYB.2021.3054742. [Google Scholar] [PubMed] [CrossRef]

21. Huang Z, Li J, Qian Y. Noise-tolerant fuzzy-β-covering-based multigranulation rough sets and feature subset selection. IEEE Trans Fuzzy Syst. 2021;30(9):2721–35. doi:10.1109/TFUZZ.2021.3093202. [Google Scholar] [CrossRef]

22. Wan J, Chen H, Li T, Yuan Z, Liu J, Huang W, et al. Interactive and complementary feature selection via fuzzy multigranularity uncertainty measures. IEEE Trans Cybern. 2023;53(2):1208–21. doi:10.1109/TCYB.2021.3112203. [Google Scholar] [PubMed] [CrossRef]

23. Hu Y, Zhang Y, Gong D. Multiobjective particle swarm optimization for feature selection with fuzzy cost. IEEE Trans Cybern. 2021;51(3):874–88. doi:10.1109/TCYB.2020.3015756. [Google Scholar] [PubMed] [CrossRef]

24. Deng Z, Li T, Deng D, Liu K, Zhang P, Zhang S, et al. Feature selection for label distribution learning using dual-similarity based neighborhood fuzzy entropy. Inf Sci. 2022;615(10):385–404. doi:10.1016/j.ins.2022.10.054. [Google Scholar] [CrossRef]

25. Dai J, Zou X, Qian Y, Wang X. Multifuzzy β-covering approximation spaces and their information measures. IEEE Trans Fuzzy Syst. 2022;31(5):955–69. doi:10.1109/TFUZZ.2022.3193448. [Google Scholar] [CrossRef]

26. Wang C, Hu Q, Wang X, Chen D, Qian Y, Dong Z. Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst. 2017;29(8):2986–99. doi:10.1109/TNNLS.2017.2710422. [Google Scholar] [PubMed] [CrossRef]

27. UCI machine learning repository [Internet]. Irvine, CA, USA: University of California; [cited 2024 Oct 1]. Available from: http://archive.ics.uci.edu. [Google Scholar]

28. Kent Ridge bio-medical data set repository [Internet]. Granada, Spain: University of Granada; [cited 2024 Oct 1]. Available from: https://leo.ugr.es/elvira/DBCRepository. [Google Scholar]

Cite This Article

APA Style

Yu, W., Ma, X., Zhang, Z., Zhang, Q. (2025). A method for fast feature selection utilizing cross-similarity within the context of fuzzy relations. Computers, Materials & Continua, 83(1), 1195–1218. https://doi.org/10.32604/cmc.2025.060833

Vancouver Style

Yu W, Ma X, Zhang Z, Zhang Q. A method for fast feature selection utilizing cross-similarity within the context of fuzzy relations. Comput Mater Contin. 2025;83(1):1195–1218. https://doi.org/10.32604/cmc.2025.060833

IEEE Style

W. Yu, X. Ma, Z. Zhang, and Q. Zhang, “A Method for Fast Feature Selection Utilizing Cross-Similarity within the Context of Fuzzy Relations,” Comput. Mater. Contin., vol. 83, no. 1, pp. 1195–1218, 2025. https://doi.org/10.32604/cmc.2025.060833

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Method for Fast Feature Selection Utilizing Cross-Similarity within the Context of Fuzzy Relations

Abstract

Keywords

References

Cite This Article

198

71

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link