Attribute Reduction of Hybrid Decision Information Systems Based on Fuzzy Conditional Information Entropy

Xiaoqin Ma; Jun Wang; Wenchang Yu; Qinli Zhang

doi:10.32604/cmc.2024.049147

icon Open Access

ARTICLE

Attribute Reduction of Hybrid Decision Information Systems Based on Fuzzy Conditional Information Entropy

Xiaoqin Ma^1,2, Jun Wang¹, Wenchang Yu¹, Qinli Zhang^1,2,*

1 School of Big Data and Artificial Intelligence, Chizhou University, Chizhou, 247000, China
2 Anhui Education Big Data Intelligent Perception and Application Engineering Research Center, Chizhou, 247000, China

* Corresponding Author: Qinli Zhang. Email: email

Computers, Materials & Continua 2024, 79(2), 2063-2083. https://doi.org/10.32604/cmc.2024.049147

Received 28 December 2023; Accepted 13 March 2024; Issue published 15 May 2024

Abstract

The presence of numerous uncertainties in hybrid decision information systems (HDISs) renders attribute reduction a formidable task. Currently available attribute reduction algorithms, including those based on Pawlak attribute importance, Skowron discernibility matrix, and information entropy, struggle to effectively manages multiple uncertainties simultaneously in HDISs like the precise measurement of disparities between nominal attribute values, and attributes with fuzzy boundaries and abnormal values. In order to address the aforementioned issues, this paper delves into the study of attribute reduction within HDISs. First of all, a novel metric based on the decision attribute is introduced to solve the problem of accurately measuring the differences between nominal attribute values. The newly introduced distance metric has been christened the supervised distance that can effectively quantify the differences between the nominal attribute values. Then, based on the newly developed metric, a novel fuzzy relationship is defined from the perspective of “feedback on parity of attribute values to attribute sets”. This new fuzzy relationship serves as a valuable tool in addressing the challenges posed by abnormal attribute values. Furthermore, leveraging the newly introduced fuzzy relationship, the fuzzy conditional information entropy is defined as a solution to the challenges posed by fuzzy attributes. It effectively quantifies the uncertainty associated with fuzzy attribute values, thereby providing a robust framework for handling fuzzy information in hybrid information systems. Finally, an algorithm for attribute reduction utilizing the fuzzy conditional information entropy is presented. The experimental results on 12 datasets show that the average reduction rate of our algorithm reaches 84.04%, and the classification accuracy is improved by 3.91% compared to the original dataset, and by an average of 11.25% compared to the other 9 state-of-the-art reduction algorithms. The comprehensive analysis of these research results clearly indicates that our algorithm is highly effective in managing the intricate uncertainties inherent in hybrid data.

Keywords

Hybrid decision information systems; fuzzy conditional information entropy; attribute reduction; fuzzy relationship; rough set theory (RST)

Symbols and Notations Used in the Paper

Symbols and notations	Meaning
$X$	A set of objects
$C$	A set of attributes
BA	Set of conditional attributes
D	The decision attribute
$2X$	The power set of X
\|Y\|	The cardinality of Y
I	[0,1]
$IX$	A set of all fuzzy sets on X
$IX×X$	A set of all fuzzy relationship on X
R	A fuzzy relationship
Rx	The fuzzy information granule of x
θ	The threshold value of tolerance relationship
P	A subset of conditional attributes
$TPθ$	A tolerance relationship determined by θ and P

1 Introduction

1.1 Research Background

In information systems, the attributes of data are not of equal importance. Some of them may not have a direct or significant impact on decision-making. These attributes are considered as redundant. The objective of attribute reduction is to eliminate these redundant attributes, while maintaining the same classification accuracy, to facilitate more concise and efficient decision-making. Attribute reduction is also called feature selection.

With the emergence of the era of big data, the intricacies and diversities of data have been constantly on the rise. Many datasets encompass both nominal and real-valued attributes, which are referred to as hybrid data. Such hybrid data can be found in various domains, including finance, healthcare, and e-commerce. The hybrid data encompass diverse attributes, necessitating distinct methodologies and strategies for their manipulation. The complexity intensifies the challenges of attribute reduction. Primarily, the disparities among nominal attribute values prove challenging to assess accurately using Euclidean distance. Furthermore, data may encompass uncertainties such as fuzzy attributes and anomalous attribute values. The reduction of attributes in hybrid data is a pivotal research topic in the field of data mining and lies at the heart of rough set theory. This line of research not only enhances the efficiency and precision of data processing but also fosters the advancement of data mining and machine learning. Consequently, exploring the attribute reduction problem in mixed information systems holds profound theoretical significance and practical relevance [1].

1.2 Related Works

The classical rough set theory is limited in its ability to handle continuous data, hence it has been enhanced to include neighborhood rough sets and fuzzy rough sets as extensions. Currently, numerous scholars have delved into the intricacies of attribute reduction by using neighborhood rough sets and fuzzy rough sets, ultimately yielding impressive outcomes.

Neighborhood rough sets can effectively handle continuous values and have been widely applied in attribute reduction. Fan et al. [2] established a max-decision neighborhood rough set model and successfully applied it to achieve attribute reduction for information systems, leading to more accurate and concise decision-making. Shu et al. [3] proposed a neighborhood entropy-based incremental feature selection framework for hybrid data based on the neighborhood rough set model. In their study, Zhang et al. [4] conducted the entropy measurement on the gene space, utilizing the neighborhood rough sets, for effective gene selection. Zhou et al. [5] put forward an online stream-based attribute reduction method based on the new neighborhood rough set theory, eliminating the need for domain knowledge or pre-parameters. Xia et al. [6] built a granular ball neighborhood rough set method for fast adaptive attribute reduction. Liao et al. [7] developed an effective attribute reduction method for handling hybrid data under test cost constraints based on the generalized confidence level and vector neighborhood rough set models. None of the above methods provides a specific strategy for handling the differences between nominal attribute values. Instead, they use Euclidean distance to measure the differences between nominal attribute values, or use category information to simply define the differences between nominal attribute values as 1 or 0.

Fuzzy rough sets are capable of effectively dealing with the ambiguity in real-world scenarios, and have also been successfully applied in attribute reduction. Yuan et al. [8] established a unsupervised attribute reduction model for hybrid data based on fuzzy rough sets. Zeng et al. [9] examined the incremental updating mechanism of fuzzy rough approximations in response to attribute value changes in a hybrid information system. Yang et al. [10] considered the uncertainty measurement and attribute reduction for multi-source fuzzy information systems based on a new multi-granulation rough sets model. Wang et al. [11] utilized variable distance parameters and, drawing from fuzzy rough set theory, developed an iterative attribute reduction model. Singh et al. [12] came up with a fuzzy similarity-based rough set approach for attribute reduction in set-valued information systems; Jain et al. [13] created an attribute reduction model by using the intuitionistic fuzzy rough set. The above methods are either not suitable for handling mixed data or do not have special techniques for handling nominal attribute values and abnormal attribute values.

Information entropy quantifies the uncertainty of knowledge and can be used for the attribute reduction. Li et al. [14] used conditional information entropy to measure the uncertainty and reduce the attributes of multi-source incomplete information systems. Vergara et al. [15] summarized the feature selection methods based on the mutual information. Li et al. [16] measured the uncertainty of gene spaces and designed a gene seleciton algorithm using information entropy. Zhang et al. [17] defined a fuzzy information entropy of classification data based on the constructed fuzzy information structure and used it for attribute reduction. Zhang et al. [18] constructed a hybrid data attribute reduction model utilizing the λ-conditional entropy derived from fuzzy rough set theory. Sang et al. [19] proposed an incremental attribute reduction method using the fuzzy dominance neighborhood conditional entropy derived from fuzzy dominant neighborhood rough sets. Huang et al. [20] defined a new fuzzy conditional information entropy for fuzzy $β$ covering information systems and applied it to attribute reduction. The above methods also do not provide a specific solution for handling mixed data.

Akram et al. [21,22] presented the concept of attribute reduction and designed the associated attribute reduction algorithms based on the discernibility matrix and discernibility function. These methods are intuitive and easy to comprehend, enabling the calculation of all reducts. However, they exhibit a high computational complexity, rendering them unsuitable for large datasets.

The comprehensive overview of the strengths and weaknesses of diverse attribute reduction algorithms can be found in Table 1.

images

1.3 Motivation and Contribution

Hybrid data is a common occurrence in data mining. They have been observed that nominal attributes significantly impact the measurement of data similarity, and Euclidean distance do not work well when dealing with nominal attribute values, yet the majority of current reduction algorithms rely heavily on Euclidean distance. Therefore, the supervised distance, a novel metric specifically for measuring the difference between nominal attribute values, has been meticulously crafted. Existing attribute reduction algorithms often exhibit sensitivity to abnormal attribute values and lack an efficient mechanism to address them. Consequently, this article introduces a novel fuzzy relationship that replaces similarity measurements based on attribute values with the count of similar attributes. This innovative approach effectively filters out abnormal attribute values, enhancing the overall accuracy and robustness of the algorithm. Due to the fact that the fuzzy conditional information entropy not only considers the fuzziness of data but also has a certain anti-noise ability, this paper uses fuzzy conditional information entropy to reduce the attributes of mixed data. The article’s novelty and contribution are outlined below:

1. A new distance metric called the supervised distance is introduced. The supervised distance metric, which considers the decision attribute that affect attribute similarity, leads to more precise attribute reduction.

2. Based on the new distance measurement, a new fuzzy relationship determined by the quantity of attributes with similar values is established. This approach views the relationship from the perspective of “feedback on parity of attribute values to attribute sets”, resulting in a fuzzy relation that is robust to a few abnormal attributes.

3. The fuzzy conditional information entropy is defined based on the new fuzzy relationship and an advanced attribute reduction algorithm is developed based on the fuzzy conditional information entropy. This algorithm incorporates an innovative distance function, an innovative fuzzy relationship, and fuzzy conditional information entropy.

4. Through meticulous experimental validation, this study clearly demonstrates that the fuzzy relation offers superior performance to the equivalence relation when dealing with hybrid data, effectively handling its complexity and uncertainties.

1.4 Organization

The structure of this paper is outlined as follows. Section 2 provides a brief overview of fuzzy relationships and HDISs, laying the foundation for the subsequent sections. Section 3 describes the method in detail. Section 4 presents the experimental results and discusses them. Section 5 summarizes the paper.

2 Preliminaries

In this section, we will review some fundamental concepts of fuzzy relations and HDISs.

2.1 Fuzzy Relation

Let $I=[0,1]$ . A fuzzy set F on X is known as a mapping $μ:X→I.$ $∀x∈X$ , $μ(x)$ is the degree of membership of x in F [23]. F can be represented as follows:

$F=μ(x1)x1+μ(x2)x2+⋯+μ(xn)xn.$ (1)

Let $|F|=∑i=1nμ(xi).$ Then $|F|$ is called the cardinality of F.

Let $IX$ denote the family of all fuzzy sets on X.

$∀F∈IX$ and $∀G∈2X$ , $F∩G$ is defined as follows:

$(F∩G)(x)=F(x)∧G(x)={F(x),x∈G0,x∉G.$ (2)

Then $|F∩G|=|F|$ $.$

If R is a fuzzy set on $X×X$ , then R is referred to as a fuzzy relation on X. R can be represented by the matrix: $M(R)=(rij)n×n$ ( $rij=R(xi,xj)∈I$ denotes the similarity between $xi$ and $xj$ ).

Let $IX×X$ denote the set of all fuzzy relationship on X.

Definition 2.1. Reference [24] Let R be a fuzzy relation on X. Then R is

1) Reflexive, if R(x,x) = 1 ( $∀x∈X$ ).

2) Symmetric, if R(x,y) = R(y, x) ( $∀x,y∈X$ ).

3) Transitive, if $R(x,z)≥R(x,y)∧R(y,z)(∀x,y,z∈X)$ .

If R is reflexive, symmetric, and transitive, then R is called a fuzzy equivalence relation on X. If R is reflexive and symmetric, then R is called a fuzzy tolerance relation on X.

Let $R∈IX×X$ . $∀x∈X$ , the fuzzy set $Rx$ on X is defined as follows:

$Rx(y)=R(x,y)(y∈X).$ (3)

Then $Rx$ is called the fuzzy information granule of x.

According to (1), $Rx=R(x,x1)x1+R(x,x2)x2+⋯+R(x,xn)xn.$

Then $|Rx|=∑i=1nR(x,xi).$

2.2 Hybrid Decision Information Systems

Definition 2.2. Reference [25] (X,C) is called an information system (IS), if $∀c∈C$ decides a function $f:X→Vc$ , where $Vc={f(x):x∈X}$ . If $Q⊆C$ , then (X,Q) is said to be a subsystem of (X,C).

If $C=B∪{d}$ , where $B={b1,b2,⋯,bm}$ represents the conditional attributes set, d represents the decision attribute, then (X,C) is called as a decision information system.

Definition 2.3. Let (X,B,d) is a decision information system. Then (X,B,d) is known as a hybrid decision information system (HDIS), if $B=Bc∪Br$ , where $Bc$ is a set of categorical attributes, $Br$ is a set of real-valued attributes.

Example 2.4. Table 2 shows an HDIS, where $X={x1,x2,⋯,x9}$ , $Bc={b1,b2}$ and $Br={b3}$ .

images

3 Methodology

3.1 A New Distance Function

To accurately measure the difference between two attributes in HDISs, we have developed a novel distance function that accounts for various types of attributes and missing data. This innovative approach allows for a more comprehensive assessment of the similarity or dissimilarity between different attributes in the system.

We first assign a definition for the distance between categorical attribute values to enable further distance definition of hybrid data.

Definition 3.1. For an HDIS (X,B,d), let $Vd={d(x):x∈X}={d1,d2,⋯,dr}$ ,

$N(b,x)=|{y∈X:b(x)=b(y),b∈Bc,b(x)≠∗}|(∀x∈X)$

$Ni(b,x)=|{y∈X:b(x)=b(y),d(y)=di}|$

Then $N(b,x)=∑i=1rNi(b.x)$ $.$

Example 3.2. (Continue with Example 2.4)

$Vd={d1=Flu,d2=Rhinitis,d3=Health},N(b2,x5)=|{x∈X:b2(x)=b2(x5)=Yes}|=|{x1,x2,x4,x5,x9}|=5,N1(b2,x5)=|{x∈X:b2(x)=b2(x5)=Yes,d(x)=d1=Flu}|=|{x1,x2,x4}|=3N2(b2,x5)=|{x∈X:b2(x)=b2(x5)=Yes,d(x)=d2=Rhinitis}|=|{x5}|=1N3(b2,x5)=|{x∈X:b2(x)=b2(x5)=Yes,d(x)=d3=Health}|=|{x9}|=1.$

Definition 3.3. For an HDIS (X,B,d), let $|Vd|=r$ , $b∈Bc$ , $x∈X$ , $y∈X$ , $b(x)≠∗$ and $b(y)≠∗$ . Then the distance between b(x) and b(y) is defiend as follows:

$ρc(b(x),b(y))=12∑i=1r|Ni(b,x)N(b,x)−Ni(b,y)N(b,y)|.$ (4)

The distribution of attribute values defines a distance that is well-suited for the characteristics of classification data.

Proposition 3.4. Let (X,B,d) be an HDIS. Then the following conclusions are valid:

1) $ρc(b(x),b(x))=0,$

2) $0≤ρc(b(x),b(y))≤1$ .

Proof. 1) The conclusion is self-evidently valid.

2) $ρc(b(x),b(y))≥0$ is self-evidently valid.

Since

$∑i=1rNi(b,x)N(b,x)=∑i=1rNi(b,y)N(b,y)=1,$

we have

$ρc(b(x),b(y))≤12(∑i=1rNi(b,x)N(b,x)+∑i=1rNi(b,y)N(b,y))=1.$

Definition 3.5. For an HDIS (X,B,d), let $b∈Br$ , $x∈X$ , $y∈X$ , $b(x)≠∗$ , and $b(y)≠∗$ . Then the distance between b(x) and b(y) is defined as follows:

$ρr(b(x),b(y))=|b(x)−b(y)|max{b(x):x∈X}−min{b(x):x∈X}.$ (5)

$ρr(b(x),b(x))=0$ and $0≤ρr(b(x),b(y))≤1$ are self-evidently valid.

Definition 3.6. For an HDIS (X,B,d), let $b∈B$ , $x∈X$ , and $y∈X$ . Then the distance between b(x) and b(y) is defined as follows:

$ρ(b(x),b(y))={0,b∈B,b(x)=∗∨b(y)=∗,d(x)=d(y)1,b∈B,b(x)=∗∨b(y)=∗,d(x)≠d(y)ρc(b(x),b(y)),b∈Bc,b(x)≠∗∧b(y)≠∗ρr(b(x),b(y)),b∈Br,b(x)≠∗∧b(y)≠∗.$ (6)

Example 3.7. (Continuation of Example 2.4) According to Definitions 3.1–3.6, we have

1) $ρ(b1(x1),b1(x3))=(|22−12|+|02−12|+|02−02|)/3=13$ ,

2) $ρ(b1(x1),b1(x4))=(|22−13|+|02−03|+|02−23|)/3=49$ ,

3) $ρ(b1(x5),b1(x9))=1$ ,

4) $ρ(b2(x1),b2(x3))=0$ ,

5) $ρ(b3(x1),b3(x3))=|39.5−39|40−36=18$ .

Definition 3.8. For an HDIS (X,B,d), $∀b∈B$ , $∀xi,xj∈X$ , let $Mb=(ρ(b(xi),b(xj)))n×n$ . Then $Mb$ is called the distance matrix of b.

Example 3.9. (Continuation of Example 2.4) Here are the distance matrices of $b1$ and $b3$ :

$Mb1=(001/34/911/34/94/91001/34/911/34/94/911/31/304/9104/94/914/94/94/9014/90011111001111/31/304/9004/94/914/94/94/9014/90004/94/94/9014/9000111111000),$

$Mb3=(01/81/83/83/417/815/81/801/41/27/81113/41/81/401/45/813/411/23/81/21/403/811/211/43/47/85/83/8001/811/81111001117/813/41/21/81001/41111110005/813/41/21/411/400).$

3.2 A New Fuzzy Relation

Definition 3.10. For an HDIS (X,B,d), $∀P⊆B$ , $∀θ∈[0,1]$ , the tolerance relationship is defined as follows:

$TPθ={(x,y)∈X×X:∀b∈P,ρ(b(x),b(y))≤θ}.$

Let $TPθ(x)={y∈X:(x,y)∈TPθ}.$ $TPθ(x)$ is called the tolerance class of x.

$TPθ$ uses distance to measure the similarity of objects. Next, we introduce a new fuzzy relationship that evaluates the similarity of objects based on the number of similar attributes.

Definition 3.11. Let (X,B,d) be an HDIS. $∀P⊆B$ , $∀θ∈[0,1]$ , $∀x∈X$ , $∀y∈X$ , define

$RPθ(x,y)=1|B||{b∈P:ρ(b(x),b(y))≤θ}|,Rd={(x,y)∈X×X:d(x)=d(y)}$

Then $RPθ$ and $Rd$ are a fuzzy relation and an equivalence relation on X, respectively.

The matrix representation of $RPθ$ is $M(RPθ)=(RPθ(xixj))n×n$ .

Let $RPθx(y)=RPθ(x,y)$ and $Rdx={y∈X:(x,y)∈Rd}$ .

$RPθx$ is the fuzzy information granule of x and $Rdx$ is the decision class of x.

According to (1), $RPθx=RPθ(x,x1)x1+RPθ(x,x2)x2+⋯+RPθ(x,xn)xn,|RPθx|=∑i=1nRPθ(x,xi)$

Let $X/Rd={Rdx:x∈X}={D1,D2,⋯,Dr}$ $.$

Let $Pθxy={b∈P:ρ(b(x),b(y))≤θ}$ . Then $RPθx(y)=1|B||Pθxy|$ $.$

$RPθx(x)=|P||B|≤1$ and $RPθx(y)≤|P||B|$ are self-evidently valid.

Proposition 3.12. Let (X,B,d) be an HDIS. If $P⊆B$ , $P1⊆P2⊆B$ $,$ and $0≤θ1≤θ2≤1$ , $RP1θx⊆RP2θx$ and $RPθ1x⊆RPθ2x(∀x∈X)$ .

Proof. According to Definition 3.11, $∀y∈X$ , $∀θ∈[0,1]$ , $RP1θx(y)=1|B||P1θxy|$ , $RP2θx(y)=1|B||P2θxy|$ $.$

Since $P1⊆P2$ , $P1θxy⊆P2θxy$ . Therefore, $∀y∈X$ , $RP1θx(y)≤RP2θx(y)$ .

Thus, $∀x∈X$ , $RP1θx⊆RP2θx$ .

Since $0≤θ1≤θ2≤1$ , $Pθ1xy⊆Pθ2xy$ . Thus $RPθ1x(y)≤RPθ2x(y)$ . Therefore, $RPθ1x⊆RPθ2x$ .

3.3 Fuzzy Conditional Information Entropy in HDISs

This section explores the concept of fuzzy entropy measures in HDISs to measure the uncertainty of HDISs.

Definition 3.13. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . Then the fuzzy information entropy of P is defined as follows:

$Hθ(P)=−∑i=1n|RPθxi|nlog2|RPθxi|n.$ (7)

Proposition 3.14. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . The following inequality holds:

$0≤Hθ(P)≤nlog2mn|P|.$

Proof. Since $|P|m≤|RPθxi|≤n$ , $|P|mn≤|RPθxi|n≤1$ $.$

The following conclusion follows directly from Definition 3.13:

$0≤Hθ(P)≤nlog2mn|P|.$

Definition 3.15. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . The fuzzy conditional information entropy of P concerning d is defined as follows:

$Hθ(P|d)=−∑i=1n∑j=1r|RPθxi∩Dj|nlog2|RPθxi∩Dj||RPθxi|.$ (8)

Lemma 3.16. (X,B,d) is an HDIS. Let $P⊆B$ , $θ∈[0,1]$ and $D∈X/d$ . The following conclusion is valid:

$(RPθx∩D)(y)+(RPθx∩(X−D))(y)=RPθ(x,y)(∀x,y∈X).$

Proof. $∀x,y∈X$ $,$ we have

$(RPθx∩D)(y)=RPθ(x,y)∧D(y)={RPθ(x,y),y∈D0,y∉D,$

$(RPθx∩(X−D))(y)=RPθ(x,y)∧(X−D)(y)={0,y∈DRPθ(x,y),y∉D.$

Thus, $(RPθx∩D)(y)+(RPθx∩(X−D))(y)=RPθ(x,y).$

Proposition 3.17. (X,B,d) is an HDIS. Let $P⊆B$ , $θ∈[0,1]$ and $D∈X/d$ . The following conclusion is valid:

$|RPθx∩D|+|RPθx∩(X−D)|=|RPθx|(∀x∈X).$

Proof. According to Lemma 3.16,

$|RPθx∩D|+|RPθx∩(X−D)|=∑i−1n(RPθx∩D)(xi)+∑i=1n(RPθx∩(X−D))(xi)=∑i−1n[(RPθx∩D)(xi)+(RPθx∩(X−D))(xi)]=∑i=1nRPθ(x,xi)=|RPθx|.$

Proposition 3. 18. (X,B,d) is an HDIS.

1) If $Q⊆P⊆B$ , then $Hθ(Q|d)≤Hθ(P|d)$ .

2) If $0≤θ1≤θ2≤1$ , then $Hθ1(P|d)≤Hθ2(P|d)(∀P⊆B)$ .

$RQθxi⊆RPθxi$ according to Proposition 3.12.

Hence, $0≤qij(1)≤pij(1)$ and $0≤qij(2)≤pij(2)$ .

According to Proposition 3.17, we have $pij(1)+pij(2)=|RPθxi|$ and $qij(1)+qij(2)=|RQθxi|$ $.$

Let $f(x,y)=−xlog2xx+y(x>0,y>0)$ .

Then $Hθ(P|d)=−∑i=1n⁡∑j=1r⁡|RPθxi∩Dj|nlog2|RPθxi∩Dj||RPθxi|=−∑i=1n⁡∑j=1r⁡pij(1)nlog2pij(1)pij(1)+pij(2)=1n∑i=1n⁡∑j=1r⁡f(pij(1),pij(2))$ and

$Hθ(Q|d)=−∑i=1n⁡∑j=1r⁡|RQθxi∩Dj|nlog2|RQθxi∩Dj||RQθxi|=−∑i=1n⁡∑j=1r⁡qij(1)nlog2qij(1)qij(1)+qij(2)=1n∑i=1n⁡∑j=1r⁡f(qij(1),qij(2))$ .

Since the function f(x, y) exhibits monotonic increases in both x and y,

$f(qij(1),qij(2))≤f(pij(1),qij(2))≤f(pij(1),pij(2))$ .

Hence, $Hθ(Q|d)≤Hθ(P|d)$ .

$RPθ1xi⊆RPθ2xi$ according to Proposition 3.12. Hence, $0≤sij(1)≤tij(1)$ and $0≤sij(2)≤tij(2)$ .

According to Proposition 3.17, we have $sij(1)+sij(2)=|RPθ1xi|$ and $tij(1)+tij(2)=|RPθ2xi|$ $.$

$Hθ1(P|d)=−∑i=1n∑j=1r|RPθ1xi∩Dj|nlog2|RPθ1xi∩Dj||RPθ1xi|=−∑i=1n∑j=1rsij(1)nlog2sij(1)sij(1)+sij(2)=1n∑i=1n∑j=1rf(sij(1),sij(2))$

$Hθ2(P|d)=−∑i=1n∑j=1r|RPθ2xi∩Dj|nlog2|RPθ2xi∩Dj||RPθ2xi|=−∑i=1n∑j=1rtij(1)nlog2tij(1)tij(1)+tij(2)=1n∑i=1n∑j=1rf(tij(1),tij(2))$

It follows from the monotonicity of f(x,y) that $f(sij(1),sij(2))≤f(tij(1),sij(2))≤f(tij(1),tij(2))$ .

Hence $Hθ1(P|d)≤Hθ2(P|d)$ .

Definition 3.19. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . The fuzzy joint information entropy of P and d is defined as follows:

$Hθ(P∪d)=−∑i=1n∑j=1r|RPθxi∩Dj|nlog2|RPθxi∩Dj|n.$ (9)

Lemma 3.20. (X,B,d) is an HDIS. Let $P⊆B$ , $θ∈[0,1]$ , $U∈2X$ , and $V∈2X$ , then

$∑y∈URPθ(x,y)+∑y∈VRPθ(x,y)≥∑y∈U∪VRPθ(x,y)(∀x∈X)$

The equality holds as $U∩V=Φ$ .

Proof. The conclusion is self-evident.

Proposition 3.21. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . Then

$∑j=1r|RPθx∩Dj|=|RPθx|(∀x∈X).$

Proof. Since $D={D1,D2,⋯,Dr}$ constitutes a partition of X,

$∑j=1r|RPθx∩Dj|=∑j=1r∑y∈Dj(RPθ(x,y)∧Dj(y))=∑y∈X(RPθ(x,y)∧D(y))=∑y∈XRPθ(x,y)=|RPθx|(∀x∈X)by Lemma 3.20.$

Proposition 3.22. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . Then

$Hθ(P|d)=Hθ(P∪d)−Hθ(P).$

Proof. $∑j=1r⁡|RPθxi∩Dj|=|RPθxi|$ according to Proposition 3.21.

$Hθ(P|d)=−∑i=1n∑j=1r|RPθxi∩Dj|nlog2|RPθxi∩Dj||RPθxi|=−∑i=1n∑j=1r|RPθxi∩Dj|n(log2|RPθxi∩Dj|n−log2|RPθxi|n)=−∑i=1n∑j=1r|RPθxi∩Dj|nlog2|RPθxi∩Dj|n+∑i=1n∑j=1r|RPθxi∩Dj|nlog2|RPθxi|n=Hθ(P∪d)+∑i=1n|RPθxi|nlog2|RPθxi|n=Hθ(P∪d)−Hθ(P).$

Theorem 3.23. (X,B,d) is an HDIS. Let $P⊆B$ and $θ∈[0,1]$ . Then $Hθ(P|d)≥0$ .

Proof. $Hθ(P)=−∑i=1n⁡|RPθxi|nlog2|RPθxi|n$ according to Definition 3.13.

$∑j=1r⁡|RPθxi∩Dj|=|RPθxi|$ according to Proposition 3.21.

$Hθ(P∪d)=−∑i=1n⁡∑j=1r⁡|RPθxi∩Dj|nlog2|RPθxi∩Dj|n$ according to Definition 3.19.

Since $log2|RPθxi∩Dj|n≤log2|RPθxi|n$ , $Hθ(P)≤Hθ(P∪d)$ .

$Hθ(P|d)=Hθ(P∪d)−Hθ(P)$ according to Proposition 3.22.

Hence, $Hθ(P|d)≥0$ $.$

3.4 An Attribute Reduction Algorithm Utilizing Fuzzy Conditional Information Entropy

images

Definition 3.24. (X,B,d) is an HDIS. Let $A⊆B$ and $θ∈[0,1]$ . If $Hθ(A|d)=Hθ(B|d)$ , A is called a coordination subset of B.

Let coo (B) denote the collection of all coordination subsets of B.

Definition 3.25. (X,B,d) is an HDIS. Let $A⊆B$ and $θ∈[0,1]$ . If $A∈coo(B)$ and $A−a∉coo(B)(∀a∈A)$ , A is called a reduct of B.

Let red(B) denote the collection of all reducts of B.

In accordance with the aforementioned definitions and Proposition 3.18, Algorithm 1 for attribute reduction is hereby presented.

The time complexity of Algorithm 1 is shown in Table 3.

images

4 Experimental Results and Discussions

4.1 Datasets

Across all experiments, we utilized twelve University of California Irvine (UCI) datasets with hybrid attributes to evaluate our approach. Table 4 offers comprehensive details on each dataset.

images

4.2 The Property of Monotonicity in Fuzzy Conditional Information Entropy

To verify the monotonicity of fuzzy conditional information entropy (FCIE), we calculate the fuzzy conditional information entropy of each dataset as the number of attributes increases.

Fig. 1 demonstrates that the fuzzy conditional information entropy consistently rises as the number of attributes increases, highlighting its monotonic behavior. The monotonicity displayed in Fig. 1 is a result of the parameter $θ$ set to 0.1. When other values are assigned to $θ$ , similar outcomes can also be observed.

images

Figure 1: Monotonicity of fuzzy conditional information entropy

4.3 The Influence of $θ$ on Attribute Reduction and Classification Accuracy

The parameter $θ$ of IARFCIE will affect the reduction results and classification accuracy. To determine the optimal parameter value, we can iterate through each value in {0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1} for $θ$ and select the one that yields the best performance.

In the experiment, a decision tree classifier was employed, and the obtained results were the mean of six iterations of 10-fold cross-validation. The complete dataset is evenly divided into 10 distinct subsets. Of these, 9 subsets are designated as training datasets, while the remaining subset serves as the test dataset. This rotation is repeated until the test dataset has undergone all 10 subsets, ensuring comprehensive evaluation. The number of epochs is 60.

Fig. 2 shows the classification accuracy of the reduction results obtained by algorithm IARFCIE with different values of parameter $θ$ . By referring to Fig. 2, we can effectively identify the optimal parameter values for algorithm IARFCIE across the 12 datasets. Table 5 shows the optimal parameter values for 12 datasets.

images images

Figure 2: Accuracy of classification for 12 datasets under different parameter settings

images

4.4 Performance Comparison Results of IARFCIE and Other State-of-the-Art Attribute Reduction Algorithms

In this section, we present a comprehensive comparison of the proposed algorithm IARFCIE with nine state-of-the-art attribute reduction algorithms: The positive region backward deletion algorithm (PRBDA), the belief-based attribute reduction algorithm (BARA) [1], the cost-sensitive attribute-scale selection algorithm (CSASSA) [7], the entropy-approximate reduction algorithm (EARA) [11], the information-preserving reduction algorithm (IPRA) [26], the random forest algorithm (RFA) [27], the relief algorithm (RA) [28], the mutual information algorithm (MIA) [18], and the neighborhood rough set algorithm (NRSA) [2]. Tables 6 and 7 show the reduction results of 10 reduction algorithms on 12 datasets. Table 8 presents the classification accuracy achieved by 10 attribute reduction algorithms, as well as the original datasets, when utilizing decision tree (DT) classifier. The experimental results are obtained by averaging 10 runs of ten-fold cross-validation.

images

4.5 Discussions

Tables 6–8 demonstrate that IARFCIE outperforms all other algorithms and raw datasets across the 12 datasets when utilizing the DT classifier. Despite selecting more features than EARA, CSASSA, RFA, and RA, there is minimal difference in the number of selected features and IARFCIE significantly surpasses their accuracy.

CSASSA and RA selected very few features, but their accuracy was also very low, indicating that they suffered from underfitting. PRBDA selected a very large number of features, but its accuracy was not very high, indicating that it suffered from overfitting.

The superior performance of IARFCIE can be attributed to two factors: Firstly, it does not rely on Euclidean distance, instead utilizing a novel distance metric. As is widely recognized, Euclidean distance is unsuitable for assessing the differences between nominal attribute values. The novel distance metric, which employs probability distributions to measure differences between nominal attribute values, aligns more closely with their inherent characteristics. Secondly, the algorithm introduces a new fuzzy relation that replaces similarity calculations based on distance with those derived from the number of attributes. This updated similarity measure effectively filters out a small number of outliers, thereby enhancing the robustness of the reduction algorithm.

5 Conclusions and Future Works

In this article, we introduce a novel difference metric that incorporates decision attributes. This metric offers a more accurate measurement of disparities between nominal attribute values. Subsequently, based on this new metric, we define a novel fuzzy relationship. This fuzzy relationship effectively filters out abnormal attribute values by utilizing the number of similar attributes to determine sample similarity. Furthermore, utilizing this new fuzzy relationship, we define a fuzzy conditional information entropy. An attribute reduction algorithm, formulated on the basis of fuzzy conditional information entropy, is then developed. Experimental results demonstrate that this attribute reduction algorithm not only exhibits a significant attribute reduction rate but also surpasses the original dataset as well as other attribute reduction algorithms in terms of average classification accuracy. Consequently, the novel metric and fuzzy relationship introduced in this article are effective in addressing the challenges associated with accurately measuring disparities between nominal attribute values and the sensitivity of attribute reduction algorithms towards abnormal attribute values. This study introduces a fresh perspective on reducing attributes in hybrid data by enhancing distance and fuzzy relationships. However, optimizing the parameters of the algorithm through grid search can significantly impact its efficiency. To address this issue, we aim to explore automatic parameter optimization methods as a future research direction.

Acknowledgement: The authors would like to thank the anonymous reviewers for their valuable comments.

Funding Statement: This work was supported by Anhui Province Natural Science Research Project of Colleges and Universities (2023AH040321) and Excellent Scientific Research and Innovation Team of Anhui Colleges (2022AH010098).

Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Xiaoqin Ma; analysis and interpretation of results: Jun Wang, Wenchang Yu; draft manuscript preparation: Qinli Zhang. All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials: Not applicable.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. X. Chu et al., “Multi-granularity dominance rough concept attribute reduction over hybrid information systems and its application in clinical decision-making,” Inform. Sci., vol. 597, pp. 274–299, 2022. doi: 10.1016/j.ins.2022.03.048. [Google Scholar] [CrossRef]

2. X. D. Fan, W. D. Zhao, C. Z. Wang, and Y. Huang, “Attribute reduction based on max-decision neighborhood rough set model,” Knowl-Based Syst., vol. 151, no. 1, pp. 16–23, 2018. doi: 10.1016/j.knosys.2018.03.015. [Google Scholar] [CrossRef]

3. W. H. Shu, W. B. Qin, and Y. H. Xie, “Incremental feature selection for dynamic hybrid data using neighborhood rough set,” Knowl-Based Syst., vol. 194, no. 22, pp. 28–39, 2020. doi: 10.1016/j.knosys.2020.105516. [Google Scholar] [CrossRef]

4. J. Zhang, G. Q. Zhang, Z. W. Li, L. Qu, and C. Wen, “Feature selection in a neighborhood decision information system with application to single cell RNA data classification,” Appl. Soft Comput., vol. 113, no. 1, pp. 107876, 2021. doi: 10.1016/j.asoc.2021.107876. [Google Scholar] [CrossRef]

5. P. Zhou, X. G. Hu, P. P. Li, and X. D. Wu, “Online streaming feature selection using adapted neighborhood rough set,” Inform. Sci., vol. 481, no. 5, pp. 258–279, 2019. doi: 10.1016/j.ins.2018.12.074. [Google Scholar] [CrossRef]

6. S. Y. Xia, H. Zhang, W. H. Li, G. Y. Wang, E. Giem and Z. Z. Chen, “GBNRS: A novel rough set algorithm for fast adaptive attribute reduction in classification,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 3, pp. 1231–1242, 2022. doi: 10.1109/TKDE.2020.2997039. [Google Scholar] [CrossRef]

7. S. J. Liao, Y. D. Lin, J. J. Li, H. L. Li, and Y. H. Qian, “Attribute-scale selection for hybrid data with test cost constraint: The approach and uncertainty measures,” Int. J. Intell Syst., vol. 37, no. 6, pp. 3297–3333, 2022. doi: 10.1002/int.22678. [Google Scholar] [CrossRef]

8. Z. Yuan, H. Chen, T. Li, Z. Yu, B. Sang and C. Luo, “Unsupervised attribute reduction for mixed data based on fuzzy rough set,” Inform. Sci., vol. 572, no. 1, pp. 67–87, 2021. doi: 10.1016/j.ins.2021.04.083. [Google Scholar] [CrossRef]

9. A. P. Zeng, T. R. Li, J. Hu, H. M. Chen, and C. Luo, “Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values,” Inform. Sci., vol. 378, no. 6, pp. 363–388, 2017. doi: 10.1016/j.ins.2016.07.056. [Google Scholar] [CrossRef]

10. L. Yang, X. Y. Zhang, X. H. Xu, and B. B. Sang, “Multi-granulation rough sets and uncertainty measurement for multi-source fuzzy information system,” Int. J. Fuzzy Syst., vol. 21, no. 6, pp. 1919–1937, 2019. doi: 10.1007/s40815-019-00667-1. [Google Scholar] [CrossRef]

11. C. Z. Wang, Y. Wang, M. W. Shao, Y. H. Qian, and D. G. Chen, “Fuzzy rough attribute reduction for categorical data,” IEEE Trans. Fuzzy Syst., vol. 28, no. 5, pp. 818–830, 2020. doi: 10.1109/TFUZZ.2019.2949765. [Google Scholar] [CrossRef]

12. S. Singh, S. Shreevastava, T. Som, and G. Somani, “A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems,” Soft Comput., vol. 24, no. 6, pp. 4675–4691, 2020. doi: 10.1007/s00500-019-04228-4. [Google Scholar] [CrossRef]

13. P. Jain, A. K. Tiwari, and T. Som, “A fitting model based intuitionistic fuzzy rough feature selection,” Eng. Appl. Artif. Intell., vol. 89, pp. 1–13, 2020. doi: 10.1016/j.engappai.2019.103421. [Google Scholar] [CrossRef]

14. Z. W. Li, Q. L. Zhang, S. P. Liu, Y. C. Peng, and L. L. Li, “Information fusion and attribute reduction for multi-source incomplete mixed data via conditional information entropy and D-S evidence theory,” Appl. Soft Comput., vol. 151, no. 11, pp. 111149, 2023. doi: 10.1016/j.asoc.2023.111149. [Google Scholar] [CrossRef]

15. J. Vergara and P. Estevez, “A review of feature selection methods based on mutual information,” Neur. Comput. App., vol. 24, no. 1, pp. 175–186, 2014. doi: 10.1007/s00521-013-1368-0. [Google Scholar] [CrossRef]

16. Z. W. Li, Q. L. Zhang, P. Wang, Y. Song, and C. F. Wen, “Uncertainty measurement for a gene space based on class-consistent technology: An application in gene selection,” Appl. Intell., vol. 53, no. 2, pp. 5416–5436, 2022. doi: 10.1007/s10489-022-03657-3. [Google Scholar] [CrossRef]

17. Q. L. Zhang, Y. Y. Chen, G. Q. Zhang, Z. W. Li, L. J. Chen and C. F. Wen, “New uncertainty measurement for categorical data based on fuzzy information structures: An application in attribute reduction,” Inform. Sci., vol. 580, no. 5, pp. 541–577, 2021. doi: 10.1016/j.ins.2021.08.089. [Google Scholar] [CrossRef]

18. X. Zhang, C. L. Mei, D. G. Chen, and J. H. Li, “Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy,” Pattern Recogn., vol. 56, pp. 1–15, 2016. doi: 10.1016/j.patcog.2016.02.013. [Google Scholar] [CrossRef]

19. B. B. Sang, H. M. Chen, L. Yang, T. R. Li, and W. H. Xu, “Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets,” IEEE Trans. Fuzzy Syst., vol. 30, no. 6, pp. 1683–1697, 2021. doi: 10.1109/TFUZZ.2021.3064686. [Google Scholar] [CrossRef]

20. Z. H. Huang and J. J. Li, “Discernibility measures for fuzzy β covering and their application,” IEEE Tansa. Cybernet., vol. 52, no. 9, pp. 9722–9735, 2022. doi: 10.1109/TCYB.2021.3054742. [Google Scholar] [PubMed] [CrossRef]

21. M. Akram, H. S. Nawaz, and M. Deveci, “Attribute reduction and information granulation in Pythagorean fuzzy formal contexts,” Expert. Syst. Appl., vol. 222, no. 1167, pp. 119794, 2023. doi: 10.1016/j.eswa.2023.119794. [Google Scholar] [CrossRef]

22. M. Akram, G. Ali, and J. C. R. Alcantud, “Attributes reduction algorithms for m-polar fuzzy relation decision systems,” Int. J. Approx. Reason., vol. 140, no. 3, pp. 232–254, 2022. doi: 10.1016/j.ijar.2021.10.005. [Google Scholar] [CrossRef]

23. L. A. Zadeh, “Fuzzy sets,” Inform. Control, vol. 8, no. 3, pp. 338–353, 1965. doi: 10.1016/S0019-9958(65)90241-X. [Google Scholar] [CrossRef]

24. D. Dubois and H. Prade, “Rough fuzzy sets and fuzzy rough sets,” Int. J. Gen. Syst., vol. 17, no. 2–3, pp. 191–209, 2007. doi: 10.1080/03081079008935107. [Google Scholar] [CrossRef]

25. Z. Pawlak, “Rough sets,” Int. J. Inf. Comput. Sci., vol. 11, no. 5, pp. 341–356, 1982. doi: 10.1007/BF01001956. [Google Scholar] [CrossRef]

26. Q. H. Hu, D. Yu, and Z. Xie, “Information-preserving hybrid data reduction based on fuzzy-rough techniques,” Pattern Recogn. Lett., vol. 27, no. 5, pp. 414–423, 2006. doi: 10.1016/j.patrec.2005.09.004. [Google Scholar] [CrossRef]

27. E. Sylvester et al., “Applications of random forest feature selection for fine-scale genetic population assignment,” Evol. Appl., vol. 11, no. 2, pp. 153–165, 2018. doi: 10.1111/eva.12524. [Google Scholar] [PubMed] [CrossRef]

28. R. J. Urbanowicz, M. Meeker, W. Lacava, R. S. Olson, and J. H. Moore, “Relief-based feature selection: Introduction and review,” J. Biomed. Inform., vol. 85, no. 4, pp. 189–203, 2018. doi: 10.1016/j.jbi.2018.07.014. [Google Scholar] [PubMed] [CrossRef]

Cite This Article

APA Style

Ma, X., Wang, J., Yu, W., Zhang, Q. (2024). Attribute reduction of hybrid decision information systems based on fuzzy conditional information entropy. Computers, Materials & Continua, 79(2), 2063–2083. https://doi.org/10.32604/cmc.2024.049147

Vancouver Style

Ma X, Wang J, Yu W, Zhang Q. Attribute reduction of hybrid decision information systems based on fuzzy conditional information entropy. Comput Mater Contin. 2024;79(2):2063–2083. https://doi.org/10.32604/cmc.2024.049147

IEEE Style

X. Ma, J. Wang, W. Yu, and Q. Zhang, “Attribute Reduction of Hybrid Decision Information Systems Based on Fuzzy Conditional Information Entropy,” Comput. Mater. Contin., vol. 79, no. 2, pp. 2063–2083, 2024. https://doi.org/10.32604/cmc.2024.049147

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Attribute Reduction of Hybrid Decision Information Systems Based on Fuzzy Conditional Information Entropy

Abstract

Keywords

References

Cite This Article

806

343

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link