[BACK]
images Computer Modeling in Engineering & Sciences images

DOI: 10.32604/cmes.2022.018050

ARTICLE

Seed-Oriented Local Community Detection Based on Influence Spreading

Shenglong Wang1,*, Jing Yang1,*, Xiaoyu Ding2, Jianpei Zhang1 and Meng Zhao1

1College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
2College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
*Corresponding Authors: Shenglong Wang. Email: wangsl@hrbeu.edu.cn; Jing Yang. Email: yangjing@hrbeu.edu.cn
Received: 25 June 2021; Accepted: 08 December 2021

Abstract: In recent years, local community detection algorithms have developed rapidly because of their nearly linear computing time and the convenience of obtaining the local information of real-world networks. However, there are still some issues that need to be further studied. First, there is no local community detection algorithm dedicated to detecting a seed-oriented local community, that is, the local community with the seed as the core. The second and third issues are that the quality of local communities detected by the previous local community detection algorithms are largely dependent on the position of the seed and predefined parameters, respectively. To solve the existing problems, we propose a seed-oriented local community detection algorithm, named SOLCD, that is based on influence spreading. First, we propose a novel measure of node influence named k-core centrality that is based on the k-core value of adjacent nodes. Second, we obtain the seed-oriented local community, which is composed of the may-members and the must-member chain of the seed, by detecting the influence scope of the seed. The may-members and the must-members of the seed are determined by judging the influence relationship between the node and the seed. Five state-of-art algorithms are compared to SOLCD on six real-world networks and three groups of artificial networks. The experimental results show that SOLCD can achieve a high-quality seed-oriented local community for various real-world networks and artificial networks with different parameters. In addition, when taking nodes with different influence as seeds, SOLCD can stably obtain high-quality seed-oriented local communities.

Keywords: Complex network; local community detection; influence spreading; seed-oriented; degree centrality; k-core centrality; local expansion

1  Introduction

In recent years, complex networks have been prevalent in various domains, such as social media, bioengineering, computer networks and e-commerce shopping [1]. An important property of complex networks is the community structure, which can be defined as a group in which entities are tightly connected [2]. Community structures, in which the entities are represented by the nodes and the relationship between entities are represented by the links, widely exist in the real world [3]. In addition, nodes within the same community are more similar than those between different communities. In other words, a community is in some fashion separated from the other communities [4].

Research on community detection has attracted extensive attention in the last two decades. As an important branch of community detection, local community detection has rapidly developed because of its nearly linear computing time and the convenience of obtaining the local information of real-world networks. Most of the existing local community detection algorithms concentrate on detecting the community with the optimal quality function where a seed is located [59]. The community with the optimal quality function is the closest to the community in the real world, and its core members are also the closest to the core members of the real-world community but not necessarily the given nodes. It is also of practical significance to obtain the local members that can be influenced by an individual in the community. Sometimes, we just want to know who has the ability to influence the network, not which community this person is located. In addition, this person may not be important in the community, but there may be others who will be affected by him. Therefore, this paper proposes an algorithm aimed at detecting the local community with the seed as the core for the first time.

Many excellent local community detection algorithms have been proposed. However, some problems still hinder the development of local community detection algorithms. First, as mentioned above, the communities detected by previous local community detection algorithms do not take the seed as the core. This leads to the seed deviation problem. Second, the quality of the detected communities depends on the location of the seed. This leads to the seed dependence problem. Third, the quality of the detected communities depends on the predefined parameter. This leads to the parameter dependence problem. In addition, the predefined parameter makes it difficult and time-consuming to obtain the most reasonable parameter.

This paper proposes a seed-oriented local community detection algorithm, named SOLCD, based on influence spreading to solve the three problems mentioned above. In order to solve the seed deviation problem and the seed dependence problem, SOLCD expands the community by constantly exploring the influence scope of the seed so the seed is always in the influence center of the detected community. This ensures that the seed is in the core position in the resulting community and the quality of the resulting community. In order to measure the node influence, we propose k-core centrality based on the k-core decomposition algorithm [10]. In order to solve the parameter dependence problem, SOLCD uses the influence spreading method that needs no parameter. In order to verify the quality of the resulting community, we propose a local community effectiveness index (LCE) and a local community uniqueness index (LCU) to evaluate the quality of the seed-oriented local community. The main contributions of this paper can be summarized as follows:

● We propose a seed-oriented local community detection algorithm for the first time, named SOLCD, based on influence spreading. SOLCD has the capacity to detect the local communities with seeds as the core, which not only enables people to obtain the seed-oriented local communities, but also makes obtaining the information of the target node more quickly and effectively.

● We propose a new measure of node influence, named k-core centrality, based on the k-core decomposition algorithm. Empirical evaluations on artificial and real-world networks show that the proposed algorithm based on k-core centrality is robust and efficient in detecting seed-oriented local communities.

● We propose two indices that can effectively evaluate the quality of seed-oriented communities: a local community effectiveness index (LCE) and a local community uniqueness (LCU) index.

● The experimental results on both artificial and real-world networks show that the SOLCD has the capacity to detect high quality seed-oriented local communities with stronger robustness than some of the latest algorithms. In addition, SOLCD can effectively solve the three problems existing in the research of local community detection: the seed deviation, the seed dependence and the parameter dependence problem.

The rest of this paper is organized as follows. Section 2 reviews the related works on local community detection. In Section 3, we describe the proposed algorithm in detail, and introduce a new node influence measure based on k-core centrality. In Section 4, we introduce a local community effectiveness index and a local community uniqueness index to estimate the quality of the detected local communities, then we test the proposed algorithm and compare it with some latest algorithms. Section 5 summarizes our work.

2  Related Works

Most of the existing local community detection algorithms based on the local expansion method consist of two major components: seed selection and community expansion. In the seed selection process, the algorithms select an appropriate node or node set as the seeds to replace the given node so as to be more suitable for community expansion. In the community expansion process, the algorithms expand the community, composed of seeds originally, by running a variety of expansion mechanisms. This section outlines the current research on the local expansion method and describes the dilemma of the current research.

2.1 Seed Selection

Seed selection is widely concerned because of its importance to the local expansion method [11], and various seed selection methods have been proposed. Lancichinetti et al. [12] randomly selected a node as the seed, which makes the results of this algorithm uncertain and leads to a weakness of this algorithm. Similarly, Baumes et al. [13] proposed an algorithm randomly selects edge as the seed. However, in the searching seeds process, this algorithm produces multiple duplicate communities, which consumes considerable time. Lee et al. [14] took a set of k nodes, in which each pair of nodes had an edge, namely, k-clique, as the seeds. Whang et al. [15] proposed a new seed selection strategy based on the personal PageRank clustering scheme. The key to this algorithm is neighborhood inflation, in which seeds are modified to represent their entire node neighborhood. Ding et al. [16] proposed a robust two-stage local community detection algorithm (RTLCD) to detect the core member of the real-world community as a substitute for the given node based on the node relation strength. Cheng et al. [17] scored the nodes in a network using the technique for order of preference by similarity to ideal solution (TOPSIS) and took the node with the highest score as the seed. In order to reduce the impact of the seed dependence problem, Guo et al. [18] take the core area which is detected by adding neighboring nodes with the maximum optimized local modularity density, as the seeds. Ni et al. [19] took the nodes whose fuzzy relationship with their NGC nodes was greater than the threshold as the seeds. An NGC node [20] is the nearest node with greater centrality.

These excellent seed selection methods mentioned above can effectively improve the quality of the detected community, but there are still two dilemmas that have not received much attention. First, some seed selection methods, such as the random seed selection method, directly take the given node as the seed [1215,21]. This leads to the quality of the detected community depending heavily on the location of the seed, which greatly affects the accuracy of the community detection algorithm. Second, some seed selection methods, such as RTLCD, TOPSIS [1617,19,20], select the nodes that are more suitable for community expansion as substitutes for the given node, which effectively alleviates the dependence of the algorithm on seed location. However, the replacement of the given node by the seed selection method will cause the given node to deviate from the result community, which makes the existing local community detection algorithms unable to effectively detect the community dominated by the given node.

2.2 Community Expansion

The function of community expansion is to expand the initial community into a local community by adding adjacent nodes to the detected community. Common community expansion methods include the quality function [59] and the influence spreading [2226].

The quality function defines the community structure in a network, which can be used to evaluate the community division quality [27]. Yang et al. [28] studied 13 quality functions and tested their sensitivity, robustness and performance on 230 large real-world networks. Based on this research, Yang et al. [28] classify quality functions into four categories: (1) links within a community, (2) links outside a community, (3) links within and outside a community, and (4) modularity.

The main idea of the influence spreading method is to score each node with an influence evaluation mechanism and spread it to the entire network. Raghavan et al. [29] proposed the label propagation algorithm (LPA) based on the epidemic spreading model. LPA assigns each node in the network a unique label, and then updates the node label to be consistent with the label of its majority neighbors until the label no longer changes. Because of the convenience and efficiency of LPA, researchers have successively proposed a series of algorithms based on LPA. Xu et al. [30] proposed an improved LPA algorithm based on a two-level neighborhood similarity measure named TNS, which could help to further divide a network into communities accurately. Inspired by LPA, Wu et al. [31] merged communities whose size was smaller than a threshold, where the threshold was based on a reasonable communities' scale, into reasonable communities to increase the community division accuracy. Based on LPA, Gregory et al. [32] proposed a method with the ability to detect overlapping communities, named COPRA (community overlap propagation algorithm). Tang et al. [33] revealed the overlapping nodes and proposed an algorithm based on the k-lowest-influence.

The community expansion methods described above can obtain high-quality local communities, but there are still some problems that need to be addressed vigorously. First, some community expansion methods need to set parameters before their execution [34,35], which make methods difficult and time-consuming to obtain the most reasonable parameter. Second, existing expansion methods are dedicated to expanding the seed to a community which is the most similar to the real-world community. However, in the community expansion process, the given node may be at the edge of the community, or even be removed from the community. That is, there are no expansion methods that focus on the local community with seeds as the core.

3  Motivations and Basic Definitions

3.1 Motivation

As discussed in Section 2, local community detection algorithms have made excellent achievements in terms of the local expansion method, but there are still three problems hindering the development of community detection research: the seed deviation problem, the seed dependence problem and the parameter dependence problem.

The motivation of solving the three problems is as follows. In order to solve the seed deviation problem and the seed dependence problem, we propose a seed-oriented algorithm, which will always take the given node as the seed, and always take the seed as the influence core in the process of community expansion. In order to solve the parameter dependence problem, we propose a local community detection algorithm based on influence spreading without any parameters.

3.1.1 Motivation for Seed-Oriented Local Community

Traditional local community detection algorithms aim to expand from the seed node to the community that is the most similar to the real community. We call these algorithms community quality-oriented local community detection algorithms. However, in the real world, every individual should have the opportunity to build its own local community that takes the individual as the center. For example, different departments with students can be considered as different communities from the perspective of a college. Every student should have the opportunity to build his own local friendship-community consisting of the members from multiple departments. In addition, every student should have the opportunity to become the center of his own local friendship-community. In this paper, we call this type of local community a seed-oriented local community. Different from traditional algorithms, this paper proposes a seed-oriented local community detection algorithm aiming to build the seed-oriented local community of a given node.

The research value of detecting seed-oriented local communities is as follows. First, when the goal is to measure the influence of a person on the other individuals, we only need to detect the seed-oriented local community where this person is located rather than the quality local community, which helps to improve the efficiency of an algorithm. Second, even if an individual is marginalized in a quality oriented community, this individual influences other individuals in a seed-oriented community. In contrast, in a seed-oriented community, the influence of this individual on the other individuals may be greater than that of the core members of a quality oriented community. Third, as a marginal individual in a quality oriented community, the local influence in a seed-oriented community may be greater than that of the core members in the quality-oriented community.

In a community, core members are described as members at the center of the community, hub members are members in close contact with members outside the community, and outlier members are members on the boundary of the community. In fact, core members have higher influence than hub members, and hub members have higher influence than outlier members. Based on this fact, this paper proposes a seed-oriented local community detection algorithm based on influence spreading. The proposed algorithm is guided by the following: a node tends to become a member of the community that is generated by an adjacent node with higher influence.

A sample of the seed-oriented local community detected by the proposed algorithm is shown in Fig. 1. We show a seed-oriented local community C and its neighboring subnetwork N. The figure shows that all the may-members of C have an influence not lower than that of the seed node. In addition, all the must-members of C, which are connected with the seed node using must-member chains, have an influence lower than the seed node.

images

Figure 1: A sample of seed-oriented local community detected by the proposed algorithm. C denotes the seed-oriented local community, N denotes the neighbor sub-network of C. The nodes colored yellow are the must-members of C, the nodes colored blue are the may-members of C, the node colored red is the seed node. The number inside the node is the node influence. Solid lines connect the nodes within C, dotted lines connect the nodes between C and N

3.2 Problem Definition

This paper considers an unweighted graph G = (V, E), where V denotes the set of nodes and E denotes the set of links between nodes. The adjacent matrix A is a two-dimensional array which stores the connectivity Aij between nodes in graph G, where Aij = 1 denotes node i and node j are connected, otherwise Aij = 0. The communities exist in the graph G can be represented as C = {C1, C2, …, Ci}(C1 ∪ C2, …,∪ Ci⊆ V). A community C consists of a set of nodes, where C = {{v1, v2, …, vj}}(C∈ C, vi ∈ V). The seed-oriented local community detection aims to detect a cover C of the graph, C = C1, C2, …, Ck (C1 ∪ C2 ∪, …, ∪ Ck ⊆ V), where ∀vk ∈ V, ∃vk ∈ Ck and ∀vj ∈ Ck, viCklvkviviCklvjvi where viCklvkvi denotes the sum of length between each node in Ck and vk.

3.3 Basic Definitions

In this paper, we detect the seed-oriented local communities by detecting the influence scope of the seed in a network. To facilitate readers following along with this paper, we display the research path of this paper in Fig. 2, and the subsection will provide the related definitions of this paper.

images

Figure 2: The research path of this paper

Definition 1 (Node neighbors). The node neighbors of node v are defined as follows:

N(v)={u|uV,Auv=1},vV(1)

where A is the adjacent matrix of graph G, and Auv=1 denotes that there is a link between node v and node u.

Definition 2 (Natural community). The natural community of node v is defined as follows:

Γ(v)=N(v){v},vV(2)

The natural community of node v is a node set composed of node v and its neighbors.

Definition 3 (Must-member). The must-members of node v are defined as follows:

Must(v)={u|uN(v),Inf(u)<Inf(v)},vV(3)

The must-members of node v is a node set composed of node v’s neighbors that have lower node influence than that of node v.

Definition 4 (May-member). The may-members of node v are defined as follows:

May(v)={u|uN(v),Inf(u)Inf(v)},vV(4)

The may-members of node v is a node set composed of node v’s neighbors that do not have lower node influence than that of node v.

Property 1 (Transitivity of influence relationship between must-members). Suppose node B is a must-member of node A, and node C is a must-member of node B. Then, the conclusion we can obtain is that there must be a path from node C to node A, that is node C must be reachable to node A, and node A must have no lower influence than node C.

Proof 1. According to the Definition 3, we can know that node A is a neighbor of node B and node B is a neighbor of node C, so there must be a path from node A to node B to node C. In addition, node B has lower node influence than that of node A and node C owns lower node influence than that of node B. Therefore, we can conclude that node C must be reachable for node A, and node C must have lower node influence than that of node A.

Definition 5 (Must-member chain). The must-member chain from node A to node B is defined as follows:

Mustchain(A,B)={v1,v2,,vi},vi+1Must(vi),i|V|,viV(5)

The must-member chain can be regarded as a queue composed of nodes in the network. The members in the queue are arranged in descending order according to their node influence, and each member in the queue must be a must-member of the previous member.

Definition 6 (Reachable must-member). A reachable must-member of node v is defined as follows:

Remust(v)={u1,u2,,ui},sPath(ui)Mustchain(v,ui),i|V|,uiV,vV(6)

where sPath(v, ui) denotes the nodes on the shortest path from node u to node v.

A reachable must-member of node v is a node which is reachable from node v, and each shortest path from node v to this node must be a must-member chain.

Definition 7 (Seed-oriented local community). The seed-oriented local community of seed node v is defined as follows:

SOLCD(v)={v}May(v)Remust(v),vV(7)

The seed-oriented local community of the seed node is a node set composed of the seed, the may-members and all the reachable must-members of the seed node.

3.4 The Proposed Algorithm

In this subsection, we will show the flowchart of the proposed algorithm in Fig. 3, and the pseudocode of the proposed algorithm in Algorithm 1. The proposed algorithm includes two phases: obtaining the may-members phase and obtaining the must-members phase. The processes of each phase are as follows:

images

Figure 3: The flowchart of SOLCD

Initialization (Line 1). Line 1 initializes list Listmay and list Listmust to empty to store the may-members and must-members of seed node respectively. Line 1 assigns the seed vseed to the queue Q.

Obtaining may-members (Lines 2–7). Phase 1 aims to obtain the may-members of the seed node. M in line 4 denotes the node influence evaluation mechanism. Line 3 obtains all the neighbors of the seed node. If the influence of the neighbor vi is higher than that of the seed node (Line 4), then line 5 assigns vi to the may-member list Listmay.

Obtaining must-members (Lines 8–26). Phase 2 aims to obtain the must-members of the seed node. When the queue Q is not empty (Line 9), Line 10 removes the first node vfirst of Q.

Line 11 obtains all the neighbors of vfirst. If the influence of the neighbor vi is lower than that of vfirst and vi does not belong to the must-member list Listmust (Line 12), then line 16 sets the flag to true (Line 16). The flag is a Boolean variable that is used to determine whether the node is a must-member of the seed node. Line 14 obtains the node vn from the union of the neighbors of the seed vi, Q and Listmust. If the influence of the neighbor vn is lower than that of vi (Line 15), then Line 16 sets the flag to false. Lines 15–16 are to ensure that node vi is the reachable must-member of node vfirst. If the flag is true, Line 20 assigns node vi to Q. Line 24 assigns vfirst to the must-member list Listmust. Line 25 removes vfirst from Q, and repeats the algorithm until the queue Q is empty.

Finally, the union of the seed, may-member list Listmay and must-member list Listmust is the seed-oriented local community of the seed node vseed.

images

3.5 Time Complexity Analysis

The time complexity analysis of the proposed algorithm is on a network G, in where the average degree is d¯ and the number of node set is N. In Phase 1, it takes O(d¯) to scan the neighbors of the seed node. Phase 2 has three nested iterations: Iteration 1 (Lines 14–18), Iteration 2 (Lines 11–23) and Iteration 3 (Lines 9–26). For Iteration 1, it takes O(d¯+a) to scan the union of Q, the must-member list Listmust and the neighbors of node vi, where a is a constant. For Iteration 2, it takes O(log|N|) to add or remove node from Q respectively. So the time complexity of Iteration 2 is O(max{d¯2,d¯log|N|}). The time complexity of Iteration 3 is O(max{d¯2,d¯|N|log|N|}). In summary, the time complexity of the proposed algorithm is O(max{d¯2,d¯|N|log|N|}).

The time complexity of proposed algorithm and comparison algorithms is displayed in Table 1, in where k denotes the mean degree; C denotes the detected community; S denotes the shell sub-network of C; N denotes the neighbor sub-network of C.

images

4  Experiments and Analyses

The experimental environment of this paper is as follows: the proposed algorithm and the comparison algorithms are programmed in JAVA; all the programs involved in this paper are running in a computer with Intel (R) Core (TM) i5-4590 CPU, 3.3 GHz, 16GB RAM. The experiments are implemented in the proposed algorithm and six comparison algorithms on six real-world networks and three groups of different parameters artificial networks, and the experimental results are verified by four commonly used local community indicators and two proposed by this paper seed-oriented community indicators. Table 2 displays related symbols and their explanations.

images

4.1 Measures of Node Influence

K-core centrality. This paper proposes a new measure, the k-core centrality, which is based on the k-core decomposition algorithm [10], for node influence.

Ki=jNicj(8)

where Ki is the k-core centrality of node i, Ni is the neighbors of node i, and cj is the core value of node j.

K-core. K-core [1,41] is a subgraph of network G in which the smallest degree of nodes is k. In k-core decomposition algorithms [4244], the k-core is defined as a subgraph of the network where all nodes have a degree not less than k, and a (k + 1)-core must be a subgraph of the k-core. If we say a node has a core value k, it means that the node belongs to a k-core so that the node’s core value is the maximum value. In addition, when node A has a higher influence than B, both of the core values and the k-core centrality of node A are higher than those of node B.

4.2 Simple Test of SOLCD

In order to illustrate the proposed algorithm, we make simple tests on the core members, the hub members and the outlier members of Karate Club Network [45]. In the analysis, we use the inner-links to represent the links within a community and the outer-links to represent the links connecting different communities. It is worth noting that the tests do not mean to compare the seed-oriented local communities detected by the proposed algorithm with the real communities in a global sense, but to show the regularities of the distribution of the nodes of seed-oriented local communities. Fig. 4 shows the distribution of Karate Club Network, in which node 1 and node 34 represent the administrator and the instructor respectively.

images

Figure 4: The distribution of karate club network

4.2.1 Tests of the SOLCD on Core Members

From the real network of Karate Club Network, we choose node 1 and node 34 which own the greatest number of inner-links as the core members. From Fig. 5, we can observe that most of the inner-members of the seed-oriented communities are also the inner-members of the real communities.

images

Figure 5: Tests of SOLCD on core members. (a) and (b) are the seed-oriented local communities of nodes 1 and 34 generated by SOLCD. Nodes colored yellow are the must-members, nodes colored blue are the may-members and nodes colored red are the seed nodes

4.2.2 Tests of the SOLCD on Hub Members

From the real network of Karate Club Network, we choose node 3 and node 9 which have some outer-links as the hub members. From Fig. 6, we can see that the seed-oriented local communities generated from hub members prefer to take the core members (nodes 1 and 34) as may-members. This phenomenon is because hub members connect different communities and have lower influence than core members. Besides that the communities generated from hub members have smaller size than the communities generated from core members.

images

Figure 6: Tests of SOLCD on hub members. (a) and (b) are the seed-oriented local communities of nodes 3 and 9 generated by SOLCD. Nodes colored yellow are the must-members, nodes colored blue are the may-members and nodes colored red are the seed nodes

4.2.3 Tests of the SOLCD on Outlier Members

From the real network of Karate Club Network, we choose node 12 and node 27 with a few of inner-links as the outlier members. From Fig. 7, we can observe that the seed-oriented local communities generated by the outlier members tend to have more may-members than must-members. This is because outlier members are peripheral members of communities and have lower influence value than that of the core members (nodes 1 and 34) and that of the hub members (nodes 3 and 9).

images

Figure 7: Tests of SOLCD on outlier members. (a) and (b) are the seed-oriented local communities of nodes 12 and 27 generated by SOLCD. Nodes colored yellow are the must-members, nodes colored blue are the may-members and nodes colored red are the seed nodes

4.2.4 Characteristics of SOLCD

According to samples on core members, hub members and outlier members, we could summarize some characteristics of SOLCD as follows:

SOLCD is a self-adaptive algorithm without any help of pre-defined parameters. This avoids the parameter-dependent problem.

● Regardless of the seed node’s attributes (core member, hub member or outlier member), the detected seed-oriented local community always take the seed node as core member. This solves the seed-deviation problem.

4.3 Evaluation Criteria

Two common used community detection algorithm evaluation criteria are adopted in this paper to verify the performance of SOLCD: the Normalized Mutual Information [46] (NMI) and F-score [47].

4.3.1 Normalized Mutual Information

Danon et al. [46] used information entropy as the measurement of the similarity between real-world communities and the resulting communities, which is named normal mutual information (NMI). The basis of NMI is a confusion matrix N in which the rows represent the information of real-world communities and columns represent the information of the resulting communities. That is, the intersection of real-world communities and resulting communities are represented by element Nij of matrix N, which denotes the numbers of nodes that exist in both communities. NMI [46] is defined as follows:

NMI(CA,CB)=2i=1|CA|j=1|CB|Nijlog(NijN/Ni.N.j)i=1|CA|Ni.log(Ni./N)j=1|CB|N.jlog(N.j/N)(9)

where |CA| represents the number of real-world communities and |CB| represents the number of resulting communities. Ni. and N.j denote the sums of the elements in Row i and Column j, respectively.

NMI is an evaluation index commonly used to assess the community division quality. The better community division quality, the higher the value of NMI. The maximum value of NMI is 1 when the resulting community is the same as the real-world community.

4.3.2 F-Score

F-score [47] is widely used in classification methods to evaluate the quality of the model. The formula of F-score is as follows:

F=2×Precision×RecallPrecision+Recall(10)

Recall=CRCDCG(11)

Precision=CRCDCD(12)

where CR represents the nodes of real-world communities and CD represents the nodes of detected communities.

Recall is the ratio of the number of correctly found nodes to the number of nodes in the real-world community. Precision is the ratio of the number of correctly found nodes to the number of nodes in detected community. F-score is the weighted harmonic average of Recall and Precision.

4.3.3 LCE

In this paper, we propose a local community effectiveness index (LCE) to measure the quality of seed-oriented local communities. High-quality seed-oriented local communities should satisfy the condition that the seed node is the center of the detected community. In other words, the sum of the shortest path lengths from the seed node to each node of the seed-oriented local community should be smaller than that from other nodes in the rest of the community. LCE is defined as follows:

LCEseed=1,ifkCl,iCllseediiCllki(13)

LCEseed=0,ifkCl,iCllseedi>iCllki(14)

LCE=iClLCEseed|Cl|(15)

where LCEseed denotes LCE value of the seed node seed-oriented local community; Cl denotes the detected local community, and lki denotes the length of the shortest path from node k to node i. We define LCE=1 when the sum of the shortest path lengths from the seed node to the other nodes among all the nodes of the community is the maximum; otherwise, LCEseed=0.

4.3.4 LCU

This paper proposes a local community uniqueness index (LCU) to estimate the uniqueness of seed-oriented local communities. A high quality seed-oriented local community should satisfy the condition of having a unique distribution of nodes. LCU is defined as follows:

LCU=|Cdistintct||Cvalid|(16)

where |Cdistintct| denotes the number of distinct valid local communities, and |Cvalid| denotes the number of all valid local communities.

4.4 Datasets

4.4.1 Artificial Networks

This paper used Lancichinetti et al. [48] (LFR) benchmark networks to generate various types of artificial networks to evaluate the performance of the proposed algorithm. The LFR benchmark network is widely used in the research of complex networks to generate artificial networks that have the same properties as real-world networks. The significance of the parameters affecting the properties of the generated artificial network is as follows. The mixing parameter μ determines the difficulty of detecting the communities for the algorithm. The higher the value of μ is, the harder it is to detect the community structure. |C|¯min and |C|¯max determine the maximum and minimum size of the communities within the artificial network, respectively; d¯ determines the mean degree of the nodes within the network and dmax determines the maximum degree of the nodes within the network; and On and Om determine the overlapping degree of communities in the network. On denotes the number of overlapping nodes between communities and Om denotes the number of overlapping communities of overlapping nodes.

In order to generate different types of artificial networks, we set the parameters of the LFR benchmark network as displayed in Table 3, where the expression [a:b:c] represents the value of parameter value ranges from a to c with a spanning of b. As shown in Table 3, we generate artificial networks in three groups of parameters: LFR-μ, LFR-αsize and LFR-αdegree. These three parameters are used to test the performance of the proposed algorithm in community structure identification, community diversity and node diversity. In order to avoid the influence of the randomness of the generated artificial networks, we generate 10 artificial networks for each parameter and take the average value as the experimental results.

images

4.4.2 Real-World Networks

This paper used 6 real-world networks to test the performance of the proposed algorithm. The characteristics of the real-world networks are listed in Table 4. By observing the relationship between 34 members of a karate club at an American university, Zachary et al. proposed the karate club network [45] in which nodes represent the members of the club and the links between nodes represent the relationships between nodes. By observing the habits of 62 bottlenose dolphins living in New Zealand, Lusseau et al. [49] found that the communication of these dolphins showed a specific pattern and proposed the dolphin network, in which each node represents a bottlenose dolphin and the link between two dolphins represents that these two dolphins are in frequent contact. The books network is a network of the purchasing records of political books on Amazon [50]. In the network, the nodes represent political books and a link between two books indicates that they are purchased together frequently. The football network is the records among the college teams that participated in the 2000 American football season [51]. In the network, each node represents a participating university and a link means that there was a match between two colleges. The Amazon network is a network of purchasing records on Amazon [28]. The DBLP network is a network of a scientific collaboration network where nodes denote authors and edges denote that the connected authors have corporations [28]. In addition, in order to obtain more detailed experimental results, we divide DBLP into 11 subnetworks according to the community size. The characteristics of DBLP after processing are displayed in Table 5.

images

images

4.5 Experimental Settings

We compared SOLCD to 6 state-of-the-art local community detection algorithms: RTLCD (a robust two-stage local community detection algorithm) [16], Clauset et al. [36], LWP (Luo, Wang and Promislow) [37], Chen et al. [38], LS (link similarity) [39] and LCD (local community detection based on maximum cliques) [40].

The RTLCD algorithm is a robust two-stage local community detection algorithm that detects the core member of the target community to replace the seed node in the seed selection stage and expands the community based on the relation strength in the community expansion stage [16]. The Clauset algorithm extends the modularity [36] to the local community, and expands the community by adding nodes that optimize the local community modularity ΔR [36]. The LWP algorithm improves the local community modularity to the indegree divided by the outdegree and adds the termination condition of the algorithm [37]. The Chen algorithm proposes a metric L=Lin/Lexwhich is the internal relation divided by the external relation [38].

Based on the definition of NMI and F-score, the detected local community has a high value of NMI and F-score is similar to the real community in a global sense. However, the goal of seed-oriented local community detection is to detect a local community with the seed node as the core member. In fact, some real-world networks have shown the power law distribution of the node degree and the core member occupies only a small scale of the networks. Therefore, in the local communities with a high NMI and F-score means the seed node cannot become the core member in most cases which indicates that the seed-deviation problem occurs.

Based on the definitions of Precision and Recall, in a detected community with high Precision and low Recall means most of the members of this detected community are also the members of the real-world community in a global sense. It is common sense that most of the members of a local community should be a subset of a global community. Therefore, high Precision and low Recall means that an algorithm prefers to detect communities in a local sense rather than detect communities in a global sense.

When the communities detected by the algorithm have high Precision and low Recall, which means that the algorithm is more inclined to detect communities in the local sense rather than in the global sense. When the detected communities have high precision and low recall, which means that the algorithm is more inclined to detect communities in the local sense rather than in the global sense. Based on the definition of LCE, the community results detected by an algorithm have high LCE, which means that the unique local communities detected by the algorithm occupy a higher proportion. Note that, we define a “seed-oriented” local community as a local community in which the seed node must satisfy LCEseed=1 and have high LCE.

The experiments are conducted on 6 real-world networks and 3 groups of LFR artificial networks. Note that an algorithm running more than 24 h on a single dataset will be terminated.

4.6 Experimental Results on Real-World Networks

Table 6 lists the NMI, Recall, Precision, F-score, LCE, LCU and Time metrics of the proposed algorithms and the other comparison algorithms on five real-world networks.

images

As shown in Table 6, we can observe that SOLCD achieves the highest LCE and LCU on all the real-world networks, and the precision is also excellent among all algorithms, especially on books and Amazon. Although NMI, Recall and F-score of these three indicators achieved by SOLCD are not good enough, we know from the analysis in Section 4.5 that high-quality seed-oriented local communities are mainly measured by LCE, LCU and precision of three indicators rather than NMI, Recall and F-score. Therefore, SOLCD can achieve high-quality seed-oriented local communities among real-world networks. RTLCD is excellent in NMI and F-Score on all the real-world networks, which illustrates that RTLCD is an outstanding community quality-oriented local community detection algorithm. However, RTLCD obtains the worst on LCE and LCU, which proves that RTLCD is severely affected by the seed deviation problem. For the remaining comparison algorithms, Clauset, Chen, LWP, LS and LCD, the performance on various indicators is mediocre. That is, these algorithms have a certain ability to detect the seed-oriented local community and community quality-oriented community, but they are not skilled at this.

As shown in Figs. 8a, 8d and 8f, the performance of all the algorithms worsens as the ID of DBLP increases. As Table 6 shows, the average size of the community increases as the dataset ID increases, which is the main factor that can affect the results of algorithms. The reason for this is that the increase of community size makes the edge of community become more loose, which makes algorithms more difficult to detect the community structure.

images

Figure 8: (a–h) The performance of algorithms on DBLP

Figs. 8b, 8c and 8e show that SOLCD is stable and achieves the highest LCE, LCU and Precision, which proves that SOLCD has the ability to detect high-quality seed-oriented local communities. RTLCD is excellent in NMI and F-score, which illustrates that RTLCD is skilled at detecting local communities in the global sense. However, RTLCD obtains the lowest LCE, which indicates that it has a serious seed-deviation problem. Chen and LS have good LCE performance, but they also have a seed-deviation problem to a certain extent.

The experimental results on real-world networks show that SOLCD has a great ability to achieve high-quality seed-oriented local communities among real-world networks, which proves that SOLCD solves the seed-deviation problem. RTLCD can achieve the communities with the best community quality, but it is poor at detecting seed-oriented local communities. The rest of the comparison algorithms are more or less affected by the seed deviation problem.

4.7 Experimental Results on Artificial Networks

4.7.1 Experimental Results on LFR-μ

LFR-μ aims to verify the ability of algorithms to reveal the community structure in response to changes in the difficulty of revealing the community structure. Fig. 9 shows the performance of all the algorithms on the LFR-μ artificial networks. We observe that all the algorithms show a downward trend on the metrics of NMI, Recall, Precision and F-score. This phenomenon occurs because the community structure becomes increasingly more difficult to find as the mixed parameter μ increases.

images

Figure 9: (a–h) The performance of algorithms on LFR-μ

Figs. 9a and 9f show that LCD and RTLCD perform excellently in NMI and F-score, which shows that these algorithms are good at detecting local communities in the global sense. However, as shown in Figs. 9d and 9e, LCD and RTLCD have high Recalls and low Precisions, which illustrate that these two algorithms have serious seed deviation problem. Fig. 9b in which LCD performs the worst in LCE, confirms this matter. In contrary, SOLCD and Chen have low Recalls and high Precisions, which indicate that although these two algorithms find only a small number of neighbors of the seed node, these neighbors are the correct members of the seed-oriented community. Fig. 9b affirms this statement. LCE of Chen is obviously higher than those of other algorithms and SOLCD achieves the optimal LCE value. As shown in Fig. 9c, SOLCD achieves the highest LCU.

The experimental results show that SOLCD can achieve high-quality seed-oriented local communities as the mixed parameter μ changes, which solves the seed deviation problem.

4.7.2 Experimental Results on LFR-αsize

LFR-αsize aims to verify the ability of algorithms to reveal the community structure when the community size changes. Fig. 10 shows the performance of all the algorithms on the LFR-αsize artificial networks. The scale values of the x-axis at the top and bottom of the graph represent the maximum and minimum of community sizes, respectively. As shown in Fig. 10, the results of most of the algorithms on the NMI, Recall, Precision and F-score worsen as the maximum and minimum community size increase. The reason is as follows. As the maximum and minimum community size increase, the community structure becomes more diverse, and the boundary of the community becomes fuzzy, which makes it difficult for the algorithms to identify the community structure. Fig. 10c shows that SOLCD is stable at the highest LCU.

images

Figure 10: (a–h) The performance of algorithms on LFR-αsize

As shown in the Figs. 10b and 10f, SOLCD is stable at a high level on LCE, LCU and Precision regardless of whether the parameter αsize changes, which proves that SOLCD can effectively detect seed-oriented local communities. RTLCD and LCD are stable on the indices of NMI and F-score, which illustrates that these two algorithms are robust to the parameter αsize in detecting local communities in the global sense. However, RTLCD and LCD perform extremely poorly on LCE and LCU, which indicates that these methods have serious seed deviation problems. Chen and Clauset have good performance on the index of Precision, and moderate performance on LCE and LCU, which shows that these two algorithms have certain capabilities to detect seed-oriented local communities, but they still have seed deviation problems to a certain extent.

The experiments prove that SOLCD is robust to changes in the community size. As the the maximum and minimum community size increase, SOLCD can still achieve high-quality seed-oriented local communities which indicates that SOLCD solves the seed deviation problem.

4.7.3 Experimental Results on LFR-αdegree

LFR-αdegree aims to test the performance of algorithms on revealing the community structure as the node degree changes. Fig. 11 displays the results of all the algorithms on the LFR-αdegree artificial networks. The scale values of the x-axis at the top and bottom of the graph represent the maximum and mean degree of nodes in the network respectively. Figs. 11a and 11f show that the performance of most algorithms on NMI and Precision improve slightly as the parameter αdegree increases. The reason for this outcome is as follows. Increasing the parameter αdegree makes the relationship between nodes become more diverse, so it can provide more node information which makes the algorithms easier to detect the community structure; however, it also increases the complexity of the network, which prevents algorithms from exploring the community structure. Therefore, the curves fluctuate.

images

Figure 11: (a–h) The performance of algorithms on LFR-αdegree

Fig. 11 shows that LCE of SOLCD decreases slightly as the parameter αdegree increases, but it remains at a high level. The LCU and Precision of SOLCD are outstanding. LCD and LWP perform excellently on the indices of NMI and F-score, which indicates that these two algorithms have great abilities to detect the local communities in the global sense. Unfortunately, LCD and LWP obtain poor LCEs which proves that they experience the seed deviation problem. Clauset and Chen have good performance on Precision, and low LCUs and LCEs, which proves that these two algorithms have certain seed-oriented local community detection abilities but they are not considerable. RTLCD underperforms in all indicators except Recall, which illustrates that RTLCD can neither effectively detect local community in global sense nor detect seed-oriented communities in a local sense when the network has a high degree.

The results indicate that SOLCD has some seed-deviation problems as the mean and maximum node degree increase, but it can achieve a high-quality seed-oriented local community.

4.7.4 Experimental Results for the Seed Dependence Problem

To perform a detailed analysis of the seed dependence problem, this paper lists the valid communities generated by seed nodes with different node influences. We take the degree centrality as the node influence measure. We divide all the nodes into ten parts according to their node influence in descending order. Taking Fig. 12 as an example, Fig. 12a is the distribution of the valid seed-oriented local communities detected by the algorithms on the DBLP1 network, and so on. The abscissa represents the ranking of the seed's node influence among the node influences of all nodes in the network (e.g., ‘0.1’ represents that the seed’s node influence is in the top 10% of all nodes, and ‘1’ represents that the bottom 10% of seed nodes). The ordinate represents the proportion of valid seed-oriented local communities in all communities detected by the algorithms. Table 7 displays the standard deviation (SD), arithmetic mean and coefficient of variation (CV) of the proportion of valid seed-oriented local communities detected by all the comparison algorithms on a group of DBLP networks. The standard deviation is a measure of the dispersion of the data distribution, which is used to measure the deviation of data from the arithmetic mean. The smaller the standard deviation is, the less these values deviate from the mean, and vice versa. When comparing the dispersion of the two groups of data, the measurement scales of the two groups of data are too different to be compared directly using the standard deviation. At this time, the coefficient of variation is required, which is the ratio of the standard deviation and arithmetic mean.

images

Figure 12: (a–l) The distribution of the valid seed-oriented local communities detected by the algorithms on a group of DBLP networks

Fig. 12 shows that the curve of SOLCD is stable at the top of Figs. 12a12f but fluctuates in Figs. 12g12k. The above figures show that the nodes in the middle ranking have more difficulty obtaining local communities with them as the core than the nodes at the top of the ranking and the nodes at the bottom of the ranking. There are three reasons for this phenomenon. First, the nodes at the bottom of the ranking have a small influence scope, so it is easy to obtain local communities with these nodes as the core. Second, although the nodes at the top of the ranking have a large influence scope, they can easily attract adjacent nodes to their community because of their strong node influence, so it is easy to obtain local communities with these nodes as the core. Third, for the nodes at the middle of the ranking, there may be multiple adjacent nodes with the same node influence. Therefore, these nodes with the same node influence will be bypassed in the community expansion process, which leads to the irregularity of the resulting community, and it fails to form a local community with these nodes at the middle of the ranking as the core. From the Table 7 shows that SOLCD has the highest mean on all DBLP networks and the lowest CV except on DBLP(1), which illustrates that compared with the other algorithms, SOLCD is more stable in terms of the seed dependence problem. However, the SD of SOLCD on some networks is high, which indicates that SOLCD still has a certain seed dependence problem. The other six comparison algorithms show sharp fluctuations in all subfigures of Fig. 12 and have small SD only on individual network in Table 7 which proves that all these algorithms have serious seed dependence problems.

images

Fig. 13 shows that the curves of each algorithm in different figures are basically consistent. That is, the experimental results are not affected by parameter μ. The performance of SOLCD improves as the node ranking decreases, and the overall performance is excellent. Clauset has perfect performance on the top node ranking, but its performance drops rapidly as the node ranking decreases. The performance of LCD also drops rapidly as the node ranking decreases, but the overall performance is not as good as that of Clauset. The curves of RTLCD, LWP and LS are always at the bottom. The performance of Chen is very stable and does not fluctuate with the node ranking. Table 8 shows that SOLCD achieves the highest mean and a slightly higher SD and CV than Chen and LS. Chen achieves the lowest SD and CV, but its mean is much lower than that of SOLCD. LS performs excellently on SD; however, its CV is high, and the mean is low. The other algorithms have poor performance on the mean, SD or CV. The experimental results show that SOLCD basically has no seed dependence problem on LFR-μ artificial networks. Chen has no seed dependence problem, but its seed-oriented community detection ability is much weaker than that of SOLCD. LS has no seed dependence problem, but has no ability to detect the seed-oriented community. The other algorithms are seriously affected by seed dependence problems.

images

Figure 13: (a–i) The distribution of the valid seed-oriented local communities detected by the algorithms on a group of LFR-μ artificial networks

images

Fig. 14 and Table 9 show that the curves and data are roughly the same as those in Fig. 13 and Table 8. Therefore, we can conclude that the performance of algorithms is similar on LFR-αsize artificial networks and LFR-μ artificial networks.

images

Figure 14: (a–f) The distribution of the valid seed-oriented local communities detected by the algorithms on a group of LFR-αsize artificial networks

images

Fig. 15 shows that SOLCD still performs stably, as in the above two groups of experiments. That is, the performance of SOLCD does not change as the parameter αdegree increases. However, the curves of the other algorithms decrease sharply as the parameter αdegree increases. The reason for this phenomenon is as follows. Table 10 shows that SOLCD achieves the highest mean and lowest CV on most networks and a slightly higher SD than RTLCD, Chen and LS. However, the mean of RTLCD and LS is so low that the results have no reference value. Chen performs well with a low parameter αdegree but worsens the parameter αdegreeincreases. The experimental results show that SOLCD basically has no seed dependence problem on LFR-αdegree artificial networks. The other algorithms are seriously affected by seed dependence problems and are seriously affected by parameter αdegree.

images

images

Figure 15: (a–l) The distribution of the valid seed-oriented local communities detected by the algorithms on a group of LFR-αdegree artificial networks

images

4.8 The Discussion of the Experimental Results

Based on the above experimental results, we can obtain the following conclusions. First, the proposed algorithm has a great ability to detect high-quality seed-oriented local communities among real-world networks, which proves that SOLCD solves the seed-deviation problem. Second, the seed-oriented local community detection ability of the proposed algorithm is not affected by parameters μ, αsize and αdegree. Third, SD and CV are the proportions of valid seed-oriented local communities detected by SOLCD on real-world networks and artificial networks, respectively. SOLCD achieves a low SD and CV, which proves that SOLCD can detect high-quality seed-oriented local communities with nodes with different node influences as seeds. This illustrates that SOLCD solves seed dependence problems. In addition, SOLCD achieves excellent results on groups of artificial networks with different parameters, which shows that SOLCD has strong robustness. However, there are still problems to be solved. SOLCD still does not completely resolve the seed dependence problem, especially when taking the nodes with medium node influence as the seed.

5  Conclusion

Research on local community detection has achieved excellent achievements. However, there are still some problems to be solved, such as the seed deviation problem, the seed dependence problem and the parameter dependence problem. In order to solve these problems, this paper proposes a seed-oriented local community detection algorithm, named SOLCD, based on influence spreading. To solve the seed deviation problem and the seed dependence problem, we propose a seed-oriented algorithm, which always takes the given node as the seed, and always takes the seed as the influence core in the community expansion process. To solve the parameter dependence problem, we propose a local community detection algorithm based on influence spreading without any parameters. In addition, we propose a local community effectiveness index (LCE) and a local community uniqueness index (LCE) to estimate the quality of seed-oriented local communities. Efficient and rapid detection of seed-oriented communities can improve the accuracy of personalized recommendation of goods and information and help public opinion analysis.

This paper compares SOLCD with six other state-of-the-art local community detection algorithms on LFR artificial networks and real-world networks. The experimental results show that SOLCD has a great ability to detect high-quality seed-oriented local communities among real-world networks, which proves that SOLCD solves the seed deviation problem. Taking nodes with different node influences as seeds, SOLCD can detect high-quality seed-oriented local communities, which illustrates that SOLCD solves the seed dependence problem. In addition, SOLCD achieves excellent results on groups of artificial networks with different parameters, which shows that SOLCD has strong robustness.

However, there are still problems to be solved. SOLCD still has not completely resolved the seed dependence problem, especially when taking the nodes with medium node influence as the seed. We will focus on solving the seed dependence problem completely in future research.

Funding Statement: National Natural Science Foundation of China (Nos. 61672179, 61370083, 61402126), Heilongjiang Province Natural Science Foundation of China (No. F2015030), Science Fund for Youths in Heilongjiang Province (No. QC2016083) and Postdoctoral Fellowship in Heilongjiang Province (No. LBH-Z14071).

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

  1. Fang, Y., Huang, X., Qin, L., Zhang, Y., & Zhang, W. (2020). A survey of community search over big graphs. The VLDB Journal, 29(1), 353-392. [Google Scholar] [CrossRef]
  2. Mittal, R., & Bhatia, M. (2021). Classification and comparative evaluation of community detection algorithms. Archives of Computational Methods in Engineering, 28, 1417-1428. [Google Scholar] [CrossRef]
  3. Pizzuti, C. (2018). Evolutionary computation for community detection in networks: A review. IEEE Transactions on Evolutionary Computation, 22(3), 464-483. [Google Scholar] [CrossRef]
  4. Garza, S. E., & Schaeffer, S. E. (2019). Community detection with the label propagation algorithm: A survey. Physica A: Statistical Mechanics and its Applications, 534, 122058. [Google Scholar] [CrossRef]
  5. Kanawati, R. (2015). Empirical evaluation of applying ensemble methods to ego-centred community identification in complex networks. Neurocomputing, 150, 417-427. [Google Scholar] [CrossRef]
  6. Zhang, R., Li, L., Bao, C. M., Zhou, L. H., Kong, B. (2015). The community detection algorithm based on the node clustering coefficient and the edge clustering coefficient. 11th World Congress on Intelligent Control and Automation, pp. 3240–3245. Shengyang, China.
  7. Peng, W., Jing, L. (2016). A multi-agent genetic algorithm for local community detection by extending the tightest nodes. 2016 IEEE Congress on Evolutionary Computation, pp. 3215–3221. Vancouver, BC, Canada.
  8. Liakos, P., Ntoulas, A., Delis, A. (2016). Scalable link community detection: A local dispersion-aware approach. IEEE International Conference on Big Data (Big Data), pp. 716–725. Washington, DC, USA.
  9. Zhu, J. R., Chen, B. L., & Zeng, Y. F. (2020). Community detection based on modularity and k-plexes. Information Sciences, 513, 127-142. [Google Scholar] [CrossRef]
  10. Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y., & Shir, E. (2007). From the cover: A model of internet topology using k-shell decomposition. Proceedings of the National Academy of Science, 104(27), 11150-11154. [Google Scholar] [CrossRef]
  11. Wang, Z. X., Li, Z. C., Ding, X. F., & Tang, J. H. (2016). Overlapping community detection based on node location analysis. Knowledge-Based Systems, 105, 225-235. [Google Scholar] [CrossRef]
  12. Lancichinetti, A., Fortunato, S., & Kertész, J. (2009). Detecting the overlapping and hierarchical community structure of complex networks. New Journal of Physics, 11(3), [Google Scholar] [CrossRef]
  13. Baumes, J., Goldberg, M. K., Krishnamoorthy, M. S., Magdon-Ismail, M., Preston, N. (2005). Finding communities by clustering a graph into overlapping subgraphs. Proceedings of the IADIS International Conference on Applied Computing, Algarve, Portugal.
  14. Lee, C., Reid, F., Mcdaid, A., Hurley, N. (2010). Detecting highly overlapping community structure by greedy clique expansion. Proceedings of the 4th SNAKDD Workshop, pp. 33–42. USA.
  15. Whang, J. J., Gleich, D. F., & Dhillon, I. S. (2015). Overlapping community detection using neighborhood-inflated seed expansion. IEEE Transactions on Knowledge & Data Engineering, 28(5), 1272-1284. [Google Scholar] [CrossRef]
  16. Ding, X. Y., Zhang, J. P., & Yang, J. (2018). A robust two-stage algorithm for local community detection. Knowledge-Based Systems, 152, 188-199. [Google Scholar] [CrossRef]
  17. Cheng, J. J., Zhang, W. B., Yang, H. J., Su, X., & Ma, T. (2020). A seed-expanding method based on topsis for community detection in complex networks. Complexity, 2020, 1-14. [Google Scholar] [CrossRef]
  18. Guo, K., Huang, X., Wu, L., Chen, Y. (2021). Local community detection algorithm based on local modularity density. Applied Intelligence, (4), 1–16. DOI 10.1007/s10489-020-02052-0. [CrossRef]
  19. Ni, L., Luo, W. J., Zhu, W. J., & Hua, B. (2020). Local overlapping community detection. ACM Transactions on Knowledge Discovery from Data, 14(1), 1-25. [Google Scholar] [CrossRef]
  20. Luo, W. J., Yan, Z. L., Bu, C. Y., & Zhang, D. F. (2017). Community detection by fuzzy relations. IEEE Transactions on Emerging Topics in Computing, 8(2), 478-492. [Google Scholar] [CrossRef]
  21. Kamuhanda, D., Wang, M., & He, K. (2020). Sparse nonnegative matrix factorization for multiple local community detection. IEEE Transactions on Computational Social Systems, 7(5), 1220-1233. [Google Scholar] [CrossRef]
  22. Kloster, K., Gleich, D. F. (2014). Heat kernel based community detection. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1386–1395. New York, USA.
  23. Hu, Y. M., Yang, B., & Wong, H. S. (2016). A weighted local view method based on observation over ground truth for community detection. Information Sciences, 355, 37-57. [Google Scholar] [CrossRef]
  24. He, K., Sun, Y., Bindel, D., Hopcroft, J. E., Li, Y. (2015). Detecting overlapping communities from local spectral subspaces. IEEE International Conference on Data Mining (ICDM), pp. 769–774. Atlantic City, NJ.
  25. Yao, Y. T., Wu, W., Lei, M. T., Zhang, X. (2016). Community detection based on variable vertex influence. IEEE First International Conference on Data Science in Cyberspace, pp. 418–423. China.
  26. You, X. M., Ma, Y. H., & Liu, Z. Y. (2020). A three-stage algorithm on community detection in social networks. Knowledge-Based Systems, 187, 104822.1-104822.12. [Google Scholar] [CrossRef]
  27. Fortunato, S. (2009). Community detection in graphs. Physics Reports, 486(3–5), [Google Scholar]
  28. Yang, J., & Leskovec, J. (2012). Defining and evaluating network communities based on ground-truth. Knowledge & Information Systems, 42(1), 181-213. [Google Scholar] [CrossRef]
  29. Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E Statistical Nonlinear and Soft Matter Physics, 76(3), 036106. [Google Scholar] [CrossRef]
  30. Xu, G. Q., Guo, J. W., & Yang, P. L. (2020). TNS-IPA: An improved label propagation algorithm for community detection based on two-level neighbourhood similarity. IEEE Access, 9, 23526-23536. [Google Scholar] [CrossRef]
  31. Wu, Z. L., Wang, X., Fang, W. Y., Liu, L. Z., & Tang, S. T. (2020). Community detection based on first passage probabilities. Physics Letters A, 390, [Google Scholar] [CrossRef]
  32. Gregory, S. (2009). Finding overlapping communities in networks by label propagation. New Journal of Physics, 12(10), 2011-2024. [Google Scholar] [CrossRef]
  33. Tang, M. L., Liu, Q., Ma, T. H., Cao, J., & Tian, Y. (2019). K-lowest-influence overlapping nodes based community detection in complex networks. IEEE Access, 7, 109646-109661. [Google Scholar] [CrossRef]
  34. Yi, Y. Q., Jin, L. H., Yu, H., Luo, H. R., & Cheng, F. (2021). Density sensitive random walk for local community detection. IEEE Access, 9, 27773-27782. [Google Scholar] [CrossRef]
  35. Liu, J. X., Shao, Y. X., & Su, S. (2021). Multiple local community detection via high-quality seed identification over both static and dynamic networks. Data Science and Engineering, 6(3), 249-264. [Google Scholar] [CrossRef]
  36. Aaron, C. (2005). Finding local community structure in networks. Physical Review E Statal Nonlinear & Soft Matter Physics, 72(2), 026132. [Google Scholar] [CrossRef]
  37. Luo, F., Wang, J. Z., & Promislow, E. (2008). Exploring local community structures in large networks. Web Intelligence and Agent Systems, 6(4), 387-400. [Google Scholar] [CrossRef]
  38. Chen, J., Zaï, O. R., Goebel, R. (2009). Local community identification in social networks. Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining, pp. 237–242. Athens, Greece.
  39. Wu, Y. J., Han, H., Hao, Z. F., & Chen, F. (2012). Local community detection using link similarity. Journal of Computer Science & Technology, 27(6), 1261-1268. [Google Scholar] [CrossRef]
  40. Meng, F. R., Zhu, M., Zhou, Y., & Zhou, R. R. (2014). Local community detection in complex networks based on maximum cliques extension. Mathematical Problems in Engineering, 2014, 1-12. [Google Scholar] [CrossRef]
  41. Batagelj, V., & Zaversnik, M. (2003). An o(m) algorithm for cores decomposition of networks. Computer Science, 1(6), 34-37. [Google Scholar] [CrossRef]
  42. Zhang, Z. (2013). Ranking spreaders by decomposing complex networks. Physics Letters A, 377(14), 1031-1035. [Google Scholar] [CrossRef]
  43. Liu, J. G., Ren, Z. M., & Guo, Q. (2013). Ranking the spreading influence in complex networks. Physica A Statistical Mechanics & its Applications, 392(18), 4154-4159. [Google Scholar] [CrossRef]
  44. Kitsak, M., Gallos, L. K., Havlin, S., Liljeros, F., & Muchnik, L. (2010). Identification of influential spreaders in complex networks. Nature Physics, 6(11), 888-893. [Google Scholar] [CrossRef]
  45. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452-473. [Google Scholar] [CrossRef]
  46. Danon, L., Duch, J., Diaz-Guilera, A., & Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics, 2005(9), 09008. [Google Scholar] [CrossRef]
  47. Li, J., Wang, X., & Wu, P. (2015). Review on community detection methods based on local optimization. Bulletin of Chinese Academy of Science, 30(2), 238-247. [Google Scholar]
  48. Lancichinetti, A., Fortunato, S., & Radicchi, F. (2008). Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4), 046110. [Google Scholar] [CrossRef]
  49. Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., & Slooten, E. (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology & Sociobiology, 54(4), 396-405. [Google Scholar] [CrossRef]
  50. Krebs, V. (2004). Social network of political books. www.visualcomplexity.com.
  51. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821-7826. [Google Scholar] [CrossRef]
images This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.