Link prediction has attracted wide attention among interdisciplinary researchers as an important issue in complex network. It aims to predict the missing links in current networks and new links that will appear in future networks. Despite the presence of missing links in the target network of link prediction studies, the network it processes remains macroscopically as a large connected graph. However, the complexity of the real world makes the complex networks abstracted from real systems often contain many isolated nodes. This phenomenon leads to existing link prediction methods not to efficiently implement the prediction of missing edges on isolated nodes. Therefore, the cold-start link prediction is favored as one of the most valuable subproblems of traditional link prediction. However, due to the loss of many links in the observation network, the topological information available for completing the link prediction task is extremely scarce. This presents a severe challenge for the study of cold-start link prediction. Therefore, how to mine and fuse more available non-topological information from observed network becomes the key point to solve the problem of cold-start link prediction. In this paper, we propose a framework for solving the cold-start link prediction problem, a joint-weighted symmetric nonnegative matrix factorization model fusing graph regularization information, based on low-rank approximation algorithms in the field of machine learning. First, the nonlinear features in high-dimensional space of node attributes are captured by the designed graph regularization term. Second, using a weighted matrix, we associate the attribute similarity and first order structure information of nodes and constrain each other. Finally, a unified framework for implementing cold-start link prediction is constructed by using a symmetric nonnegative matrix factorization model to integrate the multiple information extracted together. Extensive experimental validation on five real networks with attributes shows that the proposed model has very good predictive performance when predicting missing edges of isolated nodes.
Link prediction, an important problem in the complex network, has achieved fruitful results [
In recent years, with the increasingly severe anti-terrorism situation, the research of cold-start link prediction has gradually gained attention [
Currently, the studies of cold-start link prediction can be roughly divided into two categories. The first is based entirely on non-topological information methods [
In this paper, considering the advantages of fusing heterogeneous information via a nonnegative matrix factorization (NMF) framework, a cold-start link prediction model,
The rest article develops as follows. Part 2 shows the relevant works. Part 3 is about the establishment of the model and its optimization. Part 4 is experimental design. Part 5 is experiment results and related analysis. The last part contains our conclusions and prospects.
Leroy et al. [
In this section, we first describe the problem of cold-start link prediction. In addition, we review the conventional NMF method.
Given an attribute network
Assuming a real interaction system
Given a matrix
where
Information shortage shapes the major challenge for cold-start link prediction. The presence of isolated nodes leads to an incomplete network structure, so that the network structure topological information cannot be fully utilized to predict link. Therefore, it is necessary to seek more non-topological information to assist cold-start prediction, such as node attribute information.
Considering the advantages of symmetric NMF incorporating heterogeneous information [
where
Although the
The node attribute information of the general network has high-dimensional characteristics, and the hidden deep non-linear characteristics and the attribute-based similarity properties of nodes cannot be completely captured through the second term in
In addition, the observed node interactions in the network semi-structured state indicate that the two nodes in the network have already structurally close distance. At this point, if the node attribute information is again fused and forced into work, it will sometimes pull away the distance between the two in the structural space. Therefore, we need to restrict the role of the attribute information and correct it to play a significant role on the unconnected nodes.
In this way, the node attribute graph regularization term is introduced based on the advantages of graph regularity in data representation [
where, the
where, the
where,
As described in the ref. [
1) 0–1 Weighting.
2) Heat Kernel Weighting. If nodes
3) Dot-Product Weighting. If nodes
In this paper, the matrix
Through the weight matrix
It is impossible that the objective function
Then, expand the
First of all, let
For the
Similarly, the partial derivatives of
Because the
Replace that
Similarly, the partial derivatives of
For these equation above, in terms of the Karush-Kuhn-Tucker (KKT) complementary slackness condition
where
To sum up, the pseudo code of the proposed cold-start link prediction model of jointly weighted symmetric NMF with graph regularization (GJSNMF) is described as follows (see
Algorithm Name: GJSNMF | |
---|---|
The computational complexity of GJSNMF algorithm mainly comes from two parts. One is to compute the weight
In the experiment, the convergence of the model was verified on all the datasets, and the convergence result very well. Model convergence is shown here only on the Facebook dataset (see
We consider the following 7 real-world attribute networks datasets drawn from disparate fields.
Network | N | E | <K> | <d> | C | #Attributes |
---|---|---|---|---|---|---|
228 | 3419 | 29.991 | 1.868 | 0.6162 | 56 | |
Cornell | 195 | 286 | 2.903 | 3.2 | 0.1568 | 1703 |
Texas | 187 | 298 | 3.027 | 3.036 | 0.1937 | 1703 |
Washington | 230 | 366 | 3.373 | 2.995 | 0.1974 | 1703 |
Wisconsin | 265 | 479 | 3.464 | 3.763 | 0.2080 | 1703 |
Lazega | 71 | 378 | 10.8 | 2.104 | 0.3853 | 7 |
Coauthor | 422 | 10755 | 48.665 | 2.585 | 0.5759 | 3449 |
The basic topology features of these networks are summarized in
Like many existing prediction studies, in our work adopts also the most frequently-used metrics AUC (area under the ROC curve) and Precision to measure the performance of link prediction [
The cold-start link prediction method was proposed in the reference [
SASNMF method [
Matrix completion method [
NMF_LP method [
SPM_NMF method [
The network datasets need to be divided into training and test sets. A few nodes were selected randomly and made into isolated nodes by deleting all the links between them. Where, the edges were removed as the test set, while the remaining edges served as the training set.
The sensitivity of these parameters
In experiment, the proportion of networks divided into training and test sets ranged from 75% to 95%. The proportion of edges present under the corresponding network is approximately 50% to 90%.
AUC | NMF | SASNMF | NMF_LP | LEROY | MC | SPM_NMF | GJSNMF |
---|---|---|---|---|---|---|---|
Cornell | 0.4065 | 0.3003 | 0.3072 | 0.4267 | 0.4601 | 0.3380 | 0.4760 |
0.3370 | 0.2612 | 0.2212 | 0.5568 | 0.4904 | 0.4220 | 0.5920 | |
Coauthor | 0.2375 | 0.3930 | 0.1657 | 0.7210 | 0.4927 | 0.4380 | 0.7900 |
Lazega | 0.2748 | 0.2215 | 0.2250 | 0.7625 | 0.4976 | 0.3140 | 0.3380 |
Texas | 0.4120 | 0.2825 | 0.3130 | 0.3575 | 0.4810 | 0.3510 | 0.5440 |
Washington | 0.3930 | 0.3347 | 0.2770 | 0.4360 | 0.4649 | 0.4280 | 0.4600 |
Wisconsin | 0.3302 | 0.3570 | 0.2948 | 0.4533 | 0.4900 | 0.3900 | 0.4900 |
It is necessary to compare the predicted effects of models with different degrees of network structure. Therefore, after dividing existing edges from 50% to 90% in the network datasets. Various methods were subjected for comparative analysis.
From the results, we can see that the accuracy of the GJSNMF framework decreased significantly. This shows that when the topological information of the network is seriously missing, the global structural information availability also drops sharply, resulting in the overall poor prediction performance. Therefore, it is necessary, in the network semi-structured state, to mine more available information as auxiliary properties of the prediction to enhance the overall performance of the prediction.
One of the core points of building the model is to strengthen the model to learn the deep nonlinear manifold hidden attribute feature from the node attributes by adding the graph regularization terms to the model, in order to improve the model prediction performance. Therefore, to test whether the graph regularization term improves the model, this task is specially designed to test during the overall prediction effect of the model. This is to set the parameter
From the test results, after the model adds the graph regularization term, the utilization rate of the node attributes is higher, and the prediction effect is more obvious.
In the model construction section, there are three methods to be used to compute the corresponding weight matrix W when introducing graph regularization terms. In section, the main test is the extent to which the weights calculated under three different methods affect the overall prediction effect of the model. In
From the analysis of the experimental results, the effect of using the third weight calculation method in the model is significantly better. Of course, this is not absolute, in some datasets, the effect is bad.
Analyzing the test results in
The prediction of missing edges of isolated nodes in network semi-connected state has been a difficult problem. Mainly, the network topology structure is a nonconnected graph, and is extremely sparse. Traditional structural similarity methods cannot satisfy missing edge prediction of isolated nodes. In this paper, we analyze the underlying causes of the prediction difficulties and propose a joint weighted symmetric NMF model integrating graph regularization information for cold-start link prediction.
The model takes advantage of the symmetric nonnegative matrix and the graph regular non-negative matrix to integrate the nonlinear characteristics in the high-dimensional space of the node attributes, thus improving the guidance of the attribute information in the prediction process and realizing the cold-start link prediction task in the case of semi-connected network state.
Extensive experiments have validated that the proposed model globally excelled the state-of-the-art methods in the predictions of cold-start nodes. And the algorithm is highly robustness and extensibility. In the future that we would to optimize the model algorithm by parallel methods, in order to improve the prediction efficiency.
We would like to thank the anonymous reviewers for their contributions.