Link prediction in graph construction for supervised and - - PowerPoint PPT Presentation

link prediction in graph construction for supervised and
SMART_READER_LITE
LIVE PREVIEW

Link prediction in graph construction for supervised and - - PowerPoint PPT Presentation

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of S ao Paulo (USP) Brazil July


slide-1
SLIDE 1

Link prediction in graph construction for supervised and semi-supervised learning

Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes

Laboratory of Computational Intelligence (LABIC) University of S˜ ao Paulo (USP) Brazil July 2015

slide-2
SLIDE 2

Outline

1

Introduction

2

Proposal

3

Experiments

4

Conclusion

Jorge Valverde-Rebaza Link prediction in graph construction 2 / 20

slide-3
SLIDE 3

Outline

1

Introduction

2

Proposal

3

Experiments

4

Conclusion

Jorge Valverde-Rebaza Link prediction in graph construction 3 / 20

slide-4
SLIDE 4

Motivation

Networks or graphs are a powerful relational representation that has been employed in different tasks of machine learning. Link prediction is an important scientific issue regarding network analysis that has attracted increasing attention in recent years. Many social, biological and information systems can be naturally described as networks, while some data are flat data. To apply graph-based methods to flat data is necessary to convert the data into a network, furthermore converting flat data to relational data can help to improve classification accuracy. Despite many methods for graph construction have been proposed, it is still an open problem.

Jorge Valverde-Rebaza Link prediction in graph construction 4 / 20

slide-5
SLIDE 5

Objective and hypothesis

Propose a new method for graph construction using the link prediction intuition. If a network is very sparse, for example when a minimum spanning tree is applied, it misses structural information for the inference algorithms. If a network is very dense, for example when kNN considering k > 10 is applied, the excess edges become noise in the graph. Considering a basic graph structure is possible add predicted edges, generating a new (balanced) graph structure. It can improves the quality of graphs leading to better classification accuracy in supervised and semi-supervised domains (SSL).

Jorge Valverde-Rebaza Link prediction in graph construction 5 / 20

slide-6
SLIDE 6

Graph Construction

Many data sets are available in tabular flat format. It is necessary to convert the data into a network to be able to apply a graph-based algorithm. We apply k-nearest neighbor (kNN), Mutual kNN, Minimum/Maximum spanning tree (Min/MaxST) to generate an initial graph.

(a) 3NN (b) M3NN (c) MinST (d) MaxST

Figure: Graph construction methods.

Jorge Valverde-Rebaza Link prediction in graph construction 6 / 20

slide-7
SLIDE 7

Link Prediction (LP)

Link prediction (LP) addresses the problem of predicting the existence of missing relations or new ones. Common Neighbors (c): sc

vi,vj = |Γ(vi) ∩ Γ(vj)|

Weighted CN (w): sw

vi,vj = vk∈Γ(vi)∩Γ(vj) w(vi, vk) + w(vk, vj)

Katz (k): sk

vi,vj = ∞ l=1 βl · |pathsl vi,vj| = βAvi,vj + β2(A2)vi,vj + . . .

Figure: Link prediction process.

Jorge Valverde-Rebaza Link prediction in graph construction 7 / 20

slide-8
SLIDE 8

Outline

1

Introduction

2

Proposal

3

Experiments

4

Conclusion

Jorge Valverde-Rebaza Link prediction in graph construction 8 / 20

slide-9
SLIDE 9

Proposal

To predict new links is assigned a score svivj for each pair of disconnected vertices vi and vj. All non-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be of higher existence likelihoods. A percentage of the top ranked links can be considered.

(a) Dataset (b) MinST (c) MinST+LP (Katz- 30%)

Figure: LP construction steps.

Jorge Valverde-Rebaza Link prediction in graph construction 9 / 20

slide-10
SLIDE 10

Outline

1

Introduction

2

Proposal

3

Experiments

4

Conclusion

Jorge Valverde-Rebaza Link prediction in graph construction 10 / 20

slide-11
SLIDE 11

Datasets

Table: Data sets descriptions for SSL classification

Data set # Instances # Attributes # Classes g241c 1500 241 2 g241n 1500 241 2 Digit1 1500 241 2 USPS 1500 241 2 COIL2 1500 241 2

Table: Data sets descriptions for supervised classification

Data set # Instances # Attributes # Classes Wine 178 13 3 Ecoli 336 8 8 Customers 440 8 2 Cancer 699 10 2 Blood 748 5 2 Gaussians3 500 2 2 Gaussians5 500 2 2

Jorge Valverde-Rebaza Link prediction in graph construction 11 / 20

slide-12
SLIDE 12

SSL experimental setup

PCA was applied reducing the dimensions to 50 since in high-dimensional data the distance to the nearest neighbor approaches the distance to the farthest neighbor which degenerates the quality of the graph. 10 and 100 labeled vertices were randomly selected. We apply MinST, MaxST, kNN and MkNN with 1 ≤ k ≤ 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP , MaxST+LP , kNN+LP and MkNN+LP with 1 ≤ k ≤ 5. The weighted graph W uses the binary weighting approach. The algorithm used for the label inference task was the Local and Global Consistency (LGC). The average accuracy of 30 runs was used as evaluation.

Jorge Valverde-Rebaza Link prediction in graph construction 12 / 20

slide-13
SLIDE 13

Supervised experimental setup

For Cancer dataset the instances with missing values were also removed. We apply MinST, MaxST, kNN and MkNN with 1 ≤ k ≤ 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP , MaxST+LP , kNN+LP and MkNN+LP with 1 ≤ k ≤ 3. The weighted graph W uses the opposite of Euclidean Distance. The relational algorithms used for the classification were: nobayes, nolb-lr-binary, nolb-lr-count, nolb-lr-mode, prn. The accuracy of 10-fold cross validation was used as evaluation.

Jorge Valverde-Rebaza Link prediction in graph construction 13 / 20

slide-14
SLIDE 14

Results

1 2 3 4 5 6 7 8 kNN+LP MkNN kNN MinST+LP MkNN+LP MaxST+LP MinST MaxST CD

Figure: Nemenyi post-hoc test for semi-supervised classification.

1 2 3 4 5 6 7 8 kNN+LP kNN MkNN MinST+LP MaxST+LP MinST MkNN+LP MaxST CD

Figure: Nemenyi post-hoc test for supervised classification.

Jorge Valverde-Rebaza Link prediction in graph construction 14 / 20

slide-15
SLIDE 15

Parameter analysis

Figure: Distribution of parameters k and top percentage of links used for the graph construction methods in the supervised classification.

Jorge Valverde-Rebaza Link prediction in graph construction 15 / 20

slide-16
SLIDE 16

Average degree

2 4 6 8 10 2 4 6 8 10 12 k or % of links * 10 Average degree kNN MkNN MST kNN+LP MkNN+LP MST+LP

Figure: Average degree for kNN, MkNN, MST and LP versions: kNN+LP , MkNN+LP , MSt+LP applied to Gaussians3 data set. LP versions use k = 3 and the common neighbors measure.

Jorge Valverde-Rebaza Link prediction in graph construction 16 / 20

slide-17
SLIDE 17

Outline

1

Introduction

2

Proposal

3

Experiments

4

Conclusion

Jorge Valverde-Rebaza Link prediction in graph construction 17 / 20

slide-18
SLIDE 18

Conclusions

Link prediction (LP) has been used in many fields of science, as online social networks where links can be recommended as promising friendships. Here LP was used for graph construction: from an initial graph structure edges are predict generating a new balanced graph. The proposed graphs were evaluated in supervised and semi-supervised classification providing improvements in accuracy. The graphs are sparse and represent well the neighborhood of a point. In future work, other baseline methods could be tested as well other measures for LP . Our approach also could be applied in other domains

  • f machine learning using graph-based methods.

Jorge Valverde-Rebaza Link prediction in graph construction 18 / 20

slide-19
SLIDE 19

References

Berton, L. and Lopes, A. (2014). Graph construction based on labeled instances for semi-supervised learning. In Proceedings of 22nd ICPR, pages 2477–2482. Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. JASIST, 58(7):1019–1031. L¨ u, L. and Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150 – 1170. Macskassy, S. A. and Provost, F. J. (2007). Classification in networked data: A toolkit and a univariate case study. JMLR, 8:935–983. Rohban, M. H. and Rabiee, H. R. (2012). Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognition, 45(4):1363–1372. Valverde-Rebaza, J. and Lopes, A. (2012). Link prediction in complex networks based on cluster information. In SBIA ’12, pages 92–101. Valverde-Rebaza, J. and Lopes, A. (2013). Exploiting behaviors of communities of Twitter users for link prediction. SNAM, 3(4):1063–1074. Jorge Valverde-Rebaza Link prediction in graph construction 19 / 20

slide-20
SLIDE 20

Thank you

Jorge Valverde-Rebaza jvalverr@icmc.usp.br