Link prediction in graph construction for supervised and - PowerPoint PPT Presentation

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of S˜ ao Paulo (USP) Brazil July 2015

Outline Introduction 1 Proposal 2 Experiments 3 Conclusion 4 Jorge Valverde-Rebaza Link prediction in graph construction 2 / 20

Motivation Networks or graphs are a powerful relational representation that has been employed in different tasks of machine learning. Link prediction is an important scientific issue regarding network analysis that has attracted increasing attention in recent years. Many social, biological and information systems can be naturally described as networks, while some data are flat data . To apply graph-based methods to flat data is necessary to convert the data into a network, furthermore converting flat data to relational data can help to improve classification accuracy. Despite many methods for graph construction have been proposed, it is still an open problem . Jorge Valverde-Rebaza Link prediction in graph construction 4 / 20

Objective and hypothesis Propose a new method for graph construction using the link prediction intuition. If a network is very sparse , for example when a minimum spanning tree is applied, it misses structural information for the inference algorithms. If a network is very dense , for example when k NN considering k > 10 is applied, the excess edges become noise in the graph. Considering a basic graph structure is possible add predicted edges, generating a new ( balanced ) graph structure. It can improves the quality of graphs leading to better classification accuracy in supervised and semi-supervised domains (SSL). Jorge Valverde-Rebaza Link prediction in graph construction 5 / 20

Graph Construction Many data sets are available in tabular flat format. It is necessary to convert the data into a network to be able to apply a graph-based algorithm. We apply k -nearest neighbor ( k NN), Mutual k NN, Minimum/Maximum spanning tree (Min/MaxST) to generate an initial graph. (a) 3NN (b) M3NN (c) MinST (d) MaxST Figure: Graph construction methods. Jorge Valverde-Rebaza Link prediction in graph construction 6 / 20

Link Prediction (LP) Link prediction (LP) addresses the problem of predicting the existence of missing relations or new ones. Common Neighbors (c) : s c v i , v j = | Γ( v i ) ∩ Γ( v j ) | Weighted CN (w) : s w v i , v j = � v k ∈ Γ( v i ) ∩ Γ( v j ) w ( v i , v k ) + w ( v k , v j ) l = 1 β l · | paths � l � v i , v j = � ∞ Katz (k) : s k v i , v j | = β A v i , v j + β 2 ( A 2 ) v i , v j + . . . Figure: Link prediction process. Jorge Valverde-Rebaza Link prediction in graph construction 7 / 20

Proposal To predict new links is assigned a score s v i v j for each pair of disconnected vertices v i and v j . All non-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be of higher existence likelihoods. A percentage of the top ranked links can be considered. (a) Dataset (b) MinST (c) MinST+LP (Katz- 30%) Figure: LP construction steps. Jorge Valverde-Rebaza Link prediction in graph construction 9 / 20

Datasets Table: Data sets descriptions for SSL classification Data set # Instances # Attributes # Classes g241c 1500 241 2 g241n 1500 241 2 Digit 1 1500 241 2 USPS 1500 241 2 COIL 2 1500 241 2 Table: Data sets descriptions for supervised classification Data set # Instances # Attributes # Classes Wine 178 13 3 Ecoli 336 8 8 Customers 440 8 2 Cancer 699 10 2 Blood 748 5 2 Gaussians3 500 2 2 Gaussians5 500 2 2 Jorge Valverde-Rebaza Link prediction in graph construction 11 / 20

SSL experimental setup PCA was applied reducing the dimensions to 50 since in high-dimensional data the distance to the nearest neighbor approaches the distance to the farthest neighbor which degenerates the quality of the graph. 10 and 100 labeled vertices were randomly selected. We apply MinST, MaxST, k NN and M k NN with 1 ≤ k ≤ 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP , MaxST+LP , k NN+LP and M k NN+LP with 1 ≤ k ≤ 5. The weighted graph W uses the binary weighting approach. The algorithm used for the label inference task was the Local and Global Consistency (LGC). The average accuracy of 30 runs was used as evaluation. Jorge Valverde-Rebaza Link prediction in graph construction 12 / 20

Supervised experimental setup For Cancer dataset the instances with missing values were also removed. We apply MinST, MaxST, k NN and M k NN with 1 ≤ k ≤ 20, and the LP graphs (our proposal) considering the same methods combined with a LP measure: MinST+LP , MaxST+LP , k NN+LP and M k NN+LP with 1 ≤ k ≤ 3. The weighted graph W uses the opposite of Euclidean Distance. The relational algorithms used for the classification were: nobayes, nolb-lr-binary, nolb-lr-count, nolb-lr-mode, prn. The accuracy of 10-fold cross validation was used as evaluation. Jorge Valverde-Rebaza Link prediction in graph construction 13 / 20

Results CD 1 2 3 4 5 6 7 8 kNN+LP MaxST MkNN MinST kNN MaxST+LP MinST+LP MkNN+LP Figure: Nemenyi post-hoc test for semi-supervised classification. CD 1 2 3 4 5 6 7 8 kNN+LP MaxST kNN MkNN+LP MkNN MinST MinST+LP MaxST+LP Figure: Nemenyi post-hoc test for supervised classification. Jorge Valverde-Rebaza Link prediction in graph construction 14 / 20

Parameter analysis Figure: Distribution of parameters k and top percentage of links used for the graph construction methods in the supervised classification. Jorge Valverde-Rebaza Link prediction in graph construction 15 / 20

Average degree 12 k NN M k NN 10 MST Average degree 8 k NN+LP M k NN+LP 6 MST+LP 4 2 0 2 4 6 8 10 k or % of links * 10 Figure: Average degree for k NN, M k NN, MST and LP versions: k NN+LP , M k NN+LP , MSt+LP applied to Gaussians3 data set. LP versions use k = 3 and the common neighbors measure. Jorge Valverde-Rebaza Link prediction in graph construction 16 / 20

Conclusions Link prediction (LP) has been used in many fields of science, as online social networks where links can be recommended as promising friendships. Here LP was used for graph construction: from an initial graph structure edges are predict generating a new balanced graph. The proposed graphs were evaluated in supervised and semi-supervised classification providing improvements in accuracy. The graphs are sparse and represent well the neighborhood of a point. In future work, other baseline methods could be tested as well other measures for LP . Our approach also could be applied in other domains of machine learning using graph-based methods. Jorge Valverde-Rebaza Link prediction in graph construction 18 / 20

References Berton, L. and Lopes, A. (2014). JMLR , 8:935–983. Graph construction based on labeled instances for Rohban, M. H. and Rabiee, H. R. (2012). semi-supervised learning. Supervised neighborhood graph construction for In Proceedings of 22nd ICPR , pages 2477–2482. semi-supervised classification. Liben-Nowell, D. and Kleinberg, J. (2007). Pattern Recognition , 45(4):1363–1372. The link-prediction problem for social networks. Valverde-Rebaza, J. and Lopes, A. (2012). JASIST , 58(7):1019–1031. Link prediction in complex networks based on cluster information. L¨ u, L. and Zhou, T. (2011). In SBIA ’12 , pages 92–101. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications , Valverde-Rebaza, J. and Lopes, A. (2013). 390(6):1150 – 1170. Exploiting behaviors of communities of Twitter users for Macskassy, S. A. and Provost, F. J. (2007). link prediction. SNAM , 3(4):1063–1074. Classification in networked data: A toolkit and a univariate case study. Jorge Valverde-Rebaza Link prediction in graph construction 19 / 20

Thank you Jorge Valverde-Rebaza jvalverr@icmc.usp.br

Link prediction in graph construction for supervised and - PowerPoint PPT Presentation

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of S ao Paulo (USP) Brazil July

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Link Prediction Based on Graph Neural Networks Muhan Zhang and Yixin Chen, NeurIPS 2018 Link

Link prediction via matrix factorization Charles Elkan University of California, San Diego

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Shoestring: Graph-Based Semi- Supervised Classification with Severely Limited Labeled Data Wanyu

QBX and the DPIE for the Maxwell Equations Christian Howard University of Illinois @

Present A comprehensive data base management software for General Insurance Professional: Mine+

Robocup 2019 HELLO! Welc lcome ome to Robocup 2019 Sydney | Australia MRL-RSL 2 Overview

The Bi-objective Multi-Vehicle Covering Tour Problem (BOMCTP): formulation and lower-bound

Arborescences of Derived Graphs CJ Dowd, Sylvester Zhang, Valerie Zhang UMN REU 2019 July 25,

Joint BWP/ QWP workshop with stakeholders in relation to prior knowledge and its use in

Spectral Characterizations of Anti-Regular Graphs Barbara Schweitzer 1 Julian Lee, Eric Piato 2 Dr.

Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned

Link prediction in graph construction for supervised and - PowerPoint PPT Presentation

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton, Jorge Valverde-Rebaza and Alneu de Andrade Lopes Laboratory of Computational Intelligence (LABIC) University of S ao Paulo (USP) Brazil July

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Link Prediction Based on Graph Neural Networks Muhan Zhang and Yixin Chen, NeurIPS 2018 Link

Link prediction via matrix factorization Charles Elkan University of California, San Diego

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Corporate Presentation September 2018 About Link REIT About Link REIT Link is Our Portfolio (1)

Vertex Standard EVX-Link Training EVX-Link Training What is the EVX-Link EVX-Link is a fast

Changing the Game - The De-Linking Paradigm Old Way Our Way De-Link De-Link Link Link

10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link 10 GHz Microwave Link Project

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Shoestring: Graph-Based Semi- Supervised Classification with Severely Limited Labeled Data Wanyu

QBX and the DPIE for the Maxwell Equations Christian Howard University of Illinois @

Present A comprehensive data base management software for General Insurance Professional: Mine+

Robocup 2019 HELLO! Welc lcome ome to Robocup 2019 Sydney | Australia MRL-RSL 2 Overview

The Bi-objective Multi-Vehicle Covering Tour Problem (BOMCTP): formulation and lower-bound

Arborescences of Derived Graphs CJ Dowd, Sylvester Zhang, Valerie Zhang UMN REU 2019 July 25,

Joint BWP/ QWP workshop with stakeholders in relation to prior knowledge and its use in

Spectral Characterizations of Anti-Regular Graphs Barbara Schweitzer 1 Julian Lee, Eric Piato 2 Dr.

Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned by CamScanner Scanned

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &