Link prediction via matrix factorization
Charles Elkan University of California, San Diego September 6, 2011
1 / 26
Link prediction via matrix factorization Charles Elkan University - - PowerPoint PPT Presentation
Link prediction via matrix factorization Charles Elkan University of California, San Diego September 6, 2011 1 / 26 Outline Introduction: Three related prediction tasks 1 Link prediction in networks 2 Discussion 3 2 / 26 Link prediction
1 / 26
1
2
3
2 / 26
3 / 26
4 / 26
5 / 26
6 / 26
7 / 26
Standard Cold-start users Cold-start users + movies
0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000
0.7162 0.8039 0.9608 0.7063 0.7118 0.7451
Baseline LFL Setting Test set MAE 8 / 26
1
2
3
9 / 26
? ? ? ?
10 / 26
◮ Contrast: Non-predictive task, e.g. community detection.
◮ Contrast: Algorithm but no goal function, e.g. betweenness.
◮ Contrast: Use only graph structure, e.g. commute time.
◮ Contrast: Clusters, modularity.
◮ Contrast: MCMC, betweenness, SVD.
◮ Contrast: Compare only to classic methods. 11 / 26
◮ For author-author linking, side-information can be words in
◮ For country-country conflict, side-information is geographic
12 / 26
◮ latent vectors αi ∈ Rk for each node i ◮ scaling factors Λ ∈ Rk×k ◮ weights W ∈ Rd×d for node features ◮ weights v ∈ Rd′ for edge features.
i Λαj + xT i Wxj + vTzij)
1 1+exp(−x).
13 / 26
α,Λ,W,v
14 / 26
◮ Not influenced by relative size of positive and negative classes.
◮ Sampling is popular, but loses information. ◮ Weighting is merely heuristic. 15 / 26
α,Λ,W,v
16 / 26
◮ latent features versus unsupervised scores ◮ latent features versus explicit features.
◮ Computational biology: Protein-protein interaction network,
◮ Citation networks: NIPS authors, condensed matter physicists ◮ Social phenomena: Military conflicts between countries,
17 / 26
18 / 26
nodes |O+| |O−| +ve:−ve ratio mean degree Prot-Prot 2617 23710 6,824,979 1 : 300 9.1 Metabolic 668 5564 440,660 1 : 80 8.3 NIPS 2865 9466 8,198,759 1 : 866 3.3 Condmat 14230 2392 429,232 1 : 179 0.17 Conflict 130 320 16580 1 : 52 2.5 PowerGrid 4941 13188 24,400,293 1 : 2000 2.7 Protein-protein interaction data from Noble. Per protein: 76 features. Metabolic interactions of S. cerevisiae from the KEGG/PATHWAY database. Per protein: 157 phylogenetic features, 145 gene expression features, 23 location features.
Condensed-matter physicists [Newman]. Use node-pairs 2 hops away in first five years. Military disputes [MID 3.0]. Per country: population, GDP, polity. Per dyad: 6 features, e.g. geographic distance. US electric power grid network [Watts and Strogatz].
19 / 26
20 / 26
21 / 26
22 / 26
23 / 26
1
2
3
24 / 26
25 / 26
26 / 26
27 / 26