Making predictions involving pairwise data
Aditya Menon and Charles Elkan University of California, San Diego September 17, 2010
1 / 44
Making predictions involving pairwise data Aditya Menon and Charles - - PowerPoint PPT Presentation
Making predictions involving pairwise data Aditya Menon and Charles Elkan University of California, San Diego September 17, 2010 1 / 44 Overview of talk Propose a new problem, dyadic label prediction, and explain its importance
1 / 44
◮ Within-network classification is a special case
2 / 44
1
2
3
4
5
6
7
3 / 44
◮ Example: (user, movie) dyads with a label denoting the rating
◮ Example: (user, user) dyads with a label denoting whether the
4 / 44
◮ Entries with value “?” are missing
5 / 44
◮ Link prediction is dyadic prediction on an adjacency matrix ◮ Dyadic prediction is link prediction on a bipartite graph with
◮ Will be necessary when comparing methods later in the talk 6 / 44
◮ U ∈ Rm×k ◮ V ∈ Rn×k ◮ k ≪ min(m, n) is the number of latent features
O + λU
F + λV
F
O is the Frobenius norm over non-missing entries
7 / 44
1
2
3
4
5
6
7
8 / 44
i )
i ∈ {0, 1}L to allow multi-label prediction
9 / 44
10 / 44
11 / 44
◮ Explicit features (side information) for examples may be absent ◮ Relationship information between examples is known via the X
◮ Relationship information may have missing data ◮ Optionally, predict relationship information also 12 / 44
13 / 44
◮ Exploit advantages of dyadic prediction methods such as ability
◮ Learn latent features 14 / 44
1
2
3
4
5
6
7
15 / 44
◮ Can learn them using a latent feature approach ◮ Model X ≈ UV T and think of U as a feature representation for
W ||Y − UW T||2 F + λW
F
16 / 44
◮ Compute modularity matrix from adjacency matrix X:
◮ Latent features are eigenvectors of Q(X) ◮ Use latent features in standard supervised learning to predict Y
17 / 44
U,V,W ||X−UV T ||2 F +||Y −UW T ||2 F + 1
F +λV ||V ||2 F +λW ||W||2 F ).
U,V,W ||[XY ] − U[V ; W]T ||2 F + 1
F + λV ||V ||2 F + λW ||W||2 F )
18 / 44
U,V,W ||X−UV T ||2 F +µ||Y −UW T ||2 F + 1
F +λV ||V ||2 F +λW ||W||2 F )
◮ SMF was designed for directed graphs, unlike SocDim 19 / 44
◮ Deal with missing data in X ◮ Allow arbitrary missingness in Y , including partially observed
◮ Exploit side-information about the row objects ◮ Predict calibrated probabilities for tags ◮ Handle nominal and ordinal tags 20 / 44
i and V r j
i )TV r j
i )TV r′ j
21 / 44
i )TV r j
i )TV r′ j
r rp(r|U, V )
22 / 44
i )TV r j + (wr)T
23 / 44
i )TΛijV r j .
◮ If rows and columns are distinct sets of entities, let Λ = I ◮ For asymmetric graphs, set V = U and let Λ be unconstrained ◮ For symmetric graphs, set V = U and Λ = I 24 / 44
U,V,W||X − E(X)||2 O + 1
F + λV ||V r||2 F
l Ui)
l Ui + λW
F
25 / 44
1
2
3
4
5
6
7
26 / 44
◮ SocDim ◮ SMF ◮ LFL
27 / 44
28 / 44
◮ Since SocDim and SMF operate natively on graphs,
◮ Assume no missing data in X, for fairness to SocDim and SMF ◮ Assume graph is undirected, as SocDim does ◮ Don’t learn latent features in a supervised manner,
29 / 44
U,Λ diagonal ||Q(X) − UΛU T||2 F
U,Λ ||X − UΛU T||2 F + λU
F + λΛ
F
U ||X − σ(UU T)||2 F + λU
F
U,Λ ||f(X) − g(U, Λ)||2 F + λU
F + λΛ
F
30 / 44
◮ The graph Laplacian normalizes nodes wrt their degrees
31 / 44
◮ SocDim uses modularity matrix, while SMF uses data matrix ◮ SocDim has closed form solution, while SMF does not ◮ SocDim is immune to local optima
32 / 44
1
2
3
4
5
6
7
33 / 44
◮ Can using the Laplacian matrix with SocDim improve
◮ Can using the modularity or Laplacian matrix with SMF
◮ Just impute row/column averages for missing entries? ◮ If so, then SocDim and SMF can be applied to more problems 34 / 44
◮ Shows how dyadic label prediction can solve a difficult version
35 / 44
◮ Given true tags yil and predictions ˆ
36 / 44
37 / 44
38 / 44
39 / 44
40 / 44
1
2
3
4
5
6
7
41 / 44
42 / 44
1
2
3
4
5
6
7
43 / 44
44 / 44