cross graph learning of multi relational associations
play

Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, - PowerPoint PPT Presentation

Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, Yiming Yang Carnegie Mellon University { hanxiaol, yiming } @cs.cmu.edu June 22, 2016 1 / 24 Outline Task Description New Contributions Framework Scalable Inference


  1. Cross-Graph Learning of Multi-Relational Associations Hanxiao Liu, Yiming Yang Carnegie Mellon University { hanxiaol, yiming } @cs.cmu.edu June 22, 2016 1 / 24

  2. Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 2 / 24

  3. Task Description Goal : Predict associations among heterogeneous graphs. Citation Structure Similarity Sequence Similarity Coauthorship Shared Foci Paper Publish Write Interact Author Venue Compound Protein Attend (a) Drug-Target Interaction (b) Citation Network “John publish a reinforcement learning paper at ICML.” (John,RL Paper,ICML) 3 / 24

  4. Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 4 / 24

  5. New Contributions ◮ A unified framework to integrating heterogeneous information in multiple graphs. ◮ Transductive learning to leverage both labeled data (sparse) and unlabeled data (massive). ◮ A convex approximation for the scalable inference over the combinatorial number of possible tuples. 5 / 24

  6. Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 6 / 24

  7. Framework Notation ◮ G (1) , G (2) , . . . , G ( J ) are individual graphs; ◮ n j is the #nodes in G ( j ) ; ◮ ( i 1 , i 2 , . . . , i J ) is a tuple (multi-relation); ◮ f i 1 ,i 2 ,...,i J is the predicted score for the tuple; ◮ f is a tensor in R n 1 × n 2 ×···× n J . 7 / 24

  8. Framework Product Graph ( P ) induced from G (1) , . . . , G ( J ) . � � P = , , � �� � � �� � � �� � G (1) G (2) G (3) Tensor product: P ( G (1) , G (2) , G (3) ) = G (1) ⊗ G (2) ⊗ G (3) 8 / 24

  9. Framework Product Graph ( P ) induced from G (1) , . . . , G ( J ) . � � P = , , � �� � � �� � � �� � G (1) G (2) G (3) Tensor product: P ( G (1) , G (2) , G (3) ) = G (1) ⊗ G (2) ⊗ G (3) 8 / 24

  10. Framework Why product graph? ◮ Mapping heterogeneous graphs onto a unified graph for label propagation (transductive learning). 9 / 24

  11. Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24

  12. Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24

  13. Framework Assuming vec ( f ) ∼ N (0 , P ) (1) which implies: − log p ( f | P ) ∝ vec ( f ) ⊤ P − 1 vec ( f ) := � f � 2 (2) P Optimization problem ℓ O ( f ) + γ 2 � f � 2 min (3) P f 10 / 24

  14. Framework For computational tractability, we focus on the spectral graph product family of P . Spectral Graph Product (SGP) � G (1) , . . . , G ( J ) � The eigensystem of P κ is parametrized by the eigensystems of individual graphs, i.e., � � � � � κ λ i 1 , . . . , λ i J , v i j (4) i 1 ,...,i J j λ i j / v i j is the i j -th eigenvalue/eigenvector of the j -th graph. 11 / 24

  15. Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24

  16. Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24

  17. Framework Nice properties of SGP: Subsuming basic operations κ ( x, y ) = x × y = ⇒ P κ ( G, H ) = G ⊗ H Tensor (5) κ ( x, y ) = x + y = ⇒ P κ ( G, H ) = G ⊕ H Cartesian (6) Supporting graph diffusions σ Heat ( P κ ) = I + P κ + 1 2 P 2 κ + · · · = P e κ (7) σ von − Neumann ( P κ ) = I + P κ + P 2 κ + · · · = P (8) 1 1 − κ Order-insensitive: If κ is commutative, then SGP is commutative (up to graph isomorphism). 12 / 24

  18. Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 13 / 24

  19. Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24

  20. Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24

  21. Scalable Inference For general GP, the semi-norm is computed as � f � 2 P = vec ( f ) ⊤ P − 1 vec ( f ) (9) For SGP, P κ no longer has to be explicitly computed. � � 2 n 1 ,n 2 ,...,n J f v i 1 , . . . , v i J � � f � 2 P κ = (10) � � κ λ i 1 , . . . , λ i J i 1 ,i 2 ,...,i J ◮ f ( v i 1 , v i 2 , . . . , v i J ) = f × 1 v i 1 × 2 v i 2 · · · × J v i J ◮ However, even evaluating (10) is expensive. 14 / 24

  22. Scalable Inference Using low-rank SGP ◮ f lies in the linear span of the eigenvectors of P . ◮ Eigenvectors of high volatility can be pruned away. 15 / 24

  23. Scalable Inference Using low-rank SGP ◮ f lies in the linear span of the eigenvectors of P . ◮ Eigenvectors of high volatility can be pruned away. Figure : Eigenvectors of G (blue), H (red) and P ( G, H ). 15 / 24

  24. Scalable Inference Restrict f in the linear span of “smooth” bases of P . d 1 ,d 2 , ··· ,d J � � f ( α ) = α i 1 ,i 2 , ··· ,i J v i j (11) i 1 ,i 2 , ··· ,i J =1 j where the core tensor α ∈ R d 1 × d 2 ×···× d J , d j ≪ n j . The semi-norm becomes d 1 ,d 2 , ··· ,d J α 2 � � f ( α ) � 2 i 1 ,i 2 , ··· ,i J P κ = (12) � � κ λ i 1 , λ i 2 , . . . , λ i J i 1 ,i 2 ,...,i J =1 We then optimize w.r.t. α instead of f . Parameter size: � j n j → � j d j . 16 / 24

  25. Scalable Inference Restrict f in the linear span of “smooth” bases of P . d 1 ,d 2 , ··· ,d J � � f ( α ) = α i 1 ,i 2 , ··· ,i J v i j (11) i 1 ,i 2 , ··· ,i J =1 j where the core tensor α ∈ R d 1 × d 2 ×···× d J , d j ≪ n j . The semi-norm becomes d 1 ,d 2 , ··· ,d J α 2 � � f ( α ) � 2 i 1 ,i 2 , ··· ,i J P κ = (12) � � κ λ i 1 , λ i 2 , . . . , λ i J i 1 ,i 2 ,...,i J =1 We then optimize w.r.t. α instead of f . Parameter size: � j n j → � j d j . 16 / 24

  26. Scalable Inference Figure : Tucker Decomposition, where α is the core tensor. 17 / 24

  27. Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24

  28. Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24

  29. Scalable Inference Revised optimization objective α ∈ R d 1 × d 2 ···× dJ ℓ O ( f ( α )) + γ 2 � f ( α ) � 2 min (13) P κ Ranking loss function � � 2 � f i 1 ...i J − f i ′ 1 ...i ′ ( i 1 , . . . , i J ) ∈ O + J J ) ∈ ¯ ( i ′ 1 , . . . , i ′ O ℓ O ( f ) = (14) |O × ¯ O| � ∂f i 1 ,...,i J � − ∂f i ′ ∇ α = ∂ℓ O 1 ,...,i ′ + γα ⊘ κ J (15) ∂f ∂α ∂α Tensor algebras are carried out on GPU. 18 / 24

  30. Outline Task Description New Contributions Framework Scalable Inference Empirical Evaluation Summary 19 / 24

  31. Empirical Evaluation Datasets Enzyme 445 compounds, 664 proteins. DBLP 34 K authors, 11 K papers, 22 venues. Representative Baselines TF/GRTF Tensor Factorization/Graph-Regularized TF NN One-class Nearest Neighbor RSVM Ranking SVMs LTKM Low-Rank Tensor Kernel Machines 20 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend