learning a distance metric for structured network
play

Learning a Distance Metric for Structured Network Prediction - PowerPoint PPT Presentation

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006 Outline Introduction Context, motivation & problem


  1. Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006

  2. Outline • Introduction • Context, motivation & problem definition • Contributions • Structured network characterization • Network prediction model • Distance-based score function • Maximum-margin learning • Experiments • 1-Matchings on toy data • Equivalence networks on face images • Preliminary results on social networks • Future & related work, summary and conclusions Learning to Compare Examples Workshop, December 8, 2006

  3. Context • Pattern classification • Inputs & outputs • Independent and identically distributed • Pattern classification for structured objects • Sets of inputs & outputs • Model dependencies amongst output variables • Parameterize model using a Mahalanobis distance metric Learning to Compare Examples Workshop, December 8, 2006

  4. Motivation for structured network prediction • Man made and natural formed networks exhibit a high degree of structural regularity Learning to Compare Examples Workshop, December 8, 2006

  5. Motivation • Scale free networks Protein-interaction network, Jeffrey Heer, Berkeley Barabási & Oltvai, Nature Genetics, 2004 Learning to Compare Examples Workshop, December 8, 2006

  6. Motivation • Equivalence networks Equivalence network on Olivetti face images - union of vertex-disjoint complete subgraphs Learning to Compare Examples Workshop, December 8, 2006

  7. Structured network prediction • Given 1 2 • { x 1 , . . . , x n } n entities with attributes x k ∈ R d 5 3 • And a structural prior on networks 4 • Output • Network of similar entities with desired structure 1 2 y = ( y j,k ) y j,k ∈ { 0 , 1 } 5 3 1 1 1 1 4 1 1 1 1 1 1 Learning to Compare Examples Workshop, December 8, 2006

  8. Applications • • Tasks • Initializing • Augmenting • Filtering of networks • Domains • E-commerce • Social network analysis • Network biology Learning to Compare Examples Workshop, December 8, 2006

  9. Challenges for SNP • How can we take structural prior into account? • Complex dependencies amongst atomic edge predictions • What similarity should we use? • Avoid engineering similarity metric for each domain Learning to Compare Examples Workshop, December 8, 2006

  10. Structural network priors - 1 • Degree of a node δ ( v ) v δ ( v ) = 5 • Number of incident edges • • • Degree distribution • Probability of node having degree k, for all k Learning to Compare Examples Workshop, December 8, 2006

  11. Degree distributions &'()''*&+,-)+./-+01 %&'(&&)%*+,(*-.,*/0 ! ! !" “rich get richer” 4000 ! ! ! # !" !" nodes 20(*4536 11/')3425 protein interaction ! % !" ! # !" network 4233 nodes ! $ !" ! $ !" # " ! # !" !" !" !" 20(*3 1/')2 degree distribution %&'(&&)%*+,(*-.,*/0 0.35 ! ! social network 6848 !" nodes 0.3 0.25 11/')3425 ! # !" p(k) 0.2 equivalence network 0.15 300 nodes ! $ !" 0.1 0.05 " ! # 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 !" !" !" k 1/')2 Learning to Compare Examples Workshop, December 8, 2006

  12. Structural network priors - 2 • Combinatorial families • Chains • Trees & forests • Cycles • Unions of disjoint complete subgraphs • Generalized matchings Learning to Compare Examples Workshop, December 8, 2006

  13. ⇔ p(k) B-matchings • A b-matching has for (almost) all δ ( v ) = b v k b 1 2 1 2 1 2 1 2 5 3 5 3 5 3 5 3 4 4 4 4 1-matching 2-matching 3-matching 4-matching y j,k ∈ { 0 , 1 } � � y j,k = b ∀ j y j,k = b ∀ k y ∈ B j k • We consider B-matching networks because they B are flexible and efficient Learning to Compare Examples Workshop, December 8, 2006

  14. Predictive Model • Maximum weight b-matching as predictive model 1. Receive nodes and attributes s = ( s j,k ) s j,k ∈ R 2. Compute edge weights 3. Select a b-matching with maximal weight � y ∈ B y T s y j,k s j,k = max max y ∈ B j,k • B-matchings requires time O ( n 3 ) Learning to Compare Examples Workshop, December 8, 2006

  15. Structured network prediction • The question that remains is how do we compute the weights? Learning to Compare Examples Workshop, December 8, 2006

  16. Learning the weights • • Weights are parameterized by a Mahalanobis distance metric • s j,k = ( x j − x k ) T Q ( x j − x k ) Q � 0 • • In other words, we want to find the best linear transformation (rotation & scaling) to facilitate b-matching Learning to Compare Examples Workshop, December 8, 2006

  17. Learning the weights • • We propose to learn the weights from one or more partially observed networks • We observe the attributes of all nodes • But only a subset of the edges • • Transductive approach • Learn weights to “fit” training edges train edges • While structured network prediction is performed over training and test edges test edges Learning to Compare Examples Workshop, December 8, 2006

  18. Example • Given the following nodes & edges Learning to Compare Examples Workshop, December 8, 2006

  19. Example • 1-matching Q = Learning to Compare Examples Workshop, December 8, 2006

  20. Example • 1-matching Learning to Compare Examples Workshop, December 8, 2006

  21. Example • 1-matching Learning to Compare Examples Workshop, December 8, 2006

  22. Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

  23. Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

  24. Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T ( y − y 2 ) ≥ 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

  25. Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Learning to Compare Examples Workshop, December 8, 2006

  26. Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Q = Q − ǫ∂ gap ( y , y bad ) 2. ∂ Q Learning to Compare Examples Workshop, December 8, 2006

  27. Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q linear in Q    � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T  jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006

  28. Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q Caveat: this is not the whole story! Thanks to Simon Lacoste-Julien for help debugging    � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T  jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006

  29. Experiments • How does it work in practice? Learning to Compare Examples Workshop, December 8, 2006

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend