Learning a Distance Metric for Structured Network Prediction - PowerPoint PPT Presentation

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006

Outline • Introduction • Context, motivation & problem definition • Contributions • Structured network characterization • Network prediction model • Distance-based score function • Maximum-margin learning • Experiments • 1-Matchings on toy data • Equivalence networks on face images • Preliminary results on social networks • Future & related work, summary and conclusions Learning to Compare Examples Workshop, December 8, 2006

Context • Pattern classification • Inputs & outputs • Independent and identically distributed • Pattern classification for structured objects • Sets of inputs & outputs • Model dependencies amongst output variables • Parameterize model using a Mahalanobis distance metric Learning to Compare Examples Workshop, December 8, 2006

Motivation for structured network prediction • Man made and natural formed networks exhibit a high degree of structural regularity Learning to Compare Examples Workshop, December 8, 2006

Motivation • Scale free networks Protein-interaction network, Jeffrey Heer, Berkeley Barabási & Oltvai, Nature Genetics, 2004 Learning to Compare Examples Workshop, December 8, 2006

Motivation • Equivalence networks Equivalence network on Olivetti face images - union of vertex-disjoint complete subgraphs Learning to Compare Examples Workshop, December 8, 2006

Structured network prediction • Given 1 2 • { x 1 , . . . , x n } n entities with attributes x k ∈ R d 5 3 • And a structural prior on networks 4 • Output • Network of similar entities with desired structure 1 2 y = ( y j,k ) y j,k ∈ { 0 , 1 } 5 3 1 1 1 1 4 1 1 1 1 1 1 Learning to Compare Examples Workshop, December 8, 2006

Applications • • Tasks • Initializing • Augmenting • Filtering of networks • Domains • E-commerce • Social network analysis • Network biology Learning to Compare Examples Workshop, December 8, 2006

Challenges for SNP • How can we take structural prior into account? • Complex dependencies amongst atomic edge predictions • What similarity should we use? • Avoid engineering similarity metric for each domain Learning to Compare Examples Workshop, December 8, 2006

Structural network priors - 1 • Degree of a node δ ( v ) v δ ( v ) = 5 • Number of incident edges • • • Degree distribution • Probability of node having degree k, for all k Learning to Compare Examples Workshop, December 8, 2006

Degree distributions &'()''*&+,-)+./-+01 %&'(&&)%*+,(*-.,*/0 ! ! !" “rich get richer” 4000 ! ! ! # !" !" nodes 20(*4536 11/')3425 protein interaction ! % !" ! # !" network 4233 nodes ! $ !" ! $ !" # " ! # !" !" !" !" 20(*3 1/')2 degree distribution %&'(&&)%*+,(*-.,*/0 0.35 ! ! social network 6848 !" nodes 0.3 0.25 11/')3425 ! # !" p(k) 0.2 equivalence network 0.15 300 nodes ! $ !" 0.1 0.05 " ! # 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 !" !" !" k 1/')2 Learning to Compare Examples Workshop, December 8, 2006

Structural network priors - 2 • Combinatorial families • Chains • Trees & forests • Cycles • Unions of disjoint complete subgraphs • Generalized matchings Learning to Compare Examples Workshop, December 8, 2006

⇔ p(k) B-matchings • A b-matching has for (almost) all δ ( v ) = b v k b 1 2 1 2 1 2 1 2 5 3 5 3 5 3 5 3 4 4 4 4 1-matching 2-matching 3-matching 4-matching y j,k ∈ { 0 , 1 } � � y j,k = b ∀ j y j,k = b ∀ k y ∈ B j k • We consider B-matching networks because they B are flexible and efficient Learning to Compare Examples Workshop, December 8, 2006

Predictive Model • Maximum weight b-matching as predictive model 1. Receive nodes and attributes s = ( s j,k ) s j,k ∈ R 2. Compute edge weights 3. Select a b-matching with maximal weight � y ∈ B y T s y j,k s j,k = max max y ∈ B j,k • B-matchings requires time O ( n 3 ) Learning to Compare Examples Workshop, December 8, 2006

Structured network prediction • The question that remains is how do we compute the weights? Learning to Compare Examples Workshop, December 8, 2006

Learning the weights • • Weights are parameterized by a Mahalanobis distance metric • s j,k = ( x j − x k ) T Q ( x j − x k ) Q � 0 • • In other words, we want to find the best linear transformation (rotation & scaling) to facilitate b-matching Learning to Compare Examples Workshop, December 8, 2006

Learning the weights • • We propose to learn the weights from one or more partially observed networks • We observe the attributes of all nodes • But only a subset of the edges • • Transductive approach • Learn weights to “fit” training edges train edges • While structured network prediction is performed over training and test edges test edges Learning to Compare Examples Workshop, December 8, 2006

Example • Given the following nodes & edges Learning to Compare Examples Workshop, December 8, 2006

Example • 1-matching Q = Learning to Compare Examples Workshop, December 8, 2006

Example • 1-matching Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin Taskar et al. 2005 • We use the dual-extragradient algorithm to learn Q • Define the margin to be the minimum gap between the predictive y ∈ B values of the true structure and each possible alternative y 1 , y 2 , . . . ∈ B structure [ ] d 1 , 1 d 1 , 2 R d 1 , 2 s Q ( x ) T y s Q ( x ) = vec s Q ( x ) T ( y − y 1 ) ≥ 1 s Q ( x ) T y 1 s Q ( x ) T ( y − y 2 ) ≥ 1 s Q ( x ) T y 2 s Q ( x ) T y 3 Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B Q = Q − ǫ∂ gap ( y , y bad ) 2. ∂ Q Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q linear in Q    � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T  jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006

Maximum-margin • You can think of the dual extragradient algorithm as successively minimizing the violation of the gap constraints • Each iteration focusses on “worst offending network” s Q ( x ) T ˜ y bad = argmin 1. y ˜ y ∈ B d j,k = ( x j − x k ) T Q ( x j − x k ) Q = Q − ǫ∂ gap ( y , y bad ) 2. = � Q, ( x j − x k )( x j − x k ) T � ∂ Q Caveat: this is not the whole story! Thanks to Simon Lacoste-Julien for help debugging    � � ( x j − x k )( x j − x k ) T − ( x j − x k )( x j − x k ) T  jk ∈ FP jk ∈ FN Learning to Compare Examples Workshop, December 8, 2006

Experiments • How does it work in practice? Learning to Compare Examples Workshop, December 8, 2006

Learning a Distance Metric for Structured Network Prediction - PowerPoint PPT Presentation

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006 Outline Introduction Context, motivation & problem

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Learning distance functions Xin Sui CS395T Visual Recognition and Search The University of Texas

Learning Distance for Sequences by Learning a Ground Metric Bing Su Ying Wu

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

SYDE 372 - Winter 2011 Introduction to Pattern Recognition Distance Measures for Pattern

MIRA, SVM, k-NN Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values

Does Training Affect Match Performance? A Study Using Data Mining And Tracking Devices

CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke

Charming new results from STAR! NSD Staff Meeting, January 22, 2019 Sooraj Radhakrishnan

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent,

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics

x 2 > b w T x > 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)

Learning a Distance Metric for Structured Network Prediction - PowerPoint PPT Presentation

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006 Outline Introduction Context, motivation & problem

Distance Metric Learning: Beyond 0/1 Loss Praveen Krishnan CVIT, IIIT Hyderabad June 14, 2017 1

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Distance in data space Notion of distance (metrics) in data space Who is my closest neighbor?

Learning distance functions Xin Sui CS395T Visual Recognition and Search The University of Texas

Learning Distance for Sequences by Learning a Ground Metric Bing Su Ying Wu

Information Theoretic Metric Learning Instructor: Sham Kakade 1 Metric Learning In k -nearest

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

SYDE 372 - Winter 2011 Introduction to Pattern Recognition Distance Measures for Pattern

MIRA, SVM, k-NN Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values

Does Training Affect Match Performance? A Study Using Data Mining And Tracking Devices

CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke

Charming new results from STAR! NSD Staff Meeting, January 22, 2019 Sooraj Radhakrishnan

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent,

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics

x 2 &gt; b w T x &gt; 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)

x 2 > b w T x > 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)