Learning a Distance Metric for Structured Network Prediction - - PowerPoint PPT Presentation

learning a distance metric for structured network
SMART_READER_LITE
LIVE PREVIEW

Learning a Distance Metric for Structured Network Prediction - - PowerPoint PPT Presentation

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara Columbia University Learning to Compare Examples Workshop, December 8, 2006 Outline Introduction Context, motivation & problem


slide-1
SLIDE 1

Learning to Compare Examples Workshop, December 8, 2006

Learning a Distance Metric for Structured Network Prediction

Stuart Andrews and Tony Jebara Columbia University

slide-2
SLIDE 2

Learning to Compare Examples Workshop, December 8, 2006

  • Introduction
  • Context, motivation & problem definition
  • Contributions
  • Structured network characterization
  • Network prediction model
  • Distance-based score function
  • Maximum-margin learning
  • Experiments
  • 1-Matchings on toy data
  • Equivalence networks on face images
  • Preliminary results on social networks
  • Future & related work, summary and conclusions

Outline

slide-3
SLIDE 3

Learning to Compare Examples Workshop, December 8, 2006

  • Pattern classification
  • Inputs & outputs
  • Independent and identically distributed
  • Pattern classification for structured objects
  • Sets of inputs & outputs
  • Model dependencies amongst output variables
  • Parameterize model using a Mahalanobis distance

metric

Context

slide-4
SLIDE 4

Learning to Compare Examples Workshop, December 8, 2006

  • Man made and natural formed networks exhibit a high

degree of structural regularity

Motivation for structured network prediction

slide-5
SLIDE 5

Learning to Compare Examples Workshop, December 8, 2006

  • Scale free networks

Motivation

Protein-interaction network, Barabási & Oltvai, Nature Genetics, 2004 Jeffrey Heer, Berkeley

slide-6
SLIDE 6

Learning to Compare Examples Workshop, December 8, 2006

  • Equivalence networks

Motivation

Equivalence network on Olivetti face images - union

  • f vertex-disjoint complete

subgraphs

slide-7
SLIDE 7

Learning to Compare Examples Workshop, December 8, 2006

  • Given
  • entities with attributes
  • And a structural prior on networks
  • Output
  • Network of similar entities with desired structure

Structured network prediction

n

{x1, . . . , xn} xk ∈ Rd y = (yj,k)

yj,k ∈ {0, 1}

4 3 1 2 5 4 3 1 2 5

1 1 1 1 1 1 1 1 1 1

slide-8
SLIDE 8

Learning to Compare Examples Workshop, December 8, 2006

  • Tasks
  • Initializing
  • Augmenting
  • Filtering of networks
  • Domains
  • E-commerce
  • Social network analysis
  • Network biology

Applications

slide-9
SLIDE 9

Learning to Compare Examples Workshop, December 8, 2006

  • How can we take structural prior into account?
  • Complex dependencies amongst atomic edge predictions
  • What similarity should we use?
  • Avoid engineering similarity metric for each domain

Challenges for SNP

slide-10
SLIDE 10

Learning to Compare Examples Workshop, December 8, 2006

  • Degree of a node
  • Number of incident edges
  • Degree distribution
  • Probability of node having degree k, for all k

Structural network priors - 1

δ(v)

δ(v) = 5

v

slide-11
SLIDE 11

Learning to Compare Examples Workshop, December 8, 2006

Degree distributions

!"

#

!"

!$

!"

!%

!"

!#

!"

!!

&'()''*&+,-)+./-+01 20(*3 20(*4536 !"

"

!"

!

!"

#

!"

!$

!"

!#

!"

!!

%&'(&&)%*+,(*-.,*/0 1/')2 11/')3425 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 0.05 0.1 0.15 0.2 0.25 0.3 0.35 degree distribution k p(k) !"

"

!"

!

!"

#

!"

!$

!"

!#

!"

!!

%&'(&&)%*+,(*-.,*/0 1/')2 11/')3425

protein interaction network 4233 nodes social network 6848 nodes “rich get richer” 4000 nodes equivalence network 300 nodes

slide-12
SLIDE 12

Learning to Compare Examples Workshop, December 8, 2006

  • Combinatorial families
  • Chains
  • Trees & forests
  • Cycles
  • Unions of disjoint complete subgraphs
  • Generalized matchings

Structural network priors - 2

slide-13
SLIDE 13

Learning to Compare Examples Workshop, December 8, 2006

  • A b-matching has for (almost) all
  • We consider B-matching networks because they

are flexible and efficient

B-matchings

δ(v) = b

v

4 3 1 2 5 4 3 1 2 5 4 3 1 2 5 4 3 1 2 5

1-matching 2-matching 3-matching 4-matching

y ∈ B

  • k

yj,k = b ∀j

  • j

yj,k = b ∀k B

k p(k) b

yj,k ∈ {0, 1}

slide-14
SLIDE 14

Learning to Compare Examples Workshop, December 8, 2006

  • Maximum weight b-matching as predictive model

1. Receive nodes and attributes 2. Compute edge weights 3. Select a b-matching with maximal weight

  • B-matchings requires time

Predictive Model

O(n3) s = (sj,k) sj,k ∈ R max

y∈B

  • j,k

yj,ksj,k = max

y∈B yTs

slide-15
SLIDE 15

Learning to Compare Examples Workshop, December 8, 2006

  • The question that remains is how do we compute the

weights?

Structured network prediction

slide-16
SLIDE 16

Learning to Compare Examples Workshop, December 8, 2006

  • Weights are parameterized by a Mahalanobis distance

metric

  • In other words, we want to find the best linear

transformation (rotation & scaling) to facilitate b-matching

Learning the weights

sj,k = (xj − xk)TQ(xj − xk) Q 0

slide-17
SLIDE 17

Learning to Compare Examples Workshop, December 8, 2006

train edges test edges

  • We propose to learn the weights from one or more

partially observed networks

  • We observe the attributes of all nodes
  • But only a subset of the edges
  • Transductive approach
  • Learn weights to “fit” training edges
  • While structured network prediction

is performed over training and test edges

Learning the weights

slide-18
SLIDE 18

Learning to Compare Examples Workshop, December 8, 2006

Example

  • Given the following nodes & edges
slide-19
SLIDE 19

Learning to Compare Examples Workshop, December 8, 2006

Example

Q =

  • 1-matching
slide-20
SLIDE 20

Learning to Compare Examples Workshop, December 8, 2006

Example

  • 1-matching
slide-21
SLIDE 21

Learning to Compare Examples Workshop, December 8, 2006

Example

  • 1-matching
slide-22
SLIDE 22

Learning to Compare Examples Workshop, December 8, 2006

  • We use the dual-extragradient algorithm to learn
  • Define the margin to be the minimum gap between the predictive

values of the true structure and each possible alternative structure

Maximum-margin

Q sQ(x)Ty sQ(x)Ty1 sQ(x)Ty2 sQ(x)Ty3

[ ]

d1,1 d1,2 d1,2 sQ(x) = vec

Taskar et al. 2005

R y ∈ B y1, y2, . . . ∈ B

slide-23
SLIDE 23

Learning to Compare Examples Workshop, December 8, 2006

  • We use the dual-extragradient algorithm to learn
  • Define the margin to be the minimum gap between the predictive

values of the true structure and each possible alternative structure

Maximum-margin

Q sQ(x)Ty sQ(x)Ty1 sQ(x)Ty2 sQ(x)Ty3 sQ(x)T(y − y1) ≥ 1

[ ]

d1,1 d1,2 d1,2 sQ(x) = vec

Taskar et al. 2005

R y ∈ B y1, y2, . . . ∈ B

slide-24
SLIDE 24

Learning to Compare Examples Workshop, December 8, 2006

  • We use the dual-extragradient algorithm to learn
  • Define the margin to be the minimum gap between the predictive

values of the true structure and each possible alternative structure

Maximum-margin

Q y1, y2, . . . ∈ B sQ(x)Ty sQ(x)Ty1 sQ(x)Ty2 sQ(x)Ty3 y ∈ B sQ(x)T(y − y1) ≥ 1 sQ(x)T(y − y2) ≥ 1

[ ]

d1,1 d1,2 d1,2 sQ(x) = vec

Taskar et al. 2005

R

slide-25
SLIDE 25

Learning to Compare Examples Workshop, December 8, 2006

  • You can think of the dual extragradient algorithm as

successively minimizing the violation of the gap constraints

  • Each iteration focusses on “worst offending network”

1.

Maximum-margin

ybad = argmin

˜ y∈B

sQ(x)T ˜ y

slide-26
SLIDE 26

Learning to Compare Examples Workshop, December 8, 2006

  • You can think of the dual extragradient algorithm as

successively minimizing the violation of the gap constraints

  • Each iteration focusses on “worst offending network”

1. 2.

Maximum-margin

ybad = argmin

˜ y∈B

sQ(x)T ˜ y Q = Q − ǫ∂gap(y, ybad) ∂Q

slide-27
SLIDE 27

Learning to Compare Examples Workshop, December 8, 2006

 

jk∈FP

(xj − xk)(xj − xk)T −

  • jk∈FN

(xj − xk)(xj − xk)T  

  • You can think of the dual extragradient algorithm as

successively minimizing the violation of the gap constraints

  • Each iteration focusses on “worst offending network”

1. 2.

Maximum-margin

ybad = argmin

˜ y∈B

sQ(x)T ˜ y Q = Q − ǫ∂gap(y, ybad) ∂Q dj,k = (xj − xk)TQ(xj − xk) = Q, (xj − xk)(xj − xk)T

linear in Q

slide-28
SLIDE 28

Learning to Compare Examples Workshop, December 8, 2006

 

jk∈FP

(xj − xk)(xj − xk)T −

  • jk∈FN

(xj − xk)(xj − xk)T  

  • You can think of the dual extragradient algorithm as

successively minimizing the violation of the gap constraints

  • Each iteration focusses on “worst offending network”

1. 2.

Maximum-margin

ybad = argmin

˜ y∈B

sQ(x)T ˜ y Q = Q − ǫ∂gap(y, ybad) ∂Q dj,k = (xj − xk)TQ(xj − xk) = Q, (xj − xk)(xj − xk)T

Caveat: this is not the whole story! Thanks to Simon Lacoste-Julien for help debugging

slide-29
SLIDE 29

Learning to Compare Examples Workshop, December 8, 2006

Experiments

  • How does it work in practice?
slide-30
SLIDE 30

Learning to Compare Examples Workshop, December 8, 2006

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ROC fp rate recall raw dist acc 98.0 recall 0.0 learn dist acc 99.5 recall 72.2

Error metrics for SNP

  • Recall & hamming loss (#FP + #FN)
  • Reward the correct structure, but not the distance metric
  • We construct a structure-sensitive ROC curve
  • Structure predictions are blended with distances
  • We can now measure
  • Area under the ROC curve (AUC)
  • Recall

˜ yj,k = yj,k + ǫ exp(−dj,k)

!"#$ %& '& (& )& *&& *& %& +& '& ,& (&

  • &

)& .& *&& **& !&/( !&/' !&/% & &/% &/' &/( &/)

slide-31
SLIDE 31

Learning to Compare Examples Workshop, December 8, 2006

Example

300 nodes in 2D 1

  • matching structure

X,Y features

slide-32
SLIDE 32

Learning to Compare Examples Workshop, December 8, 2006

Example

50 100 150 200 250 0.4 0.5 0.6 0.7 0.8 0.9 1 iter AUC raw dist test learn dist test raw dist valid learn dist valid 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ROC fp rate recall raw dist acc 98.0 recall 0.0 learn dist acc 99.5 recall 72.2 50 100 150 200 250 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 iter top N recall raw dist test learn dist test raw dist valid learn dist valid 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 P/R recall precision raw dist acc 98.0 recall 0.0 learn dist acc 99.5 recall 72.2

Q 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5

Q =

slide-33
SLIDE 33

Learning to Compare Examples Workshop, December 8, 2006

  • Olivetti face images

Equivalence networks

300 images 10 per person 30 PCA features

slide-34
SLIDE 34

Learning to Compare Examples Workshop, December 8, 2006

Olivetti face images

Q =

5 10 15 20 25 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 iter top N recall raw dist test learn dist test raw dist valid learn dist valid 5 10 15 20 25 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 iter AUC raw dist test learn dist test raw dist valid learn dist valid 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ROC fp rate recall raw dist acc 96.6 recall 68.0 learn dist acc 98.5 recall 91.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 P/R recall precision raw dist acc 96.6 recall 68.0 learn dist acc 98.5 recall 91.2

! " #$ #" %$ %" &$ " #$ #" %$ %" &$

Q =

slide-35
SLIDE 35

Learning to Compare Examples Workshop, December 8, 2006

Olivetti face images

Reconstructions of rows of sqrt(Q)

slide-36
SLIDE 36

Learning to Compare Examples Workshop, December 8, 2006

Olivetti face images

Reconstructions of rows of sqrt(Q) - using scaled rows (x8)

slide-37
SLIDE 37

Learning to Compare Examples Workshop, December 8, 2006

Olivetti face images

Reconstructions of rows of sqrt(Q) - using scaled rows (x11)

slide-38
SLIDE 38

Learning to Compare Examples Workshop, December 8, 2006

Olivetti face images

Reconstructions of rows of sqrt(Q) - using scaled rows (x14)

slide-39
SLIDE 39

Learning to Compare Examples Workshop, December 8, 2006

Social network ... and future work

6848 users “ assume” b-matching structure bag-of-words features (favorite music, books, etc.)

Jeffrey Heer, Berkeley

slide-40
SLIDE 40

Learning to Compare Examples Workshop, December 8, 2006

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ROC fp rate recall raw dist acc 94.0 recall 2.4 learn dist acc 93.8 recall 1.6

Social network ... and future work

Q =

! " #$ #" %$ %" &$ &" '$ '" "$ " #$ #" %$ %" &$ &" '$ '" "$

300 nodes in 2D 1

  • matching structure

X,Y features

slide-41
SLIDE 41

Learning to Compare Examples Workshop, December 8, 2006

Future work

  • Selecting the parameter b
  • Learning and matching to the true degree distribution
  • Learning over alternate combinatorial structures such

as trees, forests, cliques

slide-42
SLIDE 42

Learning to Compare Examples Workshop, December 8, 2006

  • Structured output models
  • B. Taskar, S. Lacoste-Julien, and M. I. Jordan “Structured prediction,

dual extragradient and bregman projections” NIPS 2005

  • I. Tsochantaridis and T. Joachims and T. Hofmann and Y. Altun “Large

Margin Methods for Structured and Interdependent Output Variables” JMLR

  • F. Sha, L. Saul “Large Margin Gaussian Mixture Models for Automatic

Speech Recognition” NIPS 2006

  • Network reconstruction
  • A. Culotta, R. Bekkerman, and A. McCallum “Extracting social

networks and contact information from email and the web” AAAI 2004

  • M. Rabbat, M. Figueiredo, and R. Nowak. “Network inference from co-
  • ccurrences” University of Wisconsin 2006
  • Network simulation
  • R. Albert and A. L. Barabasi “Statistical mechanics of complex

networks”, Reviews of Modern Physics, and many others ...

Related Work

slide-43
SLIDE 43

Learning to Compare Examples Workshop, December 8, 2006

  • Distance metric learning
  • J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov

“Neighbourhood components analysis”, NIPS 2004

  • E. Xing, A. Ng, M. Jordan, and S. Russell “Distance metric learning,

with application to clustering with side-information” NIPS 2003

  • S. Shalev-Shwartz, Y. Singer, and A. Ng “Online and batch learning of

pseudometrics” ICML 2004, and many others ...

Related Work

slide-44
SLIDE 44

Learning to Compare Examples Workshop, December 8, 2006

  • We address a novel structured network prediction

problem

  • We developed a structured output model that uses a

structural network priors to make predictions

  • We parameterized the model using a Mahalanobis

distance metric

  • We demonstrated that it is possible to learn a distance

suitable for structured network prediction

  • The advantage of using a structured output model to

predict edges is that we obtain a higher recall for comparable precision / FP rates

Conclusions

slide-45
SLIDE 45

Learning to Compare Examples Workshop, December 8, 2006

Thank you for your attention Question & comments?