[PPT] - MELODI M achin E L earning, O ptimization, & D ata I PowerPoint Presentation

SLIDE 1

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

The Lov´ asz-Bregman Divergence and Connections to Rank Aggregation, Clustering, and Web Ranking

Rishabh Iyer Jeff Bilmes

University of Washington, Seattle

UAI-2013

MELODI

MachinE Learning, Optimization,

& Data Interpretation @ UW

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 1 / 24

SLIDE 2

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Outline

1

Ranking and Machine Learning

2

The Lov´ asz-Bregman divergences

3

Properties of the Lov´ asz-Bregman

4

Applications

5

Summary

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 2 / 24

SLIDE 3

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Occur in a number of Machine Learning applications:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 3 / 24

SLIDE 4

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Occur in a number of Machine Learning applications:

Combining Classifiers (Lebanon & Lafferty, 2002)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 3 / 24

SLIDE 5

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Occur in a number of Machine Learning applications:

Combining Classifiers (Lebanon & Lafferty, 2002)

1) Munich 2) Paris 3) London 4) Seattle 5) Atlanta 1) Seattle 2) Munich 3) London 4) Atlanta 5) Paris 1) Munich 2) Seattle 3) London 4) Paris 5) Atlanta

Aggregating Preferences (Murphy & Martin, 2003)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 3 / 24

SLIDE 6

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Occur in a number of Machine Learning applications:

Combining Classifiers (Lebanon & Lafferty, 2002)

1) Munich 2) Paris 3) London 4) Seattle 5) Atlanta 1) Seattle 2) Munich 3) London 4) Atlanta 5) Paris 1) Munich 2) Seattle 3) London 4) Paris 5) Atlanta

Aggregating Preferences (Murphy & Martin, 2003) Web Ranking (Liu, 2009)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 3 / 24

SLIDE 7

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 8

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i. Denote {σ1, σ2, . . . , σk} as a set of k permutations.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 9

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i. Denote {σ1, σ2, . . . , σk} as a set of k permutations. Some important problems concerning rankings:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 10

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i. Denote {σ1, σ2, . . . , σk} as a set of k permutations. Some important problems concerning rankings:

1

Combining Permutations: Given permutations σ1, σ2, · · · , σk, find a representative σ, which is “close“ to σ1, σ2, · · · , σk.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 11

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i. Denote {σ1, σ2, . . . , σk} as a set of k permutations. Some important problems concerning rankings:

1

Combining Permutations: Given permutations σ1, σ2, · · · , σk, find a representative σ, which is “close“ to σ1, σ2, · · · , σk.

2

Combining Scores: Given a set of score vectors x1, x2, · · · , xk, find a representative σ, which is “close“ to x1, x2, · · · , xk.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 12

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Combining Scores and Rankings

Denote σ as a permutation of {1, 2, · · · , n} such that σ(i) denotes the item at rank i and σ−1(i) as the rank of item i. Denote {σ1, σ2, . . . , σk} as a set of k permutations. Some important problems concerning rankings:

1

Combining Permutations: Given permutations σ1, σ2, · · · , σk, find a representative σ, which is “close“ to σ1, σ2, · · · , σk.

2

Combining Scores: Given a set of score vectors x1, x2, · · · , xk, find a representative σ, which is “close“ to x1, x2, · · · , xk.

3

Clustering: Cluster the set of permutations σ1, σ2, · · · , σk (or equivalently score vectors x1, x2, · · · , xk).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 4 / 24

SLIDE 13

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Rank aggregation

Combine a set of rankings σ1, σ2, · · · , σk.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 5 / 24

SLIDE 14

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Rank aggregation

Combine a set of rankings σ1, σ2, · · · , σk.

Rank Aggregation

. . .

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 5 / 24

SLIDE 15

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Rank aggregation

Combine a set of rankings σ1, σ2, · · · , σk.

Rank Aggregation

. . .

Often done using permutation based distance metrics.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 5 / 24

SLIDE 16

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Permutation based Distance Metrics d(σ, π)

Metric on the space of permutations.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 6 / 24

SLIDE 17

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Permutation based Distance Metrics d(σ, π)

Metric on the space of permutations. Kendall τ, dT(σ, π) =

i,j,i<j

I(σ−1π(i) > σ−1π(j)) and Spearman’s footrule: dS(σ, π) =

n

i=1

|σ−1(i) − π−1(i)|

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 6 / 24

SLIDE 18

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Permutation based Distance Metrics d(σ, π)

Metric on the space of permutations. Kendall τ, dT(σ, π) =

i,j,i<j

I(σ−1π(i) > σ−1π(j)) and Spearman’s footrule: dS(σ, π) =

n

i=1

|σ−1(i) − π−1(i)| Invariance with respect to re-orderings – i.e d(πσ, πτ) = d(σ, τ).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 6 / 24

SLIDE 19

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Permutation based Distance Metrics d(σ, π)

Metric on the space of permutations. Kendall τ, dT(σ, π) =

i,j,i<j

I(σ−1π(i) > σ−1π(j)) and Spearman’s footrule: dS(σ, π) =

n

i=1

|σ−1(i) − π−1(i)| Invariance with respect to re-orderings – i.e d(πσ, πτ) = d(σ, τ). Given a set of permutations σ1, σ2, · · · , σk, find a permutation σ: σ = argmin

π k

i=1

d(σi, π) (1)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 6 / 24

SLIDE 20

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score Aggregation

What if one has scores instead of just the orderings? For example,

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 7 / 24

SLIDE 21

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score Aggregation

What if one has scores instead of just the orderings? For example,

1

Combining Classifiers: probability distribution

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 7 / 24

SLIDE 22

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score Aggregation

What if one has scores instead of just the orderings? For example,

1

Combining Classifiers: probability distribution

2

Web ranking: Feature functions

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 7 / 24

SLIDE 23

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score Aggregation

What if one has scores instead of just the orderings? For example,

1

Combining Classifiers: probability distribution

2

Web ranking: Feature functions

Need to combine score vectors x1, x2, · · · , xk and find a representative ordering σ.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 7 / 24

SLIDE 24

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score Aggregation

What if one has scores instead of just the orderings? For example,

1

Combining Classifiers: probability distribution

2

Web ranking: Feature functions

Need to combine score vectors x1, x2, · · · , xk and find a representative ordering σ.

Score Aggregation

. . . Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 7 / 24

SLIDE 25

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score & permutation based divergence d(x||σ)

A natural formulation of this problem is through a score & permutation based divergence.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 8 / 24

SLIDE 26

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score & permutation based divergence d(x||σ)

A natural formulation of this problem is through a score & permutation based divergence. Represents distortion between a score x and an ordering σ.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 8 / 24

SLIDE 27

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score & permutation based divergence d(x||σ)

A natural formulation of this problem is through a score & permutation based divergence. Represents distortion between a score x and an ordering σ. Additional notion of ‘confidence‘ of the ordering.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 8 / 24

SLIDE 28

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score & permutation based divergence d(x||σ)

A natural formulation of this problem is through a score & permutation based divergence. Represents distortion between a score x and an ordering σ. Additional notion of ‘confidence‘ of the ordering. Given a set of scores x1, x2, · · · , xk, find a permutation σ: σ = argmin

π k

i=1

d(xi||π) (2)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 8 / 24

SLIDE 29

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

This Talk!

Lovasz Bregman Divergences

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 9 / 24

SLIDE 30

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Bregman Divergences

Given a differentiable convex function φ, define (Bregman, 1967): dφ(x, y) = φ(x) − φ(y) − ∇φ(y), x − y.

y dφ(x, y) x φ φ(y) φ(x) ∇φ(y)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 10 / 24

SLIDE 31

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Bregman Divergences

Given a differentiable convex function φ, define (Bregman, 1967): dφ(x, y) = φ(x) − φ(y) − ∇φ(y), x − y.

y dφ(x, y) x φ φ(y) φ(x) ∇φ(y)

Occur naturally in many machine learning applications:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 10 / 24

SLIDE 32

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Bregman Divergences

Given a differentiable convex function φ, define (Bregman, 1967): dφ(x, y) = φ(x) − φ(y) − ∇φ(y), x − y.

y dφ(x, y) x φ φ(y) φ(x) ∇φ(y)

Occur naturally in many machine learning applications: Subsumes many useful distance measures (e.g., Squared Euclidean, KL-divergence, Itakura Saito etc.)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 10 / 24

SLIDE 33

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Definition and notation

Submodular functions: special class of set functions. f (A ∪ v) − f (A) ≥ f (B ∪ v) − f (B), if A ⊆ B (3)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 11 / 24

SLIDE 34

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Definition and notation

Submodular functions: special class of set functions. f (A ∪ v) − f (A) ≥ f (B ∪ v) − f (B), if A ⊆ B (3) A bit more notation we will use: Given a vector y, define permutation σy that “sorts” y, in that: y[σy(1)] ≥ · · · ≥ y[σy(n)].

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 11 / 24

SLIDE 35

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Definition and notation

Submodular functions: special class of set functions. f (A ∪ v) − f (A) ≥ f (B ∪ v) − f (B), if A ⊆ B (3) A bit more notation we will use: Given a vector y, define permutation σy that “sorts” y, in that: y[σy(1)] ≥ · · · ≥ y[σy(n)]. Also, define cumulative unions Σk = {σ(1), σ(2), . . . , σ(k)}:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 11 / 24

SLIDE 36

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz extension of a submodular function (Lov´ asz, 1983)

The Lov´ asz Extension: ˆ f (y) = y, hf

σy

(4) where: hf

σy (σy(k)) = f (Σk) − f (Σk−1), ∀k

(5) and Σk is defined previously:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 12 / 24

SLIDE 37

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz extension of a submodular function (Lov´ asz, 1983)

The Lov´ asz Extension: ˆ f (y) = y, hf

σy

(4) where: hf

σy (σy(k)) = f (Σk) − f (Σk−1), ∀k

(5) and Σk is defined previously: If the point y is totally ordered, ˆ f has a unique subgradient at y.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 12 / 24

SLIDE 38

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz extension of a submodular function (Lov´ asz, 1983)

The Lov´ asz Extension: ˆ f (y) = y, hf

σy

(4) where: hf

σy (σy(k)) = f (Σk) − f (Σk−1), ∀k

(5) and Σk is defined previously: If the point y is totally ordered, ˆ f has a unique subgradient at y. Moreover, the subgradient hf

σy depends only on σy.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 12 / 24

SLIDE 39

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

The Lov´ asz Bregman divergence

Defined via the generalized Bregman divergences (Kiwiel, 1997)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 13 / 24

SLIDE 40

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

The Lov´ asz Bregman divergence

Defined via the generalized Bregman divergences (Kiwiel, 1997) A natural expression for the Lov´ asz Bregman when y is totally

rdered:

dˆ

f (x, y) = ˆ

f (x) − hf

σy , x

(6)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 13 / 24

SLIDE 41

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

The Lov´ asz Bregman divergence

Defined via the generalized Bregman divergences (Kiwiel, 1997) A natural expression for the Lov´ asz Bregman when y is totally

rdered:

dˆ

f (x, y) = ˆ

f (x) − hf

σy , x

= x, hf

σx − hf σy

(6)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 13 / 24

SLIDE 42

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

The Lov´ asz Bregman divergence

Defined via the generalized Bregman divergences (Kiwiel, 1997) A natural expression for the Lov´ asz Bregman when y is totally

rdered:

dˆ

f (x, y) = ˆ

f (x) − hf

σy , x

= x, hf

σx − hf σy

(6) dˆ

f (x, y) depends on y only via its permutation σy.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 13 / 24

SLIDE 43

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

LB divergence as a Score based permutation divergence

Lov´ asz Bregman is a score based permutation based divergence! dˆ

f (x||σ) = x, hf σx − hf σ

(7)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 14 / 24

SLIDE 44

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

LB divergence as a Score based permutation divergence

Lov´ asz Bregman is a score based permutation based divergence! dˆ

f (x||σ) = x, hf σx − hf σ

(7) Lemma dˆ

f (x||σ) = 0 if and only if σx = σ.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 14 / 24

SLIDE 45

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

LB divergence as a Score based permutation divergence

Lov´ asz Bregman is a score based permutation based divergence! dˆ

f (x||σ) = x, hf σx − hf σ

(7) Lemma dˆ

f (x||σ) = 0 if and only if σx = σ.

Akin to the permutation metrics, except for additional dependence

n valuations.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 14 / 24

SLIDE 46

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Examples of Lov´ asz Bregman

Cut functions: f (X) =

i∈X,j∈V \X dij,

dˆ

f (x, y) =

i<j

dij|xi − xj|I(σ−1

x σ(i) > σ−1 x σ(j))

(8)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 15 / 24

SLIDE 47

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Examples of Lov´ asz Bregman

Cut functions: f (X) =

i∈X,j∈V \X dij,

dˆ

f (x, y) =

i<j

dij|xi − xj|I(σ−1

x σ(i) > σ−1 x σ(j))

(8) Akin to the Kendall τ.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 15 / 24

SLIDE 48

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Examples of Lov´ asz Bregman

Cut functions: f (X) =

i∈X,j∈V \X dij,

dˆ

f (x, y) =

i<j

dij|xi − xj|I(σ−1

x σ(i) > σ−1 x σ(j))

(8) Akin to the Kendall τ. Setting dij = 1/|xi − xj|, dˆ

f (x||σ) = dT(σx, σ).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 15 / 24

SLIDE 49

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Examples of Lov´ asz Bregman

Cut functions: f (X) =

i∈X,j∈V \X dij,

dˆ

f (x, y) =

i<j

dij|xi − xj|I(σ−1

x σ(i) > σ−1 x σ(j))

(8) Akin to the Kendall τ. Setting dij = 1/|xi − xj|, dˆ

f (x||σ) = dT(σx, σ).

Concave over Cardinality: f (X) = g(|X|),

dˆ

f (x, y) = n

i=1

x(σx(i))δg(i) −

k

i=1

x(σ(i))δg(i) (9)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 15 / 24

SLIDE 50

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Examples of Lov´ asz Bregman

Cut functions: f (X) =

i∈X,j∈V \X dij,

dˆ

f (x, y) =

i<j

dij|xi − xj|I(σ−1

x σ(i) > σ−1 x σ(j))

(8) Akin to the Kendall τ. Setting dij = 1/|xi − xj|, dˆ

f (x||σ) = dT(σx, σ).

Concave over Cardinality: f (X) = g(|X|),

dˆ

f (x, y) = n

i=1

x(σx(i))δg(i) −

k

i=1

x(σ(i))δg(i) (9) Setting f (X) = min{|X|, k}, dˆ

f (x, y) = k i=1 x(σx(i)) − k i=1 x(σ(i)).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 15 / 24

SLIDE 51

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman as Ranking Measures

Subsume commonly used loss measures in web ranking (see paper for details).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 16 / 24

SLIDE 52

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman as Ranking Measures

Subsume commonly used loss measures in web ranking (see paper for details). The Normalized Discounted Cumulative Gain (NDCG) (J¨ arvelin & Kek¨ al¨ ainen 2002, Ravikumar et al, 2011).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 16 / 24

SLIDE 53

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman as Ranking Measures

Subsume commonly used loss measures in web ranking (see paper for details). The Normalized Discounted Cumulative Gain (NDCG) (J¨ arvelin & Kek¨ al¨ ainen 2002, Ravikumar et al, 2011). A special instance of the LB divergence corresponding to concave over cardinality functions!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 16 / 24

SLIDE 54

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman as Ranking Measures

Subsume commonly used loss measures in web ranking (see paper for details). The Normalized Discounted Cumulative Gain (NDCG) (J¨ arvelin & Kek¨ al¨ ainen 2002, Ravikumar et al, 2011). A special instance of the LB divergence corresponding to concave over cardinality functions! Area Under Curve (AUC) (Fawcett, 2006).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 16 / 24

SLIDE 55

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman as Ranking Measures

Subsume commonly used loss measures in web ranking (see paper for details). The Normalized Discounted Cumulative Gain (NDCG) (J¨ arvelin & Kek¨ al¨ ainen 2002, Ravikumar et al, 2011). A special instance of the LB divergence corresponding to concave over cardinality functions! Area Under Curve (AUC) (Fawcett, 2006). A special instance of the LB divergence corresponding to cut functions!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 16 / 24

SLIDE 56

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Properties of the Lov´ asz Bregman

Convexity: The Lov´ asz Bregman dˆ

f (x||σ) is convex in x for a

given σ.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 17 / 24

SLIDE 57

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Properties of the Lov´ asz Bregman

Convexity: The Lov´ asz Bregman dˆ

f (x||σ) is convex in x for a

given σ. Invariance over relabellings: Given a submodular function depending only on cardinality, dˆ

f (τx||τσ) = dˆ f (x||σ).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 17 / 24

SLIDE 58

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Properties of the Lov´ asz Bregman

Convexity: The Lov´ asz Bregman dˆ

f (x||σ) is convex in x for a

given σ. Invariance over relabellings: Given a submodular function depending only on cardinality, dˆ

f (τx||τσ) = dˆ f (x||σ).

Dependence on values and not just orderings: Low confidence in the ordering of x ⇒ dˆ

f (x||σ) small for every permutation σ.

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

The Lov´ asz Bregman divergence (left) and Kendall τ dT(σx, σ) (right)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 17 / 24

SLIDE 59

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Priority for higher rankings: Greater penalty to misorderings of σx and σ higher up in the rankings.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 18 / 24

SLIDE 60

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Priority for higher rankings: Greater penalty to misorderings of σx and σ higher up in the rankings. Extension to partial rankings: Natural interpretations for dˆ

f (x||σ) when σ given as a top k list or a partial ordering.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 18 / 24

SLIDE 61

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Priority for higher rankings: Greater penalty to misorderings of σx and σ higher up in the rankings. Extension to partial rankings: Natural interpretations for dˆ

f (x||σ) when σ given as a top k list or a partial ordering.

Lov´ asz Mallows model: Forms of Mallows model and Generalized Mallows model:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 18 / 24

SLIDE 62

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Priority for higher rankings: Greater penalty to misorderings of σx and σ higher up in the rankings. Extension to partial rankings: Natural interpretations for dˆ

f (x||σ) when σ given as a top k list or a partial ordering.

Lov´ asz Mallows model: Forms of Mallows model and Generalized Mallows model: p(x|θ, σ) = exp(−θdˆ

f (x||σ))

Z(θ, σ) , p(σ|Θ, X) = exp(− n

i=1 θidˆ f (xi||σ))

Z(Θ, X)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 18 / 24

SLIDE 63

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Priority for higher rankings: Greater penalty to misorderings of σx and σ higher up in the rankings. Extension to partial rankings: Natural interpretations for dˆ

f (x||σ) when σ given as a top k list or a partial ordering.

Lov´ asz Mallows model: Forms of Mallows model and Generalized Mallows model: p(x|θ, σ) = exp(−θdˆ

f (x||σ))

Z(θ, σ) , p(σ|Θ, X) = exp(− n

i=1 θidˆ f (xi||σ))

Z(Θ, X) We shall see interesting connections to web ranking!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 18 / 24

SLIDE 64

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 65

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 66

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

NP hard for most permutation based metrics!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 67

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

NP hard for most permutation based metrics! Combining Scores Often we have a collection of scores {x1, x2, · · · , xn}:

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 68

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

NP hard for most permutation based metrics! Combining Scores Often we have a collection of scores {x1, x2, · · · , xn}: σ = argminπ n

i=1 dˆ f (xi||π)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 69

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

NP hard for most permutation based metrics! Combining Scores Often we have a collection of scores {x1, x2, · · · , xn}: σ = argminπ n

i=1 dˆ f (xi||π)

Can be solved in closed form!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 70

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Lov´ asz Bregman Rank Aggregation

Combining Permutations Combine permutations: σ1, σ2, · · · , σn. σ = argminπ n

i=1 d(σi, π)

NP hard for most permutation based metrics! Combining Scores Often we have a collection of scores {x1, x2, · · · , xn}: σ = argminπ n

i=1 dˆ f (xi||π)

Can be solved in closed form! σ = σµ, where µ = 1

n

i=1 wixi

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 19 / 24

SLIDE 71

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 72

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

d1 d2 d1 dN dM

¡Documents ¡

¡Features ¡

D = {d1, d2, · · · , dN}, dj ∈ RM.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 73

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

d1 d2 d1 dN dM

¡Documents ¡

¡Features ¡

D = {d1, d2, · · · , dN}, dj ∈ RM. Feature vectors: d1, d2, · · · , dM and hence di(j) = dj(i).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 74

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

d1 d2 d1 dN dM

¡Documents ¡

¡Features ¡

D = {d1, d2, · · · , dN}, dj ∈ RM. Feature vectors: d1, d2, · · · , dM and hence di(j) = dj(i). Inference Problem: σ = argmin

π

Ψ(D, π) = argmin

π M

i=1

widˆ

f (di||σ)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 75

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

d1 d2 d1 dN dM

¡Documents ¡

¡Features ¡

D = {d1, d2, · · · , dN}, dj ∈ RM. Feature vectors: d1, d2, · · · , dM and hence di(j) = dj(i). Inference Problem: σ = argmin

π

Ψ(D, π) = argmin

π M

i=1

widˆ

f (di||σ)

σ = σµ, µ = M

i=1 widi and µ(j) = w, dj.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 76

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

A new view of web ranking

d1 d2 d1 dN dM

¡Documents ¡

¡Features ¡

D = {d1, d2, · · · , dN}, dj ∈ RM. Feature vectors: d1, d2, · · · , dM and hence di(j) = dj(i). Inference Problem: σ = argmin

π

Ψ(D, π) = argmin

π M

i=1

widˆ

f (di||σ)

σ = σµ, µ = M

i=1 widi and µ(j) = w, dj.

Functions of this form used in the past (Yue et al 2007, Chakrabarti et al 2008).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 20 / 24

SLIDE 77

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Conditional Models for ranking

Conditional probability models for ranking: p(σ|Θ, D) ∝ exp(−Ψ(D, σ)) ∝ exp(−

M

i=1

widˆ

f (di||σ))

(10)

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 21 / 24

SLIDE 78

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Conditional Models for ranking

Conditional probability models for ranking: p(σ|Θ, D) ∝ exp(−Ψ(D, σ)) ∝ exp(−

M

i=1

widˆ

f (di||σ))

(10) This is exactly the Mallow’s model corresponding to Lov´ asz Bregman divergence!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 21 / 24

SLIDE 79

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Conditional Models for ranking

Conditional probability models for ranking: p(σ|Θ, D) ∝ exp(−Ψ(D, σ)) ∝ exp(−

M

i=1

widˆ

f (di||σ))

(10) This is exactly the Mallow’s model corresponding to Lov´ asz Bregman divergence! These models have been used in past work (Dubey et al, 2009).

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 21 / 24

SLIDE 80

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score based clustering

K-means style clustering algorithm for clustering ordered vectors.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 22 / 24

SLIDE 81

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score based clustering

K-means style clustering algorithm for clustering ordered vectors. Each step in the k-means is easy!

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 22 / 24

SLIDE 82

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Score based clustering

K-means style clustering algorithm for clustering ordered vectors. Each step in the k-means is easy! Some clustering visualizations: Clustering based on orderings in 2 and 3 Dimensions.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 22 / 24

SLIDE 83

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Summary

Rank aggregation and permutation based metrics. Lov´ asz Bregman divergence as score & permutation divergence. Properties of the Lov´ asz Bregman divergence. Interesting connections to web ranking and rank aggregation.

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 23 / 24

SLIDE 84

Ranking and Machine Learning The Lov´ asz-Bregman divergences Properties of the Lov´ asz-Bregman Applications Summary

Thank You

Iyer & Bilmes, 2013 Lov´ asz Bregman Divergences page 24 / 24