[PPT] - Finding k-best MAP Solutions Using LP Relaxations Amir Globerson PowerPoint Presentation

SLIDE 1

Finding k-best MAP Solutions Using LP Relaxations

Amir Globerson School of Computer Science and Engineering The Hebrew University

Joint Work with: Menachem Fromer (Hebrew Univ.)

SLIDE 2

Prediction Problems

Consider the following problem:

Observe variables: Predict variables: xh

xv

SLIDE 3

Prediction Problems

Consider the following problem:

Observe variables: Predict variables:

Noisy Image Source Image Received bits Code word Symptoms Disease Sentence Derivation

Countless applications:

Images: Error correcting codes Medical diagnostics Text

Visible Hidden

xh xv

SLIDE 4

Statistical Models for Prediction

SLIDE 5

Statistical Models for Prediction

One approach:

SLIDE 6

Statistical Models for Prediction

One approach:

Assume (or learn) a model for p(xh, xv)

SLIDE 7

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

SLIDE 8

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

This conditional distribution often corresponds to a graphical model

SLIDE 9

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

This conditional distribution often corresponds to a graphical model Need to know how to find an assignment with maximum probability

SLIDE 10

The MAP Problem

Given a graphical model over f(x) =

ij

θij(xi, xj)

x1, . . . , xn

Find the most likely assignment:

xi xj θij(xi, xj)

p(x) = 1 Z ef(x) arg max

x

f(x)

SLIDE 11

MAP Approximations

x is discrete so generally NP hard

SLIDE 12

MAP Approximations

Many approximation approaches: x is discrete so generally NP hard

SLIDE 13

MAP Approximations

Many approximation approaches:

Greedy search

x is discrete so generally NP hard

SLIDE 14

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product)

x is discrete so generally NP hard

SLIDE 15

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

x is discrete so generally NP hard

SLIDE 16

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

x is discrete so generally NP hard

SLIDE 17

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches x is discrete so generally NP hard

SLIDE 18

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates

x is discrete so generally NP hard

SLIDE 19

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates Optimal in some cases (e.g., submodular functions)

x is discrete so generally NP hard

SLIDE 20

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates Optimal in some cases (e.g., submodular functions) Can be solved via message passing

x is discrete so generally NP hard

SLIDE 21

The k-best MAP Problem

SLIDE 22

The k-best MAP Problem

Find the k best assignments for f(x)

SLIDE 23

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by x(1), . . . , x(k)

SLIDE 24

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

x(1), . . . , x(k)

SLIDE 25

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

x(1), . . . , x(k)

SLIDE 26

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods

x(1), . . . , x(k)

SLIDE 27

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods Supervised learning

x(1), . . . , x(k)

SLIDE 28

From 2 to k best

We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k) Focus on k=2 Our key question: what is the LP formulation

f the problem, and its relaxations?

SLIDE 29

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

SLIDE 30

MAP and LP

SLIDE 31

MAP and LP

MAP: max

x

f(x)

SLIDE 32

MAP and LP

MAP: MAP as LP: max

x

f(x)

SLIDE 33

MAP and LP

MAP: MAP as LP: max

x

f(x) max

µ∈S µ · θ

SLIDE 34

MAP and LP

MAP: MAP as LP: S max

x

f(x) max

µ∈S µ · θ

SLIDE 35

MAP and LP

MAP: MAP as LP: S Hard max

x

f(x) max

µ∈S µ · θ

SLIDE 36

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x) max

µ∈S µ · θ

SLIDE 37

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x) max

µ∈S µ · θ

SLIDE 38

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x)

Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov

max

µ∈S µ · θ

SLIDE 39

LP Formulation of MAP

SLIDE 40

LP Formulation of MAP

x∗ = arg max

x

ij∈E

θij(xi, xj)

SLIDE 41

LP Formulation of MAP

max

q(x)

x

q(x)

ij

θij(xi, xj)

=

x∗ = arg max

x

ij∈E

θij(xi, xj)

SLIDE 42

LP Formulation of MAP

max

q(x)

x

q(x)

ij

θij(xi, xj)

=

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

SLIDE 43

LP Formulation of MAP

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

SLIDE 44

LP Formulation of MAP

Objective depends only on pairwise marginals

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

SLIDE 45

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

q(x)

SLIDE 46

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

q(x)

SLIDE 47

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

q(x)

max

x

ij

θij(xi, xj) = max

µ∈M(G)

ij

µij(xi, xj)θij(xi, xj)

SLIDE 48

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

q(x)

max

x

ij

θij(xi, xj) = max

µ∈M(G)

ij

µij(xi, xj)θij(xi, xj)=

max

µ∈M(G) µ · θ

SLIDE 49

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

x

q(x)

ij

θij(xi, xj) max

q(x)

ij
xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

ij∈E

θij(xi, xj)

q(x)

max

x

ij

θij(xi, xj) = max

µ∈M(G)

ij

µij(xi, xj)θij(xi, xj)

See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)

= max

µ∈M(G) µ · θ

SLIDE 50

The Marginal Polytope

Marginal Polytope

M(G)

max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

SLIDE 51

The Marginal Polytope

Marginal Polytope

M(G)

µ

max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

SLIDE 52

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

SLIDE 53

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

Difficult set to characterize. Easy to outer bound

SLIDE 54

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

Difficult set to characterize. Easy to outer bound The vertices have integral values and correspond to assignments on x

SLIDE 55

Relaxing the MAP LP

max

x

ij

θij(xi, xj) = max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

M(G)

SLIDE 56

Relaxing the MAP LP

max

x

ij

θij(xi, xj) = max

µ∈M(G)

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

Exact but Hard!

M(G)

SLIDE 57

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

S

M(G)

SLIDE 58

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved

S

M(G)

SLIDE 59

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

S

M(G)

SLIDE 60

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

xi

µij(xi, xj) =

xk

µjk(xj, xk)

S

M(G)

SLIDE 61

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

xi

µij(xi, xj) =

xk

µjk(xj, xk)

Exact for trees S

M(G)

SLIDE 62

Relaxing the MAP LP

max

x

ij

θij(xi, xj) ≤ max

µ∈S

ij∈E
xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

xi

µij(xi, xj) =

xk

µjk(xj, xk)

Efficient message passing schemes for solving the resulting (dual) LP

Exact for trees S

M(G)

SLIDE 63

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

SLIDE 64

The 2nd best problem and LP

MAP 2nd best

SLIDE 65

The 2nd best problem and LP

max

x

f(x)

MAP 2nd best

SLIDE 66

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

MAP 2nd best

SLIDE 67

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

MAP 2nd best

SLIDE 68

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

SLIDE 69

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

SLIDE 70

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

SLIDE 71

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

SLIDE 72

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

SLIDE 73

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

M(G, z)

SLIDE 74

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

M(G, z)

SLIDE 75

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

M(G, z)

SLIDE 76

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and:

M(G, z)

SLIDE 77

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G, z)

SLIDE 78

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G, z)

SLIDE 79

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G) M(G, z)

SLIDE 80

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

z M(G) M(G, z)

SLIDE 81

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

x(1)

SLIDE 82

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

x(1)

SLIDE 83

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

Is it plus one inequality?

M(G)

x(1)

SLIDE 84

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

Is it plus one inequality? If so, what inequality?

M(G)

x(1)

SLIDE 85

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

SLIDE 86

Adding inequalities to

z z

M(G)

SLIDE 87

Adding inequalities to

Any valid inequality must separate from the other vertices

z z

M(G)

SLIDE 88

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91)

i

µi(zi) ≤ n − 1

z z

M(G)

SLIDE 89

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

ther vertices
i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

SLIDE 90

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

ther vertices

But: Results in fractional vertices, even for trees

i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

SLIDE 91

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

ther vertices

But: Results in fractional vertices, even for trees

i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

SLIDE 92

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

ther vertices

But: Results in fractional vertices, even for trees Only an outer bound on

i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

M(G, z)

SLIDE 93

The tree case

SLIDE 94

The tree case

Focus on the case where G is a tree

SLIDE 95

The tree case

Focus on the case where G is a tree is given by pairwise consistency

M(G)

SLIDE 96

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

SLIDE 97

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

H(µ) =

i

(1 − di)Hi(Xi) +

ij∈G

H(Xi, Xj)

Bethe:

SLIDE 98

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

SLIDE 99

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

SLIDE 100

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

SLIDE 101

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

I(µ, z) ≤ 0

SLIDE 102

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

I(µ, z) ≤ 0

SLIDE 103

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

I(µ, z) ≤ 0

M(G, z)

SLIDE 104

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

i

(1 − di)µi(zi) +

ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

I(µ, z) ≤ 0

M(G, z)

Proof...

SLIDE 105

Proof

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

SLIDE 106

Proof

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

A(G, z) = M(G, z)

Want to show:

SLIDE 107

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

SLIDE 108

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define: Can construct p(x)

SLIDE 109

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

SLIDE 110

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

SLIDE 111

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

= 0

∀µ ∈ A(G, z)

Define:

SLIDE 112

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

In fact we can show that for trees:

µ ∈ M(G)

F(µ) = max{0, I(µ, z)}

A(G, z) =

µ |

µ ∈ M(G), I(µ, z) ≤ 0

= 0

∀µ ∈ A(G, z)

Define:

SLIDE 113

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

SLIDE 114

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

SLIDE 115

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

,

SLIDE 116

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

, Dual:

max λ · µ s.t.

ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

ij λij(zi, zj) +

i λi(zi) = 1

SLIDE 117

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is ,

I(µ, z)

Dual:

max λ · µ s.t.

ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

ij λij(zi, zj) +

i λi(zi) = 1

SLIDE 118

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is From there it’s easy to conclude that ,

I(µ, z)

Dual:

max λ · µ s.t.

ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

ij λij(zi, zj) +

i λi(zi) = 1

SLIDE 119

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is From there it’s easy to conclude that

F(µ) = max{0, I(µ, z)}

,

I(µ, z)

Dual:

max λ · µ s.t.

ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

ij λij(zi, zj) +

i λi(zi) = 1

SLIDE 120

Proof - Max marginals

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

SLIDE 121

Proof - Max marginals

Use max-marginals:

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

SLIDE 122

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

SLIDE 123

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

SLIDE 124

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

Rewrite:

λ(x) =

i

(1 − di)¯ λ(xi) +

ij∈T

¯ λij(xi, xj)

SLIDE 125

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi

Result follows after some algebra

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

ij

λij(xi, xj) +

i

λi(xi)

Rewrite:

λ(x) =

i

(1 − di)¯ λ(xi) +

ij∈T

¯ λij(xi, xj)

SLIDE 126

Tree Graph - Summary

x(1)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

SLIDE 127

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

x(1)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

SLIDE 128

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 129

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 130

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 131

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2) x(2)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 132

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2) x(2)

X

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 133

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2)

M(G, x(1)) =

µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

I(µ, x(1)) = 0

SLIDE 134

Non tree graphs

Any graph can be converted into a junction tree We can apply our tree result there For a junction tree with cliques C and separators S, the inequality is:

S∈S

(1 − dS)µS(zS) +

C∈C

µC(zC) ≤ 0

Specifying the marginal polytope requires a number

f variables exponential in the tree width. Not

practical.

SLIDE 135

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

SLIDE 136

Non trees - Approximations

x(1)

True M(G, x(1))

SLIDE 137

Non trees - Approximations

x(1)

True M(G, x(1))

SLIDE 138

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

SLIDE 139

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

SLIDE 140

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

SLIDE 141

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

SLIDE 142

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

SLIDE 143

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

SLIDE 144

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

SLIDE 145

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

SLIDE 146

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

IT (µ, z) ≤ 0

And the constraint:

SLIDE 147

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z IT (µ, z) ≤ 0

And the constraint:

SLIDE 148

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z IT (µ, z) ≤ 0

And the constraint:

IT (µ, z) ≤ 0

SLIDE 149

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

i

(1 − di)µi(zi) +

ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z

Fractional vertex

IT (µ, z) ≤ 0

And the constraint:

IT (µ, z) ≤ 0

SLIDE 150

Adding all spanning trees

SLIDE 151

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

SLIDE 152

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

SLIDE 153

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality

SLIDE 154

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality Solve LP

SLIDE 155

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality Solve LP If solution is fractional, find a violated tree inequality (if exists) and add it

SLIDE 156

Cutting Plane Algorithm

SLIDE 157

Cutting Plane Algorithm

z

SLIDE 158

Cutting Plane Algorithm

z

T1

SLIDE 159

Cutting Plane Algorithm

z

µ1

T1

SLIDE 160

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1

SLIDE 161

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

SLIDE 162

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

SLIDE 163

Cutting Plane Algorithm

How do we find a violated tree inequality? Note: Even all spanning tree inequalities might not suffice

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

SLIDE 164

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

SLIDE 165

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

SLIDE 166

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

ij∈T
µij(zi, zj) − µi(zi) − µj(zj)
+
i

µi(zi)

SLIDE 167

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

ij∈T
µij(zi, zj) − µi(zi) − µj(zj)
+
i

µi(zi)

SLIDE 168

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

ij∈T
µij(zi, zj) − µi(zi) − µj(zj)
+
i

µi(zi)

wij

SLIDE 169

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

ij∈T
µij(zi, zj) − µi(zi) − µj(zj)
+
i

µi(zi)

wij

Fixed

SLIDE 170

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

ij∈T
µij(zi, zj) − µi(zi) − µj(zj)
+
i

µi(zi)

Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)

wij

Fixed

SLIDE 171

Experiments

Alternative algorithms for approximate 2nd best:

Using approximate marginals from max-product (BMMF;

Yanover and Weiss 04)

Lawler/Nillson (72,80) - Partition assignments : Maximize over each part approximately. Cost O(n)

Our algorithm: STRIPES

x = x(1)

x1 = x(1)

1

x2 = ∗ x3 = ∗ . . . xn = ∗ x1 = x(1)

1

x2 = x(1)

2

x3 = ∗ . . . xn = ∗ . . . . . . . . . . . . . . . x1 = x(1)

1

x2 = x(1)

2

x3 = x(3)

1

. . . xn = x(n)

1

SLIDE 172

Attractive Grids

Ising models with ferromagnetic interaction The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope) Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added

0.5 1 50

Stripes Nillson BMMF

50

Stripes Nillson BMMF

Rank Run Time

SLIDE 173

Protein Side Chain Prediction

Given protein’s 3D shape (backbone), choose most probable side chain configuration

xi xk xj xh G=(V,E)

Protein backbone Side-chains

(MRFs from Yanover, Meltzer, Weiss ‘06)

Can be cast as a MAP problem Important to obtain multiple possible solutions

p(x) ∝ e

P

ij∈E θij(xi,xj)

SLIDE 174

Protein Side Chain Prediction

Stripes found the exact solutions for all problems studied In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08) 50

0.5 1

Stripes Nillson BMMF

50

Stripes Nillson BMMF

SLIDE 175

Open Questions

SLIDE 176

Open Questions

When are spanning trees enough?

SLIDE 177

Open Questions

When are spanning trees enough? What is the polytope structure for k-best?

SLIDE 178

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions

SLIDE 179

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms

SLIDE 180

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

SLIDE 181

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

SLIDE 182

Summary

The 2nd best can be posed as a linear program For trees differs from 1st best by one constraint only For non-trees, approximation can be devised by adding inequalities for all spanning trees Empirically effective