Finding k-best MAP Solutions Using LP Relaxations Amir Globerson - - PowerPoint PPT Presentation

finding k best map solutions using lp relaxations
SMART_READER_LITE
LIVE PREVIEW

Finding k-best MAP Solutions Using LP Relaxations Amir Globerson - - PowerPoint PPT Presentation

Finding k-best MAP Solutions Using LP Relaxations Amir Globerson School of Computer Science and Engineering The Hebrew University Joint Work with: Menachem Fromer (Hebrew Univ.) Prediction Problems Consider the following problem: Observe


slide-1
SLIDE 1

Finding k-best MAP Solutions Using LP Relaxations

Amir Globerson School of Computer Science and Engineering The Hebrew University

Joint Work with: Menachem Fromer (Hebrew Univ.)

slide-2
SLIDE 2

Prediction Problems

Consider the following problem:

Observe variables: Predict variables: xh

xv

slide-3
SLIDE 3

Prediction Problems

Consider the following problem:

Observe variables: Predict variables:

Noisy Image Source Image Received bits Code word Symptoms Disease Sentence Derivation

Countless applications:

Images: Error correcting codes Medical diagnostics Text

Visible Hidden

xh xv

slide-4
SLIDE 4

Statistical Models for Prediction

slide-5
SLIDE 5

Statistical Models for Prediction

One approach:

slide-6
SLIDE 6

Statistical Models for Prediction

One approach:

Assume (or learn) a model for p(xh, xv)

slide-7
SLIDE 7

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

slide-8
SLIDE 8

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

This conditional distribution often corresponds to a graphical model

slide-9
SLIDE 9

Statistical Models for Prediction

One approach:

Assume (or learn) a model for Predict the most likely hidden values

p(xh, xv) arg max

xh p(xh|xv)

This conditional distribution often corresponds to a graphical model Need to know how to find an assignment with maximum probability

slide-10
SLIDE 10

The MAP Problem

Given a graphical model over f(x) =

  • ij

θij(xi, xj)

x1, . . . , xn

Find the most likely assignment:

xi xj θij(xi, xj)

p(x) = 1 Z ef(x) arg max

x

f(x)

slide-11
SLIDE 11

MAP Approximations

x is discrete so generally NP hard

slide-12
SLIDE 12

MAP Approximations

Many approximation approaches: x is discrete so generally NP hard

slide-13
SLIDE 13

MAP Approximations

Many approximation approaches:

Greedy search

x is discrete so generally NP hard

slide-14
SLIDE 14

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product)

x is discrete so generally NP hard

slide-15
SLIDE 15

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

x is discrete so generally NP hard

slide-16
SLIDE 16

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

x is discrete so generally NP hard

slide-17
SLIDE 17

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches x is discrete so generally NP hard

slide-18
SLIDE 18

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates

x is discrete so generally NP hard

slide-19
SLIDE 19

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates Optimal in some cases (e.g., submodular functions)

x is discrete so generally NP hard

slide-20
SLIDE 20

MAP Approximations

Many approximation approaches:

Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations

LP approaches

Provide optimality certificates Optimal in some cases (e.g., submodular functions) Can be solved via message passing

x is discrete so generally NP hard

slide-21
SLIDE 21

The k-best MAP Problem

slide-22
SLIDE 22

The k-best MAP Problem

Find the k best assignments for f(x)

slide-23
SLIDE 23

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by x(1), . . . , x(k)

slide-24
SLIDE 24

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

x(1), . . . , x(k)

slide-25
SLIDE 25

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

x(1), . . . , x(k)

slide-26
SLIDE 26

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods

x(1), . . . , x(k)

slide-27
SLIDE 27

The k-best MAP Problem

Find the k best assignments for f(x) Denote these by Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods Supervised learning

x(1), . . . , x(k)

slide-28
SLIDE 28

From 2 to k best

We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k) Focus on k=2 Our key question: what is the LP formulation

  • f the problem, and its relaxations?
slide-29
SLIDE 29

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

slide-30
SLIDE 30

MAP and LP

slide-31
SLIDE 31

MAP and LP

MAP: max

x

f(x)

slide-32
SLIDE 32

MAP and LP

MAP: MAP as LP: max

x

f(x)

slide-33
SLIDE 33

MAP and LP

MAP: MAP as LP: max

x

f(x) max

µ∈S µ · θ

slide-34
SLIDE 34

MAP and LP

MAP: MAP as LP: S max

x

f(x) max

µ∈S µ · θ

slide-35
SLIDE 35

MAP and LP

MAP: MAP as LP: S Hard max

x

f(x) max

µ∈S µ · θ

slide-36
SLIDE 36

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x) max

µ∈S µ · θ

slide-37
SLIDE 37

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x) max

µ∈S µ · θ

slide-38
SLIDE 38

MAP and LP

MAP: MAP as LP: S Hard Approximate MAP via LP max

x

f(x)

Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov

max

µ∈S µ · θ

slide-39
SLIDE 39

LP Formulation of MAP

slide-40
SLIDE 40

LP Formulation of MAP

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

slide-41
SLIDE 41

LP Formulation of MAP

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj)

=

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

slide-42
SLIDE 42

LP Formulation of MAP

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj)

=

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

slide-43
SLIDE 43

LP Formulation of MAP

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

slide-44
SLIDE 44

LP Formulation of MAP

Objective depends only on pairwise marginals

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

slide-45
SLIDE 45

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

q(x)

slide-46
SLIDE 46

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

q(x)

slide-47
SLIDE 47

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

q(x)

max

x

  • ij

θij(xi, xj) = max

µ∈M(G)

  • ij

µij(xi, xj)θij(xi, xj)

slide-48
SLIDE 48

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

q(x)

max

x

  • ij

θij(xi, xj) = max

µ∈M(G)

  • ij

µij(xi, xj)θij(xi, xj)=

max

µ∈M(G) µ · θ

slide-49
SLIDE 49

LP Formulation of MAP

Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)

max

q(x)

  • x

q(x)

  • ij

θij(xi, xj) max

q(x)

  • ij
  • xi,xj

qij(xi, xj)θij(xi, xj)

= =

1

q∗(x)

x

x∗

x∗ = arg max

x

  • ij∈E

θij(xi, xj)

q(x)

max

x

  • ij

θij(xi, xj) = max

µ∈M(G)

  • ij

µij(xi, xj)θij(xi, xj)

See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)

= max

µ∈M(G) µ · θ

slide-50
SLIDE 50

The Marginal Polytope

Marginal Polytope

M(G)

max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

slide-51
SLIDE 51

The Marginal Polytope

Marginal Polytope

M(G)

µ

max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

slide-52
SLIDE 52

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

slide-53
SLIDE 53

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

Difficult set to characterize. Easy to outer bound

slide-54
SLIDE 54

The Marginal Polytope

Marginal Polytope

M(G)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

Difficult set to characterize. Easy to outer bound The vertices have integral values and correspond to assignments on x

slide-55
SLIDE 55

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) = max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

M(G)

slide-56
SLIDE 56

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) = max

µ∈M(G)

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

Exact but Hard!

M(G)

slide-57
SLIDE 57

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

S

M(G)

slide-58
SLIDE 58

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved

S

M(G)

slide-59
SLIDE 59

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

S

M(G)

slide-60
SLIDE 60

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

  • xi

µij(xi, xj) =

  • xk

µjk(xj, xk)

S

M(G)

slide-61
SLIDE 61

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

  • xi

µij(xi, xj) =

  • xk

µjk(xj, xk)

Exact for trees S

M(G)

slide-62
SLIDE 62

Relaxing the MAP LP

max

x

  • ij

θij(xi, xj) ≤ max

µ∈S

  • ij∈E
  • xi,xj

µij(xi, xj)θij(xi, xj)

If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency

j i k

  • xi

µij(xi, xj) =

  • xk

µjk(xj, xk)

Efficient message passing schemes for solving the resulting (dual) LP

Exact for trees S

M(G)

slide-63
SLIDE 63

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

slide-64
SLIDE 64

The 2nd best problem and LP

MAP 2nd best

slide-65
SLIDE 65

The 2nd best problem and LP

max

x

f(x)

MAP 2nd best

slide-66
SLIDE 66

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

MAP 2nd best

slide-67
SLIDE 67

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

MAP 2nd best

slide-68
SLIDE 68

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

slide-69
SLIDE 69

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

slide-70
SLIDE 70

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

slide-71
SLIDE 71

The 2nd best problem and LP

max

x=x(1) f(x)

max

x

f(x)

max

µ∈M(G) µ · θ

max

µ∈M(G,x(1)) µ · θ

x(1)

MAP 2nd best

Approximations:

slide-72
SLIDE 72

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

slide-73
SLIDE 73

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

M(G, z)

slide-74
SLIDE 74

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

M(G, z)

slide-75
SLIDE 75

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj)

M(G, z)

slide-76
SLIDE 76

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and:

M(G, z)

slide-77
SLIDE 77

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G, z)

slide-78
SLIDE 78

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G, z)

slide-79
SLIDE 79

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

M(G) M(G, z)

slide-80
SLIDE 80

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)

µ

There exists a p(x) s.t.

p(xi, xj) = µij(xi, xj) and: p(z) = 0

z M(G) M(G, z)

slide-81
SLIDE 81

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

x(1)

slide-82
SLIDE 82

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

x(1)

slide-83
SLIDE 83

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

Is it plus one inequality?

M(G)

x(1)

slide-84
SLIDE 84

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

max

x=x(1) f(x; θ) =

max

µ∈M(G,x(1)) µ · θ

Is there a simple characterization of ?

M(G, x(1))

Is it plus one inequality? If so, what inequality?

M(G)

x(1)

slide-85
SLIDE 85

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

slide-86
SLIDE 86

Adding inequalities to

z z

M(G)

slide-87
SLIDE 87

Adding inequalities to

Any valid inequality must separate from the other vertices

z z

M(G)

slide-88
SLIDE 88

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91)

  • i

µi(zi) ≤ n − 1

z z

M(G)

slide-89
SLIDE 89

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

  • ther vertices
  • i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

slide-90
SLIDE 90

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

  • ther vertices

But: Results in fractional vertices, even for trees

  • i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

slide-91
SLIDE 91

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

  • ther vertices

But: Results in fractional vertices, even for trees

  • i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

slide-92
SLIDE 92

Adding inequalities to

Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for

  • ther vertices

But: Results in fractional vertices, even for trees Only an outer bound on

  • i

µi(zi) ≤ n − 1

z z

n − 1

M(G)

M(G, z)

slide-93
SLIDE 93

The tree case

slide-94
SLIDE 94

The tree case

Focus on the case where G is a tree

slide-95
SLIDE 95

The tree case

Focus on the case where G is a tree is given by pairwise consistency

M(G)

slide-96
SLIDE 96

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

slide-97
SLIDE 97

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

H(µ) =

  • i

(1 − di)Hi(Xi) +

  • ij∈G

H(Xi, Xj)

Bethe:

slide-98
SLIDE 98

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

slide-99
SLIDE 99

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

slide-100
SLIDE 100

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

slide-101
SLIDE 101

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

I(µ, z) ≤ 0

slide-102
SLIDE 102

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

M(G)

I(µ, z) ≤ 0

slide-103
SLIDE 103

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

I(µ, z) ≤ 0

M(G, z)

slide-104
SLIDE 104

The tree case

Focus on the case where G is a tree is given by pairwise consistency Define:

z

I(µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

I(µ, z) ≤ 0

M(G, z)

Proof...

slide-105
SLIDE 105

Proof

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

slide-106
SLIDE 106

Proof

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

A(G, z) = M(G, z)

Want to show:

slide-107
SLIDE 107

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

slide-108
SLIDE 108

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define: Can construct p(x)

slide-109
SLIDE 109

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

slide-110
SLIDE 110

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

Define:

slide-111
SLIDE 111

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

= 0

∀µ ∈ A(G, z)

Define:

slide-112
SLIDE 112

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ∈ A(G, z)

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

In fact we can show that for trees:

µ ∈ M(G)

F(µ) = max{0, I(µ, z)}

A(G, z) =

  • µ |

µ ∈ M(G), I(µ, z) ≤ 0

= 0

∀µ ∈ A(G, z)

Define:

slide-113
SLIDE 113

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

slide-114
SLIDE 114

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

slide-115
SLIDE 115

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

,

slide-116
SLIDE 116

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

, Dual:

max λ · µ s.t.

  • ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

  • ij λij(zi, zj) +

i λi(zi) = 1

slide-117
SLIDE 117

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is ,

I(µ, z)

Dual:

max λ · µ s.t.

  • ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

  • ij λij(zi, zj) +

i λi(zi) = 1

slide-118
SLIDE 118

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is From there it’s easy to conclude that ,

I(µ, z)

Dual:

max λ · µ s.t.

  • ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

  • ij λij(zi, zj) +

i λi(zi) = 1

slide-119
SLIDE 119

Proof - key ideas

F(µ) =

       min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0

∀x = z

We show that the value of the above is From there it’s easy to conclude that

F(µ) = max{0, I(µ, z)}

,

I(µ, z)

Dual:

max λ · µ s.t.

  • ij λij(xi, xj) +

i λi(xi) ≤ 0

∀x = z

  • ij λij(zi, zj) +

i λi(zi) = 1

slide-120
SLIDE 120

Proof - Max marginals

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

slide-121
SLIDE 121

Proof - Max marginals

Use max-marginals:

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

slide-122
SLIDE 122

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

slide-123
SLIDE 123

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

slide-124
SLIDE 124

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

Rewrite:

λ(x) =

  • i

(1 − di)¯ λ(xi) +

  • ij∈T

¯ λij(xi, xj)

slide-125
SLIDE 125

Proof - Max marginals

Use max-marginals:

¯ λ(xi) = max

ˆ x:ˆ xi=xi λ(x)

¯ λ(xi.xj) = max

ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)

¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi

Result follows after some algebra

max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1

λ(x) =

  • ij

λij(xi, xj) +

  • i

λi(xi)

Rewrite:

λ(x) =

  • i

(1 − di)¯ λ(xi) +

  • ij∈T

¯ λij(xi, xj)

slide-126
SLIDE 126

Tree Graph - Summary

x(1)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

slide-127
SLIDE 127

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

x(1)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

slide-128
SLIDE 128

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-129
SLIDE 129

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-130
SLIDE 130

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-131
SLIDE 131

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2) x(2)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-132
SLIDE 132

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2) x(2)

X

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-133
SLIDE 133

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot

be any assignment

x(1) x(2) x(2)

M(G, x(1)) =

  • µ |

µ ∈ M(G), I(µ, x(1)) ≤ 0

  • I(µ, x(1)) = 0
slide-134
SLIDE 134

Non tree graphs

Any graph can be converted into a junction tree We can apply our tree result there For a junction tree with cliques C and separators S, the inequality is:

  • S∈S

(1 − dS)µS(zS) +

  • C∈C

µC(zC) ≤ 0

Specifying the marginal polytope requires a number

  • f variables exponential in the tree width. Not

practical.

slide-135
SLIDE 135

Outline

LP formulation of the MAP problem LP for 2nd best

General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs

Experiments

slide-136
SLIDE 136

Non trees - Approximations

x(1)

True M(G, x(1))

slide-137
SLIDE 137

Non trees - Approximations

x(1)

True M(G, x(1))

slide-138
SLIDE 138

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

slide-139
SLIDE 139

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

slide-140
SLIDE 140

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

slide-141
SLIDE 141

Non trees - Approximations

x(1)

True M(G, x(1)) Outer bound on M(G)

slide-142
SLIDE 142

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

slide-143
SLIDE 143

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

slide-144
SLIDE 144

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

slide-145
SLIDE 145

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

IT (µ, z) ≤ 0

And the constraint:

slide-146
SLIDE 146

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

IT (µ, z) ≤ 0

And the constraint:

slide-147
SLIDE 147

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z IT (µ, z) ≤ 0

And the constraint:

slide-148
SLIDE 148

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z IT (µ, z) ≤ 0

And the constraint:

IT (µ, z) ≤ 0

slide-149
SLIDE 149

Spanning tree inequalities

Give a spanning subtree T of G define

IT (µ, z) =

  • i

(1 − di)µi(zi) +

  • ij∈T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z

Fractional vertex

IT (µ, z) ≤ 0

And the constraint:

IT (µ, z) ≤ 0

slide-150
SLIDE 150

Adding all spanning trees

slide-151
SLIDE 151

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

slide-152
SLIDE 152

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

slide-153
SLIDE 153

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality

slide-154
SLIDE 154

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality Solve LP

slide-155
SLIDE 155

Adding all spanning trees

Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:

Start with one inequality Solve LP If solution is fractional, find a violated tree inequality (if exists) and add it

slide-156
SLIDE 156

Cutting Plane Algorithm

slide-157
SLIDE 157

Cutting Plane Algorithm

z

slide-158
SLIDE 158

Cutting Plane Algorithm

z

T1

slide-159
SLIDE 159

Cutting Plane Algorithm

z

µ1

T1

slide-160
SLIDE 160

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1

slide-161
SLIDE 161

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

slide-162
SLIDE 162

Cutting Plane Algorithm

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

slide-163
SLIDE 163

Cutting Plane Algorithm

How do we find a violated tree inequality? Note: Even all spanning tree inequalities might not suffice

z

µ1 Is there a tree

inequality that violates?

µ1

T1 T2

slide-164
SLIDE 164

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

slide-165
SLIDE 165

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

slide-166
SLIDE 166

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

  • ij∈T
  • µij(zi, zj) − µi(zi) − µj(zj)
  • +
  • i

µi(zi)

slide-167
SLIDE 167

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

  • ij∈T
  • µij(zi, zj) − µi(zi) − µj(zj)
  • +
  • i

µi(zi)

slide-168
SLIDE 168

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

  • ij∈T
  • µij(zi, zj) − µi(zi) − µj(zj)
  • +
  • i

µi(zi)

wij

slide-169
SLIDE 169

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

  • ij∈T
  • µij(zi, zj) − µi(zi) − µj(zj)
  • +
  • i

µi(zi)

wij

Fixed

slide-170
SLIDE 170

Finding a violated spanning tree

For a given find If it’s positive, add the maximizing tree

µ max

T

IT (µ, z)

How can we maximize over all trees? Note that:

IT (µ, z) =

  • ij∈T
  • µij(zi, zj) − µi(zi) − µj(zj)
  • +
  • i

µi(zi)

Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)

wij

Fixed

slide-171
SLIDE 171

Experiments

Alternative algorithms for approximate 2nd best:

Using approximate marginals from max-product (BMMF;

Yanover and Weiss 04)

Lawler/Nillson (72,80) - Partition assignments : Maximize over each part approximately. Cost O(n)

Our algorithm: STRIPES

x = x(1)

x1 = x(1)

1

x2 = ∗ x3 = ∗ . . . xn = ∗ x1 = x(1)

1

x2 = x(1)

2

x3 = ∗ . . . xn = ∗ . . . . . . . . . . . . . . . x1 = x(1)

1

x2 = x(1)

2

x3 = x(3)

1

. . . xn = x(n)

1

slide-172
SLIDE 172

Attractive Grids

Ising models with ferromagnetic interaction The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope) Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added

0.5 1 50

Stripes Nillson BMMF

50

Stripes Nillson BMMF

Rank Run Time

slide-173
SLIDE 173

Protein Side Chain Prediction

Given protein’s 3D shape (backbone), choose most probable side chain configuration

xi xk xj xh G=(V,E)

Protein backbone Side-chains

(MRFs from Yanover, Meltzer, Weiss ‘06)

Can be cast as a MAP problem Important to obtain multiple possible solutions

p(x) ∝ e

P

ij∈E θij(xi,xj)

slide-174
SLIDE 174

Protein Side Chain Prediction

Stripes found the exact solutions for all problems studied In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08) 50

0.5 1

Stripes Nillson BMMF

50

Stripes Nillson BMMF

slide-175
SLIDE 175

Open Questions

slide-176
SLIDE 176

Open Questions

When are spanning trees enough?

slide-177
SLIDE 177

Open Questions

When are spanning trees enough? What is the polytope structure for k-best?

slide-178
SLIDE 178

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions

slide-179
SLIDE 179

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms

slide-180
SLIDE 180

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

slide-181
SLIDE 181

Open Questions

When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

slide-182
SLIDE 182

Summary

The 2nd best can be posed as a linear program For trees differs from 1st best by one constraint only For non-trees, approximation can be devised by adding inequalities for all spanning trees Empirically effective