SLIDE 1 Finding k-best MAP Solutions Using LP Relaxations
Amir Globerson School of Computer Science and Engineering The Hebrew University
Joint Work with: Menachem Fromer (Hebrew Univ.)
SLIDE 2 Prediction Problems
Consider the following problem:
Observe variables: Predict variables: xh
xv
SLIDE 3 Prediction Problems
Consider the following problem:
Observe variables: Predict variables:
Noisy Image Source Image Received bits Code word Symptoms Disease Sentence Derivation
Countless applications:
Images: Error correcting codes Medical diagnostics Text
Visible Hidden
xh xv
SLIDE 4
Statistical Models for Prediction
SLIDE 5
Statistical Models for Prediction
One approach:
SLIDE 6 Statistical Models for Prediction
One approach:
Assume (or learn) a model for p(xh, xv)
SLIDE 7 Statistical Models for Prediction
One approach:
Assume (or learn) a model for Predict the most likely hidden values
p(xh, xv) arg max
xh p(xh|xv)
SLIDE 8 Statistical Models for Prediction
One approach:
Assume (or learn) a model for Predict the most likely hidden values
p(xh, xv) arg max
xh p(xh|xv)
This conditional distribution often corresponds to a graphical model
SLIDE 9 Statistical Models for Prediction
One approach:
Assume (or learn) a model for Predict the most likely hidden values
p(xh, xv) arg max
xh p(xh|xv)
This conditional distribution often corresponds to a graphical model Need to know how to find an assignment with maximum probability
SLIDE 10 The MAP Problem
Given a graphical model over f(x) =
θij(xi, xj)
x1, . . . , xn
Find the most likely assignment:
xi xj θij(xi, xj)
p(x) = 1 Z ef(x) arg max
x
f(x)
SLIDE 11
MAP Approximations
x is discrete so generally NP hard
SLIDE 12
MAP Approximations
Many approximation approaches: x is discrete so generally NP hard
SLIDE 13 MAP Approximations
Many approximation approaches:
Greedy search
x is discrete so generally NP hard
SLIDE 14 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product)
x is discrete so generally NP hard
SLIDE 15 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
x is discrete so generally NP hard
SLIDE 16 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
x is discrete so generally NP hard
SLIDE 17 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
LP approaches x is discrete so generally NP hard
SLIDE 18 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
LP approaches
Provide optimality certificates
x is discrete so generally NP hard
SLIDE 19 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
LP approaches
Provide optimality certificates Optimal in some cases (e.g., submodular functions)
x is discrete so generally NP hard
SLIDE 20 MAP Approximations
Many approximation approaches:
Greedy search Loopy belief propagation (e.g., max product) Linear programming relaxations
LP approaches
Provide optimality certificates Optimal in some cases (e.g., submodular functions) Can be solved via message passing
x is discrete so generally NP hard
SLIDE 21
The k-best MAP Problem
SLIDE 22
The k-best MAP Problem
Find the k best assignments for f(x)
SLIDE 23
The k-best MAP Problem
Find the k best assignments for f(x) Denote these by x(1), . . . , x(k)
SLIDE 24
The k-best MAP Problem
Find the k best assignments for f(x) Denote these by Useful in:
x(1), . . . , x(k)
SLIDE 25 The k-best MAP Problem
Find the k best assignments for f(x) Denote these by Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)
x(1), . . . , x(k)
SLIDE 26 The k-best MAP Problem
Find the k best assignments for f(x) Denote these by Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods
x(1), . . . , x(k)
SLIDE 27 The k-best MAP Problem
Find the k best assignments for f(x) Denote these by Useful in:
Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design) As a first processing stage before applying more complex methods Supervised learning
x(1), . . . , x(k)
SLIDE 28 From 2 to k best
We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k) Focus on k=2 Our key question: what is the LP formulation
- f the problem, and its relaxations?
SLIDE 29 Outline
LP formulation of the MAP problem LP for 2nd best
General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs
Experiments
SLIDE 30
MAP and LP
SLIDE 31 MAP and LP
MAP: max
x
f(x)
SLIDE 32 MAP and LP
MAP: MAP as LP: max
x
f(x)
SLIDE 33 MAP and LP
MAP: MAP as LP: max
x
f(x) max
µ∈S µ · θ
SLIDE 34 MAP and LP
MAP: MAP as LP: S max
x
f(x) max
µ∈S µ · θ
SLIDE 35 MAP and LP
MAP: MAP as LP: S Hard max
x
f(x) max
µ∈S µ · θ
SLIDE 36 MAP and LP
MAP: MAP as LP: S Hard Approximate MAP via LP max
x
f(x) max
µ∈S µ · θ
SLIDE 37 MAP and LP
MAP: MAP as LP: S Hard Approximate MAP via LP max
x
f(x) max
µ∈S µ · θ
SLIDE 38 MAP and LP
MAP: MAP as LP: S Hard Approximate MAP via LP max
x
f(x)
Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov
max
µ∈S µ · θ
SLIDE 39
LP Formulation of MAP
SLIDE 40 LP Formulation of MAP
x∗ = arg max
x
θij(xi, xj)
SLIDE 41 LP Formulation of MAP
max
q(x)
q(x)
θij(xi, xj)
=
x∗ = arg max
x
θij(xi, xj)
SLIDE 42 LP Formulation of MAP
max
q(x)
q(x)
θij(xi, xj)
=
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
SLIDE 43 LP Formulation of MAP
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
SLIDE 44 LP Formulation of MAP
Objective depends only on pairwise marginals
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
SLIDE 45 LP Formulation of MAP
Objective depends only on pairwise marginals But only those that correspond to some distribution
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
q(x)
SLIDE 46 LP Formulation of MAP
Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
q(x)
SLIDE 47 LP Formulation of MAP
Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
q(x)
max
x
θij(xi, xj) = max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
SLIDE 48 LP Formulation of MAP
Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
q(x)
max
x
θij(xi, xj) = max
µ∈M(G)
µij(xi, xj)θij(xi, xj)=
max
µ∈M(G) µ · θ
SLIDE 49 LP Formulation of MAP
Objective depends only on pairwise marginals But only those that correspond to some distribution This set is called the Marginal polytope ( Wainwright & Jordan)
max
q(x)
q(x)
θij(xi, xj) max
q(x)
qij(xi, xj)θij(xi, xj)
= =
1
q∗(x)
x
x∗
x∗ = arg max
x
θij(xi, xj)
q(x)
max
x
θij(xi, xj) = max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)
= max
µ∈M(G) µ · θ
SLIDE 50 The Marginal Polytope
Marginal Polytope
M(G)
max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
SLIDE 51 The Marginal Polytope
Marginal Polytope
M(G)
µ
max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
SLIDE 52 The Marginal Polytope
Marginal Polytope
M(G)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj)
max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
SLIDE 53 The Marginal Polytope
Marginal Polytope
M(G)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj)
max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
Difficult set to characterize. Easy to outer bound
SLIDE 54 The Marginal Polytope
Marginal Polytope
M(G)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj)
max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
Difficult set to characterize. Easy to outer bound The vertices have integral values and correspond to assignments on x
SLIDE 55 Relaxing the MAP LP
max
x
θij(xi, xj) = max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
M(G)
SLIDE 56 Relaxing the MAP LP
max
x
θij(xi, xj) = max
µ∈M(G)
µij(xi, xj)θij(xi, xj)
Exact but Hard!
M(G)
SLIDE 57 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
S
M(G)
SLIDE 58 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
If optimum is an integral vertex, MAP is solved
S
M(G)
SLIDE 59 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency
S
M(G)
SLIDE 60 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency
j i k
µij(xi, xj) =
µjk(xj, xk)
S
M(G)
SLIDE 61 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency
j i k
µij(xi, xj) =
µjk(xj, xk)
Exact for trees S
M(G)
SLIDE 62 Relaxing the MAP LP
max
x
θij(xi, xj) ≤ max
µ∈S
µij(xi, xj)θij(xi, xj)
If optimum is an integral vertex, MAP is solved Possible outer bound: Pairwise consistency
j i k
µij(xi, xj) =
µjk(xj, xk)
Efficient message passing schemes for solving the resulting (dual) LP
Exact for trees S
M(G)
SLIDE 63 Outline
LP formulation of the MAP problem LP for 2nd best
General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs
Experiments
SLIDE 64
The 2nd best problem and LP
MAP 2nd best
SLIDE 65 The 2nd best problem and LP
max
x
f(x)
MAP 2nd best
SLIDE 66 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
MAP 2nd best
SLIDE 67 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
max
µ∈M(G) µ · θ
MAP 2nd best
SLIDE 68 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
max
µ∈M(G) µ · θ
max
µ∈M(G,x(1)) µ · θ
x(1)
MAP 2nd best
SLIDE 69 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
max
µ∈M(G) µ · θ
max
µ∈M(G,x(1)) µ · θ
x(1)
MAP 2nd best
Approximations:
SLIDE 70 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
max
µ∈M(G) µ · θ
max
µ∈M(G,x(1)) µ · θ
x(1)
MAP 2nd best
Approximations:
SLIDE 71 The 2nd best problem and LP
max
x=x(1) f(x)
max
x
f(x)
max
µ∈M(G) µ · θ
max
µ∈M(G,x(1)) µ · θ
x(1)
MAP 2nd best
Approximations:
SLIDE 72
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
SLIDE 73
A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
M(G, z)
SLIDE 74 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
M(G, z)
SLIDE 75 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj)
M(G, z)
SLIDE 76 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj) and:
M(G, z)
SLIDE 77 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj) and: p(z) = 0
M(G, z)
SLIDE 78 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj) and: p(z) = 0
M(G, z)
SLIDE 79 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj) and: p(z) = 0
M(G) M(G, z)
SLIDE 80 A new marginal polytope
Given an assignment z, define the Assignment Excluding Marginal Polytope: M(G, z)
µ
There exists a p(x) s.t.
p(xi, xj) = µij(xi, xj) and: p(z) = 0
z M(G) M(G, z)
SLIDE 81 LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
max
x=x(1) f(x; θ) =
max
µ∈M(G,x(1)) µ · θ
x(1)
SLIDE 82 LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
max
x=x(1) f(x; θ) =
max
µ∈M(G,x(1)) µ · θ
Is there a simple characterization of ?
M(G, x(1))
x(1)
SLIDE 83 LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
max
x=x(1) f(x; θ) =
max
µ∈M(G,x(1)) µ · θ
Is there a simple characterization of ?
M(G, x(1))
Is it plus one inequality?
M(G)
x(1)
SLIDE 84 LP for the 2nd best problem
The 2nd best problem corresponds to the following LP:
max
x=x(1) f(x; θ) =
max
µ∈M(G,x(1)) µ · θ
Is there a simple characterization of ?
M(G, x(1))
Is it plus one inequality? If so, what inequality?
M(G)
x(1)
SLIDE 85 Outline
LP formulation of the MAP problem LP for 2nd best
General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs
Experiments
SLIDE 86
Adding inequalities to
z z
M(G)
SLIDE 87
Adding inequalities to
Any valid inequality must separate from the other vertices
z z
M(G)
SLIDE 88 Adding inequalities to
Any valid inequality must separate from the other vertices How about: (Santos 91)
µi(zi) ≤ n − 1
z z
M(G)
SLIDE 89 Adding inequalities to
Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for
µi(zi) ≤ n − 1
z z
n − 1
M(G)
SLIDE 90 Adding inequalities to
Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for
But: Results in fractional vertices, even for trees
µi(zi) ≤ n − 1
z z
n − 1
M(G)
SLIDE 91 Adding inequalities to
Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for
But: Results in fractional vertices, even for trees
µi(zi) ≤ n − 1
z z
n − 1
M(G)
SLIDE 92 Adding inequalities to
Any valid inequality must separate from the other vertices How about: (Santos 91) RHS is n for z and or less for
But: Results in fractional vertices, even for trees Only an outer bound on
µi(zi) ≤ n − 1
z z
n − 1
M(G)
M(G, z)
SLIDE 93
The tree case
SLIDE 94
The tree case
Focus on the case where G is a tree
SLIDE 95 The tree case
Focus on the case where G is a tree is given by pairwise consistency
M(G)
SLIDE 96 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
SLIDE 97 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
H(µ) =
(1 − di)Hi(Xi) +
H(Xi, Xj)
Bethe:
SLIDE 98 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
SLIDE 99 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
SLIDE 100 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
z
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
M(G)
SLIDE 101 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
z
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
M(G)
I(µ, z) ≤ 0
SLIDE 102 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
z
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
M(G)
I(µ, z) ≤ 0
SLIDE 103 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
z
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
I(µ, z) ≤ 0
M(G, z)
SLIDE 104 The tree case
Focus on the case where G is a tree is given by pairwise consistency Define:
z
I(µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
M(G)
Theorem:
M(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
I(µ, z) ≤ 0
M(G, z)
Proof...
SLIDE 105 Proof
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define:
SLIDE 106 Proof
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define:
A(G, z) = M(G, z)
Want to show:
SLIDE 107 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define:
SLIDE 108 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define: Can construct p(x)
SLIDE 109 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define:
SLIDE 110 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
Define:
SLIDE 111 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
= 0
∀µ ∈ A(G, z)
Define:
SLIDE 112 Proof
Want to show that if there exists a p(x) that has these marginals and p(z)=0.
µ ∈ A(G, z)
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
In fact we can show that for trees:
µ ∈ M(G)
F(µ) = max{0, I(µ, z)}
A(G, z) =
µ ∈ M(G), I(µ, z) ≤ 0
= 0
∀µ ∈ A(G, z)
Define:
SLIDE 113 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
SLIDE 114 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
SLIDE 115 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
,
SLIDE 116 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
, Dual:
max λ · µ s.t.
i λi(xi) ≤ 0
∀x = z
i λi(zi) = 1
SLIDE 117 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
We show that the value of the above is ,
I(µ, z)
Dual:
max λ · µ s.t.
i λi(xi) ≤ 0
∀x = z
i λi(zi) = 1
SLIDE 118 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
We show that the value of the above is From there it’s easy to conclude that ,
I(µ, z)
Dual:
max λ · µ s.t.
i λi(xi) ≤ 0
∀x = z
i λi(zi) = 1
SLIDE 119 Proof - key ideas
F(µ) =
min p(z) s.t. pij(xi, xj) = µij(xi, xj) pi(xi) = µi(xi) p(x) ≥ 0
∀x = z
We show that the value of the above is From there it’s easy to conclude that
F(µ) = max{0, I(µ, z)}
,
I(µ, z)
Dual:
max λ · µ s.t.
i λi(xi) ≤ 0
∀x = z
i λi(zi) = 1
SLIDE 120 Proof - Max marginals
max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
SLIDE 121 Proof - Max marginals
Use max-marginals:
max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
SLIDE 122 Proof - Max marginals
Use max-marginals:
¯ λ(xi) = max
ˆ x:ˆ xi=xi λ(x)
¯ λ(xi.xj) = max
ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)
max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
SLIDE 123 Proof - Max marginals
Use max-marginals:
¯ λ(xi) = max
ˆ x:ˆ xi=xi λ(x)
¯ λ(xi.xj) = max
ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)
¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
SLIDE 124 Proof - Max marginals
Use max-marginals:
¯ λ(xi) = max
ˆ x:ˆ xi=xi λ(x)
¯ λ(xi.xj) = max
ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)
¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
Rewrite:
λ(x) =
(1 − di)¯ λ(xi) +
¯ λij(xi, xj)
SLIDE 125 Proof - Max marginals
Use max-marginals:
¯ λ(xi) = max
ˆ x:ˆ xi=xi λ(x)
¯ λ(xi.xj) = max
ˆ x:ˆ xi=xi,ˆ xj=xj λ(x)
¯ λ(zi) = 1 ¯ λ(xi) ≤ xi = zi
Result follows after some algebra
max λ · µ s.t. λ(x) ≤ 0 ∀x = z λ(z) = 1
λ(x) =
λij(xi, xj) +
λi(xi)
Rewrite:
λ(x) =
(1 − di)¯ λ(xi) +
¯ λij(xi, xj)
SLIDE 126 Tree Graph - Summary
x(1)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 127 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint
x(1)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 128 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 129 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1) x(2)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 130 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1) x(2) x(2)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 131 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1) x(2) x(2) x(2)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 132 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1) x(2) x(2) x(2)
X
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 133 Tree Graph - Summary
The LP for 2nd best differs from the marginal polytope by one linear inequality constraint The 2nd best satisfies so it cannot
be any assignment
x(1) x(2) x(2)
M(G, x(1)) =
µ ∈ M(G), I(µ, x(1)) ≤ 0
SLIDE 134 Non tree graphs
Any graph can be converted into a junction tree We can apply our tree result there For a junction tree with cliques C and separators S, the inequality is:
(1 − dS)µS(zS) +
µC(zC) ≤ 0
Specifying the marginal polytope requires a number
- f variables exponential in the tree width. Not
practical.
SLIDE 135 Outline
LP formulation of the MAP problem LP for 2nd best
General (intractable) exact formulation Tractable formulation for tree graphs Approximations for non-tree graphs
Experiments
SLIDE 136 Non trees - Approximations
x(1)
True M(G, x(1))
SLIDE 137 Non trees - Approximations
x(1)
True M(G, x(1))
SLIDE 138 Non trees - Approximations
x(1)
True M(G, x(1)) Outer bound on M(G)
SLIDE 139 Non trees - Approximations
x(1)
True M(G, x(1)) Outer bound on M(G)
SLIDE 140 Non trees - Approximations
x(1)
True M(G, x(1)) Outer bound on M(G)
SLIDE 141 Non trees - Approximations
x(1)
True M(G, x(1)) Outer bound on M(G)
SLIDE 142 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
IT (µ, z) ≤ 0
And the constraint:
SLIDE 143 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
IT (µ, z) ≤ 0
And the constraint:
SLIDE 144 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
IT (µ, z) ≤ 0
And the constraint:
SLIDE 145 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
IT (µ, z) ≤ 0
And the constraint:
SLIDE 146 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
IT (µ, z) ≤ 0
And the constraint:
SLIDE 147 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
z IT (µ, z) ≤ 0
And the constraint:
SLIDE 148 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
z IT (µ, z) ≤ 0
And the constraint:
IT (µ, z) ≤ 0
SLIDE 149 Spanning tree inequalities
Give a spanning subtree T of G define
IT (µ, z) =
(1 − di)µi(zi) +
µij(zi, zj)
Separates z from the other vertices but might result in fractional vertices
z
Fractional vertex
IT (µ, z) ≤ 0
And the constraint:
IT (µ, z) ≤ 0
SLIDE 150
Adding all spanning trees
SLIDE 151
Adding all spanning trees
Can we add all spanning tree inequalities efficiently?
SLIDE 152
Adding all spanning trees
Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:
SLIDE 153 Adding all spanning trees
Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:
Start with one inequality
SLIDE 154 Adding all spanning trees
Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:
Start with one inequality Solve LP
SLIDE 155 Adding all spanning trees
Can we add all spanning tree inequalities efficiently? Yes, via a cutting plane approach:
Start with one inequality Solve LP If solution is fractional, find a violated tree inequality (if exists) and add it
SLIDE 156
Cutting Plane Algorithm
SLIDE 157
Cutting Plane Algorithm
z
SLIDE 158
Cutting Plane Algorithm
z
T1
SLIDE 159 Cutting Plane Algorithm
z
µ1
T1
SLIDE 160 Cutting Plane Algorithm
z
µ1 Is there a tree
inequality that violates?
µ1
T1
SLIDE 161 Cutting Plane Algorithm
z
µ1 Is there a tree
inequality that violates?
µ1
T1 T2
SLIDE 162 Cutting Plane Algorithm
z
µ1 Is there a tree
inequality that violates?
µ1
T1 T2
SLIDE 163 Cutting Plane Algorithm
How do we find a violated tree inequality? Note: Even all spanning tree inequalities might not suffice
z
µ1 Is there a tree
inequality that violates?
µ1
T1 T2
SLIDE 164 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
SLIDE 165 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
SLIDE 166 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
IT (µ, z) =
- ij∈T
- µij(zi, zj) − µi(zi) − µj(zj)
- +
- i
µi(zi)
SLIDE 167 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
IT (µ, z) =
- ij∈T
- µij(zi, zj) − µi(zi) − µj(zj)
- +
- i
µi(zi)
SLIDE 168 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
IT (µ, z) =
- ij∈T
- µij(zi, zj) − µi(zi) − µj(zj)
- +
- i
µi(zi)
wij
SLIDE 169 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
IT (µ, z) =
- ij∈T
- µij(zi, zj) − µi(zi) − µj(zj)
- +
- i
µi(zi)
wij
Fixed
SLIDE 170 Finding a violated spanning tree
For a given find If it’s positive, add the maximizing tree
µ max
T
IT (µ, z)
How can we maximize over all trees? Note that:
IT (µ, z) =
- ij∈T
- µij(zi, zj) − µi(zi) − µj(zj)
- +
- i
µi(zi)
Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)
wij
Fixed
SLIDE 171 Experiments
Alternative algorithms for approximate 2nd best:
Using approximate marginals from max-product (BMMF;
Yanover and Weiss 04)
Lawler/Nillson (72,80) - Partition assignments : Maximize over each part approximately. Cost O(n)
Our algorithm: STRIPES
x = x(1)
x1 = x(1)
1
x2 = ∗ x3 = ∗ . . . xn = ∗ x1 = x(1)
1
x2 = x(1)
2
x3 = ∗ . . . xn = ∗ . . . . . . . . . . . . . . . x1 = x(1)
1
x2 = x(1)
2
x3 = x(3)
1
. . . xn = x(n)
1
SLIDE 172 Attractive Grids
Ising models with ferromagnetic interaction The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope) Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added
0.5 1 50
Stripes Nillson BMMF
50
Stripes Nillson BMMF
Rank Run Time
SLIDE 173 Protein Side Chain Prediction
Given protein’s 3D shape (backbone), choose most probable side chain configuration
xi xk xj xh G=(V,E)
Protein backbone Side-chains
(MRFs from Yanover, Meltzer, Weiss ‘06)
Can be cast as a MAP problem Important to obtain multiple possible solutions
p(x) ∝ e
P
ij∈E θij(xi,xj)
SLIDE 174 Protein Side Chain Prediction
Stripes found the exact solutions for all problems studied In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08) 50
0.5 1
Stripes Nillson BMMF
50
Stripes Nillson BMMF
SLIDE 175
Open Questions
SLIDE 176
Open Questions
When are spanning trees enough?
SLIDE 177
Open Questions
When are spanning trees enough? What is the polytope structure for k-best?
SLIDE 178
Open Questions
When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions
SLIDE 179
Open Questions
When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms
SLIDE 180
Open Questions
When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?
SLIDE 181
Open Questions
When are spanning trees enough? What is the polytope structure for k-best? Finding k-best “different” solutions Scalable algorithms If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?
SLIDE 182
Summary
The 2nd best can be posed as a linear program For trees differs from 1st best by one constraint only For non-trees, approximation can be devised by adding inequalities for all spanning trees Empirically effective