High-arity Interactions, Polyhedral Relaxations, and Cutting Plane - - PowerPoint PPT Presentation

high arity interactions polyhedral relaxations and
SMART_READER_LITE
LIVE PREVIEW

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane - - PowerPoint PPT Presentation

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF) Tom a s Werner Center for Machine Perception Czech Technical University Prague, Czech Republic 1 / 18 Abstract


slide-1
SLIDE 1

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF)

Tom´ aˇ s Werner

Center for Machine Perception Czech Technical University Prague, Czech Republic

1 / 18

slide-2
SLIDE 2

Abstract

The LP relaxation approach to finding the most probable configuration of MRF has been mostly considered only for binary (= pairwise) interactions [e.g. Schlesinger-76,

Wainwright-05,Kolmogorov-06]. Based on [Schlesinger-76,Kovalevsky-75,Werner-07], we

generalise the approach to n-ary interactions, including the following contributions: ◮ Formulation of LP relaxation and its dual for n-ary problems. ◮ A simple algorithm to optimise the LP bound, the n-ary max-sum diffusion. ◮ A hierarchy of gradually tighter polyhedral relaxations of MAP-MRF, obtained by adding zero interactions. ◮ A cutting plane algorithm, where the cuts correspond to adding zero interactions and the separation problem to finding an unsatisfiable constraint satisfaction subproblem. ◮ We show that a class of high-arity interactions (e.g. of global interactions) can be included into the framework in a principled way. ◮ A simple proof that n-ary max-sum diffusion finds global optimum for n-ary supermodular problems. The result is a principled framework to deal with n-ary problems and designing their tighter relaxations.

2 / 18

slide-3
SLIDE 3

Problem formulation

V (finite) set of variables v ∈ V a single variable Xv (finite) domain of variable v ∈ V xv ∈ Xv state of variable v ∈ V A ⊆ V a subset of variables XA = ×

v∈A Xv

joint domain of variables A ⊆ V xA ∈ XA joint state of variables A ⊆ V Problem: Finding the most probable configuration of MRF Instance: ◮ variables V and their domains { Xv | v ∈ V } ◮ hypergraph E ⊆ 2V ◮ interaction θA: XA → R for each A ∈ E Task: Compute max

xV

  • A∈E

θA(xA) .

3 / 18

slide-4
SLIDE 4

Examples

◮ V = {1, 2, 3, 4} and E = {{2, 3, 4}, {1, 2}, {3, 4}, {3}}: max

x1,x2,x3,x4[θ234(x2, x3, x4) + θ12(x1, x2) + θ34(x3, x4) + θ3(x3)]

◮ E = V

1

  • ∪ E ′ where E ′ ⊆

V

2

  • : binary problem

max

xV v∈V

θv(xv) +

  • vv ′∈E ′

θvv ′(xv, xv ′)

  • ◮ E =

V

1

  • ∪ E ′ ∪ {V } where E ′ ⊆

V

2

  • (binary problem with a global constraint):

max

xV v∈V

θv(xv) +

  • vv ′∈E ′

θvv ′(xv, xv ′) + θV (xV )

  • 4 / 18
slide-5
SLIDE 5

Linear programming relaxation

primal program dual program

θ⊤µ → max

µ

ψ⊤1 → min

ϕ,ψ

Mµ = 0 ϕ ≶ 0 Nµ = 1 ψ ≶ 0 µ ≥ 0 ϕ⊤M +ψ⊤N ≥ θ⊤ X

A∈E

X

xA

θA(xA)µA(xA) → max X

A∈E

ψA → min X

xA\B

µA(xA) = µB(xB) ϕA,B(xB) ≶ 0 (A,B) ∈ J, xB ∈ XB X

xA

µA(xA) = 1 ψA ≶ 0 A ∈ E µA(xA) ≥ 0 X

B|(B,A)∈J

ϕB,A(xA)− X

B|(A,B)∈J

ϕA,B(xB)+ψA ≥ θA(xA) A ∈ E, xA ∈ XA where J ⊆ I(E) = { (A, B) | A ∈ E, B ∈ E, B ⊂ A }

5 / 18

slide-6
SLIDE 6

Meaning of primal LP: Consistency of distributions on joint states

◮ Each A ∈ E is assigned a probability distribution µA: XA → R on its joint states. ◮ For each (A, B) ∈ J, distribution µA marginalises onto µB, i.e., µB(xB) =

  • xA\B

µA(xA) Example Let A = {1, 2, 3, 4} and B = {1, 3} ⊂ A. Then the equation µB(xB) =

  • xA\B

µA(xA) reads µ13(x1, x3) =

  • x2,x4

µ1234(x1, x2, x3, x4). What happens if the distributions are crisp (i.e., they can attain only 0 or 1)? ◮ Then µA represents a single joint state. ◮ The marginalisation constraints µB(xB) =

  • xA\B

µA(xA) represents the fact that joint state µB is the restriction of joint state µA onto variables B ⊂ A.

6 / 18

slide-7
SLIDE 7

Reparameterisations

Definition A reparameterisation (equivalent transformation) is a change of weight vector θ that preserves the objective function

  • A∈E

θA(xA). ◮ Elementary reparameterisation on triplet (A, B, xB) with B ⊆ A: add ϕA,B(xB) to weights { θA(xA) | xA\B ∈ XA\B } and subtract it from θB(xB). ◮ Doing this for all triplets (A, B, xB) such that (A, B) ∈ J yields θϕ

A(xA) = θA(xA) +

  • B|(A,B)∈J

ϕA,B(xB) −

  • B|(B,A)∈J

ϕB,A(xA) Example For a binary problem, i.e. E = V

1

  • ∪ E ′ with E ′ ⊆

V

2

  • , and J = I(E), we have

θϕ

v (xv)

= θv(xv) −

  • v ′∈Nv

ϕvv ′,v(xv) θϕ

vv ′(xv, xv ′) = θvv ′(xv, xv ′) + ϕvv ′,v(xv) + ϕvv ′,v ′(xv ′)

7 / 18

slide-8
SLIDE 8

Meaning of dual LP: Minimising upper bound by reparameterisations

◮ Upper bound on the true optimum: max

xV

  • A∈E

θA(xA) ≤

  • A∈E

max

xA θA(xA)

◮ The dual LP can be written as min

ϕ

  • A∈E

max

xA θϕ A(xA)

When is the upper bound exact? ◮ Joint state xA of variables A ∈ E is called active if θA(xA) = max

xA θA(xA).

◮ The upper bound is exact iff the constraint satisfaction problem (CSP) formed by the active joint states is satisfiable.

Is this CSP satisfiable? Yes!

8 / 18

slide-9
SLIDE 9

N-ary max-sum diffusion

Algorithm (n-ary max-sum diffusion)

1: loop 2:

for (A, B) ∈ J and xB ∈ XB do

3:

ϕA,B(xB) += [θϕ

B(xB) − max xA\B θϕ A(xA)]/2

(I.e., do reparameterisation on (A, B, xB) that makes θϕ

B (xB) = max xA\B θϕ A (xA).)

4:

end for

5: end loop

◮ Monotonically decreases the upper bound by reparameterisations. ◮ Converges to a state when θϕ

B(xB) = max xA\B θϕ A(xA) for all (A, B) ∈ J and xB.

◮ For binary problems, equivalent to TRW-S [Kolmogorov-06] with edge updates. ◮ May end up in a local minimum (because minimisation by coordinates is applied to a nonsmooth convex function) but it is not a big drawback. Evaluating max

xA\B θϕ A(xA) means solving an auxiliary problem, the structure of which

is hypergraph E ∩ 2A rather than E.

9 / 18

slide-10
SLIDE 10

Adding a zero interaction may tighten the relaxation

Idea Adding a hyperedge A / ∈ E to E while setting θA ≡ 0 does not change the objective but may improve the relaxation. In fact, we can virtually add all possible zero interactions: then E = 2V but only a few θA are non-zero. Now, the relaxation is fully determined by J.

Example for V = {1, 2, 3, 4}: – the lattice I(2V ) of subsets of V – original E depicted by red nodes – J ⊆ I(2V ) depicted by red edges

123 124 134 234 13 1234 1 2 3 4 12 14 23 24 34

10 / 18

slide-11
SLIDE 11

Hierarchy of polyhedral relaxations

J1 ⊆ J2 implies that relaxation J1 is not tighter than J2. Therefore: Result All possible sets J ⊆ I(2V ) form a hierarchy of relaxations, partially ordered by the inclusion relation on I(2V ). In particular: ◮ J = ∅: the weakest relaxation (the sum of independent maxima for each hyperedge A ∈ E). ◮ J = I(E): the well-known ‘tree’ relaxation for binary problems by [Schlesinger-76,Koster-98,Wainwright-03] ◮ J = I(2V ): the exact solution. Note: Even if J1 ⊂ J2, relaxations J1 and J2 may be the same. In particular: J = I(2V ) and J = { (V , A) | A ∈ E } both yield the same relaxation. Interpretation as lift+constrain+project Tightening the relaxation can be seen as lifting the original LP polytope, imposing a marginalisation constraint in this lifted space, and projecting back.

11 / 18

slide-12
SLIDE 12

Example: Adding zero 4-cycle interactions to binary problems

◮ On a number of instances of a binary problem, we computed how many instances were solved by the n-ary max-sum diffusion to optimality. ◮ Two relaxations tested:

◮ Jtree: ‘traditional’ LP relaxation [Schlesinger-76,Kolmogorov-06,...] ◮ J4cycle: Jtree augmented with zero interactions on 4-tuples of variables (thus inducing 4-cycle subproblems). type image side |Xv| rtree r4cycle random 15 5 0.01 1.00 random 25 3 0.00 0.98 random 100 3 0.00 0.72 Potts 15 5 0.79 0.99 Potts 25 5 0.48 0.98 Potts 100 5 0.00 0.81 lines 10 4 0.72 0.88 lines 25 4 0.00 0.00 curve 10 9 0.17 0.65 curve 15 9 0.00 0.24 curve 25 9 0.00 0.00 Pi 15 5 0.00 0.82

12 / 18

slide-13
SLIDE 13

Cutting plane algorithm

Let max

  • θ⊤µ
  • µ ∈ P
  • be LP relaxation of ILP max
  • θ⊤µ
  • µ ∈ P ∩ Zn

. Cutting plane algorithm for general ILP in primal space

1: P′ ← P 2: loop 3:

Find a maximiser µ∗ of max

  • θ⊤µ
  • µ ∈ P′

.

4:

Find a half-space H such that P ∩ Zn ⊆ H and µ∗ / ∈ H. If none exists, halt. (separation problem)

5:

P′ ← P′ ∩ H

6: end loop

Cutting plane algorithm for MAP-MRF in dual space

1: J ← I(E) 2: loop 3:

Minimise upper bound of relaxation J by max-sum diffusion.

4:

Find A / ∈ E such that the CSP formed by the active joint states restricted on variables A is unsatisfiable. If none exists, halt.

5:

θA ← 0; E ← E ∪ {A}; J ← I(E)

6: end loop

13 / 18

slide-14
SLIDE 14

Example

Instead of adding all 4-cycles initially, we add only some of them one by one. ◮ Advantage: Much less memory needed for dual variables (‘messages’) ◮ Drawback: Not very practical (slow) in this simple form

14 / 18

slide-15
SLIDE 15

Global interactions

A function θA can be represented ◮ In extension: by explicitly storing the numbers { θA(xA) | xA ∈ XA }. (This is possible only for small |A|.) ◮ In intension: by an algorithm. Any interaction θA (possibly of high arity and represented in intension) can be used in the n-ary max-sum diffusion if max

xA\B θϕ A(xA) can be computed efficiently.

There may be a lot of useful high-arity interactions with this property!

15 / 18

slide-16
SLIDE 16

Example: Minimum graph cut with prescribed partition size

Let E = V

1

  • ∪ E ′ ∪ {V } where E ′ ⊆

V

2

  • , i.e.:
  • A∈E

θA(xA) =

  • v∈V

θv(xv) +

  • vv ′∈E ′

θvv ′(xv, xv ′) + θV (xV ) θvv ′(xv, xv ′) =

  • if xv = xv ′

−1 if xv = xv ′ θV (xV ) =

  • if

v∈V [

[xv = ¯ x] ] = n −∞

  • therwise

Interpretation: Minimum st-cut in a graph such that the number of pixels in the first partition equals n (NP-hard).

n required: 2000 3000 4000 5000 5368 6000 7000 8000 9000 n achieved: 2008 3004 4011 5006 5368 6004 7024 7982 9032

16 / 18

slide-17
SLIDE 17

Is computation of max

xA\B θϕ A(xA) tractable?

◮ Not for J = I(E)

◮ 3 types of updates: (vv ′, v), (V , v), (V , vv ′)

◮ Yes for J = { ({v, v ′}, {v}) | {v, v ′} ∈ E ′ } ∪ { (V , {v}) | v ∈ V } ⊂ I(E).

◮ Only two types of updates: (vv ′, v), (V , v) ◮ For each u ∈ V , compute max

xV \{u} θϕ V (xV ) = max xV \{u}

h θV (xV ) + X

v∈V

ϕV ,v(xv) i

123 124 134 234 13 1 2 3 4 12 14 23 24 34 1234

17 / 18

slide-18
SLIDE 18

Supermodular problems

Let each domain Xv be totally ordered. A function θA is supermodular if θA(xA ∧ yA) + θA(xA ∨ yA) ≥ θA(xA) + θA(yA) for any xA, yA ∈ XA, where ∧ (∨) denotes the elementwise minimum (maximum). Theorem Let θA be supermodular for each A ∈ E. Let J ⊇ { (A, {v}) | A ∈ E, v ∈ A }. Then any fixed point of the n-ary max-sum diffusion is the global optimum of the MAP-MRF problem. Proof (generalisation of the proof [Schlesinger-Flach-00] done for binary problems): ◮ The maximisers of a supermodular function on a distributive lattice form a sublattice of this lattice [Topkis78]. Hence, the active joint states of each A ∈ E form a lattice. It follows that the CSP formed by the active joint states is tractable

[Jeavons-Cooper-95] (a lattice CSP).

◮ In any fixed point of the diffusion, the CSP formed by the active joint states is arc consistent. This suffices for a lattice CSP to be satisfiable.

18 / 18