high arity interactions polyhedral relaxations and
play

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane - PowerPoint PPT Presentation

High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF) Tom a s Werner Center for Machine Perception Czech Technical University Prague, Czech Republic 1 / 18 Abstract


  1. High-arity Interactions, Polyhedral Relaxations, and Cutting Plane Algorithm for Soft Constraint Optimisation (MAP-MRF) Tom´ aˇ s Werner Center for Machine Perception Czech Technical University Prague, Czech Republic 1 / 18

  2. Abstract The LP relaxation approach to finding the most probable configuration of MRF has been mostly considered only for binary (= pairwise) interactions [e.g. Schlesinger-76, Wainwright-05,Kolmogorov-06] . Based on [Schlesinger-76,Kovalevsky-75,Werner-07] , we generalise the approach to n-ary interactions, including the following contributions: ◮ Formulation of LP relaxation and its dual for n-ary problems. ◮ A simple algorithm to optimise the LP bound, the n-ary max-sum diffusion. ◮ A hierarchy of gradually tighter polyhedral relaxations of MAP-MRF, obtained by adding zero interactions. ◮ A cutting plane algorithm, where the cuts correspond to adding zero interactions and the separation problem to finding an unsatisfiable constraint satisfaction subproblem. ◮ We show that a class of high-arity interactions (e.g. of global interactions) can be included into the framework in a principled way. ◮ A simple proof that n-ary max-sum diffusion finds global optimum for n-ary supermodular problems. The result is a principled framework to deal with n-ary problems and designing their tighter relaxations. 2 / 18

  3. Problem formulation V (finite) set of variables v ∈ V a single variable (finite) domain of variable v ∈ V X v x v ∈ X v state of variable v ∈ V A ⊆ V a subset of variables X A = × v ∈ A X v joint domain of variables A ⊆ V x A ∈ X A joint state of variables A ⊆ V Problem: Finding the most probable configuration of MRF Instance: ◮ variables V and their domains { X v | v ∈ V } ◮ hypergraph E ⊆ 2 V ◮ interaction θ A : X A → R for each A ∈ E � Task: Compute max θ A ( x A ) . x V A ∈ E 3 / 18

  4. Examples ◮ V = { 1 , 2 , 3 , 4 } and E = {{ 2 , 3 , 4 } , { 1 , 2 } , { 3 , 4 } , { 3 }} : x 1 , x 2 , x 3 , x 4 [ θ 234 ( x 2 , x 3 , x 4 ) + θ 12 ( x 1 , x 2 ) + θ 34 ( x 3 , x 4 ) + θ 3 ( x 3 )] max ∪ E ′ where E ′ ⊆ � V � V � � ◮ E = : binary problem 1 2 � � � � max θ v ( x v ) + θ vv ′ ( x v , x v ′ ) x V v ∈ V vv ′ ∈ E ′ ∪ E ′ ∪ { V } where E ′ ⊆ � V � V � � ◮ E = (binary problem with a global constraint): 1 2 � � � � max θ v ( x v ) + θ vv ′ ( x v , x v ′ ) + θ V ( x V ) x V vv ′ ∈ E ′ v ∈ V 4 / 18

  5. Linear programming relaxation primal program dual program θ ⊤ µ → max ψ ⊤ 1 → min µ ϕ,ψ M µ = 0 ϕ ≶ 0 N µ = 1 ψ ≶ 0 ϕ ⊤ M + ψ ⊤ N ≥ θ ⊤ µ ≥ 0 X X X θ A ( x A ) µ A ( x A ) → max ψ A → min A ∈ E x A A ∈ E X µ A ( x A ) = µ B ( x B ) ϕ A , B ( x B ) ≶ 0 ( A , B ) ∈ J , x B ∈ X B x A \ B X µ A ( x A ) = 1 ψ A ≶ 0 A ∈ E x A X X µ A ( x A ) ≥ 0 ϕ B , A ( x A ) − ϕ A , B ( x B )+ ψ A ≥ θ A ( x A ) A ∈ E , x A ∈ X A B | ( B , A ) ∈ J B | ( A , B ) ∈ J where J ⊆ I ( E ) = { ( A , B ) | A ∈ E , B ∈ E , B ⊂ A } 5 / 18

  6. Meaning of primal LP: Consistency of distributions on joint states ◮ Each A ∈ E is assigned a probability distribution µ A : X A → R on its joint states. ◮ For each ( A , B ) ∈ J , distribution µ A marginalises onto µ B , i.e., � µ B ( x B ) = µ A ( x A ) x A \ B Example Let A = { 1 , 2 , 3 , 4 } and B = { 1 , 3 } ⊂ A . Then the equation � � µ B ( x B ) = µ A ( x A ) reads µ 13 ( x 1 , x 3 ) = µ 1234 ( x 1 , x 2 , x 3 , x 4 ). x A \ B x 2 , x 4 What happens if the distributions are crisp (i.e., they can attain only 0 or 1)? ◮ Then µ A represents a single joint state. � ◮ The marginalisation constraints µ B ( x B ) = µ A ( x A ) represents the fact that x A \ B joint state µ B is the restriction of joint state µ A onto variables B ⊂ A . 6 / 18

  7. Reparameterisations Definition A reparameterisation (equivalent transformation) is a change of weight vector θ � that preserves the objective function θ A ( x A ). A ∈ E ◮ Elementary reparameterisation on triplet ( A , B , x B ) with B ⊆ A : add ϕ A , B ( x B ) to weights { θ A ( x A ) | x A \ B ∈ X A \ B } and subtract it from θ B ( x B ). ◮ Doing this for all triplets ( A , B , x B ) such that ( A , B ) ∈ J yields � � θ ϕ A ( x A ) = θ A ( x A ) + ϕ A , B ( x B ) − ϕ B , A ( x A ) B | ( A , B ) ∈ J B | ( B , A ) ∈ J Example ∪ E ′ with E ′ ⊆ � V � V � � For a binary problem, i.e. E = , and J = I ( E ), we have 1 2 � θ ϕ v ( x v ) = θ v ( x v ) − ϕ vv ′ , v ( x v ) v ′ ∈ N v θ ϕ vv ′ ( x v , x v ′ ) = θ vv ′ ( x v , x v ′ ) + ϕ vv ′ , v ( x v ) + ϕ vv ′ , v ′ ( x v ′ ) 7 / 18

  8. Meaning of dual LP: Minimising upper bound by reparameterisations � � ◮ Upper bound on the true optimum: max θ A ( x A ) ≤ max x A θ A ( x A ) x V A ∈ E A ∈ E � x A θ ϕ ◮ The dual LP can be written as min max A ( x A ) ϕ A ∈ E When is the upper bound exact? ◮ Joint state x A of variables A ∈ E is called active if θ A ( x A ) = max x A θ A ( x A ). ◮ The upper bound is exact iff the constraint satisfaction problem (CSP) formed by the active joint states is satisfiable. Is this CSP satisfiable? Yes! 8 / 18

  9. N-ary max-sum diffusion Algorithm (n-ary max-sum diffusion) 1: loop for ( A , B ) ∈ J and x B ∈ X B do 2: ϕ A , B ( x B ) += [ θ ϕ B ( x B ) − max x A \ B θ ϕ A ( x A )] / 2 3: (I.e., do reparameterisation on ( A , B , x B ) that makes θ ϕ x A \ B θ ϕ B ( x B ) = max A ( x A ).) end for 4: 5: end loop ◮ Monotonically decreases the upper bound by reparameterisations. ◮ Converges to a state when θ ϕ x A \ B θ ϕ B ( x B ) = max A ( x A ) for all ( A , B ) ∈ J and x B . ◮ For binary problems, equivalent to TRW-S [Kolmogorov-06] with edge updates. ◮ May end up in a local minimum (because minimisation by coordinates is applied to a nonsmooth convex function) but it is not a big drawback. Evaluating max x A \ B θ ϕ A ( x A ) means solving an auxiliary problem, the structure of which is hypergraph E ∩ 2 A rather than E . 9 / 18

  10. Adding a zero interaction may tighten the relaxation Idea Adding a hyperedge A / ∈ E to E while setting θ A ≡ 0 does not change the objective but may improve the relaxation. In fact, we can virtually add all possible zero interactions: then E = 2 V but only a few θ A are non-zero. Now, the relaxation is fully determined by J . 1234 Example for V = { 1 , 2 , 3 , 4 } : – the lattice I (2 V ) of subsets of V – original E depicted by red nodes 123 124 134 234 – J ⊆ I (2 V ) depicted by red edges 13 12 14 23 24 34 1 2 3 4 10 / 18

  11. Hierarchy of polyhedral relaxations J 1 ⊆ J 2 implies that relaxation J 1 is not tighter than J 2 . Therefore: Result All possible sets J ⊆ I (2 V ) form a hierarchy of relaxations, partially ordered by the inclusion relation on I (2 V ). In particular: ◮ J = ∅ : the weakest relaxation (the sum of independent maxima for each hyperedge A ∈ E ). ◮ J = I ( E ): the well-known ‘tree’ relaxation for binary problems by [Schlesinger-76,Koster-98,Wainwright-03] ◮ J = I (2 V ): the exact solution. Note: Even if J 1 ⊂ J 2 , relaxations J 1 and J 2 may be the same. In particular: J = I (2 V ) and J = { ( V , A ) | A ∈ E } both yield the same relaxation. Interpretation as lift+constrain+project Tightening the relaxation can be seen as lifting the original LP polytope, imposing a marginalisation constraint in this lifted space, and projecting back. 11 / 18

  12. Example: Adding zero 4-cycle interactions to binary problems ◮ On a number of instances of a binary problem, we computed how many instances were solved by the n-ary max-sum diffusion to optimality. ◮ Two relaxations tested: ◮ J tree : ‘traditional’ LP relaxation [Schlesinger-76,Kolmogorov-06,...] ◮ J 4cycle : J tree augmented with zero interactions on 4-tuples of variables (thus inducing 4-cycle subproblems). type image side | X v | r tree r 4cycle 15 5 0.01 1.00 random 25 3 0.00 0.98 random 100 3 0.00 0.72 random 15 5 0.79 0.99 Potts 25 5 0.48 0.98 Potts 100 5 0.00 0.81 Potts 10 4 0.72 0.88 lines 25 4 0.00 0.00 lines 10 9 0.17 0.65 curve 15 9 0.00 0.24 curve 25 9 0.00 0.00 curve 15 5 0.00 0.82 Pi 12 / 18

  13. Cutting plane algorithm θ ⊤ µ � µ ∈ P θ ⊤ µ � µ ∈ P ∩ Z n � � � � � � Let max be LP relaxation of ILP max . Cutting plane algorithm for general ILP in primal space 1: P ′ ← P 2: loop Find a maximiser µ ∗ of max � µ ∈ P ′ � � θ ⊤ µ � . 3: Find a half-space H such that P ∩ Z n ⊆ H and µ ∗ / ∈ H . If none exists, halt. 4: (separation problem) P ′ ← P ′ ∩ H 5: 6: end loop Cutting plane algorithm for MAP-MRF in dual space 1: J ← I ( E ) 2: loop Minimise upper bound of relaxation J by max-sum diffusion. 3: Find A / ∈ E such that the CSP formed by the active joint states restricted on 4: variables A is unsatisfiable. If none exists, halt. θ A ← 0; E ← E ∪ { A } ; J ← I ( E ) 5: 6: end loop 13 / 18

  14. Example Instead of adding all 4-cycles initially, we add only some of them one by one. ◮ Advantage: Much less memory needed for dual variables (‘messages’) ◮ Drawback: Not very practical (slow) in this simple form 14 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend