map inference with milp
play

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax


  1. 10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1

  2. Reminders • Homework 2: BP for Syntax Trees – Out: Sat, Sep. 28 – Due: Sat, Oct. 12 at 11:59pm • Last chance to switch between 10-418 / 10- 618 is October 7th (drop deadline) • Today’s after-clas office hours are un- cancelled (i.e. I am having them) 3

  3. MBR DECODING 4

  4. Minimum Bayes Risk Decoding • Suppose we given a loss function l( y ’, y ) and are asked for a single tagging • How should we choose just one from our probability distribution p( y | x ) ? • A minimum Bayes risk (MBR) decoder h( x ) returns the variable assignment with minimum expected loss under the model’s distribution E y ∼ p θ ( ·| x ) [ ` (ˆ h θ ( x ) = argmin y , y )] ˆ y X p θ ( y | x ) ` (ˆ = argmin y , y ) ˆ y y 5

  5. Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The Hamming loss corresponds to accuracy and returns the number of incorrect variable assignments: V X ` (ˆ y , y ) = (1 − I (ˆ y i , y i )) i =1 The MBR decoder is: y i = h θ ( x ) i = argmax ˆ p θ (ˆ y i | x ) y i ˆ This decomposes across variables and requires the variable marginals. 6

  6. Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The 0-1 loss function returns 1 only if the two assignments are identical and 0 otherwise: ` (ˆ y , y ) = 1 − I (ˆ y , y ) The MBR decoder is: X h θ ( x ) = argmin p θ ( y | x )(1 − I (ˆ y , y )) ˆ y y = argmax p θ (ˆ y | x ) ˆ y which is exactly the MAP inference problem! 7

  7. LINEAR PROGRAMMING & INTEGER LINEAR PROGRAMMING 8

  8. Linear Programming Whiteboard – Example of Linear Program in 2D – LP Standard Form – Converting an LP to Standard Form – LP and its Polytope – Simplex algorithm (tableau method) – Interior points algorithm(s) 9

  9. Integer Linear Programming Whiteboard – Example of an ILP in 2D – Example of an MILP in 2D 10

  10. Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 11

  11. Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 12

  12. Background: Nonconvex Global Optimization Relaxation : provides an upper bound on the surfac e. 13

  13. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 14

  14. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 15

  15. Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 16

  16. Background: Nonconvex Global Optimization The max of all relaxed solutions for each of the partitions is a global upper bound . 17

  17. Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 18

  18. Background: Nonconvex Global Optimization The incumbent is ε-optimal if the relative difference between the global upper bound and the incumbent score is less than ε . 19

  19. How much should we subdivide? 20

  20. How much should we subdivide? BRANCH-AND-BOUND • Method for recursively subdividing the search space • Subspace order can be determined heuristically (e.g. best-first search with depth-first plunging) • Prunes subspaces that can’t yield better solutions 21

  21. Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 22

  22. Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 23

  23. Limitations: Branch-and-Bound for the Viterbi Objective • The Viterbi Objective • Preview of Experiments – Nonconvex – We solve 5 sentences, but on 200 sentences, we couldn’t – NP Hard to solve run to completion (Cohen & Smith, 2010) – Our (hybrid) global search • Branch-and-bound framework incorporates – Kind of tricky to get it local search right… – This hybrid approach – Curse of dimensionality kicks sometimes finds higher in quickly likelihood (and higher • Nonconvex quadratic accuracy) solutions than optimization by LP-based pure local search branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009) • Our smallest (toy) problems have hundreds of variables 24

  24. BRANCH-AND-BOUND INGREDIENTS Mathematical Program Relaxation Projection (Branch-and-Bound Search Heuristics) 25

  25. Background: Nonconvex Global Optimization We solve the relaxation using the Simplex algorithm . 26

  26. Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 27

  27. Integer Linear Programming Whiteboard – Branch and bound for an ILP in 2D 28

  28. Branch and Bound Algorithm 2.1 Branch-and-bound Input : Minimization problem instance R . Output : Optimal solution x ⋆ with value c ⋆ , or conclusion that R has no solution, indicated by c ⋆ = ∞ . c := ∞ . 1. Initialize L := { R } , ˆ [ init ] 2. If L = ∅ , stop and return x ⋆ = ˆ x and c ⋆ = ˆ c . [ abort ] 3. Choose Q ∈ L , and set L := L \ { Q } . [ select ] c := ∞ . Otherwise, let ˇ 4. Solve a relaxation Q relax of Q . If Q relax is empty, set ˇ x be an optimal solution of Q relax and ˇ c its objective value. [ solve ] c ≥ ˆ 5. If ˇ c , goto Step 2. [ bound ] 6. If ˇ x is feasible for R , set ˆ x := ˇ x , ˆ c := ˇ c , and goto Step 2. [ check ] 7. Split Q into subproblems Q = Q 1 ∪ . . . ∪ Q k , set L := L ∪ { Q 1 , . . . , Q k } , and goto Step 2. [ branch ] 29 Slide from Achterberg (thesis, 2007)

  29. Branch and Bound root node R pruned solved subproblem subproblem current feasible Q subproblem solution unsolved new Q 1 Q k subproblems subproblems 30 Slide from Achterberg (thesis, 2007)

  30. Branch and Bound Q Q 1 Q 2 x ˇ x ˇ Figure 2.2. LP based branching on a single fractional variable. 31 Slide from Achterberg (thesis, 2007)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend