MAP Inference with MILP
1
10-418 / 10-618 Machine Learning for Structured Data
Matt Gormley Lecture 12
- Oct. 7, 2019
Machine Learning Department School of Computer Science Carnegie Mellon University
MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - - PowerPoint PPT Presentation
10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax
1
Matt Gormley Lecture 12
Machine Learning Department School of Computer Science Carnegie Mellon University
3
4
5
The Hamming loss corresponds to accuracy and returns the number
The MBR decoder is: This decomposes across variables and requires the variable marginals.
6
V
i=1
hθ(x) = argmin
ˆ y
Ey∼pθ(·|x)[`(ˆ y, y)] X
7
ˆ y
y
ˆ y
hθ(x) = argmin
ˆ y
Ey∼pθ(·|x)[`(ˆ y, y)] X
8
9
10
11
12
13
14
X1 ≤ 0.0 X1 ≥ 0.0
15
X1 ≤ 0.0 X1 ≥ 0.0
16
X1 ≤ 0.0 X1 ≥ 0.0
17
18
19
20
21
22
23
– Nonconvex – NP Hard to solve (Cohen & Smith, 2010)
– Kind of tricky to get it right… – Curse of dimensionality kicks in quickly
branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009)
have hundreds of variables
24
– We solve 5 sentences, but on 200 sentences, we couldn’t run to completion – Our (hybrid) global search framework incorporates local search – This hybrid approach sometimes finds higher likelihood (and higher accuracy) solutions than pure local search
Mathematical Program Relaxation Projection (Branch-and-Bound Search Heuristics)
25
26
27
28
29
Algorithm 2.1 Branch-and-bound Input: Minimization problem instance R. Output: Optimal solution x⋆ with value c⋆, or conclusion that R has no solution, indicated by c⋆ = ∞.
c := ∞. [init]
x and c⋆ = ˆ c. [abort]
[select]
c := ∞. Otherwise, let ˇ x be an optimal solution of Qrelax and ˇ c its objective value. [solve]
c ≥ ˆ c, goto Step 2. [bound]
x is feasible for R, set ˆ x := ˇ x, ˆ c := ˇ c, and goto Step 2. [check]
goto Step 2. [branch]
Slide from Achterberg (thesis, 2007)
30
Slide from Achterberg (thesis, 2007)
R Q Q1 Qk root node pruned solved current subproblem subproblem subproblem new unsolved subproblems subproblems feasible solution
31
Slide from Achterberg (thesis, 2007)
Q Q1 Q2 ˇ x ˇ x
Figure 2.2. LP based branching on a single fractional variable.