MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1

Reminders • Homework 2: BP for Syntax Trees – Out: Sat, Sep. 28 – Due: Sat, Oct. 12 at 11:59pm • Last chance to switch between 10-418 / 10- 618 is October 7th (drop deadline) • Today’s after-clas office hours are un- cancelled (i.e. I am having them) 3

MBR DECODING 4

Minimum Bayes Risk Decoding • Suppose we given a loss function l( y ’, y ) and are asked for a single tagging • How should we choose just one from our probability distribution p( y | x ) ? • A minimum Bayes risk (MBR) decoder h( x ) returns the variable assignment with minimum expected loss under the model’s distribution E y ∼ p θ ( ·| x ) [ ` (ˆ h θ ( x ) = argmin y , y )] ˆ y X p θ ( y | x ) ` (ˆ = argmin y , y ) ˆ y y 5

Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The Hamming loss corresponds to accuracy and returns the number of incorrect variable assignments: V X ` (ˆ y , y ) = (1 − I (ˆ y i , y i )) i =1 The MBR decoder is: y i = h θ ( x ) i = argmax ˆ p θ (ˆ y i | x ) y i ˆ This decomposes across variables and requires the variable marginals. 6

Minimum Bayes Risk Decoding h θ ( x ) = argmin E y ∼ p θ ( ·| x ) [ ` (ˆ y , y )] ˆ y Consider some example loss functions: X The 0-1 loss function returns 1 only if the two assignments are identical and 0 otherwise: ` (ˆ y , y ) = 1 − I (ˆ y , y ) The MBR decoder is: X h θ ( x ) = argmin p θ ( y | x )(1 − I (ˆ y , y )) ˆ y y = argmax p θ (ˆ y | x ) ˆ y which is exactly the MAP inference problem! 7

LINEAR PROGRAMMING & INTEGER LINEAR PROGRAMMING 8

Linear Programming Whiteboard – Example of Linear Program in 2D – LP Standard Form – Converting an LP to Standard Form – LP and its Polytope – Simplex algorithm (tableau method) – Interior points algorithm(s) 9

Integer Linear Programming Whiteboard – Example of an ILP in 2D – Example of an MILP in 2D 10

Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 11

Background: Nonconvex Global Optimization Goal: optimize over the blue surface . 12

Background: Nonconvex Global Optimization Relaxation : provides an upper bound on the surfac e. 13

Background: Nonconvex Global Optimization Branching: partitions the search space into subspaces, and enables tighter relaxation s. X 1 ≤ 0.0 X 1 ≥ 0.0 14

Background: Nonconvex Global Optimization The max of all relaxed solutions for each of the partitions is a global upper bound . 17

Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 18

Background: Nonconvex Global Optimization The incumbent is ε-optimal if the relative difference between the global upper bound and the incumbent score is less than ε . 19

How much should we subdivide? 20

How much should we subdivide? BRANCH-AND-BOUND • Method for recursively subdividing the search space • Subspace order can be determined heuristically (e.g. best-first search with depth-first plunging) • Prunes subspaces that can’t yield better solutions 21

Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 22

Background: Nonconvex Global Optimization If the subspace upper bound is worse than the current incumbent , we can prune that subspac e. 23

Limitations: Branch-and-Bound for the Viterbi Objective • The Viterbi Objective • Preview of Experiments – Nonconvex – We solve 5 sentences, but on 200 sentences, we couldn’t – NP Hard to solve run to completion (Cohen & Smith, 2010) – Our (hybrid) global search • Branch-and-bound framework incorporates – Kind of tricky to get it local search right… – This hybrid approach – Curse of dimensionality kicks sometimes finds higher in quickly likelihood (and higher • Nonconvex quadratic accuracy) solutions than optimization by LP-based pure local search branch-and-bound usually fails with more than 80 variables (Burer and Vandenbussche, 2009) • Our smallest (toy) problems have hundreds of variables 24

BRANCH-AND-BOUND INGREDIENTS Mathematical Program Relaxation Projection (Branch-and-Bound Search Heuristics) 25

Background: Nonconvex Global Optimization We solve the relaxation using the Simplex algorithm . 26

Background: Nonconvex Global Optimization We can project a relaxed solution onto the feasible region . 27

Integer Linear Programming Whiteboard – Branch and bound for an ILP in 2D 28

Branch and Bound Algorithm 2.1 Branch-and-bound Input : Minimization problem instance R . Output : Optimal solution x ⋆ with value c ⋆ , or conclusion that R has no solution, indicated by c ⋆ = ∞ . c := ∞ . 1. Initialize L := { R } , ˆ [ init ] 2. If L = ∅ , stop and return x ⋆ = ˆ x and c ⋆ = ˆ c . [ abort ] 3. Choose Q ∈ L , and set L := L \ { Q } . [ select ] c := ∞ . Otherwise, let ˇ 4. Solve a relaxation Q relax of Q . If Q relax is empty, set ˇ x be an optimal solution of Q relax and ˇ c its objective value. [ solve ] c ≥ ˆ 5. If ˇ c , goto Step 2. [ bound ] 6. If ˇ x is feasible for R , set ˆ x := ˇ x , ˆ c := ˇ c , and goto Step 2. [ check ] 7. Split Q into subproblems Q = Q 1 ∪ . . . ∪ Q k , set L := L ∪ { Q 1 , . . . , Q k } , and goto Step 2. [ branch ] 29 Slide from Achterberg (thesis, 2007)

Branch and Bound root node R pruned solved subproblem subproblem current feasible Q subproblem solution unsolved new Q 1 Q k subproblems subproblems 30 Slide from Achterberg (thesis, 2007)

Branch and Bound Q Q 1 Q 2 x ˇ x ˇ Figure 2.2. LP based branching on a single fractional variable. 31 Slide from Achterberg (thesis, 2007)

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax

Cube-Attack-Like Cryptanalysis of Round-Reduced Keccak Using MILP Ling Song , Jian Guo FSE 2019 @

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

GENERIC CUTS: AN EFFICIENT ALGORITHM FOR OPTIMAL INFERENCE IN HIGHER ORDER MRF-MAP Chetan Arora,

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Standards & Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP

Applying MILP Method to Searching Integral Distinguishers Based on Division Property for 6

New MILP Modelings for Symmetric-Key Primitives Christina Boura (Joint-work with Daniel Coggia)

A MILP model for planning at operative level in a meat packing plant By Sara Vernica Rodrguez

Learning to Branch in MILP Solvers Maxime Gasse, Didier Chetelat, Laurent Charlin, Andrea Lodi

MILP Modeling for (Large) S-boxes to Optimize Probability of Differential Characteristics Ahmed

On Recovering Affine Encodings in White-Box Implementations Patrick Derbez 1 , Pierre-Alain Fouque

Project Takeaway submitted. Next submission: Software Testing Nim Oct 26 th

Morphology in CLARIN-D Danil de Kok Introduction A whirlwind introduction: CLARIN-D

3 COMP 1 5 9 3 Algorithmic Verification LTL Model Checking and B uchi Automata Dr. Liam

Data Compression Heiko Schwarz Freie Universitt Berlin Fachbereich Mathematik und Informatik

1 XML: A Language for Metadata Tags Extensible Markup Language Tagging scheme similar to

ASSIGNMENT, LOOPS, AND BASIC TYPES CSSE 120 Rose-Hulman Institute of Technology Outline

Maths daily planning slides Week beginning 5.10.20 5.10.20 Today you need to complete the

MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 - PowerPoint PPT Presentation

10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University MAP Inference with MILP Matt Gormley Lecture 12 Oct. 7, 2019 1 Reminders Homework 2: BP for Syntax

Cube-Attack-Like Cryptanalysis of Round-Reduced Keccak Using MILP Ling Song , Jian Guo FSE 2019 @

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

GENERIC CUTS: AN EFFICIENT ALGORITHM FOR OPTIMAL INFERENCE IN HIGHER ORDER MRF-MAP Chetan Arora,

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

CS786 Lecture 15: May 21, 2012 MAP inference [KF Chapter 13] CS786 P. Poupart 2012 1 MAP Queries

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

Standards &amp; Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP

Applying MILP Method to Searching Integral Distinguishers Based on Division Property for 6

New MILP Modelings for Symmetric-Key Primitives Christina Boura (Joint-work with Daniel Coggia)

A MILP model for planning at operative level in a meat packing plant By Sara Vernica Rodrguez

Learning to Branch in MILP Solvers Maxime Gasse, Didier Chetelat, Laurent Charlin, Andrea Lodi

MILP Modeling for (Large) S-boxes to Optimize Probability of Differential Characteristics Ahmed

On Recovering Affine Encodings in White-Box Implementations Patrick Derbez 1 , Pierre-Alain Fouque

Project Takeaway submitted. Next submission: Software Testing Nim Oct 26 th

Morphology in CLARIN-D Danil de Kok Introduction A whirlwind introduction: CLARIN-D

3 COMP 1 5 9 3 Algorithmic Verification LTL Model Checking and B uchi Automata Dr. Liam

Data Compression Heiko Schwarz Freie Universitt Berlin Fachbereich Mathematik und Informatik

1 XML: A Language for Metadata Tags Extensible Markup Language Tagging scheme similar to

ASSIGNMENT, LOOPS, AND BASIC TYPES CSSE 120 Rose-Hulman Institute of Technology Outline

Maths daily planning slides Week beginning 5.10.20 5.10.20 Today you need to complete the

Standards & Energy Efficiency for the Lighting of Road Tunnels John Rands CEng MILP