Inference: Integer Linear Programs CS 6355: Structured Prediction 1 - - PowerPoint PPT Presentation

inference integer linear programs
SMART_READER_LITE
LIVE PREVIEW

Inference: Integer Linear Programs CS 6355: Structured Prediction 1 - - PowerPoint PPT Presentation

Inference: Integer Linear Programs CS 6355: Structured Prediction 1 So far in the class Thinking about structures A graph, a collection of parts that are labeled jointly, a collection of decisions Algorithms for learning Local


slide-1
SLIDE 1

CS 6355: Structured Prediction

Inference: Integer Linear Programs

1

slide-2
SLIDE 2

So far in the class

  • Thinking about structures

– A graph, a collection of parts that are labeled jointly, a collection of decisions

  • Algorithms for learning

– Local learning

  • Learn parameters for individual components independently
  • Learning algorithm not aware of the full structure

– Global learning

  • Learn parameters for the full structure
  • Learning algorithm “knows” about the full structure
  • This section: Prediction

– Sets structured prediction apart from binary/multiclass

2

slide-3
SLIDE 3

Inference

  • What is inference?

– An overview of what we have seen before – Combinatorial optimization – Different views of inference

  • Graph algorithms

– Dynamic programming, greedy algorithms, search

  • Integer programming
  • Heuristics for inference

– Sampling

  • Learning to search

3

slide-4
SLIDE 4

The big picture

  • MAP Inference is combinatorial optimization
  • Combinatorial optimization problems can be written as integer linear

programs (ILP)

– The conversion is not always trivial – Allows injection of “knowledge” into the inference in the form of constraints

  • Different ways of solving ILPs

– Commercial solvers: CPLEX, Gurobi, etc – Specialized solvers if you know something about your problem

  • Incremental ILP, Lagrangian relaxation, etc

– Can approximate to linear programs and hope for the best

  • Integer linear programs are NP hard in general

– No free lunch

4

slide-5
SLIDE 5

Today’s Agenda

  • Linear and integer linear programming

– What are they? – The geometric perspective

  • ILPs for inference

– Simple example: Multiclass classification – More general structures

5

slide-6
SLIDE 6

Detour: Linear programming

  • Minimizing a linear objective function subject to a finite

number of linear constraints (equality or inequality)

  • Very widely applicable

– Operations research, micro-economics, management

  • Historical note/anecdote

– Developed during world war 2 to optimize army expenditure

  • Nobel Prize in Economics 1975

– “Programming” not the same as computer programming

  • “Program” referred to military schedules and programming referred to optimizing

the program

6

slide-7
SLIDE 7

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

7

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

slide-8
SLIDE 8

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

8

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative such that

slide-9
SLIDE 9

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

9

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative such that

slide-10
SLIDE 10

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

10

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative

slide-11
SLIDE 11

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

11

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative

slide-12
SLIDE 12

Example: The diet problem

A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X?

12

Item Cost/100g Vitamin Z Nutrient X Carrots 2 4 0.4 Sunflower seeds 6 10 4 Double cheeseburger 0.3 0.01 2

Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative

slide-13
SLIDE 13

Linear programming

In general

13

linear linear

slide-14
SLIDE 14

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

14

x1 x2 x3

slide-15
SLIDE 15

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

15

x1 x2 x3 a1x1 + a2x2 + a3x3 = b

slide-16
SLIDE 16

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

16

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region
slide-17
SLIDE 17

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

17

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region

c

slide-18
SLIDE 18

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

18

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region

c

slide-19
SLIDE 19

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

19

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region

c

slide-20
SLIDE 20

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

20

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region

c

slide-21
SLIDE 21

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – For example:

21

x1 x2 x3 a1x1 + a2x2 + a3x3 = b Suppose we had to maximize any cTx

  • n this region

These three vertices are the only possible solutions!

slide-22
SLIDE 22

Linear programming

In general This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – The constraint matrix defines a convex polytope – Only the vertices or faces of the polytope can be solutions

22

slide-23
SLIDE 23

Geometry of linear programming

23

The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-24
SLIDE 24

Geometry of linear programming

24

One of the constraints: 𝐵"

#𝒚 ≤ 𝑐"

The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-25
SLIDE 25

Geometry of linear programming

25

One of the constraints: 𝐵"

#𝒚 ≤ 𝑐"

Points in the shaded region can are not allowed by this constraint The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-26
SLIDE 26

Geometry of linear programming

26

The constraint matrix defines a polytope that contains allowed solutions (possibly not closed) Every constraint forbids a half-space The points that are allowed form the feasible region

slide-27
SLIDE 27

Geometry of linear programming

27

The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-28
SLIDE 28

Geometry of linear programming

28

The objective defines cost for every point in the space The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-29
SLIDE 29

Geometry of linear programming

29

The objective defines cost for every point in the space The constraint matrix defines a polytope that contains allowed solutions (possibly not closed)

slide-30
SLIDE 30

Geometry of linear programming

30

The constraint matrix defines a polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Even though all points in the region are allowed, points on the faces maximize/minimize the cost

slide-31
SLIDE 31

Linear programming

  • In general
  • This is a continuous optimization problem

– And yet, there are only a finite set of possible solutions – The constraint matrix defines a convex polytope – Only the vertices or faces of the polytope can be solutions

  • Linear programs can be solved in polynomial time

31

Questions?

slide-32
SLIDE 32

Integer linear programming

  • In general

32

slide-33
SLIDE 33

Geometry of integer linear programming

33

The constraint matrix defines polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Only integer points allowed

slide-34
SLIDE 34

Integer linear programming

  • In general
  • Solving integer linear programs in general can be NP-

hard!

  • LP-relaxation: Drop the integer constraints and hope

for the best

34

slide-35
SLIDE 35

0-1 integer linear programming

  • In general
  • An instance of integer linear programs

– Still NP-hard

  • Geometry: We are only considering points that are

vertices of the Boolean hypercube

35

slide-36
SLIDE 36

0-1 integer linear programming

  • In general
  • An instance of integer linear programs

– Still NP-hard

  • Geometry: We are only considering points that are

vertices of the Boolean hypercube

– Constraints prohibit certain vertices

36

Eg: Only points within this region are allowed

slide-37
SLIDE 37

0-1 integer linear programming

  • In general
  • An instance of integer linear programs

– Still NP-hard

  • Geometry: We are only considering points that are

vertices of the Boolean hypercube

– Constraints prohibit certain vertices

37

Eg: Only points within this region are allowed Solution can be an interior point of the constraint set defined by Ax · b Questions?

slide-38
SLIDE 38

Back to structured prediction

  • Recall that we are solving argmaxy wTÁ(x,y)

– The goal is to produce a graph

  • The set of possible values that y can take is finite, but

large

  • General idea: Frame the argmax problem as a 0-1

integer linear program

– Allows addition of arbitrary constraints

38

slide-39
SLIDE 39

Thinking in ILPs

Let’s start with multi-class classification

argmaxy 2 {A, B, C} wTÁ(x, y) = argmaxy 2 {A, B, C} score(y)

Introduce decision variables for each label

  • zA = 1 if output = A, 0 otherwise
  • zB = 1 if output = B, 0 otherwise
  • zC = 1 if output = C, 0 otherwise

39

slide-40
SLIDE 40

Thinking in ILPs

Let’s start with multi-class classification

argmaxy 2 {A, B, C} wTÁ(x, y) = argmaxy 2 {A, B, C} score(y)

Introduce decision variables for each label

  • zA = 1 if output = A, 0 otherwise
  • zB = 1 if output = B, 0 otherwise
  • zC = 1 if output = C, 0 otherwise

40

Maximize the score Pick exactly one label

slide-41
SLIDE 41

Thinking in ILPs

Let’s start with multi-class classification

argmaxy 2 {A, B, C} wTÁ(x, y) = argmaxy 2 {A, B, C} score(y)

Introduce decision variables for each label

  • zA = 1 if output = A, 0 otherwise
  • zB = 1 if output = B, 0 otherwise
  • zC = 1 if output = C, 0 otherwise

41

Maximize the score Pick exactly one label An assignment to the z vector gives us a y

slide-42
SLIDE 42

Thinking in ILPs

Let’s start with multi-class classification

argmaxy 2 {A, B, C} wTÁ(x, y) = argmaxy 2 {A, B, C} score(y)

Introduce decision variables for each label

  • zA = 1 if output = A, 0 otherwise
  • zB = 1 if output = B, 0 otherwise
  • zC = 1 if output = C, 0 otherwise

42

Maximize the score Pick exactly one label

We have taken a trivial problem (finding a highest scoring element

  • f a list) and converted it into a representation that is NP-hard in

the worst case! Lesson: Don’t solve multiclass classification with an ILP solver

An assignment to the z vector gives us a y Questions?

slide-43
SLIDE 43

ILP for a general conditional models

43

x1 x2 x3 y3 y2 y1 Suppose each yi can be A, B or C Introduce one decision variable for each part being assigned labels Our goal: maxy wTÁ(x1, y1) + wTÁ(y1, y2, y3) + wTÁ(x3, y2, y3) + wTÁ(x1, x2, y2)

slide-44
SLIDE 44

ILP for a general conditional models

44

x1 x2 x3 y3 y2 y1 z1A, z1B, z1C Suppose each yi can be A, B or C Introduce one decision variable for each part being assigned labels Our goal: maxy wTÁ(x1, y1) + wTÁ(y1, y2, y3) + wTÁ(x3, y1, y3) + wTÁ(x1, x2, y2)

slide-45
SLIDE 45

ILP for a general conditional models

45

x1 x2 x3 y3 y2 y1 z1A, z1B, z1C z13AA, z13AB, z13AC, z13BA, z13BB, z13BC, z13CA, z13CB, z13CC z23AA, z23AB, z23AC, z23BA, z23BB, z23BC, z23CA, z23CB, z23CC z2A, z2B, z2C Suppose each yi can be A, B or C Introduce one decision variable for each part being assigned labels Each of these decision variables is associated with a score Questions? Our goal: maxy wTÁ(x1, y1) + wTÁ(y1, y2, y3) + wTÁ(x3, y1, y3) + wTÁ(x1, x2, y2)

slide-46
SLIDE 46

ILP for a general conditional models

46

Suppose each yi can be A, B or C Introduce one decision variable for each part being assigned labels Each of these decision variables is associated with a score x1 x2 x3 y3 y2 y1 Our goal: maxy wTÁ(x1, y1) + wTÁ(y1, y2, y3) + wTÁ(x3, y1, y3) + wTÁ(x1, x2, y2)

slide-47
SLIDE 47

ILP for a general conditional models

47

Suppose each yi can be A, B or C Introduce one decision variable for each part being assigned labels Each of these decision variables is associated with a score Not all decisions can exist together. Eg: z13AB implies z1A and z3B x1 x2 x3 y3 y2 y1 Our goal: maxy wTÁ(x1, y1) + wTÁ(y1, y2, y3) + wTÁ(x3, y2, y3) + wTÁ(x1, x2, y2)

slide-48
SLIDE 48

Writing constraints as linear inequalities

  • Exactly one of zA, zB, zC can be true

zA + zB + zC = 1

  • At least m of zA, zB, zC should be true

zA + zB + zC ¸ m

  • At most m of zA, zB, zC should be true

zA + zB + zC · m

  • Implication: zi ) zj

– Convert to disjunction: ¬ zi Ç zj

  • At least one of “not zi” or zj

– 1 – zi + zj ¸ 1 (i.e.) zj ¸ zi

48

slide-49
SLIDE 49

Writing constraints as linear inequalities

  • Exactly one of zA, zB, zC can be true

zA + zB + zC = 1

  • At least m of zA, zB, zC should be true

zA + zB + zC ¸ m

  • At most m of zA, zB, zC should be true

zA + zB + zC · m

  • Implication: zi ) zj

– Convert to disjunction: ¬ zi Ç zj

  • At least one of “not zi” or zj

– 1 – zi + zj ¸ 1 (i.e.) zj ¸ zi

49

Generally: All Boolean formulas can be converted to constraints Exercise: Convert the toy conditional model below to an ILP by hand x1 x2 x3 y3 y2 y1

slide-50
SLIDE 50

Integer linear programming for inference

  • Easy to add additional knowledge

– Specify them as Boolean formulas – Examples

  • “If y1 is an A, then y2 or y3 should be a B or C”
  • “No more than two A’s allowed in the output”
  • Many inference problems have “standard” mappings to ILPs

– Sequences, parsing, dependency parsing

  • Encoding of the problem makes a difference in solving time

– The mechanical encoding may not be efficient to solve

  • Generally: more complex constraints make solving harder

50

slide-51
SLIDE 51

Exercise 1: Sequence labeling

Goal: Find argmaxy wTÁ(x,y) y = (y1, y2, !, yn)

y2 y3 y1 y

n

51

How can this be written as an ILP?

slide-52
SLIDE 52

Exercise 2: Alignment

Suppose we have two sequences 𝑦(( 𝑦(* 𝑦(+ ⋯ 𝑦(- 𝑦*( 𝑦** 𝑦*+ ⋯ 𝑦*. Each pair 𝑦(", 𝑦*0 is assigned a score 𝑡"0. The goal is to find edges between the two sequences such that the following conditions hold: 1. The total score of the selected edges is maximized 2. No more than one edge should be connected to any element of the second sequence.

52

slide-53
SLIDE 53

Exercise 2: Alignment

Suppose we have two sequences 𝑦(( 𝑦(* 𝑦(+ ⋯ 𝑦(- 𝑦*( 𝑦** 𝑦*+ ⋯ 𝑦*. Each pair 𝑦(", 𝑦*0 is assigned a score 𝑡"0. The goal is to find edges between the two sequences such that the following conditions hold: 1. The total score of the selected edges is maximized 2. No more than one edge should be connected to any element of the second sequence.

53

slide-54
SLIDE 54

Exercise 2: Alignment

Suppose we have two sequences 𝑦(( 𝑦(* 𝑦(+ ⋯ 𝑦(- 𝑦*( 𝑦** 𝑦*+ ⋯ 𝑦*. Each pair 𝑦(", 𝑦*0 is assigned a score 𝑡"0. The goal is to find edges between the two sequences such that the following conditions hold: 1. The total score of the selected edges is maximized 2. No more than one edge should be connected to any element of the second sequence.

54

How can this be written as an ILP?

slide-55
SLIDE 55

ILP for inference: Remarks

  • Any combinatorial optimization problem can be written as an

ILP

– Even the “easy”/polynomial ones – Given an ILP, checking whether it represents a polynomial problem is intractable in general

  • ILPs are a general language for thinking about combinatorial
  • ptimization

– The representation allows us to make general statements about inference – Important: Framing/writing down the inference problem is separate from solving it

  • Off-the-shelf solvers for ILPs are quite good

– Gurobi, CPLEX – Use an off the shelf solver only if you can’t solve your inference problem

  • therwise

55

slide-56
SLIDE 56

The big picture

  • MAP Inference is combinatorial optimization
  • Combinatorial optimization problems can be written as 0-1 integer linear

programs

– The conversion is not always trivial – Allows injection of “knowledge” into the inference in the form of constraints

  • Different ways of solving ILPs

– Commercial solvers: CPLEX, Gurobi, etc – Specialized solvers if you know something about your problem

  • Incremental ILP, Lagrangian relaxation, etc

– Can relax to linear programs and hope for the best

  • Integer linear programs are NP hard in general

– No free lunch

56

Questions?