Learning to Branch Ellen Vitercik Joint work with Nina Balcan, - - PowerPoint PPT Presentation

β–Ά
learning to branch
SMART_READER_LITE
LIVE PREVIEW

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, - - PowerPoint PPT Presentation

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1 Integer Programs (IPs) a maximize subject to {0,1} 2 Facility location


slide-1
SLIDE 1

Learning to Branch

Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018

1

slide-2
SLIDE 2

Integer Programs (IPs)

a maximize 𝒅 βˆ™ π’š subject to π΅π’š ≀ 𝒄 π’š ∈ {0,1}π‘œ

2

slide-3
SLIDE 3

Facility location problems can be formulated as IPs.

3

slide-4
SLIDE 4

Clustering problems can be formulated as IPs.

4

slide-5
SLIDE 5

Binary classification problems can be formulated as IPs.

5

slide-6
SLIDE 6

Integer Programs (IPs)

a maximize 𝒅 βˆ™ π’š subject to π΅π’š = 𝒄 π’š ∈ {0,1}π‘œ NP-hard

6

slide-7
SLIDE 7

Branch and Bound (B&B)

  • Most widely-used algorithm for IP-solving (CPLEX, Gurobi)
  • Recursively partitions search space to find an optimal solution
  • Organizes partition as a tree
  • Many parameters
  • CPLEX has a 221-page manual describing 135 parameters

β€œYou may need to experiment.”

7

slide-8
SLIDE 8

Why is tuning B&B parameters important?

  • Save time
  • Solve more problems
  • Find better solutions

8

slide-9
SLIDE 9

B&B in the real world

Delivery company routes trucks daily

Use integer programming to select routes

Demand changes every day

Solve hundreds of similar optimizations

Using this set of typical problems… can we learn best parameters?

9

slide-10
SLIDE 10

Application- Specific Distribution Algorithm Designer B&B parameters

Model

𝐡 1 , 𝒄 1 , 𝒅 1 , … , 𝐡 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 How to use samples to find best B&B parameters for my domain?

10

slide-11
SLIDE 11

Model

Model has been studied in applied communities [Hutter et al. β€˜09] Application- Specific Distribution Algorithm Designer B&B parameters 𝐡 1 , 𝒄 1 , 𝒅 1 , … , 𝐡 𝑛 , 𝒄 𝑛 , 𝒅 𝑛

11

slide-12
SLIDE 12

Model

Model has been studied from a theoretical perspective [Gupta and Roughgarden β€˜16, Balcan et al., β€˜17] Application- Specific Distribution Algorithm Designer B&B parameters 𝐡 1 , 𝒄 1 , 𝒅 1 , … , 𝐡 𝑛 , 𝒄 𝑛 , 𝒅 𝑛

12

slide-13
SLIDE 13

Model

  • 1. Fix a set of B&B parameters to optimize
  • 2. Receive sample problems from unknown distribution
  • 3. Find parameters with the best performance on the samples

β€œBest” could mean smallest search tree, for example

𝐡 1 , 𝒄 1 , 𝒅 1 𝐡 2 , 𝒄 2 , 𝒅 2

13

slide-14
SLIDE 14

Questions to address

How to find parameters that are best on average over samples? Will those parameters have high performance in expectation? 𝐡, 𝒄, 𝒅

?

𝐡 1 , 𝒄 1 , 𝒅 1 𝐡 2 , 𝒄 2 , 𝒅 2

14

slide-15
SLIDE 15

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • 4. Experiments
  • 5. Conclusion and Future Directions

15

slide-16
SLIDE 16

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

16

slide-17
SLIDE 17

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140

17

slide-18
SLIDE 18

B&B

  • 1. Choose leaf of tree

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140

18

slide-19
SLIDE 19

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 𝑦1 = 0 𝑦1 = 1

19

slide-20
SLIDE 20

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 𝑦1 = 0 𝑦1 = 1

20

slide-21
SLIDE 21

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 𝑦1 = 0 𝑦1 = 1 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 𝑦2 = 0 𝑦2 = 1

21

slide-22
SLIDE 22

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 𝑦1 = 0 𝑦1 = 1 𝑦2 = 0 𝑦2 = 1

22

slide-23
SLIDE 23

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 3

5 , 0, 0, 0, 1, 1

116 0, 1, 1

3 , 1, 0, 0, 1

133.3 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1

23

slide-24
SLIDE 24

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 1, 1

3 , 1, 0, 0, 1

133.3 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1 0, 3

5 , 0, 0, 0, 1, 1

116

24

slide-25
SLIDE 25

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 3

5 , 0, 0, 0, 1, 1

116 0, 1, 1

3 , 1, 0, 0, 1

133.3 0, 1, 0, 1, 1, 0, 1 0, 4

5 , 1, 0, 0, 0, 1

118 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1 𝑦3 = 0 𝑦3 = 1 133

25

slide-26
SLIDE 26

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable
  • 3. Fathom leaf if:

i. LP relaxation solution is integral ii. LP relaxation is infeasible

  • iii. LP relaxation solution

isn’t better than best- known integral solution max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 3

5 , 0, 0, 0, 1, 1

116 0, 1, 1

3 , 1, 0, 0, 1

133.3 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1 𝑦3 = 0 𝑦3 = 1 0, 4

5 , 1, 0, 0, 0, 1

118 0, 1, 0, 1, 1, 0, 1 133

26

slide-27
SLIDE 27

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable
  • 3. Fathom leaf if:

i. LP relaxation solution is integral ii. LP relaxation is infeasible

  • iii. LP relaxation solution

isn’t better than best- known integral solution max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 3

5 , 0, 0, 0, 1, 1

116 0, 1, 1

3 , 1, 0, 0, 1

133.3 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1 𝑦3 = 0 𝑦3 = 1 0, 4

5 , 1, 0, 0, 0, 1

118 0, 1, 0, 1, 1, 0, 1 133 Integral

27

slide-28
SLIDE 28

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable
  • 3. Fathom leaf if:

i. LP relaxation solution is integral ii. LP relaxation is infeasible

  • iii. LP relaxation solution

isn’t better than best- known integral solution max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

1 2 , 1, 0, 0, 0, 0, 1

140 1,

3 5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0,

1 4 , 1

135 1, 0, 0, 1, 0, 1

2 , 1

120 1, 1, 0, 0, 0, 0, 1

3

120 0, 3

5 , 0, 0, 0, 1, 1

116 0, 1, 1

3 , 1, 0, 0, 1

133.3 𝑦1 = 0 𝑦1 = 1 𝑦6 = 0 𝑦6 = 1 𝑦2 = 0 𝑦2 = 1 𝑦3 = 0 𝑦3 = 1 0, 4

5 , 1, 0, 0, 0, 1

118 0, 1, 0, 1, 1, 0, 1 133

28

slide-29
SLIDE 29

B&B

  • 1. Choose leaf of tree
  • 2. Branch on a variable
  • 3. Fathom leaf if:

i. LP relaxation solution is integral ii. LP relaxation is infeasible

  • iii. LP relaxation solution

isn’t better than best- known integral solution

This talk: How to choose which variable?

(Assume every other aspect of B&B is fixed.)

29

slide-30
SLIDE 30

Variable selection policies can have a huge effect on tree size

30

slide-31
SLIDE 31

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • a. Algorithm Overview
  • b. Variable Selection Policies
  • 3. Learning algorithms
  • 4. Experiments
  • 5. Conclusion and Future Directions

31

slide-32
SLIDE 32

Variable selection policies (VSPs)

Score-based VSP: At leaf 𝑹, branch on variable π’šπ’‹ maximizing 𝐭𝐝𝐩𝐬𝐟 𝑹, 𝒋 Many options! Little known about which to use when

1, 3

5 , 0, 0, 0, 0, 1

136 1, 0, 0, 1, 0,

1 2 , 1

120 1, 1, 0, 0, 0, 0,

1 3

120 𝑦2 = 0 𝑦2 = 1

32

slide-33
SLIDE 33

Variable selection policies

For an IP instance 𝑅:

  • Let 𝑑𝑅 be the objective value of its LP relaxation
  • Let 𝑅𝑗

βˆ’ be 𝑅 with 𝑦𝑗 set to 0, and let 𝑅𝑗 + be 𝑅 with 𝑦𝑗 set to 1

Example.

1 2 , 1, 0, 0, 0, 0, 1

140 max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

𝑑𝑅 𝑅

33

slide-34
SLIDE 34

Variable selection policies

For an IP instance 𝑅:

  • Let 𝑑𝑅 be the objective value of its LP relaxation
  • Let 𝑅𝑗

βˆ’ be 𝑅 with 𝑦𝑗 set to 0, and let 𝑅𝑗 + be 𝑅 with 𝑦𝑗 set to 1

Example.

1 2 , 1, 0, 0, 0, 0, 1

140 1, 3

5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0, 1

4 , 1

135 𝑦1 = 0 𝑦1 = 1 max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

𝑑𝑅1

βˆ’

𝑑𝑅1

+

𝑑𝑅 𝑅

34

slide-35
SLIDE 35

For a IP instance 𝑅:

  • Let 𝑑𝑅 be the objective value of its LP relaxation
  • Let 𝑅𝑗

βˆ’ be 𝑅 with 𝑦𝑗 set to 0, and let 𝑅𝑗 + be 𝑅 with 𝑦𝑗 set to 1

Example.

Variable selection policies

1 2 , 1, 0, 0, 0, 0, 1

140 1, 3

5 , 0, 0, 0, 0, 1

136 0, 1, 0, 1, 0, 1

4 , 1

135 𝑦1 = 0 𝑦1 = 1

𝑑𝑅1

βˆ’

𝑑𝑅1

+

𝑑𝑅 𝑅

max (40, 60, 10, 10, 3, 20, 60) βˆ™ π’š s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ π’š ≀ 100 π’š ∈ {0,1}7

The linear rule (parameterized by 𝝂) [Linderoth & Savelsbergh, 1999] Branch on variable 𝑦𝑗 maximizing: score 𝑅, 𝑗 = 𝜈 min 𝑑𝑅 βˆ’ 𝑑𝑅𝑗

βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 + + (1 βˆ’ 𝜈) max 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 +

35

slide-36
SLIDE 36

Variable selection policies

And many more…

The (simplified) product rule [Achterberg, 2009] Branch on variable 𝑦𝑗 maximizing: score 𝑅, 𝑗 = 𝑑𝑅 βˆ’ 𝑑𝑅𝑗

βˆ’ βˆ™ 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 +

The linear rule (parameterized by 𝝂) [Linderoth & Savelsbergh, 1999] Branch on variable 𝑦𝑗 maximizing: score 𝑅, 𝑗 = 𝜈 min 𝑑𝑅 βˆ’ 𝑑𝑅𝑗

βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 + + (1 βˆ’ 𝜈) max 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 +

36

slide-37
SLIDE 37

Variable selection policies

Given 𝑒 scoring rules score1, … , scored. Goal: Learn best convex combination 𝜈1score1 + β‹― + πœˆπ‘’scored. Branch on variable 𝑦𝑗 maximizing: score 𝑅, 𝑗 = 𝜈1score1 𝑅, 𝑗 + β‹― + πœˆπ‘’scored 𝑅, 𝑗 Our parameterized rule

37

slide-38
SLIDE 38

Application- Specific Distribution Algorithm Designer B&B parameters

Model

𝐡 1 , 𝒄 1 , 𝒅 1 , … , 𝐡 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 How to use samples to find best B&B parameters for my domain?

38

slide-39
SLIDE 39

Application- Specific Distribution Algorithm Designer B&B parameters

Model

𝐡 1 , 𝒄 1 , 𝒅 1 , … , 𝐡 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 𝜈1, … , πœˆπ‘’ How to use samples to find best B&B parameters for my domain? 𝜈1, … , πœˆπ‘’

39

slide-40
SLIDE 40

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • a. First-try: Discretization
  • b. Our Approach
  • 4. Experiments
  • 5. Conclusion and Future Directions

40

slide-41
SLIDE 41

First try: Discretization

  • 1. Discretize parameter space
  • 2. Receive sample problems from unknown distribution
  • 3. Find params in discretization with best average performance

𝜈

Average tree size

41

slide-42
SLIDE 42

First try: Discretization

This has been prior work’s approach [e.g., Achterberg (2009)]. 𝜈

Average tree size

42

slide-43
SLIDE 43

Discretization gone wrong

𝜈 Average tree size

43

slide-44
SLIDE 44

Discretization gone wrong

𝜈 Average tree size

This can actually happen!

44

slide-45
SLIDE 45

Discretization gone wrong

Theorem [informal]. For any discretization: Exists problem instance distribution 𝒠 inducing this behavior Proof ideas: 𝒠’s support consists of infeasible IPs with β€œeasy out” variables

B&B takes exponential time unless branches on β€œeasy out” variables

B&B only finds β€œeasy outs” if uses parameters from specific range

Expected tree size 𝜈

45

slide-46
SLIDE 46

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • a. First-try: Discretization
  • b. Our Approach

i. Single-parameter settings ii. Multi-parameter settings

  • 4. Experiments
  • 5. Conclusion and Future Directions

46

slide-47
SLIDE 47

Simple assumption

Exists πœ† upper bounding the size of largest tree willing to build Common assumption, e.g.:

  • Hutter, Hoos, Leyton-Brown, StΓΌtzle, JAIR’09
  • Kleinberg, Leyton-Brown, Lucier, IJCAI’17

47

slide-48
SLIDE 48

𝜈 ∈ [0,1]

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐 Much smaller in our experiments!

Useful lemma

48

slide-49
SLIDE 49

Useful lemma

branch

  • n 𝑦2

branch

  • n 𝑦3

𝜈 𝜈 βˆ™ score1 𝑅, 1 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅, 1 𝜈 βˆ™ score1 𝑅, 2 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅, 2 𝜈 βˆ™ score1 𝑅, 3 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅, 3 branch

  • n 𝑦1

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐

49

slide-50
SLIDE 50

Useful lemma

𝑅 𝑅2

βˆ’

𝑅2

+

Any 𝜈 in yellow interval: 𝑦2 = 0 𝑦2 = 1 branch

  • n 𝑦2

branch

  • n 𝑦3

𝜈 branch

  • n 𝑦1

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐

𝑅2

βˆ’

50

slide-51
SLIDE 51

Useful lemma

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐

𝜈 𝜈 βˆ™ score1 𝑅2

βˆ’, 1 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅2 βˆ’, 1

𝜈 βˆ™ score1 𝑅2

βˆ’, 3 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅2 βˆ’, 3

𝑅 𝑅2

βˆ’

𝑅2

+

Any 𝜈 in yellow interval: 𝑦2 = 0 𝑦2 = 1 branch on 𝑦2 then 𝑦3 branch on 𝑦2 then 𝑦1

51

slide-52
SLIDE 52

Useful lemma

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐

𝜈 Any 𝜈 in blue-yellow interval: branch on 𝑦2 then 𝑦3 branch on 𝑦2 then 𝑦1 𝑅 𝑅2

βˆ’

𝑅2

+

𝑦3 = 0 𝑦3 = 1

52

𝑦2 = 0 𝑦2 = 1

slide-53
SLIDE 53

Useful lemma

Proof idea.

  • Continue dividing [0,1] into intervals s.t.:

In each interval, var. selection order fixed

  • Can subdivide only finite number of times
  • Proof follows by induction on tree depth

Lemma: For any two scoring rules and any IP 𝑅, 𝑃 (# variables)πœ†+2 intervals partition [0,1] such that: For any interval [𝑏, 𝑐], B&B builds same tree across all 𝜈 ∈ 𝑏, 𝑐

𝜈 branch on 𝑦2 then 𝑦3 branch on 𝑦2 then 𝑦1

53

slide-54
SLIDE 54

Learning algorithm

Input: Set of IPs sampled from a distribution 𝒠 For each IP, set 𝜈 = 0. While 𝜈 < 1:

  • 1. Run B&B using 𝜈 βˆ™ score1 + (1 βˆ’ 𝜈) βˆ™ score2, resulting in tree 𝒰
  • 2. Find interval 𝜈, πœˆβ€² where if B&B is run using the scoring rule

πœˆβ€²β€² βˆ™ score1 + 1 βˆ’ πœˆβ€²β€² βˆ™ score2 for any πœˆβ€²β€² ∈ 𝜈, πœˆβ€² , B&B will build tree 𝒰 (takes a little bookkeeping)

  • 3. Set 𝜈 = πœˆβ€²

Return: Any ො 𝜈 from the interval minimizing average tree size

𝜈 ∈ [0,1]

54

slide-55
SLIDE 55

Learning algorithm guarantees

Let ΖΈ 𝜈 be algorithm’s output given ΰ·¨ 𝑃

πœ†3 𝜁2 ln(#variables) samples.

W.h.p., 𝔽𝑅~𝒠[tree-size(𝑅, ΖΈ 𝜈)] βˆ’ min

𝜈∈ 0,1 𝔽𝑅~𝒠[treeβˆ’size(𝑅, 𝜈)] < 𝜁

Proof intuition: Bound algorithm class’s intrinsic complexity (IC)

  • Lemma bounds the number of β€œtruly different” parameters
  • Parameters that are β€œthe same” come from a simple set

Learning theory allows us to translate IC to sample complexity

𝜈 ∈ [0,1]

55

slide-56
SLIDE 56

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • a. First-try: Discretization
  • b. Our Approach

i. Single-parameter settings ii. Multi-parameter settings

  • 4. Experiments
  • 5. Conclusion and Future Directions

56

slide-57
SLIDE 57

Useful lemma: higher dimensions

Lemma: For any 𝑒 scoring rules and any IP, a set β„‹ of 𝑃 (# variables)πœ†+2 hyperplanes partitions 0,1 𝑒 s.t.: For any connected component 𝑆 of 0,1 𝑒 βˆ– β„‹, B&B builds the same tree across all 𝝂 ∈ 𝑆

57

slide-58
SLIDE 58

Learning-theoretic guarantees

Fix 𝑒 scoring rules and draw samples 𝑅1, … , 𝑅𝑂~𝒠 If 𝑂 = ΰ·¨ 𝑃

πœ†3 𝜁2 ln(𝑒 βˆ™ #variables) , then w.h.p., for all 𝝂 ∈ [0,1]𝑒,

1 𝑂 ෍

𝑗=1 𝑂

treeβˆ’size(𝑅𝑗, 𝝂) βˆ’ 𝔽𝑅~𝒠[treeβˆ’size(𝑅, 𝝂)] < 𝜁 Average tree size generalizes to expected tree size

58

slide-59
SLIDE 59

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • 4. Experiments
  • 5. Conclusion and Future Directions

59

slide-60
SLIDE 60

Experiments: Tuning the linear rule

Let: score1 𝑅, 𝑗 = min 𝑑𝑅 βˆ’ 𝑑𝑅𝑗

βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 +

score2 𝑅, 𝑗 = max 𝑑𝑅 βˆ’ 𝑑𝑅𝑗

βˆ’, 𝑑𝑅 βˆ’ 𝑑𝑅𝑗 +

Our parameterized rule Branch on variable 𝑦𝑗 maximizing: score 𝑅, 𝑗 = 𝜈 βˆ™ score1 𝑅, 𝑗 + (1 βˆ’ 𝜈) βˆ™ score2 𝑅, 𝑗 This is the linear rule [Linderoth & Savelsbergh, 1999]

60

slide-61
SLIDE 61

Experiments: Combinatorial auctions

Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction

  • algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000.

β€œRegions” generator: 400 bids, 200 goods, 100 instances β€œArbitrary” generator: 200 bids, 100 goods, 100 instances

61

slide-62
SLIDE 62

Additional experiments

Facility location: 70 facilities, 70 customers, 500 instances Clustering: 5 clusters, 35 nodes, 500 instances Agnostically learning linear separators: 50 points in ℝ2, 500 instances

62

slide-63
SLIDE 63

Outline

  • 1. Introduction
  • 2. Branch-and-Bound
  • 3. Learning algorithms
  • 4. Experiments
  • 5. Conclusion and Future Directions

63

slide-64
SLIDE 64

Conclusion

  • Study B&B, a widely-used algorithm for combinatorial problems
  • Show how to use ML to weight variable selection rules
  • First sample complexity bounds for tree search algorithm configuration
  • Unlike prior work [Khalil et al. β€˜16; Alvarez et al. β€˜17], which is purely empirical
  • Empirically show our approach can dramatically shrink tree size
  • We prove this improvement can even be exponential
  • Theory applies to other tree search algos, e.g., for solving CSPs

64

slide-65
SLIDE 65

Future directions

How can we train faster?

  • Don’t want to build every tree B&B will make for every training instance
  • Train on small IPs and then apply the learned policies on large IPs?

Other tree-building applications can we apply our techniques to?

  • E.g., building decision trees and taxonomies

How can we attack other learning problems in B&B?

  • E.g., node-selection policies

65

Thank you! Questions?