Learning to Branch Ellen Vitercik Joint work with Nina Balcan, - PowerPoint PPT Presentation

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1

Integer Programs (IPs) a 𝒅 ∙ 𝒚 maximize 𝐵𝒚 ≤ 𝒄 subject to 𝒚 ∈ {0,1} 𝑜 2

Facility location problems can be formulated as IPs. 3

Clustering problems can be formulated as IPs. 4

Binary classification problems can be formulated as IPs. 5

Integer Programs (IPs) a 𝒅 ∙ 𝒚 maximize 𝐵𝒚 = 𝒄 subject to 𝒚 ∈ {0,1} 𝑜 NP-hard 6

Branch and Bound (B&B) • Most widely-used algorithm for IP-solving (CPLEX, Gurobi) • Recursively partitions search space to find an optimal solution • Organizes partition as a tree • Many parameters • CPLEX has a 221-page manual describing 135 parameters “You may need to experiment.” 7

Why is tuning B&B parameters important? • Save time • Solve more problems • Find better solutions 8

B&B in the real world Delivery company routes trucks daily Use integer programming to select routes Demand changes every day Solve hundreds of similar optimizations Using this set of typical problems… can we learn best parameters? 9

Model 𝐵 1 , 𝒄 1 , 𝒅 1 , … , 𝐵 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 Application- Specific B&B parameters Algorithm Distribution Designer How to use samples to find best B&B parameters for my domain? 10

Model 𝐵 1 , 𝒄 1 , 𝒅 1 , … , 𝐵 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 Application- Specific B&B parameters Algorithm Distribution Designer Model has been studied in applied communities [Hutter et al. ‘09] 11

Model 𝐵 1 , 𝒄 1 , 𝒅 1 , … , 𝐵 𝑛 , 𝒄 𝑛 , 𝒅 𝑛 Application- Specific B&B parameters Algorithm Distribution Designer Model has been studied from a theoretical perspective [Gupta and Roughgarden ‘16, Balcan et al., ‘17] 12

Model 1. Fix a set of B&B parameters to optimize 2. Receive sample problems from unknown distribution 𝐵 1 , 𝒄 1 , 𝒅 1 𝐵 2 , 𝒄 2 , 𝒅 2 3. Find parameters with the best performance on the samples “Best” could mean smallest search tree, for example 13

Questions to address How to find parameters that are best on average over samples? 𝐵 1 , 𝒄 1 , 𝒅 1 𝐵 2 , 𝒄 2 , 𝒅 2 𝐵, 𝒄, 𝒅 ? Will those parameters have high performance in expectation? 14

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 15

(40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 𝒚 ∈ {0,1} 7 16

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 17

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 1. Choose leaf of tree 18

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 19

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 20

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 𝑦 2 = 0 𝑦 2 = 1 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 2 , 1 3 120 120 21

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 𝑦 2 = 0 𝑦 2 = 1 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 2 , 1 3 120 120 22

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 3 133.3 116 120 120 23

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 3 133.3 116 120 120 24

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 3 133.3 116 120 120 𝑦 3 = 0 𝑦 3 = 1 0, 4 5 , 1, 0, 0, 0, 1 0, 1, 0, 1, 1, 0, 1 133 118 25

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 3. Fathom leaf if: i. LP relaxation solution is 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 integral ii. LP relaxation is infeasible 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 iii. LP relaxation solution 3 isn’t better than best - 133.3 116 120 120 known integral solution 𝑦 3 = 0 𝑦 3 = 1 0, 4 5 , 1, 0, 0, 0, 1 0, 1, 0, 1, 1, 0, 1 133 118 26

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 3. Fathom leaf if: i. LP relaxation solution 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 is integral ii. LP relaxation is infeasible 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 iii. LP relaxation solution 3 isn’t better than best - 133.3 116 120 120 known integral solution 𝑦 3 = 0 𝑦 3 = 1 0, 4 5 , 1, 0, 0, 0, 1 0, 1, 0, 1, 1, 0, 1 133 118 Integral 27

1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 B&B 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 140 𝒚 ∈ {0,1} 7 𝑦 1 = 0 𝑦 1 = 1 1. Choose leaf of tree 1 3 0, 1, 0, 1, 0, 4 , 1 1, 5 , 0, 0, 0, 0, 1 2. Branch on a variable 135 136 3. Fathom leaf if: i. LP relaxation solution is 𝑦 6 = 0 𝑦 6 = 1 𝑦 2 = 0 𝑦 2 = 1 integral ii. LP relaxation is infeasible 0, 1, 1 0, 3 1, 0, 0, 1, 0, 1 1, 1, 0, 0, 0, 0, 1 3 , 1, 0, 0, 1 5 , 0, 0, 0, 1, 1 2 , 1 iii. LP relaxation solution 3 isn’t better than best - 133.3 116 120 120 known integral solution 𝑦 3 = 0 𝑦 3 = 1 0, 4 5 , 1, 0, 0, 0, 1 0, 1, 0, 1, 1, 0, 1 133 118 28

B&B 1. Choose leaf of tree This talk: How to choose which variable? 2. Branch on a variable (Assume every other aspect of B&B is fixed.) 3. Fathom leaf if: i. LP relaxation solution is integral ii. LP relaxation is infeasible iii. LP relaxation solution isn’t better than best - known integral solution 29

Variable selection policies can have a huge effect on tree size 30

Outline 1. Introduction 2. Branch-and-Bound a. Algorithm Overview b. Variable Selection Policies 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 31

Variable selection policies (VSPs) 1, 3 5 , 0, 0, 0, 0, 1 Score-based VSP: 136 At leaf 𝑹 , branch on variable 𝒚 𝒋 maximizing 𝐭𝐝𝐩𝐬𝐟 𝑹, 𝒋 𝑦 2 = 0 𝑦 2 = 1 1 1 1, 0, 0, 1, 0, 2 , 1 1, 1, 0, 0, 0, 0, 3 120 120 Many options! Little known about which to use when 32

Variable selection policies For an IP instance 𝑅 : • Let 𝑑 𝑅 be the objective value of its LP relaxation − be 𝑅 with 𝑦 𝑗 set to 0, and let 𝑅 𝑗 + be 𝑅 with 𝑦 𝑗 set to 1 • Let 𝑅 𝑗 Example. 1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 𝑅 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 𝑑 𝑅 𝒚 ∈ {0,1} 7 140 33

Variable selection policies For an IP instance 𝑅 : • Let 𝑑 𝑅 be the objective value of its LP relaxation − be 𝑅 with 𝑦 𝑗 set to 0, and let 𝑅 𝑗 + be 𝑅 with 𝑦 𝑗 set to 1 • Let 𝑅 𝑗 Example. 1 (40, 60, 10, 10, 3, 20, 60) ∙ 𝒚 max 2 , 1, 0, 0, 0, 0, 1 𝑅 40, 50, 30, 10, 10, 40, 30 ∙ 𝒚 ≤ 100 s.t. 𝑑 𝑅 𝒚 ∈ {0,1} 7 140 𝑦 1 = 0 𝑦 1 = 1 0, 1, 0, 1, 0, 1 1, 3 4 , 1 5 , 0, 0, 0, 0, 1 𝑑 𝑅 1 𝑑 𝑅 1 − 135 136 + 34

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, - PowerPoint PPT Presentation

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1 Integer Programs (IPs) a maximize subject to {0,1} 2 Facility location

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Park Heights Branch Library Library Characteristics Orleans Street Branch Model Waverly Branch

Branch Branch out and become part of our wider network About us ... The Derbyshire

Town Branch Commons Town Branch History Lexington is situated on the Town Branch of the

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Walz Branch Detroit-Shoreway Neighborhood facilities master plan Branch Information branch

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

New Hampshire Judicial Branch Presentation to House Finance February 21, 2013 Judicial Branch

Personnel Selection Branch Association Maj Dan Tanguay President PSOA PSel Branch Council

New Hampshire Judicial Branch Presentation to House Finance January 15, 2013 State of New

BRIDGE CHRIS SHIELDS - JUNE 2016 THE LOCATION 2 IPWEAQ CQ BRANCH CONFERENCE THE LOCATION 3

Descent sets for oscillating tableaux Martin Rubey 1 Bruce Sagan 2 Bruce Westbury 3 1 TU Wien 2

French/UK Grid Grid Workshop Workshop French/UK Grid Workshop French/UK London, November

Threads & GC in Clozure CL R. Matthew Emerson rme@clozure.com Clozure Associates Threads

Laboratoire Exploration et recherche en Dtection LED Agence Nationale de la Scurit des

HCMDSS/MD PnP, Boston, 26 June 2007 Accidental Systems John Rushby Computer Science Laboratory

Overview of the Decommissioning and Low Level Waste Business Line October 1, 2015 1 Ope

KISTI activities and plans Global al experim rimen ent t Science ce Data hub Center Jin Ki

We understand that different people have different understandings for the meaning of the word

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, - PowerPoint PPT Presentation

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1 Integer Programs (IPs) a maximize subject to {0,1} 2 Facility location

Title node 1 branch 1 branch 2 node 2 root branch 3 node 3 branch 4 node 4 Title node

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

California State Disability Insurance 2012 EDD Unemploy. Policy Public Work. Disability

1 Predictor for a Single Branch Branch History Table of 1-bit Predictor BHT also Called Branch

Branch-and-Bound Math 482, Lecture 33 Misha Lavrov April 27, 2020 Branch-and-bound methods

GOPIPURA 2649 SBIN02649 SURAT MAIN (CHOWK BAZAR) 488 SBIN00488 2 AHMEDABAD AMBHETHA 4075

Park Heights Branch Library Library Characteristics Orleans Street Branch Model Waverly Branch

Branch Branch out and become part of our wider network About us ... The Derbyshire

Town Branch Commons Town Branch History Lexington is situated on the Town Branch of the

The New Rules of Bank Transformation Branch transformation strategy expert Branch

Walz Branch Detroit-Shoreway Neighborhood facilities master plan Branch Information branch

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

New Hampshire Judicial Branch Presentation to House Finance February 21, 2013 Judicial Branch

Personnel Selection Branch Association Maj Dan Tanguay President PSOA PSel Branch Council

New Hampshire Judicial Branch Presentation to House Finance January 15, 2013 State of New

BRIDGE CHRIS SHIELDS - JUNE 2016 THE LOCATION 2 IPWEAQ CQ BRANCH CONFERENCE THE LOCATION 3

Descent sets for oscillating tableaux Martin Rubey 1 Bruce Sagan 2 Bruce Westbury 3 1 TU Wien 2

French/UK Grid Grid Workshop Workshop French/UK Grid Workshop French/UK London, November

Threads &amp; GC in Clozure CL R. Matthew Emerson rme@clozure.com Clozure Associates Threads

Laboratoire Exploration et recherche en Dtection LED Agence Nationale de la Scurit des

HCMDSS/MD PnP, Boston, 26 June 2007 Accidental Systems John Rushby Computer Science Laboratory

Overview of the Decommissioning and Low Level Waste Business Line October 1, 2015 1 Ope

KISTI activities and plans Global al experim rimen ent t Science ce Data hub Center Jin Ki

We understand that different people have different understandings for the meaning of the word

Threads & GC in Clozure CL R. Matthew Emerson rme@clozure.com Clozure Associates Threads