Learning when to use a Decomposition Markus Kruber Marco L ubbecke - - PowerPoint PPT Presentation

learning when to use a decomposition
SMART_READER_LITE
LIVE PREVIEW

Learning when to use a Decomposition Markus Kruber Marco L ubbecke - - PowerPoint PPT Presentation

Learning when to use a Decomposition Markus Kruber Marco L ubbecke Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 C.O.W. Aussois January 9, 2018 Machine Learning is Everywhere


slide-1
SLIDE 1

Learning when to use a Decomposition

Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018

slide-2
SLIDE 2

Machine Learning is Everywhere

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 2/17

slide-3
SLIDE 3

Supervised Learning: Classification

◮ data X

(xi )

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-4
SLIDE 4

Supervised Learning: Classification

◮ data X

labels Y (xi, yi)

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-5
SLIDE 5

Supervised Learning: Classification

◮ data X, d features, labels Y

(xi, yi) (φ(xi), yi) φ : X → Rd

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-6
SLIDE 6

Supervised Learning: Classification

◮ data X, d features, labels Y

(xi, yi) (φ(xi), yi) φ : X → Rd algorithm

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-7
SLIDE 7

Supervised Learning: Classification

◮ data X, d features, labels Y

(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error

xi∈X (f(φ(xi)), yi)

“small” “learns”

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-8
SLIDE 8

Supervised Learning: Classification

◮ data X, d features, labels Y

(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error

xi∈X (f(φ(xi)), yi)

“small” “learns” an optimization problem

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-9
SLIDE 9

Supervised Learning: Classification

◮ data X, d features, labels Y

(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error

xi∈X (f(φ(xi)), yi)

“small” “learns” an optimization problem v a l i d a t e

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17

slide-10
SLIDE 10

Binary Classification: Dog or Muffin? Owl or Apple?

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 4/17

slide-11
SLIDE 11

SCIP

◮ source-open MIP/MINLP (and much more) solver ◮ also: a branch-price-and-cut framework ◮ scip.zib.de

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 5/17

slide-12
SLIDE 12

GCG

◮ extension to SCIP ◮ fully generic branch-price-and-cut solver ◮ automatically applies Dantzig-Wolfe reformulation to a MIP ◮ www.or.rwth-aachen.de/gcg

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 6/17

slide-13
SLIDE 13

GCG automatically detects Structure, a lot!

◮ up to a few hundred or thousand decompositions per MIP ◮ GCG performance highly depends on whether MIP structure is

reflected by some decomposition (and we find/select it)

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 7/17

slide-14
SLIDE 14

Automatic Reformulation in GCG

MIP P Detection DEC D2 DEC D1 DEC Dk . . . decomposition types border, staircase, . . . Select GCG internal score GCG

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 8/17

slide-15
SLIDE 15

Automatic Reformulation in GCG

MIP P Detection DEC D2 DEC D1 DEC Dk . . . decomposition types border, staircase, . . . Select GCG internal score DWR? GCG SCIP learn learn no yes

this work: a supervised learning approach to select a best decomposition (or decide not to use a decomposition at all)

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 8/17

slide-16
SLIDE 16

Supervised Learning Approach

remember: given data X, a classifier f predicts a label Y Y = f(X)

◮ f is a binary classifier if Y ∈ {0, 1} ◮ learn a classifier: find an fθ that fits best a training set

((xi, yi), i = 1, . . . , n) among a family (fθ, θ ∈ Θ)

◮ we use standard algorithms implemented in scikit-learn

data X:

◮ MIP P ◮ decomposition(s) D

labels Y :

◮ use SCIP or GCG? ◮ which decomposition?

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 9/17

slide-17
SLIDE 17

Feature Map φ

input (P, D) features vector φ(P, D) ∈ Rd

  • utput

fθ (φ (P, D)) feature map φ classifier fθ choose family Θ learn fθ define φ 80+ features

examples of features we used

“classics”

◮ # variables/constraints ◮ variable types ◮ constraint types ◮ products of features ◮ # linking vars/conss ◮ # blocks ◮ min, max, ⊘ block size ◮ detector used (indicator) ◮ detection quality metrics

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 10/17

slide-18
SLIDE 18

Labeling: Definition of Y

question: should we use SCIP or GCG?

training set

◮ MIP P ◮ decompositions D ◮ SCIP run on each P ◮ GCG run on each (P, D)

given an input (P, D) we learn a binary classifier Y = f(φ(P, D)) where Y = 1 iff GCG on (P, D) is better than SCIP on P after a given timelimit:

◮ GCG solves P and SCIP doesn’t ◮ both solve P and GCG is faster ◮ neither solve P but GCG’s gap is smaller

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 11/17

slide-19
SLIDE 19

Regression: Predict Decomposition Quality

question: if any, which decomposition should we use? f(φ(P, D)) ∈ [0, 1]: probability that GCG with D beats SCIP. given decompositions D1, . . . , Dk and remaining time t, use GCG if max

i

f(φ(P, Di)) ≥ α we use 0.5 < α ≤ 1: decomposition is not a default choice. if we use GCG, select decomposition arg max

i

f(φ(P, Di))

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 12/17

slide-20
SLIDE 20

400 MIP Instances

SCIP structured non-str results all clr stcv cpmpsdlb ctst gapntlb ltsz bp rap stbl cvrp miplib instances 400 25 25 25 25 25 25 25 25 25 25 25 25 100

  • pt. sol.

65.5% 19 3 18 10 25 23 25 25 6 12 22 6 68

  • feas. sol. 31.5%

6 21 7 11

  • 2
  • 19 12

3 19 26 no sol. 3.0%

  • 1
  • 4
  • 1
  • 6

structured instances

coloring (clr) set covering (stcv) capacitated p-median (cpmp) survivable network design (sdlb) cutting stock (ctst) generalized assignment (gap) network design (ntlb) resource allocation (rap) capacitated vehicle routing (cvrp) lot sizing (ltsz) bin packing (bp) stable set (stbl)

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 13/17

slide-21
SLIDE 21

Overall Performance on Training Data

◮ testset of 131 MIP instances, 99 structured, 32 unstructured ◮ GCG better than SCIP on 34 instances

Instances All Structured Non-structured Solver SCIP GCG us

  • pt SCIP GCG

us

  • pt SCIP GCG

us

  • pt

No opt. sol. 52 66 44 39 39 37 31 26 13 29 14 13 CPU time (h) 111.3 142.6 93.1 85.7 83.5 82.2 65.9 58.5 27.8 56.8 29.2 27.2

  • Geo. mean (s) 127.1 370.4 78.6 67.8 73.4 146.9 39.2 32.2 672.9 5145.0 766.0 646.5

◮ SCIP: apply default SCIP to all instances ◮ GCG: apply default GCG to all instances ◮ us: our supervised learning scheme ◮ opt: best decomposition selected each time

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 14/17

slide-22
SLIDE 22

Accuracy: How often do we predict the right Solver?

avoid using GCG when we do not find an appropriate structure is GCG on (P, D) better than SCIP on P?

All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP TN FN 7 GCG FP TP 7

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 15/17

slide-23
SLIDE 23

Accuracy: How often do we predict the right Solver?

avoid using GCG when we do not find an appropriate structure is GCG on (P, D) better than SCIP on P?

All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP 69.5% 12.3% 64.6% 11.1% 84.4% 6.3% GCG 4.5% 13.7% 4.1% 20.2% 6.3% 3.1%

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 15/17

slide-24
SLIDE 24

Take-Away

◮ ML1 helps us deciding whether to use B&C or B&P ◮ don’t ask for reasons, this is ML

1this is not my name @mluebbecke · #aussois2018 · Learning when to use a Decomposition · 16/17

slide-25
SLIDE 25

Learning when to use a Decomposition

Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018

@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 17/17