Learning when to use a Decomposition Markus Kruber Marco L ubbecke - - PowerPoint PPT Presentation
Learning when to use a Decomposition Markus Kruber Marco L ubbecke - - PowerPoint PPT Presentation
Learning when to use a Decomposition Markus Kruber Marco L ubbecke Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 C.O.W. Aussois January 9, 2018 Machine Learning is Everywhere
Machine Learning is Everywhere
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 2/17
Supervised Learning: Classification
◮ data X
(xi )
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X
labels Y (xi, yi)
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X, d features, labels Y
(xi, yi) (φ(xi), yi) φ : X → Rd
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X, d features, labels Y
(xi, yi) (φ(xi), yi) φ : X → Rd algorithm
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X, d features, labels Y
(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error
xi∈X (f(φ(xi)), yi)
“small” “learns”
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X, d features, labels Y
(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error
xi∈X (f(φ(xi)), yi)
“small” “learns” an optimization problem
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Supervised Learning: Classification
◮ data X, d features, labels Y
(xi, yi) (φ(xi), yi) φ : X → Rd algorithm f : Rd → Y s.t. error
xi∈X (f(φ(xi)), yi)
“small” “learns” an optimization problem v a l i d a t e
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 3/17
Binary Classification: Dog or Muffin? Owl or Apple?
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 4/17
SCIP
◮ source-open MIP/MINLP (and much more) solver ◮ also: a branch-price-and-cut framework ◮ scip.zib.de
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 5/17
GCG
◮ extension to SCIP ◮ fully generic branch-price-and-cut solver ◮ automatically applies Dantzig-Wolfe reformulation to a MIP ◮ www.or.rwth-aachen.de/gcg
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 6/17
GCG automatically detects Structure, a lot!
◮ up to a few hundred or thousand decompositions per MIP ◮ GCG performance highly depends on whether MIP structure is
reflected by some decomposition (and we find/select it)
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 7/17
Automatic Reformulation in GCG
MIP P Detection DEC D2 DEC D1 DEC Dk . . . decomposition types border, staircase, . . . Select GCG internal score GCG
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 8/17
Automatic Reformulation in GCG
MIP P Detection DEC D2 DEC D1 DEC Dk . . . decomposition types border, staircase, . . . Select GCG internal score DWR? GCG SCIP learn learn no yes
this work: a supervised learning approach to select a best decomposition (or decide not to use a decomposition at all)
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 8/17
Supervised Learning Approach
remember: given data X, a classifier f predicts a label Y Y = f(X)
◮ f is a binary classifier if Y ∈ {0, 1} ◮ learn a classifier: find an fθ that fits best a training set
((xi, yi), i = 1, . . . , n) among a family (fθ, θ ∈ Θ)
◮ we use standard algorithms implemented in scikit-learn
data X:
◮ MIP P ◮ decomposition(s) D
labels Y :
◮ use SCIP or GCG? ◮ which decomposition?
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 9/17
Feature Map φ
input (P, D) features vector φ(P, D) ∈ Rd
- utput
fθ (φ (P, D)) feature map φ classifier fθ choose family Θ learn fθ define φ 80+ features
examples of features we used
“classics”
◮ # variables/constraints ◮ variable types ◮ constraint types ◮ products of features ◮ # linking vars/conss ◮ # blocks ◮ min, max, ⊘ block size ◮ detector used (indicator) ◮ detection quality metrics
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 10/17
Labeling: Definition of Y
question: should we use SCIP or GCG?
training set
◮ MIP P ◮ decompositions D ◮ SCIP run on each P ◮ GCG run on each (P, D)
given an input (P, D) we learn a binary classifier Y = f(φ(P, D)) where Y = 1 iff GCG on (P, D) is better than SCIP on P after a given timelimit:
◮ GCG solves P and SCIP doesn’t ◮ both solve P and GCG is faster ◮ neither solve P but GCG’s gap is smaller
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 11/17
Regression: Predict Decomposition Quality
question: if any, which decomposition should we use? f(φ(P, D)) ∈ [0, 1]: probability that GCG with D beats SCIP. given decompositions D1, . . . , Dk and remaining time t, use GCG if max
i
f(φ(P, Di)) ≥ α we use 0.5 < α ≤ 1: decomposition is not a default choice. if we use GCG, select decomposition arg max
i
f(φ(P, Di))
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 12/17
400 MIP Instances
SCIP structured non-str results all clr stcv cpmpsdlb ctst gapntlb ltsz bp rap stbl cvrp miplib instances 400 25 25 25 25 25 25 25 25 25 25 25 25 100
- pt. sol.
65.5% 19 3 18 10 25 23 25 25 6 12 22 6 68
- feas. sol. 31.5%
6 21 7 11
- 2
- 19 12
3 19 26 no sol. 3.0%
- 1
- 4
- 1
- 6
structured instances
coloring (clr) set covering (stcv) capacitated p-median (cpmp) survivable network design (sdlb) cutting stock (ctst) generalized assignment (gap) network design (ntlb) resource allocation (rap) capacitated vehicle routing (cvrp) lot sizing (ltsz) bin packing (bp) stable set (stbl)
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 13/17
Overall Performance on Training Data
◮ testset of 131 MIP instances, 99 structured, 32 unstructured ◮ GCG better than SCIP on 34 instances
Instances All Structured Non-structured Solver SCIP GCG us
- pt SCIP GCG
us
- pt SCIP GCG
us
- pt
No opt. sol. 52 66 44 39 39 37 31 26 13 29 14 13 CPU time (h) 111.3 142.6 93.1 85.7 83.5 82.2 65.9 58.5 27.8 56.8 29.2 27.2
- Geo. mean (s) 127.1 370.4 78.6 67.8 73.4 146.9 39.2 32.2 672.9 5145.0 766.0 646.5
◮ SCIP: apply default SCIP to all instances ◮ GCG: apply default GCG to all instances ◮ us: our supervised learning scheme ◮ opt: best decomposition selected each time
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 14/17
Accuracy: How often do we predict the right Solver?
avoid using GCG when we do not find an appropriate structure is GCG on (P, D) better than SCIP on P?
All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP TN FN 7 GCG FP TP 7
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 15/17
Accuracy: How often do we predict the right Solver?
avoid using GCG when we do not find an appropriate structure is GCG on (P, D) better than SCIP on P?
All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP 69.5% 12.3% 64.6% 11.1% 84.4% 6.3% GCG 4.5% 13.7% 4.1% 20.2% 6.3% 3.1%
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 15/17
Take-Away
◮ ML1 helps us deciding whether to use B&C or B&P ◮ don’t ask for reasons, this is ML
1this is not my name @mluebbecke · #aussois2018 · Learning when to use a Decomposition · 16/17
Learning when to use a Decomposition
Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018
@mluebbecke · #aussois2018 · Learning when to use a Decomposition · 17/17