Learning a classification of Mixed-Integer Quadratic Programming - PowerPoint PPT Presentation

Learning a classification of Mixed-Integer Quadratic Programming problems 22 nd Combinatorial Optimization Workshop · Aussois · January 12, 2018 Pierre Bonami 1 , Andrea Lodi 2 , Giulia Zarpellon 2 1 CPLEX Optimization, IBM Spain 2 Polytechnique Montr´ eal, GERAD, CERC Data Science for real-time Decision Making

https://www.gerad.ca/en/papers/G-2017-106

Table of contents 1. A classification question on MIQPs 2. Data and experiments 3. Two working questions 1

A classification question on MIQPs

Mixed-Integer Quadratic Programming problems We consider Mixed-Integer Quadratic Programming (MIQP) pbs. 1 2 x T Qx + c T x min Ax = b (1) x i ∈ { 0 , 1 } ∀ i ∈ I l ≤ x ≤ u • Modeling of practical applications • First extension of linear algorithms into nonlinear ones We say an MIQP is convex (resp. nonconvex ) if and only if the matrix Q is positive semi-definite, Q � 0 (resp. indefinite, Q � 0). IBM-CPLEX solver can solve both convex and nonconvex MIQPs to proven optimality 2

Solving MIQPs with CPLEX Convex 0-1 Convex mixed NLP B&B NLP B&B Nonconvex 0-1 Nonconvex mixed convexify + NLP B&B Convexification is linearize + MILP B&B relaxation - Spatial B&B Convexification : augment diagonal of Q , using x i = x 2 i for x i ∈ { 0 , 1 } : x T Qx → x T ( Q + ρ I n ) x − ρ e T x , where Q + ρ I n � 0 for some ρ > 0 Linearization : replace q ij x i x j where x i ∈ { 0 , 1 } and l j ≤ x j ≤ u j with a new variable y ij and McCormick inequalities Linearization is always full in 0-1 case 3

Solving MIQPs with CPLEX Convex 0-1 Convex mixed NL: NLP B&B NL: NLP B&B L: linearize + MILP B&B L: linearize + MILP B&B linearize + NLP B&B Nonconvex 0-1 NL: convexify + NLP B&B L: linearize + MILP B&B Convexification : augment diagonal of Q , using x i = x 2 i for x i ∈ { 0 , 1 } : x T Qx → x T ( Q + ρ I n ) x − ρ e T x , where Q + ρ I n � 0 for some ρ > 0 Linearization : replace q ij x i x j where x i ∈ { 0 , 1 } and l j ≤ x j ≤ u j with a new variable y ij and McCormick inequalities Linearization is always full in 0-1 MIQP (may not for mixed ones) 4

Linearize vs. not linearize The linearization approach seems beneficial also for the convex case, but is linearizing always the best choice? Goal Exploit ML predictive machinery to understand whether it is favorable to linearize the quadratic part of an MIQP or not. • Learn an offline classifier predicting the most suited resolution approach within CPLEX framework, in an instance-specific way qtolin parameter controls the linearization switch • Gain methodological insights about which features of the MIQPs most affect the prediction 5

Data and experiments

Steps to apply supervised learning Dataset generation • Generator of MIQPs, spanning over various parameters • Q = sprandsym(size, density, eigenvalues) Features design • Static features (21) · Mathematical characteristics (variables, constraints, objective, spectrum, . . . ) • Dynamic features (2) · Early behavior in optimization process (bounds and times at root node) Labeling procedure • Consider tie cases · Labels in { L , NL , T } • 1h, 5 seeds · Solvability and consistency checks • Look at runtimes to assign a label { ( x k , y k ) } k =1 .. N where x k ∈ R d , y k ∈ { L , NL , T } for N MIQPs 6

Dataset D (nutshell) analysis • 2300 instances, n ∈ { 25 , 50 , . . . , 200 } , density d ∈ { 0 . 2 , 0 . 4 , . . . , 1 } • Multiclass classifiers : SVM and Decision Tree based methods (Random Forests (RF) · Extremely Randomized Trees (EXT) · Gradient Tree Boosting (GTB)) • Avoid overfitting with ML best practices • Tool : scikit-learn library 7

Learning results on D test Classifiers perform well with respect to traditional classification measures : D test - Multiclass - All features SVM RF EXT GTB Accuracy 0 . 85 0 . 89 0 . 84 0 . 87 Precision 0 . 82 0 . 85 0 . 81 0 . 85 Recall 0 . 85 0 . 89 0 . 84 0 . 87 F1 score 0 . 83 0 . 87 0 . 82 0 . 86 • A major difficulty is posed by the T class, (almost) always misclassified Binary setting : remove all tie cases · performance improved How relevant are ties with respect to the question L vs. NL? 8

Hints of features importance Ensemble methods based on Decision Trees provide an importance score for each feature. Top scoring ones are dynamic ft.s and those about eigenvalues: (dyn. ft.) • Difference of lower bounds for L and NL at root node (dyn. ft.) • Difference of resolution times of the root node, for L and NL • Value of smallest nonzero eigenvalue • Spectral norm of Q , i.e., � Q � = max i | λ i | • . . . Static features setting : remove dynamic features · performance slightly deteriorated How does the prediction change without information at root node? 9

Complementary optimization measures Need : evaluate classifiers’ performance in optimization terms, and quantify the gain with respect to CPLEX default ( DEF ). • For each test example, select runtime corresponding to the predicted label to build a times vector t clf for each classifier clf and DEF σ clf Sum of predicted runtimes : sum over times in t clf N σ clf Normalized time score ∈ [0 , 1]: shifted geometric mean of times in t clf , normalized between best and worst cases SVM RF EXT GTB DEF σ DEF /σ clf 3 . 88 4 . 40 4 . 04 4 . 26 − N σ clf 0 . 98 0 . 99 0 . 98 0 . 99 0 . 42 10

Two working questions

What about other datasets? • Selection from QPLIB · 24 instances • Part of CPLEX internal testbed · 175 instances 11

What about other datasets? • Selection from QPLIB · 24 instances • Part of CPLEX internal testbed · 175 instances Poor classification , but good optimization measures : SVM RF EXT GTB DEF σ DEF /σ clf 0 . 48 0 . 53 0 . 71 0 . 42 − N σ clf 0 . 75 0 . 90 0 . 91 0 . 74 0.96 12

Why those predictions? Convexification and linearization clearly affect • formulation size • formulation strength • implementation efficacy Each problem type might have its own decision function for the question L vs. NL , depending on, e.g., • | λ min | , . . . when convexifying • # nonzero products between continuous variables in Q , . . . when linearizing mixed instances • matrix conditioning and implementations, . . . ML could also provide deeper insights 13

Thanks! Questions? 13

Minimal references Bliek C, Bonami P, Lodi A (2014) Solving mixed-integer quadratic programming problems with IBM-CPLEX: a progress report. Bonami P, Kilinc ¸ M, Linderoth J (2012) Algorithms and software for convex mixed integer nonlinear programs. Fourer R. Quadratic Optimization Mysteries http://bob4er.blogspot.com/2015/03/quadratic-optimization- mysteries-part-1.html Bishop CM (2006) Pattern Recognition and Machine Learning.

Learning a classification of Mixed-Integer Quadratic Programming - PowerPoint PPT Presentation

Learning a classification of Mixed-Integer Quadratic Programming problems 22 nd Combinatorial Optimization Workshop Aussois January 12, 2018 Pierre Bonami 1 , Andrea Lodi 2 , Giulia Zarpellon 2 1 CPLEX Optimization, IBM Spain 2 Polytechnique

The quadratic formula You may recall the quadratic formula for roots of quadratic polynomials ax 2

Solving Quadratic Integer Programs: Small Improvements Changes Yield Big Improvements Yong Xia

Mixed Integer Linear Programming Combinatorial Problem Solving (CPS) Javier Larrosa Albert

From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming Matteo Fischetti,

Mixed Integer Programming: Algorithms and Applications Julia Borghoff Mykonos May 2012 1 / 46

Mixed Integer Linear Programming Combinatorial Problem Solving (CPS) Javier Larrosa Albert

Solving Mixed-Integer SDPs Marc Pfetsch, TU Darmstadt based on work together with Tristan Gally

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

Subdeterminants and Concave Integer Quadratic Programming Alberto Del Pia, University of

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Precision Training PAI Overview What is mixed-precision

The Feasibility Pump heuristic for Mixed-Integer Conic Programming Workshop on Discrepancy Theory

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Section3.3 Analyzing Graphs of Quadratic Functions Introduction Definitions A quadratic function

11. Quadratic forms and ellipsoids Quadratic forms Orthogonal decomposition Positive

Solving Quadratic Equations MCR3U: Functions Recall that to solve a quadratic equation means to

Well-posedness of the linearized plasma-vacuum interface problem in ideal incompressible MHD

Linearizing the Plenoptic Space Grgoire Nieto 1 , Frdric Devernay 1 , James Crowley 2 1 LJK,

Linearizing Intra-Train Beam-Beam Deflection Feedback Steve Smith SLAC Nanobeams 2002

Operations Research Applications of Linear Programming Ling-Chieh Kung Department of Information

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Control Part 2 Introduction

Chemistry 4010 Lecture 2: Linear stability analysis for ODEs Marc R. Roussel September 3, 2019

Algebraic Cryptanalysis of Round-Reduced Keccak with Linear Structures Meicheng Liu joint work

System Modeling, part 2 Marc Claesen February 18, 2015 Marc Claesen System Modeling, part 2