Safe Grid Search with Optimal Complexity Joseph Salmon - PowerPoint PPT Presentation

Safe Grid Search with Optimal Complexity Joseph Salmon http://josephsalmon.eu IMAG, Univ Montpellier, CNRS Montpellier, France Joint work with: E. Ndiaye (RIKEN, Nagoya) T. Le (RIKEN, Tokyo) O. Fercoq (Institut Polytechnique de Paris) I. Takeuchi (Nagoya Institute of Technology) 1 / 22

Simplest model: standard sparse regression y P R n : a signal X “ r x 1 , . . . , x p s P R n ˆ p : dictionary of atoms/features Assumption : signal well approximated by a sparse combination β ˚ P R p : y « Xβ ˚ Objective(s): find ˆ β β ˚ » fi » fi » fi 1 § Estimation: ˆ β « ˆ β ˚ . . « ¨ – y – x 1 . . . x p — ffi . fl fl § Prediction: X ˆ β « X ˆ – fl β ˚ β ˚ p § Support recovery: lo omo on loooooooomoooooooon lo omo on y P R n X P R n ˆ p β P R p supp p ˆ β q « supp p β ˚ q p ÿ β ˚ y « j x j Constraints: large p , sparse β ˚ j “ 1 2 / 22

The ℓ 1 penalty: Lasso and variants Vocabulary: the “Modern least squares” Candès et al. (2008) § Statistics: Lasso Tibshirani (1996) § Signal processing variant: Basis Pursuit Chen et al. (1998) ˆ 1 ˙ β p λ q P arg min ˆ 2 } y ´ Xβ } 2 ` λ } β } 1 β P R p loooooomoooooon lo omo on data fitting term sparsity-inducing penalty § Solutions are sparse (sparsity level controlled by λ ) 3 / 22

The ℓ 1 penalty: Lasso and variants Vocabulary: the “Modern least squares” Candès et al. (2008) § Statistics: Lasso Tibshirani (1996) § Signal processing variant: Basis Pursuit Chen et al. (1998) ˆ 1 ˙ β p λ q P arg min ˆ 2 } y ´ Xβ } 2 ` λ } β } 1 β P R p loooooomoooooon lo omo on data fitting term sparsity-inducing penalty § Solutions are sparse (sparsity level controlled by λ ) § Need to tune/choose λ (standard is Cross-Validation) 3 / 22

The ℓ 1 penalty: Lasso and variants Vocabulary: the “Modern least squares” Candès et al. (2008) § Statistics: Lasso Tibshirani (1996) § Signal processing variant: Basis Pursuit Chen et al. (1998) ˆ 1 ˙ β p λ q P arg min ˆ 2 } y ´ Xβ } 2 ` λ } β } 1 β P R p loooooomoooooon lo omo on data fitting term sparsity-inducing penalty § Solutions are sparse (sparsity level controlled by λ ) § Need to tune/choose λ (standard is Cross-Validation) § Theoretical guaranties Bickel et al. (2009) 3 / 22

The ℓ 1 penalty: Lasso and variants Vocabulary: the “Modern least squares” Candès et al. (2008) § Statistics: Lasso Tibshirani (1996) § Signal processing variant: Basis Pursuit Chen et al. (1998) ˆ 1 ˙ β p λ q P arg min ˆ 2 } y ´ Xβ } 2 ` λ } β } 1 β P R p loooooomoooooon lo omo on data fitting term sparsity-inducing penalty § Solutions are sparse (sparsity level controlled by λ ) § Need to tune/choose λ (standard is Cross-Validation) § Theoretical guaranties Bickel et al. (2009) § Refinements: non-convex approaches Adaptive Lasso Zou (2006), scaled invariance Sun and Zhang (2012), etc. 3 / 22

Well... many Lassos are needed 1 β p λ q P arg min ˆ 2 } y ´ Xβ } 2 2 ` λ } β } 1 β P R p In practice: Step 1 compute T solutions on a grid, i.e., compute β p λ 0 q , . . . , β p λ T ´ 1 q approximating ˆ β p λ 0 q , . . . , ˆ β p λ T ´ 1 q , for some λ 0 ą ¨ ¨ ¨ ą λ T ´ 1 Step 2 pick the “best” parameter Questions : § performance criterion: how to pick a “best” λ ? § cross-validation (and variant) § SURE (Stein Unbiased Risk Estimation) § etc. § grid choice: how to design the grid itself? 4 / 22

In practice: who does what? Standard grid: (R-glmnet / Python-sklearn): geometric grid p § λ 0 “ λ max : “ } X J y } 8 “ max j “ 1 x x j , y y (critical value) § λ t “ λ max ˆ 10 ´ δt {p T ´ 1 q , T “ 100 and δ “ 3 § λ T ´ 1 “ λ max { 10 3 : “ λ min Parameter’s choice: Python- sklearn : vanilla 5-fold Cross-Validation, get smallest mean squared error (averaged over folds) R- glmnet : vanilla 10-fold Cross-Validation, get largest λ such that the error is smaller than the mean squared error (averaged over folds) + 1 standard deviation 5 / 22

Hold-out cross-validation From now on : hold-out cross-validation (one single split) Standard choice: 80 % train p n train q , 20 % test p n test q § X “ X train Y X test § y “ y train Y y test § Change the error on test (validation): � β p λ q � E test p ˆ β p λ q q “ L p y test , X test ˆ β p λ q q : “ � y test ´ X test ˆ � � � ˆ 2 ˙ � β p λ q � � y test ´ X test ˆ or � � � 6 / 22

Some practical examples § leukemia (1) : n “ 72 , p “ 7129 (genes expression) y (binary) measure of disease § diabetes (2) : n “ 442 , p “ 10 (Age, Sex, Body mass index, Average blood pressure, S1, S2, S3, S4, S5, S6) y a quantitative measure of disease progression one year after baseline (1) https://sklearn.org/modules/generated/sklearn.datasets.fetch_mldata.html (2) https://scikit-learn.org/stable/datasets/index.html#diabetes-dataset 7 / 22

Example: Training / Testing ( leukemia ) 1 . 0 0 . 8 P λ ( β ) /P λ (0) 0 . 6 0 . 4 Exact: P λ (ˆ β ( λ ) ) 0 . 2 Exact shifted: P λ (ˆ β ( λ ) ) + ǫ λ min λ max Training β λ � 2 / � y test � 2 Exact 1 . 5 � y test − X test ˆ 1 . 0 0 . 5 λ min λ max Testing 8 / 22

Example: Training / Testing ( leukemia ) 1 . 0 0 . 8 P λ ( β ) /P λ (0) 0 . 6 Exact: P λ (ˆ β ( λ ) ) 0 . 4 Exact shifted: P λ (ˆ β ( λ ) ) + ǫ 0 . 2 Approximated: P λ ( β ( λ ) ) λ min λ max Training β λ � 2 / � y test � 2 Exact 1 . 5 Approx. � y test − X test ˆ 1 . 0 0 . 5 λ min λ max Testing 8 / 22

Example: Training / Testing ( diabetes ) 1 . 10 1 . 05 P λ ( β ) /P λ (0) 1 . 00 0 . 95 Exact: P λ (ˆ β ( λ ) ) 0 . 90 Exact shifted: P λ (ˆ β ( λ ) ) + ǫ λ min λ max Training β λ � 2 / � y test � 2 1 . 04 Exact 1 . 02 � y test − X test ˆ 1 . 00 0 . 98 λ min λ max Testing 9 / 22

Example: Training / Testing ( diabetes ) 1 . 10 1 . 05 P λ ( β ) /P λ (0) 1 . 00 Exact: P λ (ˆ β ( λ ) ) 0 . 95 Exact shifted: P λ (ˆ β ( λ ) ) + ǫ 0 . 90 Approximated: P λ ( β ( λ ) ) λ min λ max Training β λ � 2 / � y test � 2 1 . 04 Exact Approx. 1 . 02 � y test − X test ˆ 1 . 00 0 . 98 λ min λ max Testing 9 / 22

Hyperparameter tuning β p λ q P arg min ˆ § Learning Task: f p X train β q ` λ Ω p β q β P R p looooomooooon lo omo on 1 � β � 1 2 � X train β ´ y train � 2 E test p ˆ β p λ q q “ L p y test , X test ˆ β p λ q q § Evaluation: 3 . 6 Validation curve at 3 . 0 machine precision 3 . 4 2 . 9 � y test − X test β ( λ ) � 2 � y test − X test β ( λ ) � 2 3 . 2 3 . 0 2 . 8 2 . 8 2 . 7 2 . 6 2 . 6 2 . 4 Validation curve at machine precision 2 . 2 2 . 5 λ min λ max λ min λ max Regularization hyperparameter λ Regularization hyperparameter λ How to choose the grid of hyperparameter? 10 / 22

Hyperparameter tuning as bilevel optimization The “optimal” hyperparameter is given by ˆ E test p ˆ β p λ q q “ L p y test , X test ˆ β p λ q q λ P arg min λ Pr λ min ,λ max s β p λ q P arg min s.t. ˆ f p X train β q ` λ Ω p β q β P R p Challenges: § non-smooth and non-convex objective function § costly to evaluate E test p ˆ β p λ q q ( e.g., dense/continuous grid) 11 / 22

Safe Grid Search with Optimal Complexity Joseph Salmon - PowerPoint PPT Presentation

Safe Grid Search with Optimal Complexity Joseph Salmon http://josephsalmon.eu IMAG, Univ Montpellier, CNRS Montpellier, France Joint work with: E. Ndiaye (RIKEN, Nagoya) T. Le (RIKEN, Tokyo) O. Fercoq (Institut Polytechnique de Paris) I.

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Safe Grid Search with Optimal Complexity E. Ndiaye Riken AIP Joint work with: T. Le, O. Fercoq,

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Grid! Alison Fulford Housekeeping National Grid 2 Introductions National Grid 3 Workplace

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

GRID PHD GRID, PHD The Smart Grid Cyber Security and the Future of Keeping the Lights On The

& Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales

Outline n Introduction Proxy Dynamic Delegation in Grid Gateway n Is there the need for a

USQCD regional grid USQCD regional grid Report to ILDG 14 Report to ILDG 14 US Grid Usage US

Susitna River Chinook Salmon Escapement Goals 1 Recommended Northern Cook Inlet King Salmon

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

Council Meeting March 18, 2019 Sandy Watershed Learning Center Council Development

Definition (Oxford Dic): A young salmon (or trout) after the parr stage, when it becomes silvery

4000 King, 20 (15

Pattern Recognition 2 1 3 Perceptrons by M.L. Minsky and S.A. Papert (1969) 4 Books: Pattern

Online Joint GlueX-EIC-PANDA Machine Learning Workshop Machine Learning for Beginners Thomas

South Campus Continuous Learning Teacher Information 5/4-5/8 Name: Bruce Callahan Mr.

Safe Grid Search with Optimal Complexity Joseph Salmon - PowerPoint PPT Presentation

Safe Grid Search with Optimal Complexity Joseph Salmon http://josephsalmon.eu IMAG, Univ Montpellier, CNRS Montpellier, France Joint work with: E. Ndiaye (RIKEN, Nagoya) T. Le (RIKEN, Tokyo) O. Fercoq (Institut Polytechnique de Paris) I.

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Safe Grid Search with Optimal Complexity E. Ndiaye Riken AIP Joint work with: T. Le, O. Fercoq,

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&amp;D on the Electric Grid 11/29/2011 Mark Nealon System Meter &amp; Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Grid! Alison Fulford Housekeeping National Grid 2 Introductions National Grid 3 Workplace

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

GRID PHD GRID, PHD The Smart Grid Cyber Security and the Future of Keeping the Lights On The

&amp; Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales

Outline n Introduction Proxy Dynamic Delegation in Grid Gateway n Is there the need for a

USQCD regional grid USQCD regional grid Report to ILDG 14 Report to ILDG 14 US Grid Usage US

Susitna River Chinook Salmon Escapement Goals 1 Recommended Northern Cook Inlet King Salmon

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi &lt; lg

Council Meeting March 18, 2019 Sandy Watershed Learning Center Council Development

Definition (Oxford Dic): A young salmon (or trout) after the parr stage, when it becomes silvery

4000 King, 20 (15

Pattern Recognition 2 1 3 Perceptrons by M.L. Minsky and S.A. Papert (1969) 4 Books: Pattern

Online Joint GlueX-EIC-PANDA Machine Learning Workshop Machine Learning for Beginners Thomas

South Campus Continuous Learning Teacher Information 5/4-5/8 Name: Bruce Callahan Mr.

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid

& Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg