Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , - PowerPoint PPT Presentation

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , Saverio Salzo 1 , Massimiliano Pontil 1 , 2 1 CSML - Istituto Italiano di Tecnologia 2 Dept of Computer Science - University College London Thirty-second Conference on Neural Information Processing Systems, Montreal, Canada Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 1 / 9

Linear Regression and Group Sparsity Problem: Predict y ∈ R N from X ∈ R N × P Linear Regression: Find w ∈ R P such that In many applications, few groups are relevant to predict y ⇒ Group Sparse w Predict psychiatric disorder from activities in regions of the brain Predict protein functions from their molecular composition Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 2 / 9

Group Lasso Given λ > 0 and a group-structure {G 1 , . . . , G L } , find L 1 2 � y − Xw � 2 + λ � w ∈ argmin ˆ � w G l � 2 , w ∈ R P l =1 5 G 1 10 20 G 2 G 3 G 4 0 Group-sparse solution ˆ w 30 40 G 5 50 -5 Limitation: The group-structure {G 1 , . . . , G L } may be unknown Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 3 / 9

Setting Setting: T Group Lasso problems with shared group-structure L 1 2 � y t − X t w t � 2 + λ � ( ∀ t ∈ { 1 , . . . , T } ) w t ( θ ) ∈ argmin ˆ � w t ⊙ θ l � 2 , w t ∈ R P l =1 encodes groups 5 10 10 20 20 0 30 30 40 40 50 50 -5 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 Goal: Estimation of the optimal group-structure θ ∗ Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 4 / 9

A Bilevel Programming Approach Upper-level Problem: T � [ θ 1 ··· θ L ] ∈ Θ U ( θ ) := E t ( ˆ w t ( θ )) ( e.g., validation error ) minimize t =1 � � where ˆ w ( θ ) = w 1 ( θ ) · · · ˆ ˆ w T ( θ ) solves Lower-level Problem: ( T Group Lasso problems) � � T L 1 2 � y t − X t w t � 2 + λ � � L ( w , θ ) := � θ l ⊙ w t � 2 minimize w ∈ R P × T t =1 l =1 Difficulties: w ( θ ) not available in closed form ˆ θ �→ ˆ w ( θ ) is nonsmooth [ ⇒ U is nonsmooth] Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 5 / 9

Approximate Bilevel Problem Upper-level Problem: T E t ( w ( K ) � [ θ 1 ··· θ L ] ∈ Θ U K ( θ ) := ( θ )) minimize t t =1 where w ( K ) ( θ ) → ˆ w t ( θ ) t Dual Algorithm: u (0) ( θ ) chosen arbitrarily for k = 0 , 1 , . . . , K − 1 � u ( k +1) ( θ ) = A ( u ( k ) ( θ ) , θ ) dual update w ( K ) ( θ ) · · · w ( K ) � � = B ( u ( K ) ( θ ) , θ ) ( θ ) primal dual relationship 1 T Goals: Find A and B smooth [ ⇒ w ( K ) is smooth ⇒ U K is smooth] Prove that the approximate bilevel scheme converges . Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 6 / 9

Contributions Bilevel Framework for Estimating the Group Lasso Structure Design of a Dual Forward-Backward Algorithm with Bregman Distances such that A and B are smooth ⇒ U K is smooth 1 � min U K → min U 2 argmin U K → argmin U Implementation of proxSAGA algorithm: nonconvex stochastic variant of θ ( q +1) = P Θ θ ( q ) − γ ∇U K ( θ ( q ) ) � � Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 7 / 9

Numerical Experiment Setting: T = 500 tasks, N = 25 noisy observations, P = 50 features. Estimate and group the features into, at most, L = 10 groups. 10 10 20 8 30 40 6 50 50 500 5000 1 2 3 4 5 6 7 8 9 10 Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 8 / 9

Conclusion Thank You Our poster AB #92 will be presented in Room 210 & 230 at 5pm Jordan Frecon, Saverio Salzo, Massimiliano Pontil NIPS 2018 9 / 9

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , - PowerPoint PPT Presentation

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , Saverio Salzo 1 , Massimiliano Pontil 1 , 2 1 CSML - Istituto Italiano di Tecnologia 2 Dept of Computer Science - University College London Thirty-second Conference on Neural

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming Matteo Fischetti,

Bilevel Knapsack with interdiction constraints Alberto Caprara 1 Margarida Carvalho 2 Andrea Lodi 1

New Branch-and-Cut Algorithms for Mixed-Integer Bilevel Linear Programs I. Ljubi c ESSEC

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

FACSIMILE: CODING AND FACSIMILE: CODING AND TRANSMISSION OF TRANSMISSION OF BILEVEL IMAGES

Bilevel Optimization, Pricing Problems and Stackelberg Games Martine Labb Computer Science

Intersection Cuts for Bilevel Optimization Matteo Fischetti, University of Padova Ivana Ljubic,

FACSIMILE: CODING AND FACSIMILE: CODING AND TRANSMISSION OF TRANSMISSION OF BILEVEL IMAGES

Bilevel Programming and the Separation Problem Andrea Lodi University of Bologna, Italy

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

TOTEM: status and prospects Emilio Radicioni INFN/CERN Overall Configuration @IP5 Forward

Playing Games with Counter Automata Antonn Ku cera Bordeaux, September 2012 Antonn Ku

Randomized Complexity Classes We allow TM to toss coins/throw dice etc. We write M(x,R) for

WPSE: F ORTIFYING W EB P ROTOCOLS VIA B ROWSER -S IDE S ECURITY M ONITORING Stefano Calzavara

Outline Notes: Simple waves, rarefaction waves Integral curves in phase plane

Prismatic Maps for the Topological Tverberg Conjecture Isaac Mabillard Joint work with Uli

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Projects (upcoming) Assess Portfolio Machine Learning for Trading Assess Learners &

Sambuz

Useful Links

Newsletter

Mail Us

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , - PowerPoint PPT Presentation

Bilevel Learning of the Group Lasso Structure Jordan Frecon 1 , Saverio Salzo 1 , Massimiliano Pontil 1 , 2 1 CSML - Istituto Italiano di Tecnologia 2 Dept of Computer Science - University College London Thirty-second Conference on Neural

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming Matteo Fischetti,

Bilevel Knapsack with interdiction constraints Alberto Caprara 1 Margarida Carvalho 2 Andrea Lodi 1

New Branch-and-Cut Algorithms for Mixed-Integer Bilevel Linear Programs I. Ljubi c ESSEC

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

FACSIMILE: CODING AND FACSIMILE: CODING AND TRANSMISSION OF TRANSMISSION OF BILEVEL IMAGES

Bilevel Optimization, Pricing Problems and Stackelberg Games Martine Labb Computer Science

Intersection Cuts for Bilevel Optimization Matteo Fischetti, University of Padova Ivana Ljubic,

FACSIMILE: CODING AND FACSIMILE: CODING AND TRANSMISSION OF TRANSMISSION OF BILEVEL IMAGES

Bilevel Programming and the Separation Problem Andrea Lodi University of Bologna, Italy

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

Using the lasso in Stata for inference in high-dimensional models David M. Drukker Executive

TOTEM: status and prospects Emilio Radicioni INFN/CERN Overall Configuration @IP5 Forward

Playing Games with Counter Automata Antonn Ku cera Bordeaux, September 2012 Antonn Ku

Randomized Complexity Classes We allow TM to toss coins/throw dice etc. We write M(x,R) for

WPSE: F ORTIFYING W EB P ROTOCOLS VIA B ROWSER -S IDE S ECURITY M ONITORING Stefano Calzavara

Outline Notes: Simple waves, rarefaction waves Integral curves in phase plane

Prismatic Maps for the Topological Tverberg Conjecture Isaac Mabillard Joint work with Uli

DRAM CONTROLLER Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Projects (upcoming) Assess Portfolio Machine Learning for Trading Assess Learners &amp;

Sambuz

Useful Links

Newsletter

Mail Us

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Projects (upcoming) Assess Portfolio Machine Learning for Trading Assess Learners &