Screening Rules for Lasso with Non-Convex Sparse Regularizers - PowerPoint PPT Presentation

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Université de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18

Motivation and objective Lasso and screening ◮ Learning sparse regression models : X ∈ R n × d , y ∈ R n d 1 � 2 � y − Xw � 2 + λ min | w j | w =( w 1 ,...,w d ) ⊤ ∈ R d j =1 ◮ Safe screening rules (1), (2) : identify vanishing coordinates of a/the solution by exploiting sparsity, convexity and duality Extension to non-convex regularizers : 2.00 l1 logsum 1.75 ◮ non-convex regularizers lead to mcp 1.50 1.25 statistically better models but ... 1.00 0.75 ◮ how to perform screening when the 0.50 0.25 0.00 regularizer is non-convex ? −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (1). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. (2). Antoine Bonnefoy et al. “Dynamic screening : Accelerating first-order algorithms for the lasso and group-lasso”. In : IEEE Trans. Signal Process. 63.19 (2015), p. 5121-5132. 2 / 18

Non-convex sparse regression Non convex regularization : r λ ( · ) smooth & concave on [0 , ∞ [ d 1 2 � y − Xw � 2 + � min r λ ( | w j | ) w ∈ R d j =1 ◮ Log Sum penalty (LSP) (3) ◮ Smoothly Clipped Absolute Deviation (SCAD) (4) Examples : ◮ capped- ℓ 1 penalty (5) ◮ Minimax Concave Penalty (MCP) (6) Rem: for pros & cons of such formulations cf. Soubies et al. (7) (3). Emmanuel J Candès , Michael B Wakin et Stephen P Boyd . “Enhancing Sparsity by Reweighted l 1 Minimization”. In : J. Fourier Anal. Applicat. 14.5-6 (2008), p. 877-905. (4). Jianqing Fan et Runze Li . “Variable selection via nonconcave penalized likelihood and its oracle properties”. In : J. Amer. Statist. Assoc. 96.456 (2001), p. 1348-1360. (5). Tong Zhang . “Analysis of multi-stage convex relaxation for sparse regularization”. In : Journal of Machine Learning Research 11.Mar (2010), p. 1081-1107. (6). Cun-Hui Zhang . “Nearly unbiased variable selection under minimax concave penalty”. In : Ann. Statist. 38.2 (2010), p. 894-942. (7). E. Soubies , L. Blanc-Féraud et G. Aubert . “A Unified View of Exact Continuous Penalties for ℓ 2 - ℓ 0 Minimization”. In : SIAM J. Optim. 27.3 (2017), p. 2034-2060. 3 / 18

Majorization-Minimization Algorithm: Maximization minimization input : max. iterations k max , stopping criterion ǫ , α , w 0 (= 0) for k = 0 , . . . , k max − 1 do Break if stopping criterion smaller than ǫ λ k j ← r ′ λ ( | w k j | ) // Majorization w k ← arg min 2 � y − Xw � 2 1 // Minimization w ∈ R d d 2 α � w − w k � 2 + � + 1 λ k j | w j | j =1 return w k r λ ( | w j | ) ≤ r λ ( | w k j | ) + r ′ λ ( | w k j | )( | w j | − | w k Majorization : j | ) Minimization : weighted-Lasso formulation 2 α � w − w k � 2 acts as a regularization for MM (8) (other 1 Rem : majorization alternatives possible, e.g., with gradient information) (8). Yangyang Kang , Zhihua Zhang et Wu-Jun Li . “On the global convergence of majorization minimization algorithms for nonconvex optimization problems”. In : arXiv preprint arXiv :1504.07791 (2015). 4 / 18

Safe Screening / Two-level screening Safe Screening : for Lasso problems, vanishing coefficients at optimality can be certified without knowing the solution ◮ prior computation starting from a similar set of tuning parameter (sequential (9) / dual-warm start) ◮ along the optimization algorithm (dynamic (10) ) State-of-the-art safe screening rules : rely on duality gap (11) Two-level screening for non-convex cases : ◮ Inner level screening : within each (weighted) Lasso ◮ Outer level screening : propagate information between Lassos (9). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. (10). Antoine Bonnefoy et al. “Dynamic screening : Accelerating first-order algorithms for the lasso and group-lasso”. In : IEEE Trans. Signal Process. 63.19 (2015), p. 5121-5132. (11). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 5 / 18

Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 6 / 18

Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ , s ∈ R n , v ∈ R d Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 D Λ ( s , v ) � − 1 2 � s � 2 − α 2 � v � 2 + s ⊤ y − v ⊤ w k (Dual) | X ⊤ s − v | � Λ s.t. 6 / 18

Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ , s ∈ R n , v ∈ R d Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 D Λ ( s , v ) � − 1 2 � s � 2 − α 2 � v � 2 + s ⊤ y − v ⊤ w k (Dual) | X ⊤ s − v | � Λ s.t. G Λ ( w , s , v ) � P Λ ( w ) − D ( s , v ) (Dual-Gap) 6 / 18

Screening weighted Lasso ◮ Primal optimization problem P Λ ( w ) : d 2 � y − Xw � 2 + 1 1 2 α � w − w k � 2 + � w ← arg min ˜ λ j | w j | w ∈ R d j =1 Screening test : | x ⊤ j ˜ s − ˜ v j | < λ j = ⇒ ˜ w j = 0 (impractical) w − w k s � y − X ˜ w v � ˜ with ˜ ρ (Λ) , ˜ αρ (Λ) (for a scalar ρ (Λ) well chosen) ◮ (Practical) Dynamic Gap safe screening test (12), (13) : � � x j � + 1 � � | x ⊤ j s − v j | + 2 G Λ ( w , s , v ) < λ j α � �� T (Λ) ( w , s , v ) j given a primal-dual approximate solution triplet ( w , s , v ) (12). O. Fercoq , A. Gramfort et J. Salmon . “Mind the duality gap : safer rules for the lasso”. In : ICML . 2015, p. 333-342. (13). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 7 / 18

Inner level screening and speed-ups ◮ After iteration k , one receives approximate solutions w k , s k and v k for weighted Lasso with weights Λ k Set of screened variables : � � j ∈ � 1 , d � : T (Λ k ) S � ( w k , s k , v k ) < λ k j j ◮ Speed-ups : reduced weighted Lasso problem size substituting X ← X S c Rem : most beneficiary with coordinate descent type solvers 8 / 18

Outer screening level / screening propagation Before iteration k + 1 ◮ change of weights Λ k +1 = { λ k +1 } j =1 ,...,d j � � w k , y − Xw k ρ (Λ k +1 ) , w k +1 − w k ◮ update ( w k +1 , s k +1 , v k +1 ) ← ρ (Λ k +1 ) Screening propagation test √ √ 2 b ) + c + 1 T (Λ k ) 2 b < λ k +1 ( ˆ w , ˆ s , ˆ v ) + � x j � ( a + j j α with � s k +1 − s k � ≤ a | G Λ ( w k , s k , v k ) − G Λ k +1 ( w k +1 , s k +1 , v k +1 ) | ≤ b | v k +1 − v k j | ≤ c j Rem: same flavor as sequential screening (14) (14). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. 9 / 18

Screening Rules for Lasso with Non-Convex Sparse Regularizers - PowerPoint PPT Presentation

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Universit de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18 Motivation and objective Lasso and screening Learning

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

1st, choose a covariance model; 2nd, aprroximate the precision matrix Q ; 3rd, draw approximate

1 Gaussian Fun Facts Well add to these as we go along! First, consider a Gaussian random

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013

Coherent detection and reconstruction of burst events in S5 data S.Klimenko, University of

rt ts rs r

L EARNING FROM D ATA : D ETECTING C ONDITIONAL I NDEPENDENCIES AND S CORE +S EARCH M ETHODS Pedro

Pipeline MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic Analytics

Screening Rules for Lasso with Non-Convex Sparse Regularizers - PowerPoint PPT Presentation

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Universit de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18 Motivation and objective Lasso and screening Learning

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Screening Rules for Lasso with Non-Convex Sparse Regularizers A. Rakotomamonjy Joint work with G.

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Inf erence p enalis ee dans les mod` eles ` a vraisemblance non explicite par des

1st, choose a covariance model; 2nd, aprroximate the precision matrix Q ; 3rd, draw approximate

1 Gaussian Fun Facts Well add to these as we go along! First, consider a Gaussian random

Data Mining and Matrices 04 Matrix Completion Rainer Gemulla, Pauli Miettinen May 02, 2013

Coherent detection and reconstruction of burst events in S5 data S.Klimenko, University of

rt ts rs r

L EARNING FROM D ATA : D ETECTING C ONDITIONAL I NDEPENDENCIES AND S CORE +S EARCH M ETHODS Pedro

Pipeline MACH IN E LEARN IN G W ITH P YS PARK Andrew Collier Data Scientist, Exegetic Analytics

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and