screening rules for lasso with non convex sparse
play

Screening Rules for Lasso with Non-Convex Sparse Regularizers - PowerPoint PPT Presentation

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Universit de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18 Motivation and objective Lasso and screening Learning


  1. Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon http://josephsalmon.eu Université de Montpellier Joint work with A. Rakotomamonjy and G. Gasso 1 / 18

  2. Motivation and objective Lasso and screening ◮ Learning sparse regression models : X ∈ R n × d , y ∈ R n d 1 � 2 � y − Xw � 2 + λ min | w j | w =( w 1 ,...,w d ) ⊤ ∈ R d j =1 ◮ Safe screening rules (1), (2) : identify vanishing coordinates of a/the solution by exploiting sparsity, convexity and duality Extension to non-convex regularizers : 2.00 l1 logsum 1.75 ◮ non-convex regularizers lead to mcp 1.50 1.25 statistically better models but ... 1.00 0.75 ◮ how to perform screening when the 0.50 0.25 0.00 regularizer is non-convex ? −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 (1). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. (2). Antoine Bonnefoy et al. “Dynamic screening : Accelerating first-order algorithms for the lasso and group-lasso”. In : IEEE Trans. Signal Process. 63.19 (2015), p. 5121-5132. 2 / 18

  3. Non-convex sparse regression Non convex regularization : r λ ( · ) smooth & concave on [0 , ∞ [ d 1 2 � y − Xw � 2 + � min r λ ( | w j | ) w ∈ R d j =1 ◮ Log Sum penalty (LSP) (3) ◮ Smoothly Clipped Absolute Deviation (SCAD) (4) Examples : ◮ capped- ℓ 1 penalty (5) ◮ Minimax Concave Penalty (MCP) (6) Rem: for pros & cons of such formulations cf. Soubies et al. (7) (3). Emmanuel J Candès , Michael B Wakin et Stephen P Boyd . “Enhancing Sparsity by Reweighted l 1 Minimization”. In : J. Fourier Anal. Applicat. 14.5-6 (2008), p. 877-905. (4). Jianqing Fan et Runze Li . “Variable selection via nonconcave penalized likelihood and its oracle properties”. In : J. Amer. Statist. Assoc. 96.456 (2001), p. 1348-1360. (5). Tong Zhang . “Analysis of multi-stage convex relaxation for sparse regularization”. In : Journal of Machine Learning Research 11.Mar (2010), p. 1081-1107. (6). Cun-Hui Zhang . “Nearly unbiased variable selection under minimax concave penalty”. In : Ann. Statist. 38.2 (2010), p. 894-942. (7). E. Soubies , L. Blanc-Féraud et G. Aubert . “A Unified View of Exact Continuous Penalties for ℓ 2 - ℓ 0 Minimization”. In : SIAM J. Optim. 27.3 (2017), p. 2034-2060. 3 / 18

  4. Majorization-Minimization Algorithm: Maximization minimization input : max. iterations k max , stopping criterion ǫ , α , w 0 (= 0) for k = 0 , . . . , k max − 1 do Break if stopping criterion smaller than ǫ λ k j ← r ′ λ ( | w k j | ) // Majorization w k ← arg min 2 � y − Xw � 2 1 // Minimization w ∈ R d d 2 α � w − w k � 2 + � + 1 λ k j | w j | j =1 return w k r λ ( | w j | ) ≤ r λ ( | w k j | ) + r ′ λ ( | w k j | )( | w j | − | w k Majorization : j | ) Minimization : weighted-Lasso formulation 2 α � w − w k � 2 acts as a regularization for MM (8) (other 1 Rem : majorization alternatives possible, e.g., with gradient information) (8). Yangyang Kang , Zhihua Zhang et Wu-Jun Li . “On the global convergence of majorization minimization algorithms for nonconvex optimization problems”. In : arXiv preprint arXiv :1504.07791 (2015). 4 / 18

  5. Safe Screening / Two-level screening Safe Screening : for Lasso problems, vanishing coefficients at optimality can be certified without knowing the solution ◮ prior computation starting from a similar set of tuning parameter (sequential (9) / dual-warm start) ◮ along the optimization algorithm (dynamic (10) ) State-of-the-art safe screening rules : rely on duality gap (11) Two-level screening for non-convex cases : ◮ Inner level screening : within each (weighted) Lasso ◮ Outer level screening : propagate information between Lassos (9). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. (10). Antoine Bonnefoy et al. “Dynamic screening : Accelerating first-order algorithms for the lasso and group-lasso”. In : IEEE Trans. Signal Process. 63.19 (2015), p. 5121-5132. (11). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 5 / 18

  6. Safe Screening / Two-level screening Safe Screening : for Lasso problems, vanishing coefficients at optimality can be certified without knowing the solution ◮ prior computation starting from a similar set of tuning parameter (sequential (9) / dual-warm start) ◮ along the optimization algorithm (dynamic (10) ) State-of-the-art safe screening rules : rely on duality gap (11) Two-level screening for non-convex cases : ◮ Inner level screening : within each (weighted) Lasso ◮ Outer level screening : propagate information between Lassos (9). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. (10). Antoine Bonnefoy et al. “Dynamic screening : Accelerating first-order algorithms for the lasso and group-lasso”. In : IEEE Trans. Signal Process. 63.19 (2015), p. 5121-5132. (11). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 5 / 18

  7. Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 6 / 18

  8. Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ , s ∈ R n , v ∈ R d Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 D Λ ( s , v ) � − 1 2 � s � 2 − α 2 � v � 2 + s ⊤ y − v ⊤ w k (Dual) | X ⊤ s − v | � Λ s.t. 6 / 18

  9. Notation Notation : X = [ x 1 , . . . , x d ] , Λ = ( λ 1 , . . . , λ d ) ⊤ , s ∈ R n , v ∈ R d Inner (convex) problems : d 2 � y − Xw � 2 + 1 2 α � w − w k � 2 + � P Λ ( w ) � 1 (Primal) λ j | w j | j =1 D Λ ( s , v ) � − 1 2 � s � 2 − α 2 � v � 2 + s ⊤ y − v ⊤ w k (Dual) | X ⊤ s − v | � Λ s.t. G Λ ( w , s , v ) � P Λ ( w ) − D ( s , v ) (Dual-Gap) 6 / 18

  10. Screening weighted Lasso ◮ Primal optimization problem P Λ ( w ) : d 2 � y − Xw � 2 + 1 1 2 α � w − w k � 2 + � w ← arg min ˜ λ j | w j | w ∈ R d j =1 Screening test : | x ⊤ j ˜ s − ˜ v j | < λ j = ⇒ ˜ w j = 0 (impractical) w − w k s � y − X ˜ w v � ˜ with ˜ ρ (Λ) , ˜ αρ (Λ) (for a scalar ρ (Λ) well chosen) ◮ (Practical) Dynamic Gap safe screening test (12), (13) : � � x j � + 1 � � | x ⊤ j s − v j | + 2 G Λ ( w , s , v ) < λ j α � �� � T (Λ) ( w , s , v ) j given a primal-dual approximate solution triplet ( w , s , v ) (12). O. Fercoq , A. Gramfort et J. Salmon . “Mind the duality gap : safer rules for the lasso”. In : ICML . 2015, p. 333-342. (13). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 7 / 18

  11. Screening weighted Lasso ◮ Primal optimization problem P Λ ( w ) : d 2 � y − Xw � 2 + 1 1 2 α � w − w k � 2 + � w ← arg min ˜ λ j | w j | w ∈ R d j =1 Screening test : | x ⊤ j ˜ s − ˜ v j | < λ j = ⇒ ˜ w j = 0 (impractical) w − w k s � y − X ˜ w v � ˜ with ˜ ρ (Λ) , ˜ αρ (Λ) (for a scalar ρ (Λ) well chosen) ◮ (Practical) Dynamic Gap safe screening test (12), (13) : � � x j � + 1 � � | x ⊤ j s − v j | + 2 G Λ ( w , s , v ) < λ j α � �� � T (Λ) ( w , s , v ) j given a primal-dual approximate solution triplet ( w , s , v ) (12). O. Fercoq , A. Gramfort et J. Salmon . “Mind the duality gap : safer rules for the lasso”. In : ICML . 2015, p. 333-342. (13). E. Ndiaye et al. “Gap Safe screening rules for sparsity enforcing penalties”. In : Journal of Machine Learning Research 18.128 (2017), p. 1-33. 7 / 18

  12. Inner level screening and speed-ups ◮ After iteration k , one receives approximate solutions w k , s k and v k for weighted Lasso with weights Λ k Set of screened variables : � � j ∈ � 1 , d � : T (Λ k ) S � ( w k , s k , v k ) < λ k j j ◮ Speed-ups : reduced weighted Lasso problem size substituting X ← X S c Rem : most beneficiary with coordinate descent type solvers 8 / 18

  13. Outer screening level / screening propagation Before iteration k + 1 ◮ change of weights Λ k +1 = { λ k +1 } j =1 ,...,d j � � w k , y − Xw k ρ (Λ k +1 ) , w k +1 − w k ◮ update ( w k +1 , s k +1 , v k +1 ) ← ρ (Λ k +1 ) Screening propagation test √ √ 2 b ) + c + 1 T (Λ k ) 2 b < λ k +1 ( ˆ w , ˆ s , ˆ v ) + � x j � ( a + j j α with � s k +1 − s k � ≤ a | G Λ ( w k , s k , v k ) − G Λ k +1 ( w k +1 , s k +1 , v k +1 ) | ≤ b | v k +1 − v k j | ≤ c j Rem: same flavor as sequential screening (14) (14). L. El Ghaoui , V. Viallon et T. Rabbani . “Safe feature elimination in sparse supervised learning”. In : Journal of Pacific Optimization (8 2012), p. 667-698. 9 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend