Complexity Analysis of the Lasso Regularization Path Julien Mairal - PowerPoint PPT Presentation

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15

What this work is about another paper about the Lasso/Basis Pursuit [Tibshirani, 1996, Chen et al., 1999]: 1 2 � y − Xw � 2 min 2 + λ � w � 1 ; (1) w ∈ R p the first complexity analysis of the homotopy method [Ritter, 1962, Osborne et al., 2000, Efron et al., 2004] for solving (1); Some conclusions reminiscent of the simplex algorithm [Klee and Minty, 1972]; the SVM regularization path [G¨ artner, Jaggi, and Maria, 2010]. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 2/15

The Lasso Regularization Path and the Homotopy Under uniqueness assumption of the Lasso solution, the regularization path is piecewise linear: 1.5 w 1 w 2 1 coefficient values w 3 w 4 0.5 w 5 0 −0.5 0 1 2 3 4 λ Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 3/15

Our Main Results Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3 p + 1) / 2 linear segments. Proposition - approximate analysis There exists an ε -approximate path with O (1 / √ ε ) linear segments. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 4/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Recipe of the homotopy method - main ideas 1 finds a trivial solution w ⋆ ( λ ∞ ) = 0 with λ ∞ = � X ⊤ y � ∞ ; 2 compute the direction of the current linear segment of the path; 3 follow the direction of the path by decreasing λ ; 4 stop at the next “kink” and go back to 2. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Recipe of the homotopy method - main ideas 1 finds a trivial solution w ⋆ ( λ ∞ ) = 0 with λ ∞ = � X ⊤ y � ∞ ; 2 compute the direction of the current linear segment of the path; 3 follow the direction of the path by decreasing λ ; 4 stop at the next “kink” and go back to 2. Caveats kinks can be very close to each other; the direction of the path can involve ill-conditioned matrices; worst-case exponential complexity (main result of this work). Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

Worst case analysis Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3 p + 1) / 2 linear segments. Regularization path, p=6 2 Coefficients (log scale) 1 0 −1 100 200 300 Kinks Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 6/15

Worst case analysis Consider a Lasso problem ( y ∈ R n , X ∈ R n × p ). y in R n +1 and the matrix ˜ X in R ( n +1) × ( p +1) as follows: Define the vector ˜ � X � � � 2 α y y y △ X △ ˜ = , = , ˜ y n +1 0 α y n +1 where y n +1 � = 0 and 0 < α < λ 1 / (2 y ⊤ y + y 2 n +1 ). Adverserial strategy If the regularization path of the Lasso ( y , X ) has k linear segments, the y , ˜ path of ( ˜ X ) has 3 k − 1 linear segments. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 7/15

Worst case analysis � X � � � y 2 α y y △ X △ ˜ ˜ = , = , 0 α y n +1 y n +1 Let us denote by { η 1 , . . . , η k } the sequence of k sparsity patterns in {− 1 , 0 , 1 } p encountered along the path of the Lasso ( y , X ). y , ˜ The new sequence of sparsity patterns for ( ˜ X ) is first k patterns middle k patterns � �� η 1 = 0 � � η 2 � � η k � � η k � � η k − � � η 1 =0 � 1 , , . . . , , , , . . . , , 0 0 0 1 1 1 � � − η 2 � � − η 3 � � − η k � , , . . . , . 1 1 1 � �� last k − 1 patterns Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 8/15

Worst case analysis We are now in shape to build a pathological path with (3 p + 1) / 2 linear segments. Note that this lower-bound complexity is tight.     1 α 1 2 α 2 2 α 3 . . . 2 α p 1 0 α 2 2 α 3 . . . 2 α p         y △ 1 X △ 0 0 α 3 . . . 2 α p     = , = ,     . . . . . ...  .   . . . .  . . . . .     1 0 0 0 . . . α p Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 9/15

b b b b Approximate Complexity Refinement of Giesen, Jaggi, and Laue [2010] for the Lasso Strong Duality f ( w ), primal w ⋆ κ ⋆ w κ g ( κ ), dual Strong duality means that max κ g ( κ ) = min w f ( w ) Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 10/15

b b b b Approximate Complexity Duality Gaps f ( w ), primal w ˜ ˜ κ w κ δ ( ˜ w , ˜ κ ) g ( κ ), dual Strong duality means that max κ g ( κ ) = min w f ( w ) w ) − f ( w ⋆ ) ≤ δ ( ˜ The duality gap guarantees us that 0 ≤ f ( ˜ w , ˜ κ ) . Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 11/15

Approximate Complexity � � = 1 f λ ( w ) △ 2 � y − Xw � 2 min 2 + λ � w � 1 , (primal) w � = − 1 � g λ ( κ ) △ 2 κ ⊤ κ − κ ⊤ y s.t. � X ⊤ κ � ∞ ≤ λ max . (dual) κ ε -approximate solution w satisfies APPROX λ ( ε ) when there exists a dual variable κ s.t. δ λ ( w , κ ) = f λ ( w ) − g λ ( κ ) ≤ ε f λ ( w ) . ε -approximate path A path P : λ �→ w ( λ ) is an approximate path if it always contains ε -approximate solutions. (see Giesen et al. [2010] for generic results on approximate paths) Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 12/15

Approximate Complexity Main relation ⇒ APPROX λ (1 −√ ε ) ( ε ) APPROX λ (0) = Key: find an appropriate dual variable κ ( w ) + simple calculation; Proposition - approximate analysis � � log( λ ∞ /λ 1 ) there exists an ε -approximate path with at most segments. √ ε Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 13/15

Approximate Homotopy Recipe - main ideas/features Maintain approximate optimality conditions along the path; Make steps in λ greater than or equal to λ (1 − θ √ ε ); When the kinks are too close to each other, make a large step and use a first-order method instead; Between λ ∞ and λ 1 , the number of iterations is upper-bounded by � � log( λ ∞ /λ 1 ) θ √ ε . Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 14/15

A Few Messages to Conclude Despite its exponential complexity, the homotopy algorithm remains extremely powerful in practice ; the main issue of the homotopy algorithm might be its numerical stability; when one does not care about precision, the worst-case complexity of the path can significantly be reduced. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 15/15

Advertisement SPAMS toolbox (open-source) C++ interfaced with Matlab, R, Python . proximal gradient methods for ℓ 0 , ℓ 1 , elastic-net, fused-Lasso, group-Lasso, tree group-Lasso, tree- ℓ 0 , sparse group Lasso, overlapping group Lasso... ...for square, logistic, multi-class logistic loss functions. handles sparse matrices, provides duality gaps. fast implementations of OMP and LARS - homotopy . dictionary learning and matrix factorization (NMF, sparse PCA). coordinate descent, block coordinate descent algorithms. fast projections onto some convex sets. Try it! http://www.di.ens.fr/willow/SPAMS/ Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 16/15

References I S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing , 20:33–61, 1999. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of statistics , 32(2):407–499, 2004. B. G¨ artner, M. Jaggi, and C. Maria. An exponential lower bound on the complexity of regularization paths. preprint arXiv:0903.4817v2 , 2010. J. Giesen, M. Jaggi, and S. Laue. Approximating parameterized convex optimization problems. In Algorithms - ESA , Lectures Notes Comp. Sci. 2010. V. Klee and G. J. Minty. How good is the simplex algorithm? In O. Shisha, editor, Inequalities , volume III, pages 159–175. Academic Press, New York, 1972. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 17/15

Complexity Analysis of the Lasso Regularization Path Julien Mairal - PowerPoint PPT Presentation

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15 What this work is about

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Regularization Regularization is a general approach to add a complexity parameter to a

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction to Path Analysis Ways to think about path analysis Path coefficients

RIDGE and LASSO regularization for regression Feature selection - Some algorithms perform

Lasso Regularization Paths for NARMAX Models via Coordinate Descent Ant onio H. Ribeiro, Luis

Abstract This study analyses the effect of immigrant concentration in neighbourhood of residence

Semilinear elliptic equations with singular coefficients Tusheng Zhang University of Manchester

Chance constrained problems: reformulation using penalty functions and sample approximation

Bilevel Modelling of Energy Pricing Problem publics ou privs. recherche franais ou

Motivational aspects of energy transitions in Japan Some empirical findings September 6, 2017

Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some situations) Does

Department of Psychology 2018-2019 Who s who in the department Program requirements

Big Data Analysis with Apache Spark UC#BERKELEY This Lecture Course Objectives and

Complexity Analysis of the Lasso Regularization Path Julien Mairal - PowerPoint PPT Presentation

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15 What this work is about

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Regularization Regularization is a general approach to add a complexity parameter to a

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction to Path Analysis Ways to think about path analysis Path coefficients

RIDGE and LASSO regularization for regression Feature selection - Some algorithms perform

Lasso Regularization Paths for NARMAX Models via Coordinate Descent Ant onio H. Ribeiro, Luis

Abstract This study analyses the effect of immigrant concentration in neighbourhood of residence

Semilinear elliptic equations with singular coefficients Tusheng Zhang University of Manchester

Chance constrained problems: reformulation using penalty functions and sample approximation

Bilevel Modelling of Energy Pricing Problem publics ou privs. recherche franais ou

Motivational aspects of energy transitions in Japan Some empirical findings September 6, 2017

Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some situations) Does

Department of Psychology 2018-2019 Who s who in the department Program requirements

Big Data Analysis with Apache Spark UC#BERKELEY This Lecture Course Objectives and

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Regularization Overview Regularization Overview Problems & Multicollinearity We will