complexity analysis of the lasso regularization path
play

Complexity Analysis of the Lasso Regularization Path Julien Mairal - PowerPoint PPT Presentation

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15 What this work is about


  1. Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15

  2. What this work is about another paper about the Lasso/Basis Pursuit [Tibshirani, 1996, Chen et al., 1999]: 1 2 � y − Xw � 2 min 2 + λ � w � 1 ; (1) w ∈ R p the first complexity analysis of the homotopy method [Ritter, 1962, Osborne et al., 2000, Efron et al., 2004] for solving (1); Some conclusions reminiscent of the simplex algorithm [Klee and Minty, 1972]; the SVM regularization path [G¨ artner, Jaggi, and Maria, 2010]. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 2/15

  3. The Lasso Regularization Path and the Homotopy Under uniqueness assumption of the Lasso solution, the regularization path is piecewise linear: 1.5 w 1 w 2 1 coefficient values w 3 w 4 0.5 w 5 0 −0.5 0 1 2 3 4 λ Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 3/15

  4. Our Main Results Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3 p + 1) / 2 linear segments. Proposition - approximate analysis There exists an ε -approximate path with O (1 / √ ε ) linear segments. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 4/15

  5. Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

  6. Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Recipe of the homotopy method - main ideas 1 finds a trivial solution w ⋆ ( λ ∞ ) = 0 with λ ∞ = � X ⊤ y � ∞ ; 2 compute the direction of the current linear segment of the path; 3 follow the direction of the path by decreasing λ ; 4 stop at the next “kink” and go back to 2. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

  7. Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ �→ w ⋆ ( λ ) is continuous and piecewise linear. Recipe of the homotopy method - main ideas 1 finds a trivial solution w ⋆ ( λ ∞ ) = 0 with λ ∞ = � X ⊤ y � ∞ ; 2 compute the direction of the current linear segment of the path; 3 follow the direction of the path by decreasing λ ; 4 stop at the next “kink” and go back to 2. Caveats kinks can be very close to each other; the direction of the path can involve ill-conditioned matrices; worst-case exponential complexity (main result of this work). Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 5/15

  8. Worst case analysis Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3 p + 1) / 2 linear segments. Regularization path, p=6 2 Coefficients (log scale) 1 0 −1 100 200 300 Kinks Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 6/15

  9. Worst case analysis Consider a Lasso problem ( y ∈ R n , X ∈ R n × p ). y in R n +1 and the matrix ˜ X in R ( n +1) × ( p +1) as follows: Define the vector ˜ � X � � � 2 α y y y △ X △ ˜ = , = , ˜ y n +1 0 α y n +1 where y n +1 � = 0 and 0 < α < λ 1 / (2 y ⊤ y + y 2 n +1 ). Adverserial strategy If the regularization path of the Lasso ( y , X ) has k linear segments, the y , ˜ path of ( ˜ X ) has 3 k − 1 linear segments. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 7/15

  10. Worst case analysis � X � � � y 2 α y y △ X △ ˜ ˜ = , = , 0 α y n +1 y n +1 Let us denote by { η 1 , . . . , η k } the sequence of k sparsity patterns in {− 1 , 0 , 1 } p encountered along the path of the Lasso ( y , X ). y , ˜ The new sequence of sparsity patterns for ( ˜ X ) is first k patterns middle k patterns � �� � � �� � � � η 1 = 0 � � η 2 � � η k � � η k � � η k − � � η 1 =0 � 1 , , . . . , , , , . . . , , 0 0 0 1 1 1 � � − η 2 � � − η 3 � � − η k � , , . . . , . 1 1 1 � �� � last k − 1 patterns Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 8/15

  11. Worst case analysis We are now in shape to build a pathological path with (3 p + 1) / 2 linear segments. Note that this lower-bound complexity is tight.     1 α 1 2 α 2 2 α 3 . . . 2 α p 1 0 α 2 2 α 3 . . . 2 α p         y △ 1 X △ 0 0 α 3 . . . 2 α p     = , = ,     . . . . . ...  .   . . . .  . . . . .     1 0 0 0 . . . α p Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 9/15

  12. b b b b Approximate Complexity Refinement of Giesen, Jaggi, and Laue [2010] for the Lasso Strong Duality f ( w ), primal w ⋆ κ ⋆ w κ g ( κ ), dual Strong duality means that max κ g ( κ ) = min w f ( w ) Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 10/15

  13. b b b b Approximate Complexity Duality Gaps f ( w ), primal w ˜ ˜ κ w κ δ ( ˜ w , ˜ κ ) g ( κ ), dual Strong duality means that max κ g ( κ ) = min w f ( w ) w ) − f ( w ⋆ ) ≤ δ ( ˜ The duality gap guarantees us that 0 ≤ f ( ˜ w , ˜ κ ) . Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 11/15

  14. Approximate Complexity � � = 1 f λ ( w ) △ 2 � y − Xw � 2 min 2 + λ � w � 1 , (primal) w � = − 1 � g λ ( κ ) △ 2 κ ⊤ κ − κ ⊤ y s.t. � X ⊤ κ � ∞ ≤ λ max . (dual) κ ε -approximate solution w satisfies APPROX λ ( ε ) when there exists a dual variable κ s.t. δ λ ( w , κ ) = f λ ( w ) − g λ ( κ ) ≤ ε f λ ( w ) . ε -approximate path A path P : λ �→ w ( λ ) is an approximate path if it always contains ε -approximate solutions. (see Giesen et al. [2010] for generic results on approximate paths) Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 12/15

  15. Approximate Complexity Main relation ⇒ APPROX λ (1 −√ ε ) ( ε ) APPROX λ (0) = Key: find an appropriate dual variable κ ( w ) + simple calculation; Proposition - approximate analysis � � log( λ ∞ /λ 1 ) there exists an ε -approximate path with at most segments. √ ε Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 13/15

  16. Approximate Homotopy Recipe - main ideas/features Maintain approximate optimality conditions along the path; Make steps in λ greater than or equal to λ (1 − θ √ ε ); When the kinks are too close to each other, make a large step and use a first-order method instead; Between λ ∞ and λ 1 , the number of iterations is upper-bounded by � � log( λ ∞ /λ 1 ) θ √ ε . Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 14/15

  17. A Few Messages to Conclude Despite its exponential complexity, the homotopy algorithm remains extremely powerful in practice ; the main issue of the homotopy algorithm might be its numerical stability; when one does not care about precision, the worst-case complexity of the path can significantly be reduced. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 15/15

  18. Advertisement SPAMS toolbox (open-source) C++ interfaced with Matlab, R, Python . proximal gradient methods for ℓ 0 , ℓ 1 , elastic-net, fused-Lasso, group-Lasso, tree group-Lasso, tree- ℓ 0 , sparse group Lasso, overlapping group Lasso... ...for square, logistic, multi-class logistic loss functions. handles sparse matrices, provides duality gaps. fast implementations of OMP and LARS - homotopy . dictionary learning and matrix factorization (NMF, sparse PCA). coordinate descent, block coordinate descent algorithms. fast projections onto some convex sets. Try it! http://www.di.ens.fr/willow/SPAMS/ Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 16/15

  19. References I S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing , 20:33–61, 1999. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of statistics , 32(2):407–499, 2004. B. G¨ artner, M. Jaggi, and C. Maria. An exponential lower bound on the complexity of regularization paths. preprint arXiv:0903.4817v2 , 2010. J. Giesen, M. Jaggi, and S. Laue. Approximating parameterized convex optimization problems. In Algorithms - ESA , Lectures Notes Comp. Sci. 2010. V. Klee and G. J. Minty. How good is the simplex algorithm? In O. Shisha, editor, Inequalities , volume III, pages 159–175. Academic Press, New York, 1972. Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 17/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend