greedy selection on the lasso solution grid
play

Greedy selection on the Lasso solution grid Piotr Pokarowski - PowerPoint PPT Presentation

Greedy selection on the Lasso solution grid Piotr Pokarowski Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 1 Dec 2016 Piotr Pokarowski Penalized Loss Minimization Framework Data = { ( y 1 , x T 1 ) , . . . , ( y n ,


  1. Greedy selection on the Lasso solution grid Piotr Pokarowski Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 1 Dec 2016 Piotr Pokarowski

  2. Penalized Loss Minimization Framework Data = { ( y 1 , x T 1 · ) , . . . , ( y n , x T n · ) } = Train ⊕ Valid ⊕ Test Fitting: � β ( λ ) = arg min { loss ( β, Train ) + penalty ( β, λ ) } β � � Selection: � � λ = arg min err β ( λ ) , Valid λ � � β ( � � Assessment: � err = err λ ) , Test Piotr Pokarowski

  3. Loss and Penalty Loss is relaxation of prediction error and tempered (partial, scaled etc.) negative log-likelihood n � loss ( β, Train ) = L ( y i , f ( x i · , β )) i = 1 Penalty on a model β = ( β 1 , . . . , β p ) T p � penalty ( β, λ ) = P λ ( | β j | ) j = 1 λ 1 ( t > 0 ) � P λ ( t ) � λ t 2 Piotr Pokarowski

  4. Loss Functions ⊃ linear, logistic models For i = 1 , . . . , n we have x i . ∈ R p and X = [ x 1 . , . . . , x n . ] T = [ x . 1 , . . . , x . p ] . y = ( y 1 , . . . , y n ) T , For simplicity of presentation y T 1 n = 0 and the columns are standardized such that x T x T . j 1 n = 0 , . j x . j = 1 for j = 1 , . . . , p . We consider a generalized linear model with a canonical link function g ( E y i ) = x T i . β ∗ . Let ε i = ( y i − E y i ) / sd ( y i ) . We assume that ε = ( ε 1 , . . . , ε n ) T ∈ R n is a vector of iid zero-mean errors having a subgaussian distribution with a constant σ that is E exp ( u ε i ) ≤ exp ( σ 2 u 2 / 2 ) for u ∈ R . Piotr Pokarowski

  5. Penalty Functions - Classics A. Hoerl and R. Kennard, Technometric 1970: Ridge Regression (RR) ≡ ℓ 2 -penalty P λ ( t ) = λ t 2 R. Nishi, Ann. Stat. 1984: Generalized Information Criterion (GIC) ≡ ℓ 0 -penalty P λ ( t ) = λ 1 ( t > 0 ) R. Tibshirani, JRRS-B 1996: Lasso ≡ ℓ 1 -penalty P λ ( t ) = λ t Piotr Pokarowski

  6. Penalty Functions - New Propositions H. Zou and T. Hastie, JRSS-B 2005 (1750 cit.): Elastic Net (EN) P λ 1 ,λ 2 ( t ) = λ 1 t + λ 2 2 t 2 P λ,α ( t ) = λ ( α t + 1 − α t 2 ) 2 CH. Zhang, Ann. Stat. 2010 (270 cit.): Minimax Concave Penalty (MCP) P λ,γ ( t ) = λ ( t ∧ γλ )( 1 − t ∧ γλ 2 γλ ) GIC � MCP � Lasso � EN � RR Piotr Pokarowski

  7. Elastic Net Penalty EN thresholding functions alpha = 0.1 alpha = 0.1 4 4 alpha = 0.5 alpha = 0.5 alpha = 0.9 alpha = 0.9 3 3 P 2 β 2 1 1 0 0 0 1 2 3 4 0 1 2 3 4 β t Piotr Pokarowski

  8. Minimax Concave Penalty MCP thresholding functions gamma = 25 gamma = 25 4 4 gamma = 2.5 gamma = 2.5 gamma = 1.1 gamma = 1.1 3 3 P 2 β 2 1 1 0 0 0 1 2 3 4 0 1 2 3 4 β t Piotr Pokarowski

  9. Algorithm 1 GIC-thresholded Lasso (SS) Input: y , X and λ Screening (Lasso) � β = argmin β { ℓ ( β ) + λ | β | 1 } ; order nonzero | � β j 1 | ≥ . . . ≥ | � β j s | , where s = | supp � β | ; � � { j 1 } , { j 1 , j 2 } , . . . , supp � set J = β ; Selection (GIC) � � � ℓ ( � β ML ) + λ 2 | J | T = argmin J ∈J ; J β SS = � Output: � T , � β ML b T Piotr Pokarowski

  10. Algorithm 2 Greedy Selection on the Lasso Solution Grid (SOSnet) Input: y , X and ( o , λ ≤ λ 1 < . . . < λ m ) Screening (Lasso) for k = 1 to m do β ( k ) = argmin β { ℓ ( β ) + λ k | β | 1 } ; � β ( k ) β ( k ) order nonzero | � j 1 | ≥ . . . ≥ | � j sk | , s k = | supp � β ( k ) | ; Ordering (squared Wald tests) for l = 1 to o do set J = { j 1 , j 2 , . . . , j s kl } , s kl = ⌊ s k · l o ⌋ ; compute � β ML ; J set predictors in J according to squared Wald tests: w 2 i 1 ≥ w 2 i 2 ≥ . . . ≥ w 2 i skl ; set J kl = {{ i 1 } , { i 1 , i 2 } , . . . , { i 1 , i 2 , . . . , i s kl }} end for ; end for ; Selection (Generalized Information Criterion, GIC) � � J = � m � o � ℓ J + λ 2 | J | l = 1 J kl T = argmin J ∈J k = 1 β SOSnet = � Output: � T , � β ML b T Piotr Pokarowski

  11. When thresholding separates a true model ? beta 5 hat_beta 4 coefficients 3 2 1 0 −1 1 2 3 4 5 6 7 8 indices Piotr Pokarowski

  12. Lasso separarion error (1) A true model is T = supp ( β ∗ ) = { j ∈ F : β ∗ j � = 0 } . β ∗ min = min j ∈ T | β ∗ j | and t = | T | . A Bregman divergence D ( β, β ∗ ) = ℓ ( β ) − ℓ ( β ∗ ) − ˙ ℓ ( β ∗ ) T ( β − β ∗ ) A symmetrized Bregman divergence ∆( β, β ∗ ) = D ( β, β ∗ ) + D ( β ∗ , β ) = ( β − β ∗ ) T ( ˙ ℓ ( β ) − ˙ ℓ ( β ∗ )) Piotr Pokarowski

  13. Lasso separarion error (2) For a ∈ ( 0 , 1 ) consider a cone � � T | 1 ≤ 1 + a ν ∈ R p : | ν ¯ C T , a = 1 − a | ν T | 1 . (1) A general invertibility factor defined in J. Huang and C-H Zhang JMLR 2012: ∆( β ∗ + ν, β ∗ ) ζ a = inf . (2) | ν T | 1 | ν | ∞ ν ∈C T , a Piotr Pokarowski

  14. Lasso separarion error (3) We have on A = { ˙ ℓ ( β ∗ ) ≤ λ } a so-called oracle inequality | ∆ | ∞ ≤ ( 1 + a ) λζ − 1 < β ∗ min / 2 . a It is easy to check that A a ⊆ { T ∈ J } Hence for λ < ( 1 + a ) − 1 ζ a β ∗ min / 2 we have � � − a 2 λ 2 P ( T �∈ J ) ≤ 2 p exp . 2 σ 2 Piotr Pokarowski

  15. GIC error (1) Let W ∗ = diag ( sd ( y 1 ) , ..., sd ( y n )) , X ∗ = W 1 / 2 X . ∗ Let X ∗ J be a submatrix of X ∗ with columns having indices in J . H ∗ J - orthogonal projection on columns of X ∗ J . A scaled K-L distances between T and its submodels is defined in X-T Shen et al JASA 2012: J ⊂ T , | T \ J | = k || ( I − H ∗ J ) X β ∗ || 2 , δ k = min ℓ ( x T ¨ iT β T ) / ¨ ℓ ( x T i . β ∗ ) c k = min min i β T : || X ∗ T β T − X ∗ β ∗ ||≤ δ k ˜ k c 2 δ = min k δ k / k Piotr Pokarowski

  16. GIC error (2) ˜ If t σ 2 < λ 2 < δ 2 ( 1 + a ) 2 then � � − a 2 λ 2 P ( T ∈ J , ˆ T ⊂ T ) ≤ exp . 2 σ 2 , log ( 3 p )) < λ 2 then If σ 2 a 2 min ( tc − 1 t � � − a 2 λ 2 P ( T ∈ J , ˆ T ⊃ T ) ≤ 3 p exp . 4 σ 2 Piotr Pokarowski

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend