On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015

Introduction Model selection is a commonly used method to find sparsity or parsimony of statistical models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection. In this paper, they prove that a single condition, which they call the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.

Some previous results ◮ Knight and Fu(2000) have shown estimation consistency for Lasso for fixed p and fixed β n ◮ Meinshausen and Buhlmann(2006) have shown that Lasso is consistent in estimating the dependency between Gaussian variables even when p grows faster than n ◮ Zhao and Yu(2006) have show model selection consistency for both fixed p and large p problems

Definition Suppose the linear regression model: Y n = X n β n + ǫ n Where, Y n is a n x 1 response vector, X n = ( X n 1 , X n 2 , ..., X n p ) = (( x 1 ) T , ( x 2 ) T , ..., ( x n ) T ) is a n x p design matrix, β n is a p x 1 vector of model coefficients. ǫ n is a i.i.d random error variables with mean 0 and variance σ 2 Lasso estimator is : β n ( λ ) = argmin β ( || Y n − X n β || 2 ˆ 2 + λ || β || 1 ) with λ ≥ 0

Notation β n = ( β n 1 , β n 2 , .., β n q , β n q +1 , ..., β n p ) T Suppose, β n j � = 0 for j=1,2,..,q and β n j =0 for j=q+1,...,p β n (1) = ( β n 1 , ..., β n q ) , β n (2) = ( β n q +1 , ..., β n p ) X n (1) = ( X n 1 , ..., X n q ) , X n (2) = ( X n q +1 , ..., X n p ) , � C n C n � C n = 1 nX T 11 12 n X n = C n C n 21 22 11 = 1 22 = 1 where C n n X n (1) T X n (1) , ..., C n n X n (2) T X n (2)

Definition of Consistency β n − β n → p 0, as n → ∞ ◮ Estimation : ˆ ◮ Model selection : P ([ i : ˆ β n i � = 0] = [ i : β n i � = 0]) → p 1, as n → ∞ ◮ Sign : P ( ˆ β n = s β n ) → p 1, as n → ∞ , where β n = s β n ⇔ sign(ˆ ˆ β n ) = sign( β n )

Definition 1 Strongly Sign Consistent Lasso is Strongly Sign Consistent if ∃ λ n = f ( n ) such that lim n →∞ P (ˆ β n ( λ n ) = s β n ) = 1 General Sign Consistent Lasso is General Sign Consistent if lim n →∞ P ( ∃ λ ≥ 0 , ˆ β n ( λ n ) = s β n ) = 1

Definition 2 Strong Irrepresentable Condition 11 ) − 1 sign( β n ∃ η > 0, such that | C n 21 ( C n (1) ) | ≤ 1 − η Weak Irrepresentable Condition | C n 21 ( C n 11 ) − 1 sign( β n (1) ) | < 1

Result-Small p and q Classical setting: p,q and β n are all fixed as n → ∞ Suppose the following regularity conditions: C n → C > 0 , as n → ∞ 1 n max 1 ≤ i ≤ n (( x n i ) T x n i ) → 0 , n → ∞

Result-Small p and q Theorem 1 For fixed p,q and β n = β , under the previous assumptions, Lasso is strongly sign consistent if Strong Irrepresentable Condition holds. That is, when Strong Irrepresentable Condition holds, ∀ λ n that satisfies λ n λ n n → 0 and → ∞ with 0 ≤ c < 1, we have 1+ c n n P (ˆ β n ( λ n ) = s β n ) = 1 − o ( e − n c )

Result-Small p and q Theorem 2 For fixed p,q and β n = β , under the previous assumptions, Lasso is general sign consistent only if there exists N so that Weak Irrepresentable Condition holds for n > N

Result-Small p and q Therefore, Strong Irrepresentable Condition implies strong sign consistency implies general sign consistency implies Weak Irrepresentable Condition. So except for the technical difference between the two conditions, Irrepresentable Condition is almost necessary and sufficient for both strong sign consistency and general sign consistency.

Result-Large p and q Furthermore, under additional regularity conditions on the noise terms ǫ n i , this small p result can be extended to the large p case. That is, when p also tends to infinity not too fast as n tends to infinity, we show that Strong Irrepresentable Condition, again, implies Strong Sign Consistency for Lasso.

Result-Large p and q The dimension of the designs C n and parameters β n grow as n grows, then p n and q n are allowed to grow with n Suppose the following conditions: ∃ 0 ≤ c 1 < c 2 ≤ 1 and M 1 , M 2 , M 3 , M 4 > 0, 1 n ( X n i ) T X n i ≤ M 1 , for ∀ i , α T C n 11 α ≥ M 2 , for ∀ || α || 2 2 = 1 q n = O ( n c 1 ), 1 − c 2 2 , min i =1 , 2 ,.., q | β n n i | ≥ M 3

Result-Large p and q Theorem 3 i ) 2 k < ∞ for an Assume ǫ n i are i.i.d. random variable with E( ǫ n integer k > 0. Under the previous conditions, Strong Irrepresentable Condition implies that Lasso has strong sign consistency for p n = o ( n ( c 2 − c 1 ) k ). In particular, for ∀ λ n that √ n ) 2 k → ∞ , we have satisfies, λ n √ n = o ( n ( c 2 − c 1 ) / 2 ) and p n ( λ n 1 β n ( λ n ) = s β n ) ≥ 1 − O ( p n n k P (ˆ λ 2 k ) → 1 as n → ∞

Result-Large p and q Theorem 4 Assume ǫ n i are i.i.d. Gaussian random variables. Under the previous conditions, if there exists 0 < c 3 < c 2 − c 1 for which p n = O ( e n c 3 ), then Strong Irrepresentable Condisions implies that Lasso has 1+ c 4 2 with c 3 < c 4 < strong sign consistency. In particular, for λ n ∝ n c 2 − c 1 , P (ˆ β n ( λ n ) = s β n ) ≥ 1 − o ( e n c 3 ) → 1 as n → ∞

Discussions In this paper, they have provided Strong and Weak Irrepresentable Conditions that are almost necessary and sufficient for model selection consistency of Lasso under both small p and large p settings. Although much of Lasso’s strength lies in its finite sample performance which is not the focus here, their asymptotic results offer insights and guidance to applications of Lasso as a feature selection tool, assuming that the typical regularity conditions are satisfied on the design matrix as in Knight and Fu (2000)

References Peng Zhao, Bin Yu On Model Selection Consistency of Lasso Journal of Machine Learing Research 7 (2006) 2541-2563 Jinhu Jia, Karl Rohe Preconditioning To Comply With The Irrepresentable Condition math.ST 28 Aug 2012

The End

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 - PowerPoint PPT Presentation

On Model Selection Consistency Of Lasso Yewon Kim 12/08/2015 Introduction Model selection is a commonly used method to find sparsity or parsimony of statistical models, but usually involves a computationally heavy combinatorial search. Lasso

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

On the Distribution of the Adaptive LASSO Estimator U. Schneider (joint with B. M. P otscher)

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

C++ 11 Memory Consistency Model Sebastian Gerstenberg NUMA Seminar 07.01.2015 Agenda 1.

KO HIKURANGI TE MAUNGA, KO WAIAPU TE AWA, KO NGATI POROU TE IWI. I tu nga hui a hapu kei roto i

Villa ! Sovoye Poissy, ! France y, Le ! Corbusier Notre ! Dame ! de ! Haut Ronchamp, ! France Le !

Submission 22 August 2018 Te Rnanga - - iwi o Ngpuhi The only entity mandated to

Hui a-Rohe - Post Settlement Governance Entity Wednesday 27 th March 2013 Ngmotu Purpose of the

info@berl.co.nz | www.berl.co.nz Ngti Porou Hauora November 2015 slide 1 Ngti Porou

Urban Mori Urban Planning: Mori planning documents and urban development plans Biddy Livesey

ROM CHM ROM CHM 7 sites across the ROM Rohe 6 Atua, and 1 ihiihi field Monthly Visits

CULTURAL FR L FRAMEWO WORK MAURI COMP MA OMPASS OUR for addressing QUANTITATIVE WASTEWA