Condence Sets Based on Sparse Estimators Are Necessarily Large - PowerPoint PPT Presentation

Con…dence Sets Based on Sparse Estimators Are Necessarily Large Benedikt M. Pötscher Department of Statistics, University of Vienna

Sparse Estimators and the "Oracle" Property Given is a parametric statistical model indexed by a parameter � 2 R k . An � n for � is said to be sparse if for every � 2 R k and i = 1 ; : : : ; k estimator ^ � ^ � n !1 P n;� lim � n;i = 0 = 1 whenever � i = 0 . Examples of sparse estimators (that are also consistent for � ): � Post-model-selection estimators based on a consistent model selection procedure. � Thresholding estimators with suitable choice of threshold c n (typically c n ! 0 , n 1 = 2 c n ! 1 ).

Sparse Estimators and the "Oracle" Property (cont’d) � Various penalized maximum likelihood (least squares) estimators (e.g., SCAD, LASSO, adaptive LASSO, certain Bridge estimators) for an ap- propriate choice of the regularization parameter. For many (but not all) estimators, sparsity implies the so-called "oracle" property : That is, their (pointwise) asymptotic distribution coincides with the distribution of an infeasible "estimator" (the "oracle") that makes use of the zero restrictions holding for the true parameter vector � . I.e., the estimator "adapts" to the unknown zero restrictions.

A Simple Example � � � � Y 1 ; : : : ; Y n iid N ( �; 1) and ^ � n = � � � � > c n ) with c n ! 0 and n 1 = 2 c n ! Y 1 ( Y 1 . This is Hodges’ estimator. It is a post-model-selection estimator (hard- thresholding) based on consistent selection between the unrestricted model M U = R and the restricted model M R = f 0 g . Then ^ � n is consistent for � and satis…es the sparsity property: � ^ � n !1 P n;� lim � n = 0 = 1 whenever � = 0 ; as well as the "oracle" (supere¢ciency) property ( N (0 ; 1) � 6 = 0 � n � � ) d n 1 = 2 (^ ! � = 0 ; N (0 ; 0) the "oracle" being the unrestricted MLE ^ � ( U ) = � Y if � 6 = 0 , and the restricted MLE ^ This seems to say that ^ � ( R ) = 0 if � = 0 . � n is as good as the unrestricted MLE if � 6 = 0 and as good as the restricted MLE if � = 0 .

A Simple Example (cont’d) The "oracle" property suggests the following con…dence interval for � ( (^ � n � n � 1 = 2 z 1 � �= 2 ; ^ � n + n � 1 = 2 z 1 � �= 2 ) ^ if � n 6 = 0 C n = � n = 0 : ^ f 0 g if That, is C n chooses between the standard con…dence intervals based on the unrestricted and restricted MLE, respectively, depending on whether the model selection procedure underlying ^ � n chooses the unrestricted model M U = R or the restricted model M R = f 0 g . Due to the "oracle" property, C n satis…es ( ) 1 � � for � 6 = 0 for every � 2 R . n !1 P n;� ( � 2 C n ) = lim � 1 � � for � = 0 1

Comments on the "Oracle" Property A selection of recent papers establishing the "oracle" property for a variety of estimators in (semi)parametric models: Bunea (AS 2004), Bunea & McKeague (JMVA 2005) Fan & Li (JASA 2001, AS 2002, JASA 2004), Zou (JASA 2006) Wang & Leng (JASA 2007), Li & Liang (AS 2007) Wang, G. Li, & Tsai (JRSS B 2007), Zhang & Li (BA 2007) Wang, R. Li, & Tsai (BA 2007), Zou & Yuan (AS 2008), etc.

Comments on the "Oracle" Property (cont’d) This literature views the "oracle" property as a desirable property of an estimator as the "oracle" property seems to lead to a gain in e¢ciency and to a gain in the size of con…dence sets. Zou & Yuan (AS 2008) call the "oracle" property a "gold standard for evaluating variable selection and coe¢cient estimation procedures".

Comments on the "Oracle" Property (cont’d) However, nothing could be farther from the truth: Bad minimax risk behavior of Hodges’ estimator has been known for decades (e.g., Lehmann & Casella (1998)). Furthermore, the "con…dence" set C n constructed above, although satisfying ( ) 1 � � for � 6 = 0 n !1 P n;� ( � 2 C n ) = lim � 1 � � for every � 2 R , 1 for � = 0 is dishonest in the sense that its minimal coverage probability satis…es n !1 inf lim � 2 R P n;� ( � 2 C n ) = 0 as pointed out by Beran (1992) and Kabaila (1995). We establish general results of this sort for arbitrary con…dence sets based on arbitrary sparse estimators in general (semi)parametric models.

Comments on the "Oracle" Property (cont’d) These results complement results on bad minimax risk behavior of sparse estimators in Yang (BA 2005) and Leeb &Pötscher (JE 2008); earlier minimax risk results can be found in Hosoya (1984), Shibata (AIM 1986), Foster & George (AS 1994).

Results n P n;� : � 2 R k o satis…es for every � 2 R k Assume the statistical experiment P n;�= p n is contiguous w.r.t. P n; 0 : (1) Let C n be a random set in R k "based" on the sparse estimator ^ � n in the sense that for every � 2 R k . P n;� (^ � n 2 C n ) = 1 (2) E.g., C n = [^ � n � a n ; ^ � n + b n ] is a k-dimensional box centered at ^ � n with a n ; b n possessing only nonnegative coordinates.

Results (cont’d) Theorem 1: Suppose Assumption (1) is satis…ed, ^ � n is sparse, and C n satis…es (2). Let � denote the asymptotic minimal coverage probability of C n , i.e., � = lim inf � 2 R k P n;� ( � 2 C n ) . inf n !1 Then for every t � 0 P n;� ( p n diam( C n ) � t ) � �: lim inf n !1 sup (3) � 2 R k More generally, for every t � 0 and every unit vector e 2 R k P n;� ( p n ext( C n ; ^ lim inf n !1 sup � n ; e ) � t ) � � (4) � 2 R k where ext( C n ; ^ � n ; e ) = sup f � � 0 : �e + ^ � n 2 C n g .

Results (cont’d) � Any con…dence set C n based on a sparse estimator that has positive asymptotic minimal coverage probability is necessarily larger by an order of magnitude than the classical MLE based con…dence set which has diameter � n � 1 = 2 . (If diam C n is nonrandom, then p n diam C n ! 1 .) � Con…dence sets C n based on sparse estimators and constructed from the "oracle" property, like the interval in the Hodges’ estimator example, have bounded p n diam C n . Hence, they have asymptotic minimal coverage probability 0 .

Results (cont’d) n o P n;�;� : � 2 R k ; � 2 T � Extension to semiparametric models and to con- …dence sets for linear functions A� is simple. � For particular classes of sparse estimators the results in (3) and (4) can be strengthened. � Assumption � = R k not essential. Results hold as long as 0 is an interior point of � .

Partially Sparse Estimators Suppose now � = ( � 0 ; � 0 ) 0 where � is k � � 1 , and the estimator ^ � n for � is partially sparse in the sense that for every � 2 R k and i = 1 ; : : : ; k � � ^ � n !1 P n;� lim � n;i = 0 = 1 holds whenever � i = 0 . If C n is a con…dence set for � based on ^ � n , Theorem 1 (extended to semiparametric models) can be immediately applied to give a similar result. This is not so if con…dence sets for � or A� (with this linear function also depending on � ) are considered.

Partially Sparse Estimators (cont’d) Suppose for some � 2 R k � k � the sequence P n; ( �;�= p n ) is Theorem 2: contiguous w.r.t. P n; ( �; 0) for every � 2 R k � . Let ^ � n be partially sparse. Let A = ( A 1 ; A 2 ) be a q � k matrix of full row-rank satisfying rank A 1 < q . Suppose C n is based on A ^ � n (i.e., P n;� ( A ^ � n 2 C n ) = 1 for every � ). Let � denote the asymptotic minimal coverage probability of C n , i.e., � = lim inf � 2 R k P n;� ( A� 2 C n ) . inf n !1 Then for every t � 0 P n;� ( p n diam( C n ) � t ) � �: lim inf n !1 sup � 2 R k

Partially Sparse Estimators (cont’d) The condition rank A 1 < q in Theorem 2 is, e.g., satis…ed if A = I k or A = (0 ; I k � ) . It is not satis…ed if A = ( I k � k � ; 0) . In this case a similar result can be obtained under an additional condition on the estimator.

Summary � Con…dence sets based on sparse estimators are necessarily larger then standard MLE based con…dence sets by an order of magnitude. This results hold under very weak conditions on the (semi)parametric model. Similar results hold for partially sparse estimators. � Sparse estimators also have bad minimax risk properties (Lehmann & Casella (1998), Yang (2005), Leeb &Pötscher (2008)). � Hence, despite its appeal at …rst sight, the sparsity property and the closely related "oracle" property have detrimental consequences for an estimator and associated con…dence sets. This downside of sparse estimators is not visible in the pointwise asymptotic framework underlying the "oracle" property concept of Fan & Li (2001) and others.

Condence Sets Based on Sparse Estimators Are Necessarily Large - PowerPoint PPT Presentation

Condence Sets Based on Sparse Estimators Are Necessarily Large Benedikt M. Ptscher Department of Statistics, University of Vienna Sparse Estimators and the "Oracle" Property Given is a parametric statistical model indexed by a

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

I MRT- - THE STATE OF THE THE STATE OF THE I MRT EVI DENCE EVI DENCE Bhadrasain Vikram, MD

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON

Ma Margi ginal l Inde depe pende dence and d Co Condi diti tion onal l Inde depe

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Consistency Maintenance: Propagation Consistency Maintenance: Propagation Con fl ict Resolution

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Company Presentation Con Condo dor r Pr Pressu essure Con e Contr trol ol Parent Company

Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1 , Sudipto Banerjee 2 and

New Drugs in AML New version of old drugs Inhibitors of signaling pathways CPX-351

Why is it plausible? Barry Mazur January 5, 2012 Rough notes in preparation for a lecture at the

Machine Learning: a Basic Toolkit Lorenzo Rosasco, - Universita di Genova - Istituto Italiano

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson

1 1.2. Guidelines for Information Security of Cloud Computing Category Main contents of measure

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE Lisa

Generic likelihood methods in R Peter Dalgaard Department of Biostatistics University of

Condence Sets Based on Sparse Estimators Are Necessarily Large - PowerPoint PPT Presentation

Condence Sets Based on Sparse Estimators Are Necessarily Large Benedikt M. Ptscher Department of Statistics, University of Vienna Sparse Estimators and the "Oracle" Property Given is a parametric statistical model indexed by a

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

I MRT- - THE STATE OF THE THE STATE OF THE I MRT EVI DENCE EVI DENCE Bhadrasain Vikram, MD

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE &amp; RECLAMATI ON CLOSURE &amp; RECLAMATI ON

Ma Margi ginal l Inde depe pende dence and d Co Condi diti tion onal l Inde depe

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Consistency Maintenance: Propagation Consistency Maintenance: Propagation Con fl ict Resolution

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Company Presentation Con Condo dor r Pr Pressu essure Con e Contr trol ol Parent Company

Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1 , Sudipto Banerjee 2 and

New Drugs in AML New version of old drugs Inhibitors of signaling pathways CPX-351

Why is it plausible? Barry Mazur January 5, 2012 Rough notes in preparation for a lecture at the

Machine Learning: a Basic Toolkit Lorenzo Rosasco, - Universita di Genova - Istituto Italiano

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson

1 1.2. Guidelines for Information Security of Cloud Computing Category Main contents of measure

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE Lisa

Generic likelihood methods in R Peter Dalgaard Department of Biostatistics University of

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON