gaussian approximations and multiplier bootstrap for
play

Gaussian approximations and multiplier bootstrap for maxima of sums - PowerPoint PPT Presentation

Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Victor Chernozhukov (MIT), Denis Chetverikov (UCLA), and Kengo Kato (U. of Tokyo) Sep. 3. 2013 Chernozhukov Chetverikov K. (MIT, UCLA, UT)


  1. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors Victor Chernozhukov (MIT), Denis Chetverikov (UCLA), and Kengo Kato (U. of Tokyo) Sep. 3. 2013 Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 1 / 24

  2. This talk is based upon the paper: Chernozhukov, V., Chetverikov, D. and K. (2012). Central limit theorems and multiplier bootstrap when p is much larger than n . arXiv:1212.6906. [A revised version is to appear in Ann. Statist.] The title was changed during the revision process. Applications to moment inequality models (if time allowed) are based on an ongoing paper. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 2 / 24

  3. Introduction Let x 1 , . . . , x n be independent random vectors in R p , p ≥ 2 . E[ x i ] = 0 and E[ x i x ′ i ] exists. E[ x i x ′ i ] may be degenerate. (Important!) Possibly p ≫ n . Keep in mind p = p n . This paper is about approximating the distribution of n 1 � T 0 = max √ n x ij . 1 ≤ j ≤ p i =1 By making x i,p +1 = − x i 1 , . . . , x i, 2 p = − x ip , we have � n � n 1 1 � � � � max √ n x ij � = max √ n x ij . � � � � 1 ≤ j ≤ p 1 ≤ j ≤ 2 p � i =1 i =1 Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 3 / 24

  4. Introduction Let y 1 , . . . , y n be independent normal random vectors with y i ∼ N (0 , E[ x i x ′ i ]) . Define n 1 � Z 0 = max √ n y ij . 1 ≤ j ≤ p i =1 When p is fixed , (subject to the Lindeberg condition) the central limit theorem guarantees that sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0 . t ∈ R Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 4 / 24

  5. Introduction Basic question: How large p = p n can be while having sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0? t ∈ R Related to multivariate CLT with growing dimension (Portnoy, 1986, PTRF; G¨ otze, 1991, AoP; Bentkus, 2003, JSPI, etc.). Write n n 1 1 � � X = √ n x i , Y = √ n y i . i =1 i =1 They are concerned with conditions under which sup | P( X ∈ A ) − P( Y ∈ A ) | → 0 , A ∈A while allowing for p = p n → ∞ . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 5 / 24

  6. Introduction Bentkus (2003) proved that (in case of i.i.d. and E[ x i x ′ i ] = I ), | P( X ∈ A ) − P( Y ∈ A ) | = O ( p 1 / 4 E[ | x 1 | 3 ] n − 1 / 2 ) . sup A : convex Typically E[ | x 1 | 3 ] = O ( p 3 / 2 ) , so that the RHS= o (1) provided that p = o ( n 2 / 7 ) . The main message of the paper: to make sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | → 0 , t ∈ R p can be much larger . Subject to some conditions, log p = o ( n 1 / 7 ) will suffice. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 6 / 24

  7. Introduction Still the above approximation results are not directly usable unless the cov. structure between the coordinates in X is unknown. In some cases, we know the cov. structure. e.g. think of x i = ε i z i where ε i is a scalar (error) r.v. with mean zero and common variance, and z i is the vector of non-stochastic covariates. Then T 0 is the maximum of t -statistics. But usually not. In such cases the dist. of Z 0 . is unknown. ⇒ We propose a Gaussian multiplier bootstrap for approximating the dist. of T 0 when the cov. structure between the coordinates of X is unknown. Its validity is established through the Gaussian approximation results. Still p can be much larger than n . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 7 / 24

  8. Applications Selecting design-adaptive tuning parameters for Lasso (Tibshirani, 1996, JRSSB) and Dantzig selector (Cand` es and Tao, 2007, AoS). Multiple hypotheses testing (too many references). Adaptive specification testing. These three applications are examined in the arXiv paper. Testing many moment inequalities. Will be treated if time allowed. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 8 / 24

  9. Literature Classical CLTs with p = p n → ∞ : Portnoy (1986, PTRF), G¨ otze (1991, AoP), Bentkus (2003, JSPI), among many others. Modern approaches on multivariate CLTs: Chatterjee (2005, arXiv),Chatterjee and Meckes (2008, ALEA), Reinert and R¨ ollin (2009, AoP), R¨ ollin (2011,AIHP). Developing Stein’s methods for normal approximation. Harsha, Klivans, and Meka (2012, J.ACM). Bootstrap in high dim.: Mammen (1993, AoS), Arlot, Blanchard, and Roquain (2010a,b, AoS). Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 9 / 24

  10. Main Thm. Theorem Suppose that there exists const. 0 < c 1 < C 1 s.t. c 1 ≤ n − 1 � n i =1 E[ x 2 ij ] ≤ C 1 , 1 ≤ ∀ j ≤ p . Then sup | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | t ∈ R � n − 1 / 8 ( M 3 / 4 ∨ M 1 / 2 ) log 7 / 8 ( pn/γ ) ≤ C inf 3 4 γ ∈ (0 , 1) � + n − 1 / 2 Q (1 − γ ) log 3 / 2 ( pn/γ ) + γ , where C = C ( c 1 , C 1 ) > 0 . Here Q (1 − γ ) = (1 − γ ) -quantile of max i,j | x ij | ∨ (1 − γ ) -quantile of max i,j | y ij | , and M k = max 1 ≤ j ≤ p ( n − 1 � n i =1 E[ | x ij | k ]) 1 /k . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 10 / 24

  11. Comments No restriction on correlation structure. The extra parameter γ appears essentially to avoid the appearance of the term of the form 1 ≤ j ≤ p | x ij | k ] E[ max in the bound. Notice the difference from M k . To avoid this, we use a suitable truncation, and γ controls the level of truncation. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 11 / 24

  12. Techniques There are a lot of techniques used to prove the main thm. Directly bounding the probability difference ( P( T 0 ≤ t ) − P( Z 0 ≤ t ) ) is difficult. Transform the problem into bounding E[ g ( X ) − g ( Y )] , g : smooth , where X = n − 1 / 2 � n i =1 x i , Y = n − 1 / 2 � n i =1 y i . How? Approximate z = ( z 1 , . . . , z p ) ′ �→ max 1 ≤ j ≤ p z j by F β ( z ) = β − 1 log( � p j =1 e βz j ) . Then 0 ≤ F β ( z ) − max 1 ≤ j ≤ p z j ≤ β − 1 log p . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 12 / 24

  13. Techniques Approximate the indicator function 1( · ≤ t ) by a smooth function h (standard). Then take g = h ◦ F β . Use a variant of Stein’s method to bound E[ g ( X ) − g ( Y )] . (*) Truncation + some fine properties of F β are used here. To obtain a bound on the probability difference from (*), we need an anti-concentration ineq. for maxima of normal random vectors. Intuition: from (*), we will have a bound on P( T 0 ≤ t ) − P( Z 0 ≤ t + error ) . Want to replace P( Z 0 ≤ t + error ) by P( Z 0 ≤ t ) . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 13 / 24

  14. Simplified anti-concentration ineq. Lemma (Simplified form) Let ( Y 1 , . . . , Y p ) ′ be a normal random vector with E[ Y j ] = 0 and E[ Y 2 j ] = 1 for all 1 ≤ j ≤ p . Then ∀ ǫ > 0 , sup P( | max 1 ≤ j ≤ p Y j − t | ≤ ǫ ) ≤ 4 ǫ (E[ max 1 ≤ j ≤ p Y j ] + 1) . t ∈ R This bound is universally tight (up to constant). Note 1: E[max 1 ≤ j ≤ p Y j ] ≤ √ 2 log p . Note 2: The inequality is dimension-free : Easy to extend it to separable Gaussian processes. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 14 / 24

  15. Some consequences Assumption: either (E.1) E[exp( | x ij | /B n )] ≤ 2 , ∀ i, j ; or ij ]) 1 / 4 ≤ B n , ∀ i. 1 ≤ j ≤ p x 4 (E.2) (E[ max Moreover, assume both c 1 ≤ n − 1 � n i =1 E[ x 2 (M.1) ij ] ≤ C 1 , ∀ j ; and n − 1 � n i =1 E[ | x ij | 2+ k ] ≤ B k (M.2) n , k = 1 , 2 , ∀ j. Here B n → ∞ is allowed. e.g. consider the case where x i = ε i z i with ε i mean zero scalar error and z i vector of non-stochastic covariates normalized s.t. n − 1 � n i =1 z 2 ij = 1 , ∀ j . Then (E.2),(M.1),(M.2) are satisfied if E[ ε 2 i ] ≥ c 1 , E[ ε 4 i ] ≤ C 1 , | z ij | ≤ B n , ∀ i, j, after adjusting constants. Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 15 / 24

  16. Corollary Corollary Suppose that one of the following conditions is satisfied: (i) (E.1) and B 2 n log 7 ( pn ) ≤ C 1 n 1 − c 1 ; or n log 7 ( pn ) ≤ C 1 n 1 − c 1 . (ii) (E.2) and B 4 Moreover, suppose that (M.1) and (M.2) are satisfied. Then | P( T 0 ≤ t ) − P( Z 0 ≤ t ) | ≤ Cn − c , sup t ∈ R where c, C depend only on c 1 , C 1 . Chernozhukov Chetverikov K. (MIT, UCLA, UT) GAR and MB for Maxima of Sums of High-Dimensional Vectors Sep. 3. 2013 16 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend