Multi-level Thresholding Tests for High Dimensional Means and - PowerPoint PPT Presentation

Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen Guanghua School of Management Center for Statistical Science Peking University Joint work with Bin Guo and Yumou Qiu

Two Sample Testing and Signal Detection i.i.d. i.i.d. ◮ X 1 , . . . , X n 1 ∼ F 1 ( µ 1 , Σ 1 ) and Y 1 , . . . , Y n 2 ∼ F 2 ( µ 2 , Σ 2 ) ◮ X k = ( X k 1 , . . . , X kp ) T and Y k = ( Y k 1 , . . . , Y kp ) T are p-dimensional ◮ Means: µ 1 = ( µ 11 , . . . , µ 1 p ) T and µ 2 = ( µ 21 , . . . , µ 2 p ) T ◮ Covariances: Σ 1 = ( σ ij 1 ) p × p and Σ 2 = ( σ ij 2 ) p × p Signals in the Mean H 0 : µ 1 = µ 2 vs. H a : µ 1 � = µ 2 Signals in the Covariance H 0 : Σ 1 = Σ 2 vs. H a : Σ 1 � = Σ 2 2 / 60

Tests for Means: Hotelling’s T 2 � − 1 � S n ( 1 n 1 + 1 T 2 = ( ¯ X 1 − ¯ X 2 ) ′ ( ¯ X 1 − ¯ n 2 ) X 2 ) where n i 2 � � ( X ij − ¯ X i )( X ij − ¯ S n = ( n 1 + n 2 − 2) − 1 X i ) ′ . i =1 j =1 Under H 0 : µ 1 = µ 2 and Gaussianity n 1 + n 2 − p − 1 p ( n 1 + n 2 − 2) T 2 ∼ F p,n 1 + n 2 − p − 1 . Reject H 0 at level α if n 1 + n 2 − p − 1 p ( n 1 + n 2 − 2) T 2 > F p,n 1 + n 2 − p − 1 ( α ) . 3 / 60

HD Tests for Means without Threshholding ◮ Bai and Saranadasa (BS) (1996) removed S − 1 from T 2 n X 2 ) − n 1 + n 2 BS = ( ¯ X 1 − ¯ X 2 ) ′ ( ¯ X 1 − ¯ trS n n 1 n 2 �� Requires: (i) p n → c ∈ [0 , ∞ ) and (ii) λ max = o tr (Σ 2 ) . ◮ Srivastava (2009): replaced S n with the diagonal matrix of S n in T 2 Requires: Gaussian data and p ∼ n . ◮ Chen and Qin (2010): proposed U -statistic formulation allowing p ≫ n , Σ 1 � = Σ 2 . 4 / 60

Chen-Qin (2010, AoS) Test i � = j X T i � = j X T � n 1 � n 2 j =1 X T � 1 i X 1 j � 2 i X 2 j 1 i X 2 j i =1 Q n = + − 2 n 1 ( n 1 − 1) n 2 ( n 2 − 1) n 1 n 2 • A linear combination of one- and two-sample U-statistics. E ( Q n ) = µ T 1 µ 1 + µ T 2 µ 2 − 2 µ T 1 µ 2 = � µ 1 − µ 2 � 2 . • Main Assumption for Asymptotic Normality of Q n tr (Σ 4 ) tr 2 (Σ 2 ) → 0 as p → ∞ . • Applicable for ANY p if the eigenvalues are bounded. • Thus, allows p ≫ n . 5 / 60

Asymptotic Power of Chen-Qin Test    − z α + nκ (1 − κ ) � µ 1 − µ 2 � 2  , Φ � 2 tr (˜ Σ 2 ) where ˜ Σ = κ Σ 1 + (1 − κ )Σ 2 and κ = lim n 1 ,n 2 →∞ n 1 / ( n 1 + n 2 ) . ◮ A VALID test under weak assumptions for wide range of dimensions. ◮ “VALID” means control of type I error. ◮ The power may be weak under high dimension due to inflated tr (˜ Σ 2 ) . 6 / 60

Thresholding Tests for Means ◮ One Sample Higher Criticism (HC) Test (Tukey, 1976). ◮ Donoho and Jin (2004) pioneered the theory under N p ( µ, I p ) . ◮ µ = ( µ 1 , · · · , µ p ) and those non-zero µ i = √ 2 r log p ◮ Faint Signals if r ∈ (0 , 1) ◮ S β = { k : µ k � = 0 } , the signal set. ◮ | S β | = p 1 − β – the number of signals. ◮ Sparse Signals if β ∈ (0 . 5 , 1) . 7 / 60

Higher Criticism (HC) ◮ X ∼ N ( µ, I p ) ◮ Z i is the Z-statistic at the i -th dimension. ◮ p i = P ( N (0 , 1) > Z i ) is the p-value for the i -th null. ◮ Sorted p-values: p (1) ≤ p (2) ≤ · · · ≤ p ( p ) . ◮ The HC statistic: √ p ( i/p − p ( i ) ) HC ∗ n = max . � p ( i ) (1 − p ( i ) ) 0 ≤ i ≤ α ∗ p 8 / 60

Optimal Detection Boundary Under N p ( µ, I p ) ◮ A phase diagram in ( r, β ) -plane r = ̺ ( β ) . ◮ If r > ̺ ( β ) , H 0 and H 1 are asymptotically separable; If r < ̺ ( β ) , H 0 and H 1 are not separable. ◮ Donoho and Jin (2004) established the detection boundary for HC test for Gaussian data � β − 1 / 2 , 1 / 2 < β ≤ 3 / 4 ; ̺ ∗ ( β ) = (1 − √ 1 − β ) 2 , 3 / 4 < β < 1 Same as the optimal detection boundary by Ingster (1999) without knowing the underlying signal strength r and sparsity β . 9 / 60

(i) For any test of the hypothesis, P(Type I Error) + P(Type II Error) → 1 if r < ρ ( β ) as n, p → ∞ ; (ii) There exists a test (HC) such that P(Type I Error) + P(Type II Error) → 0 if r > ρ ( β ) as n, p → ∞ . 10 / 60

L γ -Thresholding for H 0 : µ = 0 ◮ Motivated by Donoho and Johnstone (1994) and Fan (1996). ¯ � n X i = 1 j =1 X ij ◮ n ◮ The threshold statistics p X i | γ I {|√ n ¯ � | n ¯ � T γn ( s ) = X i | > 2 s log( p ) } for s ∈ (0 , 1) i =1 ◮ γ = 0 : the HC; ◮ γ = 1 : the L 1 -thresholding (hard thresholding) by Donoho and Johnstone (1994); ◮ γ = 2 : the L 2 -thresholding used in Zhong, Chen and Xu (2013). 11 / 60

L 2 -Thresholding Tests ◮ One sample: Zhong, Chen and Xu (2013, AoS) ◮ Two sample: Chen, Li and Zhong (2019, AoS) ◮ Can also attain Ingster “optimal” detection boundary when the underlying distributions is unknown and data are dependent ( Σ � = I p ). ◮ More powerful than the HC when ( r, β ) are above the boundary (ZCX). ◮ The detection boundary can be lowered by utilizing the dependence (CLZ) by first transforming data X ij to ˆ ˜ Σ − 1 X ij then applying the L 2 -thresholding. 12 / 60

Two-Sample for Means: Signals and Sparsity ◮ δ k = µ 1 k − µ 2 k – signal in the k-th dimension. ◮ S β = { k : δ k � = 0 } , the signal set. ◮ | S β | = p 1 − β — the number of signals. ◮ Sparse if β ∈ (0 . 5 , 1) . 13 / 60

Two-Sample for Means: L 2 Test Statistic ◮ An unbiased estimator to the signal δ 2 k : U-statistics n 1 n 2 � � 1 1 X ( k ) 1 i X ( k ) X ( k ) 2 i X ( k ) T nk = 1 j + 2 j n 1 ( n 1 − 1) n 2 ( n 2 − 1) i � = j i � = j n 1 n 2 � � 2 X ( k ) 1 i X ( k ) − 2 j . n 1 n 2 i j ◮ Test statistic ˜ � T n = n T nk . ◮ Chen and Qin (2010, AoS) 14 / 60

Two-Sample for Means: L 2 vs L 2 -Thresholding Statistics ◮ CQ: ˜ T n = n � T ni + n � T nk . i ∈ S c k ∈ S β β ◮ Oracle: n � T ni . i ∈ S β ◮ Thresholding Statistic p � � � L n ( s ) = nT nk I nT nk + 1 > λ n ( s ) k =1 where λ n ( s ) = 2 s log( p ) . ◮ Try to exclude those δ k = 0 dimensions. 15 / 60

Two-Sample Tests for Means: Variance Comparison – Strong Signal Case of nδ 2 k > 2 log( p ) Tests Variances 2 p + 2 � � ρ 2 L 2 ij + 4 n δ k δ l ρ kl i � = j k,l ∈ S β � � 2 p 1 − β + 2 ρ 2 Oracle ij + 4 n δ k δ l ρ kl i � = j ∈ S β k,l ∈ S β � � 2 L p + 2 p 1 − β + 2 ρ 2 Thresholding ij + 4 n δ k δ l ρ kl i � = j ∈ S β k,l ∈ S β • L p denotes slowly varying functions in the form of ( a log p ) b . 16 / 60

Multi-level Thresholding: Weak Signal Case Weak Signals: δ 2 k = 2 r log p/n for r < 1 . L n ( s ) − ˆ µ L n ( s ) , 0 M L n = max , σ L n ( s ) , 0 ˆ s ∈S n X ( k ) X ( k ) S n = { s k : s k = n ( ¯ − ¯ ) 2 / (2 log p ) for k = 1 , · · · , p } . 1 1 Theorem. Under Conditions C1 - C3 and H 0 , � � → exp ( − e − x ) , P a ( log p ) M L n − b ( log p, η ) ≤ x 1 2 and b ( y, η ) = 2 log y + 2 − 1 loglog y − 2 − 1 log { 4 π where a ( y ) = (2 log y ) (1 − η )2 } . 17 / 60

Detection Boundary of Multi-level Thresholding for Means Multi-level Thresholding test rejects H 0 if M L n ≥ G α = { q α + b ( log p, η ) } /a ( log p ) , q α is the upper α quantile of the Gumbel distribution. Define  β − 1 2 ≤ β ≤ 3 1 2 , 4 ;    ̺ ( β ) = (1 − √ 1 − β ) 2 ,  3  4 < β < 1 ,  Theorem Assume Conditions C1 - C3 . If r > ̺ ( β ) , the sum of type I and II errors of the multi-level thresholding test converges to zero as α → 0 and p → ∞ . ◮ The same “detection boundary” as the optimal one for N p ( µ, I p ) case. ◮ Signal enhancement by transforming data with the precision matrix Ω = Σ − 1 . ◮ Improved detection boundary: lower than ρ ( β ) . ◮ See Chen, Li and Zhong (2019) for details. 18 / 60

Two Sample Tests for Covariance Matrices ◮ H 0 : Σ 1 = Σ 2 vs Σ 1 � = Σ 2 . ◮ S n 1 = ( s ij 1 ) , S n 2 = ( s ij 2 ) : two sample covariances ◮ θ ij 1 = Var { ( X ki − µ 1 i )( X kj − µ 1 j ) } and θ ij 2 = Var { ( Y ki − µ 2 i )( Y kj − µ 2 j ) } p ◮ ˆ � n 1 k =1 { ( X ki − ¯ X i )( X kj − ¯ X j ) − s ij 1 } 2 1 θ ij 1 = → θ ij 1 n 1 p ◮ ˆ � n 2 k =1 { ( Y ki − ¯ Y i )( Y kj − ¯ 1 Y j ) − s ij 2 } 2 θ ij 2 = → θ ij 2 n 2 ( s ij 1 − s ij 2 ) 2 M ij = , 1 ≤ i ≤ j ≤ p. ˆ θ ij 1 /n 1 + ˆ θ ij 2 /n 2 19 / 60

Existing Work ◮ Bai et al. (2009, AoS): Corrected Likelihood Ratio test using RMT. ◮ Cai, Liu and Xia (2013): L max statistic M n = max 1 ≤ i ≤ j ≤ p M ij ◮ Only use the maximal signal ◮ Li and Chen (2012): L 2 statistic, sum over all M ij ◮ Include too many uninformative entries ◮ Srivastava and Yanagihara (2010): Also L 2 statistic to measure tr (Σ 2 1 ) / ( tr 2 (Σ 1 )) − tr (Σ 2 2 ) / ( tr 2 (Σ 2 )) 20 / 60

L 2 -Test Statistic: Li and Chen (2012) ◮ Target on Square of Frobenius norm: tr { (Σ 1 − Σ 2 ) 2 } = tr (Σ 2 1 ) + tr (Σ 2 2 ) − 2 tr (Σ 1 Σ 2 ) . ◮ Note that Σ 1 = Σ 2 if and only if tr { (Σ 1 − Σ 2 ) 2 } = 0 . ◮ Although the Frobenius norm is large, it brings two advantages. ◮ (i) Relatively easier to analyze for test procedures and power formula. ◮ (ii) Can target on certain sections of the covariance matrix. 21 / 60

Multi-level Thresholding Tests for High Dimensional Means and - PowerPoint PPT Presentation

Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen Guanghua School of Management Center for Statistical Science Peking University Joint work with Bin Guo and Yumou Qiu Two Sample Testing and Signal

Thresholding of Text Documents Oliver A Nina William A Barrett Thresholding or Binarization

Distributional Results for Thresholding Estimators in High-Dimensional Gaussian Regression Ulrike

Matrix estimation by Universal Singular Value Thresholding Sourav Chatterjee Courant Institute,

Score Distribution Based Term Specific Thresholding for Spoken Term Detection D. Can M. Sarac

Today Arrays One-dimensional Machine-Level Programming IV: Data Multi-dimensional

Multi-Dimensional Reflective BSDE July 29 2010, Cornell University By Qinghua Li, Columbia

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018

Multi Multi-dimensional Data and Spatial Range dimensional Data and Spatial Range Query in

Multi-dimensional Dependency Grammar as Graph Description Ralph Debusmann and Gert Smolka

Multi-Dimensional Gas Flows Tai-Ping Liu Academia Sinica, Taiwan Stanford University Final

Multi- -dimensional Data and dimensional Data and Spatial Range Spatial Range Multi Query in

Data Structures in Memory! Arrays One dimensional Multi dimensional (nested) M

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

A Primer on Strategic Games Krzysztof R. Apt (so not Krzystof and definitely not Krystof) CWI,

Product Differentiation in a Hotelling City with Elastic Demand* Matt Birch Graduate Student

Some variations of the Hotelling game Marco Scarsini 1 1 LUISS based on joint work with Matas

Hotellings Model Philosophy of Economics University of Virginia Matthias Brinkmann Aims for

2020 On a Projective Ensemble Approach to Two Sample Test for Equality of Distributions Zhimei

Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from : Percy Liang

Why Are Cities Located Where They Are? 9 Taxonomy of Location Problems Location Decision

Introduction Multivariate procedures in R Until version 2.1.0, R had limited support for