on corrections of classical multivariate tests for high
play

On corrections of classical multivariate tests for high-dimensional - PowerPoint PPT Presentation

Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions On corrections of classical multivariate tests for high-dimensional data


  1. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai , Dandan Jiang , Shurong Zheng :

  2. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions Overview Introduction High-dimensional data and new challenge in statistics A two sample problem Sample covariance matrices Sample v.s. population covariance matrices Marˇ cenko-Pastur distributions Bai and Silverstein’s CLT for linear spectral statistics Random Fisher matrices Random Fisher matrices Testing covariance matrices I Simulation study I Testing covariance matrices II Simulation study II Multivariate regressions Conclusions :

  3. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions Introduction High-dimensional data and new challenge in statistics A two sample problem Sample covariance matrices Sample v.s. population covariance matrices Marˇ cenko-Pastur distributions Bai and Silverstein’s CLT for linear spectral statistics Random Fisher matrices Random Fisher matrices Testing covariance matrices I Simulation study I Testing covariance matrices II Simulation study II Multivariate regressions Conclusions :

  4. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions High-dimensional data and new challenge in statistics High dimensional data High dimensional data � = high dimensional models ◮ Nonparametric regression: a very high-dimensional model (i.e. infinite dimensional model) but with one-dimensional data : y i = f ( x i ) + ε i , f : R �→ R , i = 1 , . . . , n ◮ High-dimensional data : observation vectors y i ∈ R p , with p relatively high w.r.t. the sample size n :

  5. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions High-dimensional data and new challenge in statistics High dimensional data Some typical data dimensions : data ratio n / p data dimension p sample size n n / p portfolio ∼ 50 500 10 climate survey 320 600 1.9 a · 10 2 b · 10 2 speech analysis ∼ 1 ORL face data base 1440 320 1.2 micro-arrays 2000 200 0.1 ◮ Important: data ratio n / p not always large ; could be ≪ 1 ◮ Note: use of the Inverse data ratio: y = p / n :

  6. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem High-dimensional effect by an example The two-sample problem: ◮ two independent samples: x 1 , . . . , x n 1 ∼ ( µ 1 , Σ) , y 1 , . . . , y n 2 ∼ ( µ 2 , Σ) ◮ want to test H 0 : µ 1 = µ 2 against H 1 : µ 1 � = µ 2 . ◮ Classical approach: Hotelling’s T 2 test T 2 = n 1 n 2 ( x − y ) ′ S − 1 n ( x − y ) , n where n 1 n 2 X X = x i , y = y j , n = n 1 + n 2 , x i =1 j =1 " n 1 # X X n 2 1 ( x i − x i )( x i − x i ) ′ + ( y j − y j )( y j − y i ) ′ S n = . n − 2 i =1 j =1 S n : a sample covariance matrix :

  7. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem The two-sample problem: Hotelling’s T 2 test: nice properties ◮ invariance under linear transformations; ◮ finite-sample optimality if Gaussian; asymptotic optimality otherwise. Hotelling’s T 2 test: bad news ◮ low power even for moderate data dimensions; ◮ high instability in computing S − 1 even for p = 40; n ◮ very few is known for the non Gaussian case; ◮ fatal deficiency: when p > n − 2, S n is not invertible. :

  8. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Dempster’s non-exact test (NET) Dempster A.P., ’58, ’60 ◮ A reasonable test must be based on x − y even when p > n − 2; ◮ choose a new basis in R n , project the data such that 1. axis 1 � Ground mean: ( n 1 µ 1 + n 2 µ 2 ) / n 2. axis 2 � ( x − y ) . n × p = ( x 1 , . . . , x n 1 , y 1 , . . . , y n 2 ) ′ , and the (orthonormal) ◮ let the data matrix X base change H n : 0 1 0 1 0 1 h ′ z ′ n 2 √ nn 1 1 n 1 1 1 1 B C B C B C . . n × p = H n Z n × n X = . A X = . A , h 1 = √ n 1 n , h 2 = A . @ @ @ . . n 1 − √ nn 2 1 n 2 h ′ z ′ n n Under normality, we have: ◮ the z i ’s are n independent N p ( ∗ , Σ); 1 E z 2 = n 1 n 2 ◮ E z 1 = √ n ( n 1 µ 1 + n 2 µ 2 ) , √ n ( µ 1 − µ 2 ) , E z 3 = 0 , i = 3 , . . . , n . :

  9. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Dempster’s non-exact test (NET) Test statistic: � z 2 � 2 ◮ F D = ( n − 2) � z 3 � 2 + · · · + � z n � 2 ◮ Under H 0 , r X � z j � 2 ∼ Q := α k χ 2 1 ( k ) , k =1 where α 1 ≥ · · · α r > 0 are the non null eigenvalues of Σ. ◮ The distribution of F D is complicated ◮ Approximations - so the NET test : think as Σ = I p , 1. Q ≃ m χ 2 r ; 2. next estimate r by ˆ r ; ◮ Finally, under H 0 , F D ≃ F (ˆ r , ( n − 2)ˆ r ) . :

  10. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Dempster’s non-exact test (NET) Problems with the NET test: ◮ Difficult to construct the orthogonal transformation H n = { h j } for large n ; ◮ even under Gaussianity, the exact power function depend on H n . :

  11. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Bai and Saranadasa’s test (ANT) Bai & Saranadasa, ’96 n ◮ Consider directly the statistic M n = � x − y � 2 − n 1 n 2 tr S n ; ◮ generally under very mild conditions (here RMT comes!), n 2 M n n − 1 n − 2 tr Σ 2 . σ 2 = ⇒ N (0 , 1) , n := Var( M n ) = σ 2 n 2 1 n 2 n 2 ◮ A ratio consistent estimator: » – n = 2 n ( n − 1)( n − 2) 1 σ 2 tr S 2 n − 2( tr S n ) 2 σ 2 n /σ 2 P b n − , b − → 1 . n n 1 n 2 ( n − 3) ◮ Finally, under H 0 , Z n = M n = ⇒ N (0 , 1) σ 2 b n This is the Bai-Saranadasa’s asymptotic normal test (ANT). :

  12. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Comparison between T 2 , NET and ANT Power functions: ◮ Assuming p → ∞ , n → ∞ , p / n → y ∈ (0 , 1), n 1 / n → κ ; ◮ Hotelling’s T 2 , Dempster’s NET and Bai-Saranadasa’s ANT: s ! n (1 − y ) κ (1 − κ ) � Σ − 1 / 2 µ � 2 β H ( µ ) = Φ − ξ α + + o (1) , 2 y „ « n 2 tr Σ 2 κ (1 − κ ) � µ � 2 √ β D ( µ ) = Φ − ξ α + + o (1) = β BS ( µ ) . where α = test size, and ξ α = Φ − 1 (1 − α ) . µ = µ 1 − µ 2 , ◮ Important: because of the factor (1 − y ), T 2 losses power when y increases, i.e. p increases relatively to n . :

  13. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem Comparison between T 2 , NET and ANT Simulation results 1: Gaussian case Σ = (1 − ρ ) I p + ρ J p , J p = 1 p 1 ′ ◮ Choice of covariance: p ◮ noncentral parameter η = � µ 1 − µ 2 � 2 √ , ( n 1 , n 2 ) = (25 , 20), n = 45 tr Σ 2 :

  14. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions A two sample problem A summary of the introduction ◮ High-dimensional effect need to be taken into account ; ◮ Surprisingly, asymptotic methods with RMT perform well even for small p (as low as p = 4) ; ◮ many of classical multivariate analysis methods have to be examined with respect to high-dimensional effects. :

  15. Introduction Sample covariance matrices Random Fisher matrices Testing covariance matrices I Testing covariance matrices II Multivariate regressions Conclusions Introduction High-dimensional data and new challenge in statistics A two sample problem Sample covariance matrices Sample v.s. population covariance matrices Marˇ cenko-Pastur distributions Bai and Silverstein’s CLT for linear spectral statistics Random Fisher matrices Random Fisher matrices Testing covariance matrices I Simulation study I Testing covariance matrices II Simulation study II Multivariate regressions Conclusions :

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend