spectral distributions of high dimensional sample
play

Spectral distributions of high-dimensional sample correlation - PowerPoint PPT Presentation

Spectral distributions of high-dimensional sample correlation matrices under infinite variance Johannes Heiny Ruhr-University Bochum Joint work with Jianfeng Yao (HKU), Thomas Mikosch and Jorge Yslas (Copenhagen). Random Matrices and Complex


  1. Spectral distributions of high-dimensional sample correlation matrices under infinite variance Johannes Heiny Ruhr-University Bochum Joint work with Jianfeng Yao (HKU), Thomas Mikosch and Jorge Yslas (Copenhagen). Random Matrices and Complex Data Analysis Workshop, December 10-12, 2019, Shanghai J. Heiny Sample correlation & off-diagonal 1 / 30

  2. Normalized histogram of eigenvalues and MP density 0.9 Histogram of eigenvalues 0.8 y = f γ (x) 0.7 0.6 0.5 y 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 x Figure: These are NOT spikes! J. Heiny Sample correlation & off-diagonal 2 / 30

  3. Setup for the picture Data matrix X = X n : p × n matrix with iid centered entries and generic variable X d = X 11 . X = ( X it ) i =1 ,...,p ; t =1 ,...,n Sample covariance matrix S = 1 n XX ′ Ordered eigenvalues of S λ 1 ( S ) ≥ λ 2 ( S ) ≥ · · · ≥ λ p ( S ) Sample correlation matrix R = (diag( S )) − 1 / 2 S (diag( S )) − 1 / 2 . J. Heiny Sample correlation & off-diagonal 3 / 30

  4. Regular variation Regular variation with index α > 0 : P ( | X | > x ) = x − α L ( x ) , where L is a slowly varying function. This implies E [ | X | α + ε ] = ∞ for any ε > 0 . Normalizing sequence ( a 2 np ) such that np P ( X 2 > a 2 np x ) → x − α/ 2 , as n → ∞ for x > 0 . 1 / α ℓ ( np ) for a slowly varying function ℓ . Then a np = ( np ) J. Heiny Sample correlation & off-diagonal 4 / 30

  5. Reduction to Diagonal Diagonal X with iid regularly varying entries α ∈ (0 , 4) and p = n β with β ∈ [0 , 1] . We have np � XX ′ − diag( XX ′ ) � P a − 2 → 0 , where � · � denotes the spectral norm. n � ( XX ′ ) ij = X it X jt . t =1 J. Heiny Sample correlation & off-diagonal 5 / 30

  6. Eigenvalues Weyl’s inequality � � � λ i ( A + B ) − λ i ( A ) � ≤ � B � . max i =1 ,...,p Choose A + B = XX ′ and A = diag( XX ′ ) to obtain � � � P a − 2 � λ i ( XX ′ ) − λ i (diag( XX ′ )) max → 0 , n → ∞ . np i =1 ,...,p Note: Limit theory for ( λ i ( S )) reduced to ( S ii ) . J. Heiny Sample correlation & off-diagonal 6 / 30

  7. Heavy-tailed case Theorem (Heiny and Mikosch, 2016) X with iid regularly varying entries α ∈ (0 , 4) and p n = n β ℓ ( n ) with β ∈ [0 , 1] . 1 If β ∈ [0 , 1] , then � � � P a − 2 � λ i ( XX ′ ) − λ i (diag( XX ′ )) max → 0 . np i =1 ,...,p 2 If β ∈ (( α/ 2 − 1) + , 1] , then � � � P a − 2 � λ i ( XX ′ ) − X 2 max → 0 . np ( i ) ,np i =1 ,...,p J. Heiny Sample correlation & off-diagonal 7 / 30

  8. Example: Eigenvalues Figure: Smoothed histogram based on 20000 simulations of the approximation error for the normalized eigenvalue a − 2 np λ 1 ( S ) for entries X it with α = 1 . 6 , β = 1 , n = 1000 and p = 200 . J. Heiny Sample correlation & off-diagonal 8 / 30

  9. Eigenvectors v k unit eigenvector of S associated to λ k ( S ) Unit eigenvectors of diag( S ) are canonical basisvectors e j . Eigenvectors X with iid regularly varying entries with index α ∈ (0 , 4) and p n = n β ℓ ( n ) with β ∈ [0 , 1] . Then for any fixed k ≥ 1 , P � v k − e L k � ℓ 2 → 0 , n → ∞ . J. Heiny Sample correlation & off-diagonal 9 / 30

  10. Localization vs. Delocalization Pareto data Normal Data 1.0 ● ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● Size of components Size of Components ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 −0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● −0.15 ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 100 150 200 0 50 100 150 200 Indices of components Indices of Components Figure: X ∼ Pareto (0 . 8) Figure: X ∼ N (0 , 1) Components of eigenvector v 1 . p = 200 , n = 1000 . J. Heiny Sample correlation & off-diagonal 10 / 30

  11. Point Process of Normalized Eigenvalues Point process convergence p ∞ � � d N n = δ a − 2 → δ Γ − 2 /α = N np λ i ( XX ′ ) i i =1 i =1 The limit is a PRM on (0 , ∞ ) with mean measure µ ( x, ∞ ) = x − α/ 2 , x > 0 , and Γ i = E 1 + · · · + E i , ( E i ) iid standard exponential . J. Heiny Sample correlation & off-diagonal 11 / 30

  12. Point Process of Normalized Eigenvalues Limiting distribution: For k ≥ 1 , n →∞ P ( a − 2 lim np λ k ≤ x ) = lim n →∞ P ( N n ( x, ∞ ) < k ) = P ( N ( x, ∞ ) < k ) � x − α/ 2 � s k − 1 � e − x − α/ 2 , = x > 0 . s ! s =0 J. Heiny Sample correlation & off-diagonal 12 / 30

  13. Point Process of Normalized Eigenvalues Limiting distribution: For k ≥ 1 , n →∞ P ( a − 2 lim np λ k ≤ x ) = lim n →∞ P ( N n ( x, ∞ ) < k ) = P ( N ( x, ∞ ) < k ) � x − α/ 2 � s k − 1 � e − x − α/ 2 , = x > 0 . s ! s =0 Largest eigenvalue n λ 1 ( S ) d → Γ − α/ 2 , 1 a 2 np where the limit has a Fr´ echet distribution with parameter α/ 2 . Soshnikov ( 2006 ), Auffinger et al. ( 2009 ), Auffinger and Tang ( 2016 ), Davis et al. ( 2014 , 2016 2 ), JH and Mikosch ( 2016 ) J. Heiny Sample correlation & off-diagonal 12 / 30

  14. α = 3 . 99 α = 3 . 99 , n = 2000 , p = 1000 J. Heiny Sample correlation & off-diagonal 13 / 30

  15. α = 3 α = 3 , n = 2000 , p = 1000 J. Heiny Sample correlation & off-diagonal 14 / 30

  16. α = 2 . 1 α = 2 . 1 , n = 10000 , p = 1000 J. Heiny Sample correlation & off-diagonal 15 / 30

  17. Infinite variance, α < 2 Limiting spectral distribution of ( XX ′ ) under E [ X 2 ] = ∞ : Regular variation with α < 2 : n + p XX ′ → G γ F a − 2 α weakly , whose density g γ α satisfies α ( x ) ∼ c x − 1 − α/ 2 , g γ x → ∞ . Ben Arous and Guionnet (2008), Belinschi et al. (2009) J. Heiny Sample correlation & off-diagonal 16 / 30

  18. Moments of LSD Assumption: X symmetric and regularly varying with index α ∈ (0 , 2) . Goal: For k ≥ 1 , find the limit of � � � = 1 x k F R ( dx ) p E [tr( R k )] E J. Heiny Sample correlation & off-diagonal 17 / 30

  19. Moments of LSD One has p n � � E [tr( R k )] = E [ Y i 1 t 1 Y i 2 t 1 · · · Y i k t k Y i 1 t k ] . i 1 ,...,i k =1 t 1 ,...,t k =1 � �� � := F ( i 1 ,...,i k ) Assumption: X symmetric ⇒ Y ij symmetric X ij √ � n Y ij = t =1 X 2 it J. Heiny Sample correlation & off-diagonal 18 / 30

  20. Moments of LSD k − 2 r − 2 � � p E [tr( R k )] → β k ( γ ) + 2 1 γ r − 1 (Γ(1 − α/ 2)) − r + q +1 α r =2 q =0 � r − q � t ⋆ ( � I ) � � s � � � � Γ( d i ( � α/ 2 I, T )) Γ(1 − α/ 2) Γ( N i ( � I )) s =1 i =1 I ∈C ( q ) I | ( � T ∈C s, | � I ) r,k � m it ( � � � I, T ) − α Γ . 2 ( i,t ) ∈ ∆( � I,T ) J. Heiny Sample correlation & off-diagonal 19 / 30

  21. J. Heiny Sample correlation & off-diagonal 20 / 30

  22. J. Heiny Sample correlation & off-diagonal 21 / 30

  23. Motivation Random walk S n = X 1 + · · · + X n , n ≥ 1 . ( X i ) are iid random variables with generic element X . 1 E [ X ] = 0 and E [ X 2 ] = 1 . 2 Dimension p = p n → ∞ Consider iid copies ( S ( i ) n ) i ≤ p of S n and define the point process p � N n = δ d p ( S ( i ) n / √ n − d p ) . i =1 J. Heiny Sample correlation & off-diagonal 22 / 30

  24. We want to prove: p � d N n = δ d p ( S ( i ) → N , n → ∞ , n / √ n − d p ) i =1 where N is a Poisson random measure with mean measure µ ( x, ∞ ) = e − x , x ∈ R , and � 2 log p − log log p + log 4 π d p = . 2(2 log p ) 1 / 2 J. Heiny Sample correlation & off-diagonal 23 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend