tracy widom limit for sample covariance matrices
play

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH - PowerPoint PPT Presentation

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of Technology Joint work with Jong Yun Hwang (KAIST) and Ji Oon Lee (KAIST) May 10, 2019 Sample covariance matrix Consider a (mean-zero) multivariate


  1. Tracy–Widom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of Technology Joint work with Jong Yun Hwang (KAIST) and Ji Oon Lee (KAIST) May 10, 2019

  2. Sample covariance matrix Consider a (mean-zero) multivariate Gaussian random variable:   y 1  y 2     ∈ R M , y =  .  y ∼ N (0 , Σ) . .  . y M Σ = E yy ∗ , i.e. , Σ αβ = E y α y β : covariance matrix Sample covariance matrix: N � Σ := 1 � y i y ∗ i , y i : samples . N i =1 Traditional setup: M fixed, N ր ∞ , then � Σ → Σ = E yy ∗ . High-dimensional setup: N/M =: d N → d ∈ (0 , ∞ ) , N ր ∞ .

  3. Wishart matrix: Σ = I Let λ 1 ≥ λ 2 ≥ . . . ≥ λ M denote the ordered eigenvalues of � Σ . For any fixed interval J � � � � � . = 1 N J . � � � N J − ρ MP ( x )d x � → 0 , with M # { j : λ j ∈ J } , J almost surely as N → ∞ , where ρ MP is the Marchenko-Pastur law: � ν d ( x )d x + (1 − d ) δ 0 ( x )d x , if d < 1 , ρ MP ( x )d x = ν d ( x )d x , if d ≥ 1 , where � ( E + − x )( x − E − ) ν d ( x ) = d E ± = (1 ± d − 1 / 2 ) 2 . 1 [ E − ,E + ] ( x ) and 2 π x 0.8 0.6 0.4 0.2 0.5 1.0 1.5 2.0 2.5 3.0 Figure : Histogram of the eigenvalues of � Σ : Σ = I , N = 2000 , d = 2 .

  4. Largest eigenvalue: Σ = I Theorem. Let E + denote the upper endpoint of the support of ρ MP . Then for any (small) ε > 0 and (large) D > 0 N ε � � 1 | λ 1 − E + | ≥ ≤ P N D , N 2 / 3 for N ≥ N 0 ( ε, D ) . Hence, we expect that the fluctuations of the largest eigenvalue are of order N − 2 / 3 .

  5. Largest eigenvalue: Σ = I Theorem: (Johnstone 2001, Johansson 2000) For real, respectively complex, Gaussians with Σ = I , there is a constant γ = γ ( d ) such that � � γN 2 / 3 ( λ 1 − E + ) ≤ s lim = F β ( s ) , β = 1 , 2 . N →∞ P Tracy–Widom distributions: � � � � � ∞ � ∞ − 1 ( F 2 ( s )) 1 / 2 , ( x − s ) q ( x ) 2 dx F 2 ( s ) := exp − , F 1 ( s ) := exp q ( x ) dx 2 s s where q satisfies q ′′ = sq + 2 q 3 , q ( s ) ∼ Ai ( s ) as s → ∞ . 0.3 0.2 0.1 - 4 - 2 2 4 Figure : Histogram of 2000 samples of γN 2 / 3( λ 1 − E +) of N = 1000 d = 2 , real Gaussians.

  6. Universality results Replace Gaussian distributions by general distributions. Definition. Let X = ( x αi ) be an M × N matrix whose entries are i.i.d. real random variables such that C k E | x αi | 2 = N − 1 , E | x αi | k � E x αi = 0 , N k/ 2 . Assume that M = M ( N ) with N/M → d ∈ (0 , ∞ ) as N → ∞ . Sample covariance matrix: � Σ = XX ∗ . Theorem (Pillai and Yin 2014). Let E + denote the upper endpoint of the support of ρ MP . Then for any (small) ε > 0 and (large) D > 0 � � γN 2 / 3 ( λ 1 − E + ) ≤ s lim = F 1 ( s ) , N →∞ P for N ≥ N 0 ( ε, D ) , where γ = d − 1 / 6 ( d 1 / 4 + d − 1 / 4 ) − 1 .

  7. Sparse sample covariance matrices Motivation: Biadjacancy matrix of bipartite graph. Two vertex sets: V = { v α } M α =1 has size M ; W = { w i } N i =1 has size N . Edges only between V and W . Biadjacancy matrix: 1 ≤ α ≤ M , 1 ≤ i ≤ N . B = ( b αi ) , � 1 , if there is an edge between v α and w i , b αi = 0 , else .

  8. Sparse sample covariance matrices Biadjacancy matrix of bipartite Erd˝ os-R´ enyi graph: B = ( b αi ) , ( b αi ) are i.i.d. random variables satisfying � 1 , with probability p b αi = 0 , with probability 1 − p , Remarks: ◦ E b k αi = p , k ≥ 1 . ◦ We allow p to depend on N . We say that B is sparse if p ≪ 1 .

  9. Sparse sample covariance matrices Center and normalize B = ( b αi ) : b αi − p X αi . . = � . Np (1 − p ) Then, � � αi = 1 1 E X 2 E X k E X αi = 0 , N , αi = 1 + O ( p ) , ( k ≥ 3) . N ( Np ) ( k − 2) / 2 Introduce the sparsity parameter q by q 2 . 0 < q ≤ cN 1 / 2 . . = pN , Hence, � � 1 E X k 1 + O ( q 2 /N ) αi = , k ≥ 3 . Nq k − 2

  10. Sparse sample covariance matrix Definition. Let X = ( X αi ) be an M × N matrix whose entries are real i.i.d. random variables such that C k E | X αi | 2 = N − 1 , E | X αi | k ≤ E X αi = 0 , Nq k − 2 for some q = N φ , 0 < φ ≤ 1 2 . Assume that M = M ( N ) with N/M → d ∈ (0 , ∞ ) as N → ∞ . Sample covariance matrix � Σ := XX ∗ . Remark: For any φ > 0 , the eigenvalues of � Σ follow the Marchenko-Pastur law on global scale. Rescaled cumulants: Let κ ( k ) be the k -th cumulant of X αi : ∞ � κ ( k ) t k � e tX αi � κ (2) = 1 κ (1) = 0 , log E = k ! , N . k =1 Set, for k ≥ 3 , s (1) := 0 , s (2) := 1 , s ( k ) := Nq k − 2 κ ( k ) .

  11. Estimates on largest eigenvalue Theorem (Hwang-Lee-S. ’18). Consider the largest eigenvalue λ 1 of XX ∗ . Assume that q ≥ N φ , φ > 0 . Then, there is L + ∈ R such that for any ε > 0 and D > 0 , �� � > N ε � 1 �� � 1 < N − D , � λ 1 − L + P q 4 + N 2 / 3 for N ≥ N 0 ( ε, D ) , where � � 2 � � 2 s (4) 1 1 1 + O ( q − 4 ) . L + := 1 + √ + √ 1 + √ q 2 d d d

  12. Tracy–Widom limit Theorem (Hwang-Lee-S. ’18). Assume that φ > 1 / 6 (so that q ≫ N 1 / 6 ). Let λ 1 be the largest eigenvalue of XX ∗ . Then, � � γN 2 / 3 ( λ 1 − L + ) � s lim = F 1 ( s ) N →∞ P for all s ∈ R , where γ − 1 = d 1 / 6 ( d 1 / 4 + d − 1 / 4 ) . Remark: For N 1 / 6 ≪ q ≤ N 1 / 3 , the spectral shift L + − E + ≃ 1 q 2 is much larger than the TW-fluctuations. Remark: If φ < 1 / 6 , it is believed that the limiting distribution will be Gaussian; cf. Huang-Landon-Yau ’17.

  13. (Some) previous results � TW for Wishart matrix ( X is Gaussian, Σ = I ) � Complex case: Johansson (2000) � Real case: Johnstone (2001) � Null case ( Σ = I ) � Edge universality: Pillai-Yin (2014), Ding-Yang (2018) � (Phase transition for) Spiked Wishart matrix ( X is Gaussian, Σ is a finite-rank perturbation of I ) � Complex case: Baik-Ben Arous-P´ ech´ e (2005) � Real case: Bloemendal-Vir´ ag (2011) � Spiked sample covariance matrix ( Σ is a finite-rank perturbation of I ) � Edge universality: Bloemendal-Knowles-Yau-Yin (2014) � Non-null case ( Σ � = I ) � TW for Gaussian, complex case: El Karoui (2007), real case: Lee-S. (2016), Fan-Johnstone (2017) � Edge universality: Bao-Pan-Zhou (2015), Knowles-Yin (2017).

  14. Tools: Stieltjes transform and Green function ◦ Given a probability measure µ on R , its Stieltjes transform is defined as � d µ ( x ) z ∈ C + . m µ ( z ) . . = x − z , R ◦ For µ the Marchenko-Pastur law, we have �� � � � 2 − 4 z z + 1 − 1 z + 1 − 1 + d d m MP ( z ) = . 2 z ◦ Hence, m MP ( z ) is the solution to � � zm MP + 1 − 1 1 + zm MP + m MP ( z ) = 0 , d with m MP ( z ) ∈ C + , for z ∈ C + . ◦ In the sparse setup we make the ansatz � � 2 m ( z ) + s (4) � m ( z ) + 1 − 1 � m ( z ) + 1 − 1 m ( z ) 2 = 0 , 1 + z � m ( z ) + z � � z � � q 2 d d m ( z ) ∈ C + , z ∈ C + . and pick the solution with �

  15. Tools: Stieltjes transform and Green function ◦ In the sparse setup we make the ansatz � � 2 m ( z ) + s (4) � � m ( z ) + 1 − 1 m ( z ) + 1 − 1 m ( z ) 2 = 0 , 1 + z � m ( z ) + z � � z � � q 2 d d m ( z ) ∈ C + , z ∈ C + . and pick the solution with � � d � ρ ( x ) ◦ Then there is a probability measure � ρ such that � m ( z ) = x − z and with support on R [ L − , L + ] where � � 2 � � 2 s (4) 1 1 1 + O ( q − 4 ) . 1 ± √ ± √ 1 ± √ L ± = q 2 d d d � Moreover, we have � ρ ( x ) ∼ ( L + − x ) + , for x near the upper edge. ◦ Green function/resolvent of Q := X ∗ X ( N by N matrix) 1 G X ∗ X ( z ) := X ∗ X − z . ◦ By spectral calculus N � 1 1 N Tr G X ∗ X ( z ) = λ i ( Q ) − z . i =1

  16. Tools: Stieltjes transform and Green function ◦ Green function/resolvent of Q := X ∗ X ( N by N matrix) 1 G X ∗ X ( z ) := X ∗ X − z . ◦ By spectral calculus N � 1 1 N Tr G X ∗ X ( z ) = λ i ( Q ) − z . i =1 Local law at the edge: � 1 � � � � 1 1 � � N Tr G X ∗ X ( z ) − � � ≤ N ε m ( z ) q 2 + , N Im z with high probability, for all z = E + i η with E ∈ [ L + − c, L + + 1] , N − 1+ ε ′ ≤ η ≤ 1 .

  17. Linearization ◦ Define an ( N + M ) × ( N + M ) matrix (the linearization of Q ) � − zI N × N � X ∗ z ∈ C + . H ( z ) := , X − I M × M ◦ Introduce the “Green function” G ( z ) := H ( z ) − 1 . By the Schur complement formula, � � 1 G ij ( z ) = , 1 ≤ i, j ≤ N , X ∗ X − zI ij � � 1 G αβ ( z ) = z , N + 1 ≤ α, β ≤ N + M . XX ∗ − zI αβ Weak local law (Ding-Yang 2018) � � � 1 Im m MP ( z ) 1 | G ab ( z ) − Π ab ( z ) | ≤ N ε q + + , 1 ≤ a, b ≤ N + M , N Im z N Im z with high probability, for all z = E + i η with E ∈ [ L + − c, L + + 1] , N − 1+ ε ′ ≤ η ≤ 1 , where � m MP ( z ) I N × N � 0 Π( z ) := . (1 + m MP ( z )) − 1 I M × M 0

  18. Moment estimate Define P : C + → C by � � 2 m + s (4) � zm + 1 − 1 � zm + 1 − 1 q 2 m 2 P ( m ) := 1 + zm + . d d Then we had P ( � m ( z )) = 0 . Hence we expect that � � 1 �� � � 1 � � ≤ N ε Ψ( z ) , � � P N Tr X ∗ X − z with high probability for some “small” control parameter Ψ( z ) . In other words: � D P ( m ( z )) D � m ( z ) := 1 1 ≤ N 2 Dε Ψ( z ) 2 D , E P ( m ( z )) N Tr X ∗ X − z .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend