TracyWidom limit for sample covariance matrices Kevin Schnelli KTH - PowerPoint PPT Presentation

Tracy–Widom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of Technology Joint work with Jong Yun Hwang (KAIST) and Ji Oon Lee (KAIST) May 10, 2019

Sample covariance matrix Consider a (mean-zero) multivariate Gaussian random variable:   y 1  y 2     ∈ R M , y =  .  y ∼ N (0 , Σ) . .  . y M Σ = E yy ∗ , i.e. , Σ αβ = E y α y β : covariance matrix Sample covariance matrix: N � Σ := 1 � y i y ∗ i , y i : samples . N i =1 Traditional setup: M fixed, N ր ∞ , then � Σ → Σ = E yy ∗ . High-dimensional setup: N/M =: d N → d ∈ (0 , ∞ ) , N ր ∞ .

Wishart matrix: Σ = I Let λ 1 ≥ λ 2 ≥ . . . ≥ λ M denote the ordered eigenvalues of � Σ . For any fixed interval J � � � � � . = 1 N J . � � � N J − ρ MP ( x )d x � → 0 , with M # { j : λ j ∈ J } , J almost surely as N → ∞ , where ρ MP is the Marchenko-Pastur law: � ν d ( x )d x + (1 − d ) δ 0 ( x )d x , if d < 1 , ρ MP ( x )d x = ν d ( x )d x , if d ≥ 1 , where � ( E + − x )( x − E − ) ν d ( x ) = d E ± = (1 ± d − 1 / 2 ) 2 . 1 [ E − ,E + ] ( x ) and 2 π x 0.8 0.6 0.4 0.2 0.5 1.0 1.5 2.0 2.5 3.0 Figure : Histogram of the eigenvalues of � Σ : Σ = I , N = 2000 , d = 2 .

Largest eigenvalue: Σ = I Theorem. Let E + denote the upper endpoint of the support of ρ MP . Then for any (small) ε > 0 and (large) D > 0 N ε � � 1 | λ 1 − E + | ≥ ≤ P N D , N 2 / 3 for N ≥ N 0 ( ε, D ) . Hence, we expect that the fluctuations of the largest eigenvalue are of order N − 2 / 3 .

Largest eigenvalue: Σ = I Theorem: (Johnstone 2001, Johansson 2000) For real, respectively complex, Gaussians with Σ = I , there is a constant γ = γ ( d ) such that � � γN 2 / 3 ( λ 1 − E + ) ≤ s lim = F β ( s ) , β = 1 , 2 . N →∞ P Tracy–Widom distributions: � � � � � ∞ � ∞ − 1 ( F 2 ( s )) 1 / 2 , ( x − s ) q ( x ) 2 dx F 2 ( s ) := exp − , F 1 ( s ) := exp q ( x ) dx 2 s s where q satisfies q ′′ = sq + 2 q 3 , q ( s ) ∼ Ai ( s ) as s → ∞ . 0.3 0.2 0.1 - 4 - 2 2 4 Figure : Histogram of 2000 samples of γN 2 / 3( λ 1 − E +) of N = 1000 d = 2 , real Gaussians.

Universality results Replace Gaussian distributions by general distributions. Definition. Let X = ( x αi ) be an M × N matrix whose entries are i.i.d. real random variables such that C k E | x αi | 2 = N − 1 , E | x αi | k � E x αi = 0 , N k/ 2 . Assume that M = M ( N ) with N/M → d ∈ (0 , ∞ ) as N → ∞ . Sample covariance matrix: � Σ = XX ∗ . Theorem (Pillai and Yin 2014). Let E + denote the upper endpoint of the support of ρ MP . Then for any (small) ε > 0 and (large) D > 0 � � γN 2 / 3 ( λ 1 − E + ) ≤ s lim = F 1 ( s ) , N →∞ P for N ≥ N 0 ( ε, D ) , where γ = d − 1 / 6 ( d 1 / 4 + d − 1 / 4 ) − 1 .

Sparse sample covariance matrices Motivation: Biadjacancy matrix of bipartite graph. Two vertex sets: V = { v α } M α =1 has size M ; W = { w i } N i =1 has size N . Edges only between V and W . Biadjacancy matrix: 1 ≤ α ≤ M , 1 ≤ i ≤ N . B = ( b αi ) , � 1 , if there is an edge between v α and w i , b αi = 0 , else .

Sparse sample covariance matrices Biadjacancy matrix of bipartite Erd˝ os-R´ enyi graph: B = ( b αi ) , ( b αi ) are i.i.d. random variables satisfying � 1 , with probability p b αi = 0 , with probability 1 − p , Remarks: ◦ E b k αi = p , k ≥ 1 . ◦ We allow p to depend on N . We say that B is sparse if p ≪ 1 .

Sparse sample covariance matrices Center and normalize B = ( b αi ) : b αi − p X αi . . = � . Np (1 − p ) Then, � � αi = 1 1 E X 2 E X k E X αi = 0 , N , αi = 1 + O ( p ) , ( k ≥ 3) . N ( Np ) ( k − 2) / 2 Introduce the sparsity parameter q by q 2 . 0 < q ≤ cN 1 / 2 . . = pN , Hence, � � 1 E X k 1 + O ( q 2 /N ) αi = , k ≥ 3 . Nq k − 2

Sparse sample covariance matrix Definition. Let X = ( X αi ) be an M × N matrix whose entries are real i.i.d. random variables such that C k E | X αi | 2 = N − 1 , E | X αi | k ≤ E X αi = 0 , Nq k − 2 for some q = N φ , 0 < φ ≤ 1 2 . Assume that M = M ( N ) with N/M → d ∈ (0 , ∞ ) as N → ∞ . Sample covariance matrix � Σ := XX ∗ . Remark: For any φ > 0 , the eigenvalues of � Σ follow the Marchenko-Pastur law on global scale. Rescaled cumulants: Let κ ( k ) be the k -th cumulant of X αi : ∞ � κ ( k ) t k � e tX αi � κ (2) = 1 κ (1) = 0 , log E = k ! , N . k =1 Set, for k ≥ 3 , s (1) := 0 , s (2) := 1 , s ( k ) := Nq k − 2 κ ( k ) .

Estimates on largest eigenvalue Theorem (Hwang-Lee-S. ’18). Consider the largest eigenvalue λ 1 of XX ∗ . Assume that q ≥ N φ , φ > 0 . Then, there is L + ∈ R such that for any ε > 0 and D > 0 , �� > N ε � 1 �� 1 < N − D , � λ 1 − L + P q 4 + N 2 / 3 for N ≥ N 0 ( ε, D ) , where � � 2 � � 2 s (4) 1 1 1 + O ( q − 4 ) . L + := 1 + √ + √ 1 + √ q 2 d d d

Tracy–Widom limit Theorem (Hwang-Lee-S. ’18). Assume that φ > 1 / 6 (so that q ≫ N 1 / 6 ). Let λ 1 be the largest eigenvalue of XX ∗ . Then, � � γN 2 / 3 ( λ 1 − L + ) � s lim = F 1 ( s ) N →∞ P for all s ∈ R , where γ − 1 = d 1 / 6 ( d 1 / 4 + d − 1 / 4 ) . Remark: For N 1 / 6 ≪ q ≤ N 1 / 3 , the spectral shift L + − E + ≃ 1 q 2 is much larger than the TW-fluctuations. Remark: If φ < 1 / 6 , it is believed that the limiting distribution will be Gaussian; cf. Huang-Landon-Yau ’17.

(Some) previous results � TW for Wishart matrix ( X is Gaussian, Σ = I ) � Complex case: Johansson (2000) � Real case: Johnstone (2001) � Null case ( Σ = I ) � Edge universality: Pillai-Yin (2014), Ding-Yang (2018) � (Phase transition for) Spiked Wishart matrix ( X is Gaussian, Σ is a finite-rank perturbation of I ) � Complex case: Baik-Ben Arous-P´ ech´ e (2005) � Real case: Bloemendal-Vir´ ag (2011) � Spiked sample covariance matrix ( Σ is a finite-rank perturbation of I ) � Edge universality: Bloemendal-Knowles-Yau-Yin (2014) � Non-null case ( Σ � = I ) � TW for Gaussian, complex case: El Karoui (2007), real case: Lee-S. (2016), Fan-Johnstone (2017) � Edge universality: Bao-Pan-Zhou (2015), Knowles-Yin (2017).

Tools: Stieltjes transform and Green function ◦ Given a probability measure µ on R , its Stieltjes transform is defined as � d µ ( x ) z ∈ C + . m µ ( z ) . . = x − z , R ◦ For µ the Marchenko-Pastur law, we have �� 2 − 4 z z + 1 − 1 z + 1 − 1 + d d m MP ( z ) = . 2 z ◦ Hence, m MP ( z ) is the solution to � � zm MP + 1 − 1 1 + zm MP + m MP ( z ) = 0 , d with m MP ( z ) ∈ C + , for z ∈ C + . ◦ In the sparse setup we make the ansatz � � 2 m ( z ) + s (4) � m ( z ) + 1 − 1 � m ( z ) + 1 − 1 m ( z ) 2 = 0 , 1 + z � m ( z ) + z � � z � � q 2 d d m ( z ) ∈ C + , z ∈ C + . and pick the solution with �

Tools: Stieltjes transform and Green function ◦ In the sparse setup we make the ansatz � � 2 m ( z ) + s (4) � � m ( z ) + 1 − 1 m ( z ) + 1 − 1 m ( z ) 2 = 0 , 1 + z � m ( z ) + z � � z � � q 2 d d m ( z ) ∈ C + , z ∈ C + . and pick the solution with � � d � ρ ( x ) ◦ Then there is a probability measure � ρ such that � m ( z ) = x − z and with support on R [ L − , L + ] where � � 2 � � 2 s (4) 1 1 1 + O ( q − 4 ) . 1 ± √ ± √ 1 ± √ L ± = q 2 d d d � Moreover, we have � ρ ( x ) ∼ ( L + − x ) + , for x near the upper edge. ◦ Green function/resolvent of Q := X ∗ X ( N by N matrix) 1 G X ∗ X ( z ) := X ∗ X − z . ◦ By spectral calculus N � 1 1 N Tr G X ∗ X ( z ) = λ i ( Q ) − z . i =1

Tools: Stieltjes transform and Green function ◦ Green function/resolvent of Q := X ∗ X ( N by N matrix) 1 G X ∗ X ( z ) := X ∗ X − z . ◦ By spectral calculus N � 1 1 N Tr G X ∗ X ( z ) = λ i ( Q ) − z . i =1 Local law at the edge: � 1 � � � � 1 1 � � N Tr G X ∗ X ( z ) − � � ≤ N ε m ( z ) q 2 + , N Im z with high probability, for all z = E + i η with E ∈ [ L + − c, L + + 1] , N − 1+ ε ′ ≤ η ≤ 1 .

Linearization ◦ Define an ( N + M ) × ( N + M ) matrix (the linearization of Q ) � − zI N × N � X ∗ z ∈ C + . H ( z ) := , X − I M × M ◦ Introduce the “Green function” G ( z ) := H ( z ) − 1 . By the Schur complement formula, � � 1 G ij ( z ) = , 1 ≤ i, j ≤ N , X ∗ X − zI ij � � 1 G αβ ( z ) = z , N + 1 ≤ α, β ≤ N + M . XX ∗ − zI αβ Weak local law (Ding-Yang 2018) � � � 1 Im m MP ( z ) 1 | G ab ( z ) − Π ab ( z ) | ≤ N ε q + + , 1 ≤ a, b ≤ N + M , N Im z N Im z with high probability, for all z = E + i η with E ∈ [ L + − c, L + + 1] , N − 1+ ε ′ ≤ η ≤ 1 , where � m MP ( z ) I N × N � 0 Π( z ) := . (1 + m MP ( z )) − 1 I M × M 0

Moment estimate Define P : C + → C by � � 2 m + s (4) � zm + 1 − 1 � zm + 1 − 1 q 2 m 2 P ( m ) := 1 + zm + . d d Then we had P ( � m ( z )) = 0 . Hence we expect that � � 1 �� 1 � � ≤ N ε Ψ( z ) , � � P N Tr X ∗ X − z with high probability for some “small” control parameter Ψ( z ) . In other words: � D P ( m ( z )) D � m ( z ) := 1 1 ≤ N 2 Dε Ψ( z ) 2 D , E P ( m ( z )) N Tr X ∗ X − z .

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH - PowerPoint PPT Presentation

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of Technology Joint work with Jong Yun Hwang (KAIST) and Ji Oon Lee (KAIST) May 10, 2019 Sample covariance matrix Consider a (mean-zero) multivariate

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Widom Larsen Theory Widom Larsen Theory Dr. Pat McDaniel Dr. Pat McDaniel ISNPS- -UNM UNM

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Tele-Exercise and Multiple Sclerosis (TEAMS) Study Tracy Flemming Tracy, Clinical Research

A Tracy-Widom Empirical Estimator For Valid P-values With High-Dimensional Datasets Maxime

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Why Student Distributions? A Combination . . . Why Materns Covariance Main Result Derivation

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

An assessment of the tropical Humidity Temperature covariance using AIRS Antonia Gambacorta,

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation

Probability and Statistics for Computer Science cov ( X, Y

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH - PowerPoint PPT Presentation

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of Technology Joint work with Jong Yun Hwang (KAIST) and Ji Oon Lee (KAIST) May 10, 2019 Sample covariance matrix Consider a (mean-zero) multivariate

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Widom Larsen Theory Widom Larsen Theory Dr. Pat McDaniel Dr. Pat McDaniel ISNPS- -UNM UNM

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Covariance Matrices &amp; All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Tele-Exercise and Multiple Sclerosis (TEAMS) Study Tracy Flemming Tracy, Clinical Research

A Tracy-Widom Empirical Estimator For Valid P-values With High-Dimensional Datasets Maxime

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Why Student Distributions? A Combination . . . Why Materns Covariance Main Result Derivation

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

An assessment of the tropical Humidity Temperature covariance using AIRS Antonia Gambacorta,

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation

Probability and Statistics for Computer Science cov ( X, Y

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)