Spiked Eigenvalues of High Dimensional Separable Sample Covariance - PowerPoint PPT Presentation

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan, Nanyang Technological University, Singapore November 19, 2019 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 1 / 75

Outline Motivation 1 Misleading PCA on Simulated Data High Dimensional Separable Covariance Model 2 Asymptotic Performance of Largest Eigenvalues 3 Inference on High Dimensional Time Series 4 Implementing Factor Analysis on Our Model Unit Root Models Satisfying Assumption 4 A New Test for Unit Root against Factor Model More Thoughts about Panel Data Structures Simulations 5 The Simulation about Proposition 3 The Simulation about Proposition 4 Reference 6 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 2 / 75

The model y it = ℓ i 1 f 1 t + ℓ i 2 f 2 t + ε it = ℓ ∗ i f t + ε it , i = 1 , 2 , . . . , n ; t = 1 , 2 , . . . , T, (1) where f t = ( f 1 t , f 2 t ) ∗ are two common factors, ℓ i = ( ℓ i 1 , ℓ i 2 ) ∗ are the corresponding factor loadings, and ε it is the error component, in which the symbol “ ∗ ” denotes the conventional conjugate transpose. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 4 / 75

Scenario : No true common factors Under this case, the factor loadings are generated as ℓ i = (0 , 0) ∗ . When the original data follow AR(1) model ( γ = 0 . 2 ), Figures 1 and 2 provide all eigenvalues of the sample covariance matrix as ( T, n ) = (20 , 40) and ( T, n ) = (40 , 20) , respectively. There are no spiked eigenvalues in view of these graphs, which correctly reflect the fact that there are no common factors in the original data. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 5 / 75

Figures Figure: 1 T = 20 , n = 40 , γ = 0 . 2 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 6 / 75

Figures Figure: 2 T = 40 , n = 20 , γ = 0 . 2 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 7 / 75

Figures Figure: 3 T = 20 , n = 40 , γ = 1 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 8 / 75

Figures Figure: 4 T = 40 , n = 20 , γ = 1 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 9 / 75

Scenario : No true common factors However, as the data observations are nonstationary ( γ = 1 ), Figures 3 and 4 show that there is one spiked eigenvalue from the sample covariance matrix, while the true number of common factors is 0 . This example demonstrates that PCA may not be informative accurately on high dimensional data with dependent sample observations. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 10 / 75

High Dimensional Separable Covariance Model Consider an n -dimensional random vector y with observations y 1 , y 2 , . . . , y T . Pool all observations together into a T × n matrix Y = ( y 1 , y 2 , . . . , y T ) ∗ . The data matrix Y has the structure Y = ΓXΩ 1 / 2 , (2) where X = ( x 1 , ..., x n ) = ( x ij ) ( T + L ) × n is a ( T + L ) × n random matrix with i.i.d. elements; Σ = ΓΓ ∗ and Ω are T × T and n × n deterministic non-negative definite matrices, respectively. Here Γ is a T × ( T + L ) deterministic matrix. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 12 / 75

Separable covariance matrix Actually the matrix Γ describes dependence among sample observations. The matrix Ω measures cross-sectional dependence for y under study. Under this setting, the sample covariance matrix of y can be expressed as ΓXΩX ∗ Γ ∗ . It is also called separable covariance matrix. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 13 / 75

Largest spiked eigenvalues We are interested in the largest spiked eigenvalues of matrix Ω , which describes the cross-sectional dependence. In the classical procedure of using PCA, spiked empirical eigenvalues from sample covariance matrix ΓXΩX ∗ Γ ∗ are utilized to approximate those of the matrix Ω . In this paper, we investigate the spiked empirical eigenvalues from an innovative view: how the the spiked eigenvalues of the matrix Σ (due to the dependent sample) affect the spiked sample eigenvalues ? To this end, we do not impose any spiked structures on the matrix Ω . Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 14 / 75

spikiness of the matrix Σ We assume spikiness of the matrix Σ through the following decomposition. Let the spectral decomposition of Γ be V Λ 1 / 2 U , where V and U are T × T and T × ( T + L ) orthogonal matrices respectively ( VV ∗ = UU ∗ = I ), Λ is a diagonal matrix composed by the descent � Λ S � 0 ordered eigenvalues of Σ = ΓΓ ∗ . Moreover, we write Λ = , 0 Λ P where Λ S = diag ( µ 1 , ..., µ K ) , Λ P = diag ( µ K +1 , ..., µ T ) , and µ 1 , ..., µ K are referred to the spiked eigenvalues that are significantly bigger than the � U 1 � and Σ 2 = U ∗ rest. In addition, we write U = 2 Λ P U 2 . U 2 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 15 / 75

Asymptotic Performance of Largest Eigenvalues This section is to establish the asymptotic distribution of the largest spiked empirical eigenvalues. First, we make the following assumptions. Assumption (Moment Conditions) { x ij : i = 1 , ..., T + L , j = 1 , ..., n } are i.i.d random variables such that E x ij = 0 . E |√ nx ij | 2 = 1 and E |√ nx ij | 4 = γ 4 < ∞ . Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 17 / 75

Assumption 2 Assumption (Dependent Sample Structure) α L = µ K = ... = µ K − n L < α L− 1 = µ K − n L +1 ... < α 1 = µ n 1 = ... = µ 1 , where n 1 ,..., n L are finite. Moreover, there exists a small constant c > 0 such that α i − 1 − α i ≥ cα i for i = 1 , 2 , ..., L and µ K − µ K +1 ≥ cµ K . Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 18 / 75

Assumption 3 Assumption (Cross-sectional Structure) The matrix Ω is nonnegative definite and its effective rank r ∗ ( Ω ) = tr ( Ω ) � Ω � 2 → ∞ , where � Ω � 2 means the spectral norm. Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 19 / 75

Assumption 4 Assumption (Spiked Dependent Sample Structure) The spiked eigenvalues of the population covariance matrix are much bigger than the rest of the eigenvalues. Precisely speaking, for ∀ ε > 0 , there is K ε , independent of n and T , such that when n and T are big enough, � T i = K ε µ i < ε 2 . (3) µ K Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices November 19, 2019 20 / 75

Spiked Eigenvalues of High Dimensional Separable Sample Covariance - PowerPoint PPT Presentation

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan, Nanyang Technological University, Singapore November 19, 2019 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Statistical inference in a spiked population model Jian-feng Yao Joint work with Weiming Li

Homogeneous Linear Systems Three Cases: Distinct Real Eigenvalues Repeated Eigenvalues

Edgeworth and confidence interval correction in spiked PCA Iain Johnstone & Jeha Yang

Weak Detection of Signal in the Spiked Wigner Model Hye Won Chung and Ji Oon Lee Korea Advanced

Eigenvalues and Eigenvectors Raibatak Sen Gupta 2019 Eigenvalues Characteristic Equation and

Eigenvalues, Eigenvectors, and Diagonalization Diagonalization Math 240 Calculus III Summer

Three-body approach to d + scattering and bound state using realistic forces in a separable or

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

For matrix A ( p p ) with real eigenvalues, define F A , the empirical distribution function of

Chapter 5 Eigenvalues and Eigenvectors Section 5.1 Eigenvectors and Eigenvalues Motivation:

For matrix A ( p p ) with real eigenvalues, define F A , the empirical distribution function of

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and

Linear Algebra Chapter 5: Eigenvalues and Eigenvectors Section 5.1. Eigenvalues and

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016

Probability and Statistics for Computer Science cov ( X, Y

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the

= = = 2 Further Var( X ) Var( ) Y a a a X =

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political

Spiked Eigenvalues of High Dimensional Separable Sample Covariance - PowerPoint PPT Presentation

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan, Nanyang Technological University, Singapore November 19, 2019 Guangming Pan, (USTC) Spiked Eigenvalues of High Dimensional Separable Sample Covariance

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Statistical inference in a spiked population model Jian-feng Yao Joint work with Weiming Li

Homogeneous Linear Systems Three Cases: Distinct Real Eigenvalues Repeated Eigenvalues

Edgeworth and confidence interval correction in spiked PCA Iain Johnstone &amp; Jeha Yang

Weak Detection of Signal in the Spiked Wigner Model Hye Won Chung and Ji Oon Lee Korea Advanced

Eigenvalues and Eigenvectors Raibatak Sen Gupta 2019 Eigenvalues Characteristic Equation and

Eigenvalues, Eigenvectors, and Diagonalization Diagonalization Math 240 Calculus III Summer

Three-body approach to d + scattering and bound state using realistic forces in a separable or

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

For matrix A ( p p ) with real eigenvalues, define F A , the empirical distribution function of

Chapter 5 Eigenvalues and Eigenvectors Section 5.1 Eigenvectors and Eigenvalues Motivation:

For matrix A ( p p ) with real eigenvalues, define F A , the empirical distribution function of

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and

Linear Algebra Chapter 5: Eigenvalues and Eigenvectors Section 5.1. Eigenvalues and

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016

Probability and Statistics for Computer Science cov ( X, Y

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the

= = = 2 Further Var( X ) Var( ) Y a a a X =

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political

Edgeworth and confidence interval correction in spiked PCA Iain Johnstone & Jeha Yang