Log Covariance Matrix Estimation Xinwei Deng Department of - PowerPoint PPT Presentation

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1

Outline • Background and Motivation • The Proposed Log-ME Method • Simulation and Real Example • Summary and Discussion 2

Background • Covariance matrix estimation is important in multivariate analysis and many statistical applications. • Suppose x 1 ,..., x n are i.i.d. p -dimensional random vectors ∼ N ( 0 , Σ ) . Let ′ S = ∑ n i / n be the sample covariance matrix. The negative log-likelihood i = 1 x i x function is proportional to L n ( Σ ) = − log | Σ − 1 | + tr [ Σ − 1 S ] . (1) • Recent interests of p is large or p ≈ n . S is not a stable estimate. – The largest eigenvalues of S overly estimate the true eigenvalues. – When p > n , S is singular and the smallest eigenvalue is zero. How to estimate Σ − 1 ? 3

Recent Estimation Methods on Σ or Σ − 1 • Reduce number of nonzeros estimates of Σ or Σ − 1 . – Σ : Bickel and Levina (2008), using thresholding. – Σ − 1 : Yuan and Lin (2007), l 1 penalty on Σ − 1 . Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. • Shrinkage estimates of the covariance matrix. – Ledoit and Wolf (2006), ρΣ +( 1 − ρ ) µI . – Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4

Motivation • Estimate of Σ or Σ − 1 needs to be positive definite. – The mathematical restriction makes the covariance matrix estimation problem challenging. • Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A . Σ = exp ( A ) = I + A + A 2 2! + ··· – Expressing the likelihood function in terms of A ≡ log ( Σ ) releases the mathematical restriction. • Consider the spectral decomposition of Σ = TDT ′ with D = diag ( d 1 ,..., d p ) . Then A = TMT ′ with M = diag ( log ( d 1 ) ,..., log ( d p )) . 5

Idea of the Proposed Method • Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. – Their approximation based on on S being nonsingular ⇒ not applicable when p ≥ n . • We extend the likelihood approximation to the case of singular S . • Regularize the largest and smallest eigenvalues of Σ simultaneously . • An efficient iterative quadratic programming algorithm to estimate A (log Σ ). • Call the resulting estimate “Log-ME”, Logarithm-transformed Matrix Estimate. 6

A Simple Example • Experiment: simulate x i ’s from N ( 0 , I ) , i = 1 ,..., n where n = 50. • For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. • For each p , repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for – The sample covariance matrix. – The Log-ME covariance matrix estimate 7

) 1 / : 7 ;06 ! <= 8.4 ! 9: & $ 3456-/71-1-+6-.8439-1-/7+,47- -*122+-3/+)4+,5126+/+-3)*13+ !'% ( !'$ # !'# ' !'" " ! & 1 / ! "! #! $! %! &!! ! "! #! $! %! &!! *+,-./+0.12 ()*+,-).,/0 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p .The true eigenvalues are all equal to 1. 8

The Transformed Log-Likelihood • In terms of the covariance matrix logarithm A , the negative log-likelihood function in (1) becomes L n ( A ) = tr ( A )+ tr [ exp ( − A ) S ] . (2) • The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A . • Because of the matrix exponential term exp ( − A ) S , estimating A by directly minimizing L n ( A ) is nontrivial. • Our approach: Approximate exp ( − A ) S using the Volterra integral equation (valid even for S singular case). 9

The Volterra Integral Equation • The Volterra integral equation (Bellman, 1970, page 175) is � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( As ) ds . (3) • Repeatedly applying (3) leads to � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 s ) ds � t � s + 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 ( s − u ))( A − A 0 ) exp ( A 0 u ) duds 0 + cubic and higher order terms , (4) where A 0 = log ( Σ 0 ) and Σ 0 is an initial estimate of Σ . • The expression of exp ( − A ) can be obtained by letting t = 1 in (4) and replacing A , A 0 in (4) with − A , − A 0 . 10

Approximation to the Log-Likelihood • The term tr [ exp ( − A ) S ] can be written as � 1 tr [ exp ( − A ) S ] = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds 0 0 0 + cubic and higher order terms . (5) • By leaving out the higher order terms in (5), we approximate L n ( A ) by using l n ( A ) : � � 1 � l n ( A ) = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds − tr ( A ) 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds . (6) 0 0 0 11

Explicit Form of l n ( A ) • The integrations in l n ( A ) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T ′ 0 . • Some Notation: – Here D 0 = diag ( d ( 0 ) 1 ,..., d ( 0 ) p ) with d ( 0 ) ’s as the eigenvalues of Σ 0 . i – T 0 = ( t ( 0 ) 1 ,..., t ( 0 ) p ) with t ( 0 ) as the corresponding eigenvector for d ( 0 ) . i i – Let B = T ′ 0 ( A − A 0 ) T 0 = ( b ij ) p × p , and ˜ S = T ′ 0 ST 0 = ( ˜ s ij ) p × p . • The l n ( A ) can be written as a function of b ij : p p p 1 ∑ ii + ∑ i = 1 ∑ ∑ ∑ ∑ 2 ξ ii b 2 ξ ij b 2 τ ij b ii b ij + η kij b ik b k j l n ( A ) = ij + 2 i = 1 i < j j � = i k = 1 i < j , i � = k , j � = k � p � ∑ β ii b ii + 2 ∑ β ij b ij − , (7) i = 1 i < j up to some constant. Getting B ↔ Getting A . 12

Some Details • For the linear term, s ij ( d ( 0 ) − d ( 0 ) j ) / ( d ( 0 ) d ( 0 ) j ) ˜ s ii ˜ i i β ii = − 1 , β ij = . d ( 0 ) ( log d ( 0 ) − log d ( 0 ) j ) i i • For the quadratic term, s ii ˜ ξ ii = , d ( 0 ) i s ii / d ( 0 ) s j j / d ( 0 ) ( d ( 0 ) / d ( 0 ) s ii / d ( 0 ) +( d ( 0 ) j / d ( 0 ) s j j / d ( 0 ) − ˜ − 1 ) ˜ − 1 ) ˜ ˜ ξ ij = i j i j i i j + , log d ( 0 ) − log d ( 0 ) ( log d ( 0 ) − log d ( 0 ) ) 2 j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) j i τ ij = i  ˜ ) 2 + s ij ,  ( log d ( 0 ) − log d ( 0 ) log d ( 0 ) − log d ( 0 ) j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) − 1 / d ( 0 ) 2 / d ( 0 ) − 1 / d ( 0 ) − 1 / d ( 0 ) i j j i i j η kij = k  ˜ + + s ij .  log ( d ( 0 ) k / d ( 0 ) j ) log ( d ( 0 ) j / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) k / d ( 0 ) ) j ) j ) i i i i 13

The Log-ME Method • Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n ( A ) . F = tr ( A 2 ) = ∑ p • Consider the penalty function � A � 2 i = 1 ( log ( d i )) 2 , where d i is the eigenvalue of the covariance matrix Σ . – If d i goes to zero or diverges to infinity, the value of log ( d i ) goes to infinity in both cases. – Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. • Estimate Σ , or equivalently A , by minimizing l n , λ ( B ) ≡ l n , λ ( A ) = l n ( A )+ λ tr ( A 2 ) , (8) where λ is a tuning parameter. 14

An Iterative Algorithm • The l n , λ ( B ) depends on an initial estimate Σ 0 , or equivalently, A 0 . • Propose to iteratively use l n , λ ( B ) to obtain its minimizer ˆ B : Algorithm: Step 1 : Set an initial covariance matrix estimate Σ 0 , a positive definite matrix. Step 2 : Use the spectral decomposition Σ 0 = T 0 D 0 T ′ 0 , and set A 0 = log ( Σ 0 ) . ′ B by minimizing l n , λ in (10). Then obtain ˆ Step 3 : Compute ˆ A = T 0 ˆ 0 + A 0 , BT and update the estimate of Σ by ′ Σ = exp ( ˆ ˆ A ) = exp ( T 0 ˆ 0 + A 0 ) . BT Σ − Σ 0 � 2 Step 4 : Check if � ˆ F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆ Σ and go back to Step 2 . • Set an initial Σ 0 in Step 1 to be S + ε I . 15

Simulation Study • Six different covariance models of Σ = ( σ ij ) p × p are used for comparison, – Model 1: Homogeneous model with Σ = I . – Model 2: MA(1) model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 45. – Model 3: Circle model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 3, σ 1 , p = σ p , 1 = 0 . 3. • Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). • Consider two loss functions to evaluate the performance of each method, − 1 | + tr ( ˆ − 1 Σ ) − ( − log | Σ − 1 | + p ) , Σ Σ KL = − log | ˆ ∆ 1 = | ˆ d 1 / ˆ d p − d 1 / d p | , where d 1 and d p are the largest and smallest eigenvalue of Σ . Denote ˆ d 1 and ˆ d p to be their estimates. 16

Log Covariance Matrix Estimation Xinwei Deng Department of - PowerPoint PPT Presentation

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed Log-ME Method

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Robust covariance estimation for financial applications Tim Verdonck , Mia Hubert, Peter Rousseeuw

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor Another Factorization?

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Covariance Matrix Simplification For Efficient Uncertainty Management Andr Jalobeanu, Jorge A.

Learning and Sophistication in Coordination Games Kyle Hyndman 1 Antoine Terracol 2 Jonathan

Eigenvectors of some large sample covariance matrix ensembles Random Matrix Workshop, Tlcom

Cross-section systematics / inputs for VALOR DUNE ND analysis Costas Andreopoulos, Steve

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Log Covariance Matrix Estimation Xinwei Deng Department of - PowerPoint PPT Presentation

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed Log-ME Method

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Robust covariance estimation for financial applications Tim Verdonck , Mia Hubert, Peter Rousseeuw

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &amp;

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Singular Value Decomposition CS3220 - Summer 2008 Jonathan Kaldor Another Factorization?

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Covariance Matrix Simplification For Efficient Uncertainty Management Andr Jalobeanu, Jorge A.

Learning and Sophistication in Coordination Games Kyle Hyndman 1 Antoine Terracol 2 Jonathan

Eigenvectors of some large sample covariance matrix ensembles Random Matrix Workshop, Tlcom

Cross-section systematics / inputs for VALOR DUNE ND analysis Costas Andreopoulos, Steve

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &