log covariance matrix estimation
play

Log Covariance Matrix Estimation Xinwei Deng Department of - PowerPoint PPT Presentation

Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed Log-ME Method


  1. Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1

  2. Outline • Background and Motivation • The Proposed Log-ME Method • Simulation and Real Example • Summary and Discussion 2

  3. Background • Covariance matrix estimation is important in multivariate analysis and many statistical applications. • Suppose x 1 ,..., x n are i.i.d. p -dimensional random vectors ∼ N ( 0 , Σ ) . Let ′ S = ∑ n i / n be the sample covariance matrix. The negative log-likelihood i = 1 x i x function is proportional to L n ( Σ ) = − log | Σ − 1 | + tr [ Σ − 1 S ] . (1) • Recent interests of p is large or p ≈ n . S is not a stable estimate. – The largest eigenvalues of S overly estimate the true eigenvalues. – When p > n , S is singular and the smallest eigenvalue is zero. How to estimate Σ − 1 ? 3

  4. Recent Estimation Methods on Σ or Σ − 1 • Reduce number of nonzeros estimates of Σ or Σ − 1 . – Σ : Bickel and Levina (2008), using thresholding. – Σ − 1 : Yuan and Lin (2007), l 1 penalty on Σ − 1 . Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. • Shrinkage estimates of the covariance matrix. – Ledoit and Wolf (2006), ρΣ +( 1 − ρ ) µI . – Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4

  5. Motivation • Estimate of Σ or Σ − 1 needs to be positive definite. – The mathematical restriction makes the covariance matrix estimation problem challenging. • Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A . Σ = exp ( A ) = I + A + A 2 2! + ··· – Expressing the likelihood function in terms of A ≡ log ( Σ ) releases the mathematical restriction. • Consider the spectral decomposition of Σ = TDT ′ with D = diag ( d 1 ,..., d p ) . Then A = TMT ′ with M = diag ( log ( d 1 ) ,..., log ( d p )) . 5

  6. Idea of the Proposed Method • Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. – Their approximation based on on S being nonsingular ⇒ not applicable when p ≥ n . • We extend the likelihood approximation to the case of singular S . • Regularize the largest and smallest eigenvalues of Σ simultaneously . • An efficient iterative quadratic programming algorithm to estimate A (log Σ ). • Call the resulting estimate “Log-ME”, Logarithm-transformed Matrix Estimate. 6

  7. A Simple Example • Experiment: simulate x i ’s from N ( 0 , I ) , i = 1 ,..., n where n = 50. • For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. • For each p , repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for – The sample covariance matrix. – The Log-ME covariance matrix estimate 7

  8. ) 1 / : 7 ;06 ! <= 8.4 ! 9: & $ 3456-/71-1-+6-.8439-1-/7+,47- -*122+-3/+)4+,5126+/+-3)*13+ !'% ( !'$ # !'# ' !'" " ! & 1 / ! "! #! $! %! &!! ! "! #! $! %! &!! *+,-./+0.12 ()*+,-).,/0 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p .The true eigenvalues are all equal to 1. 8

  9. The Transformed Log-Likelihood • In terms of the covariance matrix logarithm A , the negative log-likelihood function in (1) becomes L n ( A ) = tr ( A )+ tr [ exp ( − A ) S ] . (2) • The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A . • Because of the matrix exponential term exp ( − A ) S , estimating A by directly minimizing L n ( A ) is nontrivial. • Our approach: Approximate exp ( − A ) S using the Volterra integral equation (valid even for S singular case). 9

  10. The Volterra Integral Equation • The Volterra integral equation (Bellman, 1970, page 175) is � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( As ) ds . (3) • Repeatedly applying (3) leads to � t exp ( At ) = exp ( A 0 t )+ 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 s ) ds � t � s + 0 exp ( A 0 ( t − s ))( A − A 0 ) exp ( A 0 ( s − u ))( A − A 0 ) exp ( A 0 u ) duds 0 + cubic and higher order terms , (4) where A 0 = log ( Σ 0 ) and Σ 0 is an initial estimate of Σ . • The expression of exp ( − A ) can be obtained by letting t = 1 in (4) and replacing A , A 0 in (4) with − A , − A 0 . 10

  11. Approximation to the Log-Likelihood • The term tr [ exp ( − A ) S ] can be written as � 1 tr [ exp ( − A ) S ] = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds 0 0 0 + cubic and higher order terms . (5) • By leaving out the higher order terms in (5), we approximate L n ( A ) by using l n ( A ) : � � 1 � l n ( A ) = tr ( S Σ − 1 0 tr [( A − A 0 ) Σ − s 0 S Σ s − 1 0 ) − ] ds − tr ( A ) 0 � 1 � s 0 tr [( A − A 0 ) Σ u − s ( A − A 0 ) Σ − u 0 S Σ s − 1 + ] duds . (6) 0 0 0 11

  12. Explicit Form of l n ( A ) • The integrations in l n ( A ) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T ′ 0 . • Some Notation: – Here D 0 = diag ( d ( 0 ) 1 ,..., d ( 0 ) p ) with d ( 0 ) ’s as the eigenvalues of Σ 0 . i – T 0 = ( t ( 0 ) 1 ,..., t ( 0 ) p ) with t ( 0 ) as the corresponding eigenvector for d ( 0 ) . i i – Let B = T ′ 0 ( A − A 0 ) T 0 = ( b ij ) p × p , and ˜ S = T ′ 0 ST 0 = ( ˜ s ij ) p × p . • The l n ( A ) can be written as a function of b ij : p p p 1 ∑ ii + ∑ i = 1 ∑ ∑ ∑ ∑ 2 ξ ii b 2 ξ ij b 2 τ ij b ii b ij + η kij b ik b k j l n ( A ) = ij + 2 i = 1 i < j j � = i k = 1 i < j , i � = k , j � = k � p � ∑ β ii b ii + 2 ∑ β ij b ij − , (7) i = 1 i < j up to some constant. Getting B ↔ Getting A . 12

  13. Some Details • For the linear term, s ij ( d ( 0 ) − d ( 0 ) j ) / ( d ( 0 ) d ( 0 ) j ) ˜ s ii ˜ i i β ii = − 1 , β ij = . d ( 0 ) ( log d ( 0 ) − log d ( 0 ) j ) i i • For the quadratic term, s ii ˜ ξ ii = , d ( 0 ) i s ii / d ( 0 ) s j j / d ( 0 ) ( d ( 0 ) / d ( 0 ) s ii / d ( 0 ) +( d ( 0 ) j / d ( 0 ) s j j / d ( 0 ) − ˜ − 1 ) ˜ − 1 ) ˜ ˜ ξ ij = i j i j i i j + , log d ( 0 ) − log d ( 0 ) ( log d ( 0 ) − log d ( 0 ) ) 2 j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) j i τ ij = i  ˜ ) 2 + s ij ,  ( log d ( 0 ) − log d ( 0 ) log d ( 0 ) − log d ( 0 ) j i j i   1 / d ( 0 ) − 1 / d ( 0 ) 1 / d ( 0 ) − 1 / d ( 0 ) 2 / d ( 0 ) − 1 / d ( 0 ) − 1 / d ( 0 ) i j j i i j η kij = k  ˜ + + s ij .  log ( d ( 0 ) k / d ( 0 ) j ) log ( d ( 0 ) j / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) / d ( 0 ) log ( d ( 0 ) k / d ( 0 ) ) log ( d ( 0 ) k / d ( 0 ) ) j ) j ) i i i i 13

  14. The Log-ME Method • Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n ( A ) . F = tr ( A 2 ) = ∑ p • Consider the penalty function � A � 2 i = 1 ( log ( d i )) 2 , where d i is the eigenvalue of the covariance matrix Σ . – If d i goes to zero or diverges to infinity, the value of log ( d i ) goes to infinity in both cases. – Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. • Estimate Σ , or equivalently A , by minimizing l n , λ ( B ) ≡ l n , λ ( A ) = l n ( A )+ λ tr ( A 2 ) , (8) where λ is a tuning parameter. 14

  15. An Iterative Algorithm • The l n , λ ( B ) depends on an initial estimate Σ 0 , or equivalently, A 0 . • Propose to iteratively use l n , λ ( B ) to obtain its minimizer ˆ B : Algorithm: Step 1 : Set an initial covariance matrix estimate Σ 0 , a positive definite matrix. Step 2 : Use the spectral decomposition Σ 0 = T 0 D 0 T ′ 0 , and set A 0 = log ( Σ 0 ) . ′ B by minimizing l n , λ in (10). Then obtain ˆ Step 3 : Compute ˆ A = T 0 ˆ 0 + A 0 , BT and update the estimate of Σ by ′ Σ = exp ( ˆ ˆ A ) = exp ( T 0 ˆ 0 + A 0 ) . BT Σ − Σ 0 � 2 Step 4 : Check if � ˆ F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆ Σ and go back to Step 2 . • Set an initial Σ 0 in Step 1 to be S + ε I . 15

  16. Simulation Study • Six different covariance models of Σ = ( σ ij ) p × p are used for comparison, – Model 1: Homogeneous model with Σ = I . – Model 2: MA(1) model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 45. – Model 3: Circle model with σ ii = 1 , σ i , i − 1 = σ i − 1 , i = 0 . 3, σ 1 , p = σ p , 1 = 0 . 3. • Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). • Consider two loss functions to evaluate the performance of each method, − 1 | + tr ( ˆ − 1 Σ ) − ( − log | Σ − 1 | + p ) , Σ Σ KL = − log | ˆ ∆ 1 = | ˆ d 1 / ˆ d p − d 1 / d p | , where d 1 and d p are the largest and smallest eigenvalue of Σ . Denote ˆ d 1 and ˆ d p to be their estimates. 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend