high dimensional covariance decomposition into sparse
play

High-Dimensional Covariance Decomposition into Sparse Markov and - PowerPoint PPT Presentation

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance


  1. High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine

  2. High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance estimation: Σ ∗ := E [ XX T ] . High-dimensional regime: both n, p → ∞ and n ≪ p . Challenge: empirical (sample) covariance ill-posed when n ≪ p : n � Σ n := 1 x ( k ) x ( k ) T . � n k =1 Solution: Imposing Sparsity for Tractable High-dimensional Estimation

  3. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R

  4. Incorporating Sparsity in High Dimensions Sparse Covariance Σ ∗ Σ ∗ R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence

  5. Incorporating Sparsity in High Dimensions Sparse Inverse Covariance 1 J ∗ − 1 Σ ∗ M Relationship with Statistical Properties (Gaussian) Sparse Inverse Covariance (Markov Model): conditional independence Local Markov Property: X i ⊥ X V \{ nbd( i ) ∪ i } | X nbd( i ) For Gaussian: J ij = 0 ⇔ ( i, j ) / ∈ E

  6. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence

  7. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 J ∗ − 1 Σ ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence Guarantees under Sparsity Constraints in High Dimensions Consistent Estimation when n = Ω(log p ) ⇒ n ≪ p . Consistent: Sparsistent and Satisfying reasonable Norm Guarantees.

  8. Incorporating Sparsity in High Dimensions Sparse Covariance Sparse Inverse Covariance 1 − 1 Σ ∗ J ∗ Σ ∗ Σ ∗ M R Relationship with Statistical Properties (Gaussian) Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence Guarantees under Sparsity Constraints in High Dimensions Consistent Estimation when n = Ω(log p ) ⇒ n ≪ p . Going beyond Sparsity in High Dimensions?

  9. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains

  10. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation 1 Σ ∗ J ∗ − 1 Σ ∗ R M

  11. Going Beyond Sparse Models Motivation Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation 1 Σ ∗ J ∗ − 1 Σ ∗ R M Efficient Decomposition and Estimation in High Dimensions? Unique Decomposition? Good Sample Requirements?

  12. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M

  13. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M Contribution 1: Novel Model for Decomposition Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.

  14. Summary of Results 1 Σ ∗ + Σ ∗ R = J ∗ − 1 . M Contribution 1: Novel Model for Decomposition Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation. Contribution 2: Methods and Guarantees Conditions for unique decomposition (exact statistics). Sparsistency and norm guarantees in both Markov and independence domains (sample analysis) Sample requirement: no. of samples n = Ω(log p ) for p variables. Efficient Method for Covariance Decomposition and Estimation in High-Dimension

  15. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang)

  16. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang) Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al)

  17. Related Works Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding. ◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.) Sparse Inverse Covariance Estimation: ◮ ℓ 1 Penalization (Meinshausen and B¨ uhlmann) (Ravikumar et. al) ◮ Non-Convex Methods (Anandkumar et. al) (Zhang) Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al) Multi-Resolution Markov+Independence Models (Choi et. al) Decomposition in inverse covariance domain Lack theoretical guarantees Our contribution: Guaranteed Decomposition and Estimation

  18. Outline Introduction 1 Algorithm 2 Guarantees 3 Experiments 4 Proof Techniques 5 Conclusion 6

  19. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance

  20. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance Sparse Covariance Estimation (Independence Model) Σ ∗ = Σ ∗ I . � Σ n : sample covariance using n samples p variables: p ≫ n .

  21. Some Intuitions and Ideas Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance Sparse Covariance Estimation (Independence Model) Σ ∗ = Σ ∗ I . � Σ n : sample covariance using n samples p variables: p ≫ n . Σ n (Bickel & Levina): Hard-thresholding the off-diagonal entries of � � log p threshold chosen as n Sparsistency (support recovery) and Norm Guarantees when n = Ω(log p ) ⇒ n ≪ p .

  22. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples

  23. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j

  24. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j Max-entropy Formulation (Lagrangian Dual) � Σ M := argmax log det Σ M − λ � Σ R � 1 , off Σ M ≻ 0 , Σ R � � �� Σ n � � � Σ n − Σ M � ∞ , off ≤ γ, � � s . t . Σ M d = , Σ R d = 0 . d

  25. Recap of Inverse Covariance (Markov) Estimation Σ ∗ = J ∗ − 1 +Σ ∗ 1 M R � Σ n : sample covariance using n i.i.d. samples ℓ 1 -MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08) � � � Σ n , J M � − log det J M + γ � J M � 1 , off J M := argmin J M ≻ 0 where � � J M � 1 , off := | ( J M ) ij | . i � = j Max-entropy Formulation (Lagrangian Dual) � Σ M := argmax log det Σ M − λ � Σ R � 1 , off Σ M ≻ 0 , Σ R � � �� Σ n � � � Σ n − Σ M � ∞ , off ≤ γ, � � s . t . Σ M d = , Σ R d = 0 . d Consistent Estimation Under Certain Conditions, n = Ω(log p )

  26. Extension to Markov+Independence Models? 1 Σ ∗ + Σ ∗ − 1 . R = J ∗ M Sparse Covariance Estimation Hard-thresholding the off-diagonal entries of � Σ n . Sparse Inverse Covariance Estimation Add ℓ 1 penalty to maximum likelihood program (involving inverse covariance matrix estimation)

  27. Extension to Markov+Independence Models? 1 Σ ∗ + Σ ∗ − 1 . R = J ∗ M Sparse Covariance Estimation Hard-thresholding the off-diagonal entries of � Σ n . Sparse Inverse Covariance Estimation Add ℓ 1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend