High-Dimensional Covariance Decomposition into Sparse Markov and - - PowerPoint PPT Presentation
High-Dimensional Covariance Decomposition into Sparse Markov and - - PowerPoint PPT Presentation
High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance
High-Dimensional Covariance Estimation
n i.i.d. samples, p variables X := [X1, . . . , Xp]T . Covariance estimation: Σ∗ := E[XXT ]. High-dimensional regime: both n, p → ∞ and n ≪ p. Challenge: empirical (sample) covariance ill-posed when n ≪ p:
- Σn := 1
n
n
- k=1
x(k)x(k)T . Solution: Imposing Sparsity for Tractable High-dimensional Estimation
Incorporating Sparsity in High Dimensions
Sparse Covariance
Σ∗ Σ∗
R
Sparse Inverse Covariance
1
Σ∗ J∗
M −1
Incorporating Sparsity in High Dimensions
Sparse Covariance
Σ∗ Σ∗
R
Relationship with Statistical Properties (Gaussian)
Sparse Covariance (Independence Model): marginal independence
Incorporating Sparsity in High Dimensions
Sparse Inverse Covariance
1
Σ∗ J∗
M −1
Relationship with Statistical Properties (Gaussian)
Sparse Inverse Covariance (Markov Model): conditional independence Local Markov Property: Xi ⊥ XV \{nbd(i)∪i} | Xnbd(i) For Gaussian: Jij = 0 ⇔ (i, j) / ∈ E
Incorporating Sparsity in High Dimensions
Sparse Covariance
Σ∗ Σ∗
R
Sparse Inverse Covariance
1
Σ∗ J∗
M −1
Relationship with Statistical Properties (Gaussian)
Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence
Incorporating Sparsity in High Dimensions
Sparse Covariance
Σ∗ Σ∗
R
Sparse Inverse Covariance
1
Σ∗ J∗
M −1
Relationship with Statistical Properties (Gaussian)
Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence
Guarantees under Sparsity Constraints in High Dimensions
Consistent Estimation when n = Ω(log p) ⇒ n ≪ p. Consistent: Sparsistent and Satisfying reasonable Norm Guarantees.
Incorporating Sparsity in High Dimensions
Sparse Covariance
Σ∗ Σ∗
R
Sparse Inverse Covariance
1
Σ∗ J∗
M −1
Relationship with Statistical Properties (Gaussian)
Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence
Guarantees under Sparsity Constraints in High Dimensions
Consistent Estimation when n = Ω(log p) ⇒ n ≪ p. Going beyond Sparsity in High Dimensions?
Going Beyond Sparse Models
Motivation
Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains
Going Beyond Sparse Models
Motivation
Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains
One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation
1
Σ∗ J∗
M −1
Σ∗
R
Going Beyond Sparse Models
Motivation
Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains
One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation
1
Σ∗ J∗
M −1
Σ∗
R
Efficient Decomposition and Estimation in High Dimensions? Unique Decomposition? Good Sample Requirements?
Summary of Results
Σ∗ + Σ∗
R = J∗ M −1.
1
Summary of Results
Σ∗ + Σ∗
R = J∗ M −1.
1
Contribution 1: Novel Model for Decomposition
Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.
Summary of Results
Σ∗ + Σ∗
R = J∗ M −1.
1
Contribution 1: Novel Model for Decomposition
Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.
Contribution 2: Methods and Guarantees
Conditions for unique decomposition (exact statistics). Sparsistency and norm guarantees in both Markov and independence domains (sample analysis) Sample requirement: no. of samples n = Ω(log p) for p variables. Efficient Method for Covariance Decomposition and Estimation in High-Dimension
Related Works
Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.
◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)
Sparse Inverse Covariance Estimation:
◮ ℓ1 Penalization (Meinshausen and B¨
uhlmann) (Ravikumar et. al)
◮ Non-Convex Methods (Anandkumar et. al) (Zhang)
Related Works
Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.
◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)
Sparse Inverse Covariance Estimation:
◮ ℓ1 Penalization (Meinshausen and B¨
uhlmann) (Ravikumar et. al)
◮ Non-Convex Methods (Anandkumar et. al) (Zhang)
Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al)
Related Works
Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.
◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)
Sparse Inverse Covariance Estimation:
◮ ℓ1 Penalization (Meinshausen and B¨
uhlmann) (Ravikumar et. al)
◮ Non-Convex Methods (Anandkumar et. al) (Zhang)
Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al) Multi-Resolution Markov+Independence Models (Choi et. al) Decomposition in inverse covariance domain Lack theoretical guarantees Our contribution: Guaranteed Decomposition and Estimation
Outline
1
Introduction
2
Algorithm
3
Guarantees
4
Experiments
5
Proof Techniques
6
Conclusion
Some Intuitions and Ideas
Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance
Some Intuitions and Ideas
Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance
Sparse Covariance Estimation (Independence Model)
Σ∗ = Σ∗
I.
- Σn: sample covariance using n samples
p variables: p ≫ n.
Some Intuitions and Ideas
Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance
Sparse Covariance Estimation (Independence Model)
Σ∗ = Σ∗
I.
- Σn: sample covariance using n samples
p variables: p ≫ n. Hard-thresholding the off-diagonal entries of Σn (Bickel & Levina): threshold chosen as
- log p
n Sparsistency (support recovery) and Norm Guarantees when n = Ω(log p) ⇒ n ≪ p.
Recap of Inverse Covariance (Markov) Estimation
Σ∗ = J∗
M −1+Σ∗ R
- Σn: sample covariance
using n i.i.d. samples
1
Recap of Inverse Covariance (Markov) Estimation
Σ∗ = J∗
M −1+Σ∗ R
- Σn: sample covariance
using n i.i.d. samples
1
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)
- JM := argmin
JM≻0
- Σn, JM − log det JM+γJM1,off
where
JM1,off :=
- i=j
| (JM)ij |.
Recap of Inverse Covariance (Markov) Estimation
Σ∗ = J∗
M −1+Σ∗ R
- Σn: sample covariance
using n i.i.d. samples
1
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)
- JM := argmin
JM≻0
- Σn, JM − log det JM+γJM1,off
where
JM1,off :=
- i=j
| (JM)ij |.
Max-entropy Formulation (Lagrangian Dual)
- ΣM := argmax
ΣM ≻0,ΣR
log det ΣM−λΣR1,off
- s. t.
- Σn − ΣM∞,off ≤ γ,
- ΣM
- d =
- Σn
d
- , ΣR
- d = 0.
Recap of Inverse Covariance (Markov) Estimation
Σ∗ = J∗
M −1+Σ∗ R
- Σn: sample covariance
using n i.i.d. samples
1
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)
- JM := argmin
JM≻0
- Σn, JM − log det JM+γJM1,off
where
JM1,off :=
- i=j
| (JM)ij |.
Max-entropy Formulation (Lagrangian Dual)
- ΣM := argmax
ΣM ≻0,ΣR
log det ΣM−λΣR1,off
- s. t.
- Σn − ΣM∞,off ≤ γ,
- ΣM
- d =
- Σn
d
- , ΣR
- d = 0.
Consistent Estimation Under Certain Conditions, n = Ω(log p)
Extension to Markov+Independence Models?
Σ∗ + Σ∗
R = J∗ M −1.
1
Sparse Covariance Estimation
Hard-thresholding the off-diagonal entries of Σn.
Sparse Inverse Covariance Estimation
Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation)
Extension to Markov+Independence Models?
Σ∗ + Σ∗
R = J∗ M −1.
1
Sparse Covariance Estimation
Hard-thresholding the off-diagonal entries of Σn.
Sparse Inverse Covariance Estimation
Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?
Extension to Markov+Independence Models?
Σ∗ + Σ∗
R = J∗ M −1.
1
Sparse Covariance Estimation
Hard-thresholding the off-diagonal entries of Σn.
Sparse Inverse Covariance Estimation
Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?
Challenges and Insights
Penalties in above methods are in different domains
Extension to Markov+Independence Models?
Σ∗ + Σ∗
R = J∗ M −1.
1
Sparse Covariance Estimation
Hard-thresholding the off-diagonal entries of Σn.
Sparse Inverse Covariance Estimation
Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?
Challenges and Insights
Penalties in above methods are in different domains Insight: Consider dual program of MLE Dual program is in covariance domain for Markov model.
Our Algorithm: Covariance Decomposition
Σ∗ + Σ∗
R = J∗ M −1.
Extend ℓ1-penalized MLE
1
Max-entropy Formulation
Lagrangian dual of ℓ1-penalized MLE
- ΣM
:= argmax
ΣM≻0
log det ΣM
- s. t.
Σn − ΣM ∞,off ≤ γ,
- ΣM
- d =
- Σn
d
.
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
Our Algorithm: Covariance Decomposition
Σ∗ + Σ∗
R = J∗ M −1.
Extend ℓ1-penalized MLE
1
Max-entropy Formulation + ℓ1-penalized Residuals (This work)
Lagrangian dual of ℓ1-penalized MLE
- ΣM
:= argmax
ΣM≻0
log det ΣM−λΣR1,off
- s. t.
Σn − ΣM ∞,off ≤ γ,
- ΣM
- d =
- Σn
d
.
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
Our Algorithm: Covariance Decomposition
Σ∗ + Σ∗
R = J∗ M −1.
Extend ℓ1-penalized MLE
1
Max-entropy Formulation + ℓ1-penalized Residuals (This work)
Lagrangian dual of ℓ1-penalized MLE ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM−λΣR1,off
- s. t.
Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d,
- ΣR
- d = 0.
ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
Our Algorithm: Covariance Decomposition
Σ∗ + Σ∗
R = J∗ M −1.
Extend ℓ1-penalized MLE
1
Max-entropy Formulation + ℓ1-penalized Residuals (This work)
Lagrangian dual of ℓ1-penalized MLE ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM−λΣR1,off
- s. t.
Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d,
- ΣR
- d = 0.
ℓ1 − ℓ∞-penalized MLE (This work)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t. JM∞,off ≤ λ.
Observations regarding the Proposed Method
ℓ1 − ℓ∞-penalized MLE (Primal)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ
Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM − λΣR1,off
- s. t.
- Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d ,
- ΣR
- d = 0.
Observations regarding the Proposed Method
ℓ1 − ℓ∞-penalized MLE (Primal)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ
Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM − λΣR1,off
- s. t.
- Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d ,
- ΣR
- d = 0.
Case: λ → 0 (Sparse Covariance Estimation) λ =
- log p/n reduces to approximate shrinkage estimator.
Observations regarding the Proposed Method
ℓ1 − ℓ∞-penalized MLE (Primal)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ
Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM − λΣR1,off
- s. t.
- Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d ,
- ΣR
- d = 0.
Case: λ → 0 (Sparse Covariance Estimation) λ =
- log p/n reduces to approximate shrinkage estimator.
Case: λ → ∞ (Sparse Inverse Covariance Estimator) Residual matrix ΣR = 0: ℓ1-penalized MLE of Ravikumar et. al
Observations regarding the Proposed Method
ℓ1 − ℓ∞-penalized MLE (Primal)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ
Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax
ΣM≻0,ΣR
log det ΣM − λΣR1,off
- s. t.
- Σn − ΣM + ΣR∞,off ≤ γ,
- ΣM
- d =
- Σn
d ,
- ΣR
- d = 0.
Case: λ → 0 (Sparse Covariance Estimation) λ =
- log p/n reduces to approximate shrinkage estimator.
Case: λ → ∞ (Sparse Inverse Covariance Estimator) Residual matrix ΣR = 0: ℓ1-penalized MLE of Ravikumar et. al Unification of Sparse Covariance & Inverse Covariance models.
Analysis under Exact Statistics
Analysis under Exact Statistics
Similar algorithm as sample statistics, only γ = 0 (no penalization):
- JM := argmin
JM≻0
Σ∗, JM − log det JM
- s. t. JM∞,off ≤ λ.
Analysis under Exact Statistics
Similar algorithm as sample statistics, only γ = 0 (no penalization):
- JM := argmin
JM≻0
Σ∗, JM − log det JM
- s. t. JM∞,off ≤ λ.
KKT conditions results identifiability conditions.
Analysis under Exact Statistics
Similar algorithm as sample statistics, only γ = 0 (no penalization):
- JM := argmin
JM≻0
Σ∗, JM − log det JM
- s. t. JM∞,off ≤ λ.
KKT conditions results identifiability conditions. The main identifiability condition: Supp(Σ∗
R) ⊆ Supp(J∗ M).
Node pairs are partitioned as follows: SM := Supp(J∗
M)
SR := Supp(Σ∗
R)
S := SM \ SR S SR SM Sc
M
Outline
1
Introduction
2
Algorithm
3
Guarantees
4
Experiments
5
Proof Techniques
6
Conclusion
Guarantees for High-Dimensional Estimation
Σ∗ + Σ∗
R = J∗ M −1.
1
Conditions for Recovery
Maximum degree ∆ in the Markov graph (corresponding to J∗
M).
Number of samples n, number of nodes p satisfy n = Ω(∆2 log p). Mutual-Incoherence type conditions.
Guarantees for High-Dimensional Estimation
Σ∗ + Σ∗
R = J∗ M −1.
1
Conditions for Recovery
Maximum degree ∆ in the Markov graph (corresponding to J∗
M).
Number of samples n, number of nodes p satisfy n = Ω(∆2 log p). Mutual-Incoherence type conditions.
Theorem
The proposed method outputs estimates ( JM, ΣR) such that (w.h.p.) ( JM, ΣR) are sparsistent and sign consistent. satisfy norm guarantees. JM − J∗
M∞,
ΣR − Σ∗
R∞ = O
- log p/n
- .
Guarantee Sparsistency and Efficient Estimation in Both Domains
Observations
Corollary 1 (Sparse Covariance Estimation)
With λ = Θ(
- log p/n) , our method reduces to shrinkage estimator
(comparable to Bickel & Levina which is hard-threshold estimator) and is sparsistent for covariance estimation.
Corollary 2 (Sparse Inverse Covariance Estimation)
With λ → ∞ , our method reduces to ℓ1-penalized MLE (Ravikumar et. al) and is sparsistent for inverse covariance estimation.
Conditions for Recovery
Mutual incoherence-type conditions Sample complexity n = Ω(∆2 log p) . Comparable to inverse covariance estimation (Ravikumar et. al).
Outline
1
Introduction
2
Algorithm
3
Guarantees
4
Experiments
5
Proof Techniques
6
Conclusion
Synthetic Data
Σ∗ + Σ∗
R = J∗ M −1,
J∗ = (Σ∗)−1.
1
Setup
8 × 8 2-d grid for Markov model. Mixed Markov model (both positive and negative correlations).
J estimation
1000 2000 3000 4000 5000 6000 0.8 1 1.2 1.4 1.6 ℓ1 + ℓ∞ method ℓ1 method
n J∗ − J∞
Performance under LBP
5 10 15 −10 10 20 LBP applied to J∗ model LBP applied to J∗
M model
iteration
- Avg. mean error
Learned model is amenable for efficient Inference. Advantage over existing techniques.
Experiments on Foreign Exchange Rate Data
Setup
Monthly Foreign Exchange Rates to US Dollar. Apply the proposed method.
Malaysia South Korea South Africa India Japan Denmark Sweden Taiwan Norway Thailand Sri Lanka China
Solid line: Markov graph. Dotted line: Independence graph.
Experiments on Stock Market Data
Setup
Monthly stock returns of companies on S&P index. Companies in divisions E.Trans, Comm, Elec&Gas and G.Retail Trade. Apply the proposed method.
CBS NSC SNS BNI HD
CMCSA
TGT MCD WMT CVS FDX ETR EXC T VZ
Solid line: Markov graph. Dotted line: Independence graph.
Outline
1
Introduction
2
Algorithm
3
Guarantees
4
Experiments
5
Proof Techniques
6
Conclusion
Analysis under Sample Statistics
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t. JM∞,off ≤ λ.
Analysis under Sample Statistics
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t. JM∞,off ≤ λ.
Challenges
1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗
M).
Analysis under Sample Statistics
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t. JM∞,off ≤ λ.
Challenges
1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗
M).
2) Decoupling the errors: Σ∗ + Σ∗
R = J∗ M −1.
Analysis under Sample Statistics
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t. JM∞,off ≤ λ.
Challenges
1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗
M).
2) Decoupling the errors: Σ∗ + Σ∗
R = J∗ M −1.
Proposing a modified version which is easier to analyze.
Modified Program (Restricted and Relaxed)
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t.
- JM
- Sc
M = 0,
- JM
- SR = λ sign
- J∗
M
- SR
- .
S SR SM Sc
M
Primal-Dual Witness Method
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t.
- JM
- Sc
M = 0,
- JM
- SR = λ sign
- J∗
M
- SR
- .
Primal-Dual Witness Method
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t.
- JM
- Sc
M = 0,
- JM
- SR = λ sign
- J∗
M
- SR
- .
Sparsistency Guarantee
Supp( JM) ⊆ Supp(J∗
M)
Supp( ΣR) ⊆ Supp(Σ∗
R)
Primal-Dual Witness Method
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t.
- JM
- Sc
M = 0,
- JM
- SR = λ sign
- J∗
M
- SR
- .
Sparsistency Guarantee
Supp( JM) ⊆ Supp(J∗
M)
Supp( ΣR) ⊆ Supp(Σ∗
R)
Error Decoupling
- ∆J :=
JM − J∗
M,
- ∆R :=
ΣR − Σ∗
R
S SR SM Sc
M
- ∆J
= λδ
- ∆R
= 0
- ∆J
= 0
Primal-Dual Witness Method
- JM := argmin
JM≻0
- Σn, JM − log det JM + γJM1,off
- s. t.
- JM
- Sc
M = 0,
- JM
- SR = λ sign
- J∗
M
- SR
- .
Sparsistency Guarantee
Supp( JM) ⊆ Supp(J∗
M)
Supp( ΣR) ⊆ Supp(Σ∗
R)
Error Decoupling
- ∆J :=
JM − J∗
M,
- ∆R :=
ΣR − Σ∗
R
S SR SM Sc
M
- ∆J
= λδ
- ∆R
= 0
- ∆J
= 0
Sufficient Conditions for equivalence between the modified and
- riginal programs (Mutual Incoeherence):
JM, ΣR
- =
JM, ΣR
- .