High-Dimensional Covariance Decomposition into Sparse Markov and - - PowerPoint PPT Presentation

high dimensional covariance decomposition into sparse
SMART_READER_LITE
LIVE PREVIEW

High-Dimensional Covariance Decomposition into Sparse Markov and - - PowerPoint PPT Presentation

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid Janzamin and Anima Anandkumar U.C. Irvine High-Dimensional Covariance Estimation n i.i.d. samples, p variables X := [ X 1 , . . . , X p ] T . Covariance


slide-1
SLIDE 1

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains

Majid Janzamin and Anima Anandkumar

U.C. Irvine

slide-2
SLIDE 2

High-Dimensional Covariance Estimation

n i.i.d. samples, p variables X := [X1, . . . , Xp]T . Covariance estimation: Σ∗ := E[XXT ]. High-dimensional regime: both n, p → ∞ and n ≪ p. Challenge: empirical (sample) covariance ill-posed when n ≪ p:

  • Σn := 1

n

n

  • k=1

x(k)x(k)T . Solution: Imposing Sparsity for Tractable High-dimensional Estimation

slide-3
SLIDE 3

Incorporating Sparsity in High Dimensions

Sparse Covariance

Σ∗ Σ∗

R

Sparse Inverse Covariance

1

Σ∗ J∗

M −1

slide-4
SLIDE 4

Incorporating Sparsity in High Dimensions

Sparse Covariance

Σ∗ Σ∗

R

Relationship with Statistical Properties (Gaussian)

Sparse Covariance (Independence Model): marginal independence

slide-5
SLIDE 5

Incorporating Sparsity in High Dimensions

Sparse Inverse Covariance

1

Σ∗ J∗

M −1

Relationship with Statistical Properties (Gaussian)

Sparse Inverse Covariance (Markov Model): conditional independence Local Markov Property: Xi ⊥ XV \{nbd(i)∪i} | Xnbd(i) For Gaussian: Jij = 0 ⇔ (i, j) / ∈ E

slide-6
SLIDE 6

Incorporating Sparsity in High Dimensions

Sparse Covariance

Σ∗ Σ∗

R

Sparse Inverse Covariance

1

Σ∗ J∗

M −1

Relationship with Statistical Properties (Gaussian)

Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence

slide-7
SLIDE 7

Incorporating Sparsity in High Dimensions

Sparse Covariance

Σ∗ Σ∗

R

Sparse Inverse Covariance

1

Σ∗ J∗

M −1

Relationship with Statistical Properties (Gaussian)

Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence

Guarantees under Sparsity Constraints in High Dimensions

Consistent Estimation when n = Ω(log p) ⇒ n ≪ p. Consistent: Sparsistent and Satisfying reasonable Norm Guarantees.

slide-8
SLIDE 8

Incorporating Sparsity in High Dimensions

Sparse Covariance

Σ∗ Σ∗

R

Sparse Inverse Covariance

1

Σ∗ J∗

M −1

Relationship with Statistical Properties (Gaussian)

Sparse Covariance (Independence Model): marginal independence Sparse Inverse Covariance (Markov Model): conditional independence

Guarantees under Sparsity Constraints in High Dimensions

Consistent Estimation when n = Ω(log p) ⇒ n ≪ p. Going beyond Sparsity in High Dimensions?

slide-9
SLIDE 9

Going Beyond Sparse Models

Motivation

Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains

slide-10
SLIDE 10

Going Beyond Sparse Models

Motivation

Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains

One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation

1

Σ∗ J∗

M −1

Σ∗

R

slide-11
SLIDE 11

Going Beyond Sparse Models

Motivation

Sparsity constraints restrictive to have faithful representation. Data not sparse in a single domain Solution: Sparsity in Multiple Domains. Challenge: Hard to impose sparsity in different domains

One Possibility (This Work): Proposing Sparse Markov Model by adding Sparse Residual Perturbation

1

Σ∗ J∗

M −1

Σ∗

R

Efficient Decomposition and Estimation in High Dimensions? Unique Decomposition? Good Sample Requirements?

slide-12
SLIDE 12

Summary of Results

Σ∗ + Σ∗

R = J∗ M −1.

1

slide-13
SLIDE 13

Summary of Results

Σ∗ + Σ∗

R = J∗ M −1.

1

Contribution 1: Novel Model for Decomposition

Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.

slide-14
SLIDE 14

Summary of Results

Σ∗ + Σ∗

R = J∗ M −1.

1

Contribution 1: Novel Model for Decomposition

Decomposition into Markov and residual domains. Statistically meaningful model Unification of Sparse Covariance and Inverse Covariance Estimation.

Contribution 2: Methods and Guarantees

Conditions for unique decomposition (exact statistics). Sparsistency and norm guarantees in both Markov and independence domains (sample analysis) Sample requirement: no. of samples n = Ω(log p) for p variables. Efficient Method for Covariance Decomposition and Estimation in High-Dimension

slide-15
SLIDE 15

Related Works

Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.

◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)

Sparse Inverse Covariance Estimation:

◮ ℓ1 Penalization (Meinshausen and B¨

uhlmann) (Ravikumar et. al)

◮ Non-Convex Methods (Anandkumar et. al) (Zhang)

slide-16
SLIDE 16

Related Works

Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.

◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)

Sparse Inverse Covariance Estimation:

◮ ℓ1 Penalization (Meinshausen and B¨

uhlmann) (Ravikumar et. al)

◮ Non-Convex Methods (Anandkumar et. al) (Zhang)

Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al)

slide-17
SLIDE 17

Related Works

Sparse Covariance/Inverse Covariance Estimation Sparse Covariance Estimation: Covariance Thresholding.

◮ (Bickel & Levina) (Wagaman & Levina) ( Cai et. al.)

Sparse Inverse Covariance Estimation:

◮ ℓ1 Penalization (Meinshausen and B¨

uhlmann) (Ravikumar et. al)

◮ Non-Convex Methods (Anandkumar et. al) (Zhang)

Beyond Sparse Models: Decomposition Issues Sparse + Low Rank (Chandrasekaran et. al) (Candes et. al) Decomposable Regularizers (Negahban et. al) Multi-Resolution Markov+Independence Models (Choi et. al) Decomposition in inverse covariance domain Lack theoretical guarantees Our contribution: Guaranteed Decomposition and Estimation

slide-18
SLIDE 18

Outline

1

Introduction

2

Algorithm

3

Guarantees

4

Experiments

5

Proof Techniques

6

Conclusion

slide-19
SLIDE 19

Some Intuitions and Ideas

Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance

slide-20
SLIDE 20

Some Intuitions and Ideas

Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance

Sparse Covariance Estimation (Independence Model)

Σ∗ = Σ∗

I.

  • Σn: sample covariance using n samples

p variables: p ≫ n.

slide-21
SLIDE 21

Some Intuitions and Ideas

Review Ideas for Special Cases: Sparse Covariance/Inverse Covariance

Sparse Covariance Estimation (Independence Model)

Σ∗ = Σ∗

I.

  • Σn: sample covariance using n samples

p variables: p ≫ n. Hard-thresholding the off-diagonal entries of Σn (Bickel & Levina): threshold chosen as

  • log p

n Sparsistency (support recovery) and Norm Guarantees when n = Ω(log p) ⇒ n ≪ p.

slide-22
SLIDE 22

Recap of Inverse Covariance (Markov) Estimation

Σ∗ = J∗

M −1+Σ∗ R

  • Σn: sample covariance

using n i.i.d. samples

1

slide-23
SLIDE 23

Recap of Inverse Covariance (Markov) Estimation

Σ∗ = J∗

M −1+Σ∗ R

  • Σn: sample covariance

using n i.i.d. samples

1

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM+γJM1,off

where

JM1,off :=

  • i=j

| (JM)ij |.

slide-24
SLIDE 24

Recap of Inverse Covariance (Markov) Estimation

Σ∗ = J∗

M −1+Σ∗ R

  • Σn: sample covariance

using n i.i.d. samples

1

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM+γJM1,off

where

JM1,off :=

  • i=j

| (JM)ij |.

Max-entropy Formulation (Lagrangian Dual)

  • ΣM := argmax

ΣM ≻0,ΣR

log det ΣM−λΣR1,off

  • s. t.
  • Σn − ΣM∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d

  • , ΣR
  • d = 0.
slide-25
SLIDE 25

Recap of Inverse Covariance (Markov) Estimation

Σ∗ = J∗

M −1+Σ∗ R

  • Σn: sample covariance

using n i.i.d. samples

1

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al. ‘08)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM+γJM1,off

where

JM1,off :=

  • i=j

| (JM)ij |.

Max-entropy Formulation (Lagrangian Dual)

  • ΣM := argmax

ΣM ≻0,ΣR

log det ΣM−λΣR1,off

  • s. t.
  • Σn − ΣM∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d

  • , ΣR
  • d = 0.

Consistent Estimation Under Certain Conditions, n = Ω(log p)

slide-26
SLIDE 26

Extension to Markov+Independence Models?

Σ∗ + Σ∗

R = J∗ M −1.

1

Sparse Covariance Estimation

Hard-thresholding the off-diagonal entries of Σn.

Sparse Inverse Covariance Estimation

Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation)

slide-27
SLIDE 27

Extension to Markov+Independence Models?

Σ∗ + Σ∗

R = J∗ M −1.

1

Sparse Covariance Estimation

Hard-thresholding the off-diagonal entries of Σn.

Sparse Inverse Covariance Estimation

Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?

slide-28
SLIDE 28

Extension to Markov+Independence Models?

Σ∗ + Σ∗

R = J∗ M −1.

1

Sparse Covariance Estimation

Hard-thresholding the off-diagonal entries of Σn.

Sparse Inverse Covariance Estimation

Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?

Challenges and Insights

Penalties in above methods are in different domains

slide-29
SLIDE 29

Extension to Markov+Independence Models?

Σ∗ + Σ∗

R = J∗ M −1.

1

Sparse Covariance Estimation

Hard-thresholding the off-diagonal entries of Σn.

Sparse Inverse Covariance Estimation

Add ℓ1 penalty to maximum likelihood program (involving inverse covariance matrix estimation) Is it possible to unify above methods and guarantees?

Challenges and Insights

Penalties in above methods are in different domains Insight: Consider dual program of MLE Dual program is in covariance domain for Markov model.

slide-30
SLIDE 30

Our Algorithm: Covariance Decomposition

Σ∗ + Σ∗

R = J∗ M −1.

Extend ℓ1-penalized MLE

1

Max-entropy Formulation

Lagrangian dual of ℓ1-penalized MLE

  • ΣM

:= argmax

ΣM≻0

log det ΣM

  • s. t.

Σn − ΣM ∞,off ≤ γ,

  • ΣM
  • d =
  • Σn

d

.

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
slide-31
SLIDE 31

Our Algorithm: Covariance Decomposition

Σ∗ + Σ∗

R = J∗ M −1.

Extend ℓ1-penalized MLE

1

Max-entropy Formulation + ℓ1-penalized Residuals (This work)

Lagrangian dual of ℓ1-penalized MLE

  • ΣM

:= argmax

ΣM≻0

log det ΣM−λΣR1,off

  • s. t.

Σn − ΣM ∞,off ≤ γ,

  • ΣM
  • d =
  • Σn

d

.

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
slide-32
SLIDE 32

Our Algorithm: Covariance Decomposition

Σ∗ + Σ∗

R = J∗ M −1.

Extend ℓ1-penalized MLE

1

Max-entropy Formulation + ℓ1-penalized Residuals (This work)

Lagrangian dual of ℓ1-penalized MLE ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM−λΣR1,off

  • s. t.

Σn − ΣM + ΣR∞,off ≤ γ,

  • ΣM
  • d =
  • Σn

d,

  • ΣR
  • d = 0.

ℓ1-MLE for Sparse Inverse Covariance (Ravikumar et. al)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
slide-33
SLIDE 33

Our Algorithm: Covariance Decomposition

Σ∗ + Σ∗

R = J∗ M −1.

Extend ℓ1-penalized MLE

1

Max-entropy Formulation + ℓ1-penalized Residuals (This work)

Lagrangian dual of ℓ1-penalized MLE ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM−λΣR1,off

  • s. t.

Σn − ΣM + ΣR∞,off ≤ γ,

  • ΣM
  • d =
  • Σn

d,

  • ΣR
  • d = 0.

ℓ1 − ℓ∞-penalized MLE (This work)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t. JM∞,off ≤ λ.
slide-34
SLIDE 34

Observations regarding the Proposed Method

ℓ1 − ℓ∞-penalized MLE (Primal)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ

Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM − λΣR1,off

  • s. t.
  • Σn − ΣM + ΣR∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d ,

  • ΣR
  • d = 0.
slide-35
SLIDE 35

Observations regarding the Proposed Method

ℓ1 − ℓ∞-penalized MLE (Primal)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ

Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM − λΣR1,off

  • s. t.
  • Σn − ΣM + ΣR∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d ,

  • ΣR
  • d = 0.

Case: λ → 0 (Sparse Covariance Estimation) λ =

  • log p/n reduces to approximate shrinkage estimator.
slide-36
SLIDE 36

Observations regarding the Proposed Method

ℓ1 − ℓ∞-penalized MLE (Primal)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ

Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM − λΣR1,off

  • s. t.
  • Σn − ΣM + ΣR∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d ,

  • ΣR
  • d = 0.

Case: λ → 0 (Sparse Covariance Estimation) λ =

  • log p/n reduces to approximate shrinkage estimator.

Case: λ → ∞ (Sparse Inverse Covariance Estimator) Residual matrix ΣR = 0: ℓ1-penalized MLE of Ravikumar et. al

slide-37
SLIDE 37

Observations regarding the Proposed Method

ℓ1 − ℓ∞-penalized MLE (Primal)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off, s. t. JM∞,off ≤ λ

Max-entropy Markov + ℓ1-penalized Residuals (Dual) ( ΣM, ΣR) := argmax

ΣM≻0,ΣR

log det ΣM − λΣR1,off

  • s. t.
  • Σn − ΣM + ΣR∞,off ≤ γ,
  • ΣM
  • d =
  • Σn

d ,

  • ΣR
  • d = 0.

Case: λ → 0 (Sparse Covariance Estimation) λ =

  • log p/n reduces to approximate shrinkage estimator.

Case: λ → ∞ (Sparse Inverse Covariance Estimator) Residual matrix ΣR = 0: ℓ1-penalized MLE of Ravikumar et. al Unification of Sparse Covariance & Inverse Covariance models.

slide-38
SLIDE 38

Analysis under Exact Statistics

slide-39
SLIDE 39

Analysis under Exact Statistics

Similar algorithm as sample statistics, only γ = 0 (no penalization):

  • JM := argmin

JM≻0

Σ∗, JM − log det JM

  • s. t. JM∞,off ≤ λ.
slide-40
SLIDE 40

Analysis under Exact Statistics

Similar algorithm as sample statistics, only γ = 0 (no penalization):

  • JM := argmin

JM≻0

Σ∗, JM − log det JM

  • s. t. JM∞,off ≤ λ.

KKT conditions results identifiability conditions.

slide-41
SLIDE 41

Analysis under Exact Statistics

Similar algorithm as sample statistics, only γ = 0 (no penalization):

  • JM := argmin

JM≻0

Σ∗, JM − log det JM

  • s. t. JM∞,off ≤ λ.

KKT conditions results identifiability conditions. The main identifiability condition: Supp(Σ∗

R) ⊆ Supp(J∗ M).

Node pairs are partitioned as follows: SM := Supp(J∗

M)

SR := Supp(Σ∗

R)

S := SM \ SR S SR SM Sc

M

slide-42
SLIDE 42

Outline

1

Introduction

2

Algorithm

3

Guarantees

4

Experiments

5

Proof Techniques

6

Conclusion

slide-43
SLIDE 43

Guarantees for High-Dimensional Estimation

Σ∗ + Σ∗

R = J∗ M −1.

1

Conditions for Recovery

Maximum degree ∆ in the Markov graph (corresponding to J∗

M).

Number of samples n, number of nodes p satisfy n = Ω(∆2 log p). Mutual-Incoherence type conditions.

slide-44
SLIDE 44

Guarantees for High-Dimensional Estimation

Σ∗ + Σ∗

R = J∗ M −1.

1

Conditions for Recovery

Maximum degree ∆ in the Markov graph (corresponding to J∗

M).

Number of samples n, number of nodes p satisfy n = Ω(∆2 log p). Mutual-Incoherence type conditions.

Theorem

The proposed method outputs estimates ( JM, ΣR) such that (w.h.p.) ( JM, ΣR) are sparsistent and sign consistent. satisfy norm guarantees. JM − J∗

M∞,

ΣR − Σ∗

R∞ = O

  • log p/n
  • .

Guarantee Sparsistency and Efficient Estimation in Both Domains

slide-45
SLIDE 45

Observations

Corollary 1 (Sparse Covariance Estimation)

With λ = Θ(

  • log p/n) , our method reduces to shrinkage estimator

(comparable to Bickel & Levina which is hard-threshold estimator) and is sparsistent for covariance estimation.

Corollary 2 (Sparse Inverse Covariance Estimation)

With λ → ∞ , our method reduces to ℓ1-penalized MLE (Ravikumar et. al) and is sparsistent for inverse covariance estimation.

Conditions for Recovery

Mutual incoherence-type conditions Sample complexity n = Ω(∆2 log p) . Comparable to inverse covariance estimation (Ravikumar et. al).

slide-46
SLIDE 46

Outline

1

Introduction

2

Algorithm

3

Guarantees

4

Experiments

5

Proof Techniques

6

Conclusion

slide-47
SLIDE 47

Synthetic Data

Σ∗ + Σ∗

R = J∗ M −1,

J∗ = (Σ∗)−1.

1

Setup

8 × 8 2-d grid for Markov model. Mixed Markov model (both positive and negative correlations).

J estimation

1000 2000 3000 4000 5000 6000 0.8 1 1.2 1.4 1.6 ℓ1 + ℓ∞ method ℓ1 method

n J∗ − J∞

Performance under LBP

5 10 15 −10 10 20 LBP applied to J∗ model LBP applied to J∗

M model

iteration

  • Avg. mean error

Learned model is amenable for efficient Inference. Advantage over existing techniques.

slide-48
SLIDE 48

Experiments on Foreign Exchange Rate Data

Setup

Monthly Foreign Exchange Rates to US Dollar. Apply the proposed method.

Malaysia South Korea South Africa India Japan Denmark Sweden Taiwan Norway Thailand Sri Lanka China

Solid line: Markov graph. Dotted line: Independence graph.

slide-49
SLIDE 49

Experiments on Stock Market Data

Setup

Monthly stock returns of companies on S&P index. Companies in divisions E.Trans, Comm, Elec&Gas and G.Retail Trade. Apply the proposed method.

CBS NSC SNS BNI HD

CMCSA

TGT MCD WMT CVS FDX ETR EXC T VZ

Solid line: Markov graph. Dotted line: Independence graph.

slide-50
SLIDE 50

Outline

1

Introduction

2

Algorithm

3

Guarantees

4

Experiments

5

Proof Techniques

6

Conclusion

slide-51
SLIDE 51

Analysis under Sample Statistics

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t. JM∞,off ≤ λ.
slide-52
SLIDE 52

Analysis under Sample Statistics

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t. JM∞,off ≤ λ.

Challenges

1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗

M).

slide-53
SLIDE 53

Analysis under Sample Statistics

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t. JM∞,off ≤ λ.

Challenges

1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗

M).

2) Decoupling the errors: Σ∗ + Σ∗

R = J∗ M −1.

slide-54
SLIDE 54

Analysis under Sample Statistics

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t. JM∞,off ≤ λ.

Challenges

1) Sparsistency guarantee: hard to show Supp( JM) ⊆ Supp(J∗

M).

2) Decoupling the errors: Σ∗ + Σ∗

R = J∗ M −1.

Proposing a modified version which is easier to analyze.

Modified Program (Restricted and Relaxed)

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t.
  • JM
  • Sc

M = 0,

  • JM
  • SR = λ sign
  • J∗

M

  • SR
  • .

S SR SM Sc

M

slide-55
SLIDE 55

Primal-Dual Witness Method

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t.
  • JM
  • Sc

M = 0,

  • JM
  • SR = λ sign
  • J∗

M

  • SR
  • .
slide-56
SLIDE 56

Primal-Dual Witness Method

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t.
  • JM
  • Sc

M = 0,

  • JM
  • SR = λ sign
  • J∗

M

  • SR
  • .

Sparsistency Guarantee

Supp( JM) ⊆ Supp(J∗

M)

Supp( ΣR) ⊆ Supp(Σ∗

R)

slide-57
SLIDE 57

Primal-Dual Witness Method

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t.
  • JM
  • Sc

M = 0,

  • JM
  • SR = λ sign
  • J∗

M

  • SR
  • .

Sparsistency Guarantee

Supp( JM) ⊆ Supp(J∗

M)

Supp( ΣR) ⊆ Supp(Σ∗

R)

Error Decoupling

  • ∆J :=

JM − J∗

M,

  • ∆R :=

ΣR − Σ∗

R

S SR SM Sc

M

  • ∆J

= λδ

  • ∆R

= 0

  • ∆J

= 0

slide-58
SLIDE 58

Primal-Dual Witness Method

  • JM := argmin

JM≻0

  • Σn, JM − log det JM + γJM1,off
  • s. t.
  • JM
  • Sc

M = 0,

  • JM
  • SR = λ sign
  • J∗

M

  • SR
  • .

Sparsistency Guarantee

Supp( JM) ⊆ Supp(J∗

M)

Supp( ΣR) ⊆ Supp(Σ∗

R)

Error Decoupling

  • ∆J :=

JM − J∗

M,

  • ∆R :=

ΣR − Σ∗

R

S SR SM Sc

M

  • ∆J

= λδ

  • ∆R

= 0

  • ∆J

= 0

Sufficient Conditions for equivalence between the modified and

  • riginal programs (Mutual Incoeherence):

JM, ΣR

  • =

JM, ΣR

  • .
slide-59
SLIDE 59

Outline

1

Introduction

2

Algorithm

3

Guarantees

4

Experiments

5

Proof Techniques

6

Conclusion

slide-60
SLIDE 60

Conclusion

Summary

Combination of Markov and independence models Unifying sparse covariance/inverse covariance estimation methods Efficient method and guarantees for estimation in both domains

slide-61
SLIDE 61

Conclusion

Summary

Combination of Markov and independence models Unifying sparse covariance/inverse covariance estimation methods Efficient method and guarantees for estimation in both domains

Outlook

Other forms of residuals (e.g. low rank) Discrete Model (via pseudo-likelihood) http://arxiv.org/abs/1211.0919