Robust estimation of precision matrices under cellwise contamination - - PowerPoint PPT Presentation

robust estimation of precision matrices under cellwise
SMART_READER_LITE
LIVE PREVIEW

Robust estimation of precision matrices under cellwise contamination - - PowerPoint PPT Presentation

Scale Covariance Multivariate Conclusion Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M uller and Neville Weber COMPSTAT 2014 Scale Covariance Multivariate Conclusion Outline Robust Scale


slide-1
SLIDE 1

Scale Covariance Multivariate Conclusion

Robust estimation of precision matrices under cellwise contamination

Garth Tarr, Samuel M¨ uller and Neville Weber COMPSTAT 2014

slide-2
SLIDE 2

Scale Covariance Multivariate Conclusion

Outline

Robust Scale Estimator, Pn Covariance Covariance Matrix Autocovariance Long Range Dependence Short Range Dependence Inverse Covariance Matrix Estimation

slide-3
SLIDE 3

Scale Covariance Multivariate Conclusion

Outline

Robust scale estimation with Pn Robust pairwise covariance estimation Robust covariance and precision matrices Summary and key references

slide-4
SLIDE 4

Scale Covariance Multivariate Conclusion

Pairwise mean scale estimator: Pn

  • Consider the U-statistic, based on the pairwise mean kernel,

Un(X) := ✓n 2 ◆−1 X

i<j

Xi + Xj 2 .

  • Let H(t) = P((Xi + Xj)/2 ≤ t) be the cdf of the kernels

with corresponding empirical distribution function, Hn(t) := ✓n 2 ◆−1 X

i<j

I ⇢Xi + Xj 2 ≤ t

  • ,

for t ∈ R.

Definition (Interquartile range of pairwise means)

Pn = c ⇥ H−1

n

(0.75) − H−1

n

(0.25) ⇤ , where c ≈ 1.048 is a correction factor to ensure Pn is consistent for the standard deviation when the underlying observations are Gaussian.

slide-5
SLIDE 5

Scale Covariance Multivariate Conclusion

Outline

Robust scale estimation with Pn Robust pairwise covariance estimation Robust covariance and precision matrices Summary and key references

slide-6
SLIDE 6

Scale Covariance Multivariate Conclusion

From scale to covariance: the GK device

  • Gnanadesikan and Kettenring (1972) relate scale and

covariance using the following identity, cov(X, Y ) = 1 4↵ [var(↵X + Y ) − var(↵X − Y )] , where X and Y are random variables.

  • In general, X and Y can have different units, so we set

↵ = 1/ p var(X) and = 1/ p var(Y ).

  • Replacing variance with P 2

n we can similarly construct,

P (X, Y ) = 1 4↵ ⇥ P 2

n(↵X + Y ) − P 2 n(↵X − Y )

⇤ , where ↵ = 1/Pn(X) and = 1/Pn(Y ).

slide-7
SLIDE 7

Scale Covariance Multivariate Conclusion

Outline

Robust scale estimation with Pn Robust pairwise covariance estimation Robust covariance and precision matrices Summary and key references

slide-8
SLIDE 8

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

slide-9
SLIDE 9

Scale Covariance Multivariate Conclusion

1 50 100 1 15 30

Variables Observations

0% 25% 50% 75% 100% 0% 5% 10%

Cells contaminated Rows contaminated

p=30

For details see Alqallaf et al. (2009).

slide-10
SLIDE 10

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices

2. 3.

slide-11
SLIDE 11

Scale Covariance Multivariate Conclusion

Pairwise approach to the rescue?

1 50 100 1 2

Variables Observations

0% 25% 50% 75% 100% 0% 5% 10%

Cells contaminated Rows contaminated

p=30 p=2

slide-12
SLIDE 12

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices
  • 2. correct for positive definiteness

3.

slide-13
SLIDE 13

Scale Covariance Multivariate Conclusion

Positive definite?

  • Standard approach of Maronna and Zamar (2002) suffers

from outlier propagation so fails for cellwise contamination.

  • Higham (2002) outlines the nearest positive definite (NPD)

approach:

  • 1. Perform a spectral decomposition of the symmetric matrix of

pairwise covariances

  • 2. Any negative eigenvalues are set to some small positive

constant

  • 3. Reconstruct the covariance matrix using the adjusted

eigenstructure

! NPD approach produces poorly conditioned covariance matrices.

slide-14
SLIDE 14

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices
  • 2. correct for positive definiteness
  • 3. regularisation procedure
slide-15
SLIDE 15

Scale Covariance Multivariate Conclusion

Precision matrices

  • In many practical applications, the covariance matrix is not

what is really required.

  • PCA, Mahalanobis distance, LDA, etc. use the inverse

covariance matrix: the precision matrix, Θ = Σ−1.

  • Precision matrices are also of interest in modelling Gaussian

Markov random fields, where zeros in the correspond to conditional independence between variables.

Sparsity!

In many applications, it is often useful to impose a level of sparsity

  • n the estimated precision matrix.
slide-16
SLIDE 16

Scale Covariance Multivariate Conclusion

Regularisation techniques

Graphical lasso (glasso) (Friedman, Hastie, Tibshirani, 2007) minimises the penalised negative Gaussian log-likelihood: f(Θ) = tr( ˆ ΣΘ) − log |Θ| + ||Θ||1, where ||Θ||1 is the L1 norm and is a tuning parameter for the amount of shrinkage. Quadratic Inverse Covariance (QUIC) (Hsieh, et. al. 2011) solves the same minimisation problem as the glasso but uses a second order approach. Constrained `1-minimisation for inverse matrix estimation (CLIME) (Cai, Liu and Luo, 2011) solves the following

  • bjective function:

min ||Θ||1 subject to: || ˆ ΣΘ − I||∞ ≤ .

slide-17
SLIDE 17

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices
  • 2. correct for positive definiteness
  • 3. regularisation procedure

Evaluation:

  • Is the estimate “close” to the truth?
slide-18
SLIDE 18

Scale Covariance Multivariate Conclusion

Simulation design

sample size n = 100 replications N = 100 variables p = 30, 60, 90 scenarios banded, sparse and dense precision matrices, Θ true data N(0, Σ) where Σ = Θ−1 contamination 0% to 25% randomly scattered component-wise

  • 1. Banded
  • 2. Sparse
  • 3. Dense
slide-19
SLIDE 19

Scale Covariance Multivariate Conclusion

10% contamination

  • 20
  • 10

10 20

  • 20
  • 10

10 20 Extreme X1 X2

  • 20
  • 10

10 20

  • 20
  • 10

10 20 X1 Moderate X2

slide-20
SLIDE 20

Scale Covariance Multivariate Conclusion

Evaluating performance

Entropy Loss

Measures how “close” ˆ Θ is to Θ, L(Θ, ˆ Θ) = tr(Θ−1 ˆ Θ) − log |Θ−1 ˆ Θ| − p. Reported as percentage relative improvement in average loss: PRIAL( ˆ Θ) = L(Θ, ˆ Θ0) − L(Θ, ˆ Θ) L(Θ, ˆ Θ0) × 100, where ˆ Θ0 is the estimated precision matrix after a regularisation technique has been applied to the classical sample covariance matrix for uncontaminated data.

slide-21
SLIDE 21

Scale Covariance Multivariate Conclusion

Changing dimension: p = 90

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical CLIME, banded, p = 90

slide-22
SLIDE 22

Scale Covariance Multivariate Conclusion

Changing dimension: p = 60

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical CLIME, banded, p = 60

slide-23
SLIDE 23

Scale Covariance Multivariate Conclusion

Changing dimension: p = 30

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical CLIME, banded, p = 30

slide-24
SLIDE 24

Scale Covariance Multivariate Conclusion

Changing regularisation routine: GLASSO

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical GLASSO, banded, p = 60

slide-25
SLIDE 25

Scale Covariance Multivariate Conclusion

Changing regularisation routine: QUIC

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, banded, p = 60

slide-26
SLIDE 26

Scale Covariance Multivariate Conclusion

Changing experiment: banded

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, banded, p = 90

slide-27
SLIDE 27

Scale Covariance Multivariate Conclusion

Changing experiment: sparse

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, scattered, p = 90

slide-28
SLIDE 28

Scale Covariance Multivariate Conclusion

Changing experiment: dense

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, dense, p = 30

slide-29
SLIDE 29

Scale Covariance Multivariate Conclusion

Estimating dependence

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices
  • 2. correct for positive definiteness
  • 3. regularisation procedure

Evaluation:

  • Is the estimate “close” to the truth?
  • Are we able to recover the support of a Gaussian graphical

model?

slide-30
SLIDE 30

Scale Covariance Multivariate Conclusion

QUIC, uncontaminated data, N = 100 replications

True Θ Classic MCD 20 40 60 80 100 Pn with NPD Qn with NPD ⌧ with NPD 20 40 60 80 100

slide-31
SLIDE 31

Scale Covariance Multivariate Conclusion

QUIC, 10% contamination, N = 100 replications

True Θ Classic MCD 20 40 60 80 100 Pn with NPD Qn with NPD ⌧ with NPD 20 40 60 80 100

slide-32
SLIDE 32

Scale Covariance Multivariate Conclusion

Evaluating performance

Matthew’s Correlation Coefficient (MCC)

Takes into account the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN), MCC = TP × TN − FP × FN p (TP + FP)(TP + FN)(TN + FP)(TN + FN) . Basically the correlation between the observed and predicted binary classifications.

slide-33
SLIDE 33

Scale Covariance Multivariate Conclusion

MCC: p = 30

5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 Percent contamination in each variable Matthews correlation coefficient

Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, scattered, p = 30

slide-34
SLIDE 34

Scale Covariance Multivariate Conclusion

MCC: p = 60

5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 Percent contamination in each variable Matthews correlation coefficient

Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, scattered, p = 60

slide-35
SLIDE 35

Scale Covariance Multivariate Conclusion

MCC: p = 90

5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 Percent contamination in each variable Matthews correlation coefficient

Qn with NPD ⌧ with NPD Pn with NPD MCD Pn with OGK Classical QUIC, scattered, p = 90

slide-36
SLIDE 36

Scale Covariance Multivariate Conclusion

Other considerations

  • Contaminating model: compare with the missingness literature
  • Choice of sparsity parameter: the billion euro question
  • p > n: looks promising
slide-37
SLIDE 37

Scale Covariance Multivariate Conclusion

QUIC, n = 50, p = 60, sparse precision matrix

5 10 15 20 25

  • 300
  • 200
  • 100

Percent contamination in each variable Entropy loss PRIAL

Qn with NPD ⌧ with NPD Pn with NPD Classical

slide-38
SLIDE 38

Scale Covariance Multivariate Conclusion

Outline

Robust scale estimation with Pn Robust pairwise covariance estimation Robust covariance and precision matrices Summary and key references

slide-39
SLIDE 39

Scale Covariance Multivariate Conclusion

Summary

Problem:

To estimate dependence in multivariate settings with cellwise contamination.

Solution:

  • 1. pairwise covariance matrices
  • 2. correct for positive definiteness
  • 3. regularisation procedure

Result:

  • Performs well with moderate amounts of outliers
  • Looks promising for p > n problems
slide-40
SLIDE 40

Scale Covariance Multivariate Conclusion

References

Alqallaf, F., Van Aelst, S., Yohai, V.J., Zamar, R.H., (2009). Propagation of outliers in multivariate data. The Annals of Statistics, 37:311–331. Cai, T., Liu, W. and Luo, X. (2011). A constrained `1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106:594–607. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. Gnanadesikan, R. and Kettenring J. R. (1972). Robust estimates, residuals and outlier detection with multiresponse data. Biometrics, 28(1):81–124.

slide-41
SLIDE 41

Scale Covariance Multivariate Conclusion

References

Higham, N. J. (2002). Computing the nearest correlation matrix–a problem from finance IMA Journal of Numerical Analysis, 22(3):329–343. Hsieh, C-J., Sustik, M.A., Dhillon I.S. and Ravikumar, P.K. (2011). Sparse inverse covariance matrix estimation using quadratic approximation. Advances in Neural Information Processing Systems 24, 2330–2338. Maronna, R. and Zamar, R., (2002). Robust estimates of location and dispersion for high-dimensional datasets. Technometrics, 44(4):307–317. Tarr, G., M¨ uller, S. and Weber, N.C., (2012). A robust scale estimator based on pairwise means. Journal of Nonparametric Statistics, 24(1):187–199.