Robust estimation of precision matrices under cellwise contamination - PowerPoint PPT Presentation

Scale Covariance Multivariate Conclusion Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M¨ uller and Neville Weber COMPSTAT 2014

Scale Covariance Multivariate Conclusion Outline Robust Scale Estimator, P n Covariance Covariance Matrix Autocovariance Long Range Short Range Inverse Covariance Dependence Dependence Matrix Estimation

Scale Covariance Multivariate Conclusion Outline Robust scale estimation with P n Robust pairwise covariance estimation Robust covariance and precision matrices Summary and key references

Scale Covariance Multivariate Conclusion Pairwise mean scale estimator: P n • Consider the U -statistic, based on the pairwise mean kernel, ◆ − 1 X ✓ n X i + X j U n ( X ) := . 2 2 i<j • Let H ( t ) = P (( X i + X j ) / 2 ≤ t ) be the cdf of the kernels with corresponding empirical distribution function, ◆ − 1 X ✓ n ⇢ X i + X j � H n ( t ) := ≤ t , for t ∈ R . I 2 2 i<j Definition (Interquartile range of pairwise means) ⇥ H − 1 (0 . 75) − H − 1 ⇤ P n = c (0 . 25) , n n where c ≈ 1 . 048 is a correction factor to ensure P n is consistent for the standard deviation when the underlying observations are Gaussian.

Scale Covariance Multivariate Conclusion From scale to covariance: the GK device • Gnanadesikan and Kettenring (1972) relate scale and covariance using the following identity, 1 cov( X, Y ) = 4 ↵� [var( ↵ X + � Y ) − var( ↵ X − � Y )] , where X and Y are random variables. • In general, X and Y can have di ff erent units, so we set p p ↵ = 1 / var( X ) and � = 1 / var( Y ) . • Replacing variance with P 2 n we can similarly construct, 1 ⇥ P 2 n ( ↵ X + � Y ) − P 2 ⇤ � P ( X, Y ) = n ( ↵ X − � Y ) , 4 ↵� where ↵ = 1 /P n ( X ) and � = 1 /P n ( Y ) .

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination.

Scale Covariance Multivariate Conclusion 100% 100 75% Rows contaminated Observations 50% 50 p=30 25% 1 0% 1 15 30 0% 5% 10% Variables Cells contaminated For details see Alqallaf et al. (2009).

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination. Solution: 1. pairwise covariance matrices 2. 3.

Scale Covariance Multivariate Conclusion Pairwise approach to the rescue? 100% 100 75% Rows contaminated Observations 50% 50 p=30 p=2 25% 1 0% 1 2 0% 5% 10% Variables Cells contaminated

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination. Solution: 1. pairwise covariance matrices 2. correct for positive definiteness 3.

Scale Covariance Multivariate Conclusion Positive definite? • Standard approach of Maronna and Zamar (2002) su ff ers from outlier propagation so fails for cellwise contamination. • Higham (2002) outlines the nearest positive definite (NPD) approach: 1. Perform a spectral decomposition of the symmetric matrix of pairwise covariances 2. Any negative eigenvalues are set to some small positive constant 3. Reconstruct the covariance matrix using the adjusted eigenstructure ! NPD approach produces poorly conditioned covariance matrices.

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination. Solution: 1. pairwise covariance matrices 2. correct for positive definiteness 3. regularisation procedure

Scale Covariance Multivariate Conclusion Precision matrices • In many practical applications, the covariance matrix is not what is really required. • PCA, Mahalanobis distance, LDA, etc. use the inverse covariance matrix: the precision matrix, Θ = Σ − 1 . • Precision matrices are also of interest in modelling Gaussian Markov random fields, where zeros in the correspond to conditional independence between variables. Sparsity! In many applications, it is often useful to impose a level of sparsity on the estimated precision matrix.

Scale Covariance Multivariate Conclusion Regularisation techniques Graphical lasso (glasso) (Friedman, Hastie, Tibshirani, 2007) minimises the penalised negative Gaussian log-likelihood: f ( Θ ) = tr( ˆ ΣΘ ) − log | Θ | + � || Θ || 1 , where || Θ || 1 is the L 1 norm and � is a tuning parameter for the amount of shrinkage. Quadratic Inverse Covariance (QUIC) (Hsieh, et. al. 2011) solves the same minimisation problem as the glasso but uses a second order approach. Constrained ` 1 -minimisation for inverse matrix estimation (CLIME) (Cai, Liu and Luo, 2011) solves the following objective function: subject to: || ˆ min || Θ || 1 ΣΘ − I || ∞ ≤ � .

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination. Solution: 1. pairwise covariance matrices 2. correct for positive definiteness 3. regularisation procedure Evaluation: • Is the estimate “close” to the truth? •

Scale Covariance Multivariate Conclusion Simulation design sample size n = 100 replications N = 100 variables p = 30 , 60 , 90 scenarios banded, sparse and dense precision matrices, Θ true data N ( 0 , Σ ) where Σ = Θ − 1 contamination 0% to 25% randomly scattered component-wise 1. Banded 2. Sparse 3. Dense

Scale Covariance Multivariate Conclusion 10% contamination Extreme Moderate 20 20 10 10 X 2 X 2 0 0 -10 -10 -20 -20 -20 -10 0 10 20 -20 -10 0 10 20 X 1 X 1

Scale Covariance Multivariate Conclusion Evaluating performance Entropy Loss Measures how “close” ˆ Θ is to Θ , Θ ) = tr( Θ − 1 ˆ Θ ) − log | Θ − 1 ˆ L ( Θ , ˆ Θ | − p. Reported as percentage relative improvement in average loss: Θ ) = L ( Θ , ˆ Θ 0 ) − L ( Θ , ˆ Θ ) PRIAL ( ˆ × 100 , L ( Θ , ˆ Θ 0 ) where ˆ Θ 0 is the estimated precision matrix after a regularisation technique has been applied to the classical sample covariance matrix for uncontaminated data.

Scale Covariance Multivariate Conclusion Changing dimension: p = 90 0 CLIME, banded, p = 90 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Changing regularisation routine: GLASSO 0 GLASSO, banded, p = 60 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Changing regularisation routine: QUIC 0 QUIC, banded, p = 60 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Changing experiment: banded 0 QUIC, banded, p = 90 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Changing experiment: sparse 0 QUIC, scattered, p = 90 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Changing experiment: dense 0 QUIC, dense, p = 30 Q n with NPD Entropy loss PRIAL -100 ⌧ with NPD P n with NPD MCD -200 P n with OGK Classical -300 0 5 10 15 20 25 Percent contamination in each variable

Scale Covariance Multivariate Conclusion Estimating dependence Problem: To estimate dependence in multivariate settings with cellwise contamination. Solution: 1. pairwise covariance matrices 2. correct for positive definiteness 3. regularisation procedure Evaluation: • Is the estimate “close” to the truth? • Are we able to recover the support of a Gaussian graphical model?

Scale Covariance Multivariate Conclusion QUIC, uncontaminated data, N = 100 replications True Θ Classic MCD 100 80 60 40 20 0 P n with NPD Q n with NPD ⌧ with NPD 100 80 60 40 20 0

Robust estimation of precision matrices under cellwise contamination - PowerPoint PPT Presentation

Scale Covariance Multivariate Conclusion Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M uller and Neville Weber COMPSTAT 2014 Scale Covariance Multivariate Conclusion Outline Robust Scale

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Mixed Precision Training PAI Overview What is mixed-precision

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

VLVK EHF. VLVK EHF. Precision machining Precision machining Professional precision for

2018 Milken Institute Hamptons Dialogues Precision, Precision, Precision: The Future of Health

ISM on NPD Auckland, 8 March 2017 Update on developments on nuclear disarmament verification

Force and Petroleum Economics of IOR/EOR General integrated work process for economic

GOVERNANCE FRAMEWORK REPORT CUTS PRELIMINARY RESPONSE 11.8.2020 REGULATORY HARMONISATION

FREE DOWNLOAD PRESENTATION NOW: PREPARE A PERFECT PRESENTATION IN LESS THAN 3 HOURS Author:

REGIONAL RELEVANCY AMERICA S M I T C H W H I T A K E R Vice President, General Manager

Annual Update Mission Statement The Division of Mining and Reclamation's mission is to assure

2 nd Analyst Meeting Year 2018 Performance ICHITAN GROUP PCL. May ,2019 15.00 hrs. 16.00

Final Results Presentation For the year to 30 th June 2016 Investment Case Animalcare is a

Sambuz

Useful Links

Newsletter

Mail Us

Robust estimation of precision matrices under cellwise contamination - PowerPoint PPT Presentation

Scale Covariance Multivariate Conclusion Robust estimation of precision matrices under cellwise contamination Garth Tarr, Samuel M uller and Neville Weber COMPSTAT 2014 Scale Covariance Multivariate Conclusion Outline Robust Scale

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Mixed Precision Training PAI Overview What is mixed-precision

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Transformations and Matrices Transformations I Transformations are functions Matrices

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

VLVK EHF. VLVK EHF. Precision machining Precision machining Professional precision for

2018 Milken Institute Hamptons Dialogues Precision, Precision, Precision: The Future of Health

ISM on NPD Auckland, 8 March 2017 Update on developments on nuclear disarmament verification

Force and Petroleum Economics of IOR/EOR General integrated work process for economic

GOVERNANCE FRAMEWORK REPORT CUTS PRELIMINARY RESPONSE 11.8.2020 REGULATORY HARMONISATION

FREE DOWNLOAD PRESENTATION NOW: PREPARE A PERFECT PRESENTATION IN LESS THAN 3 HOURS Author:

REGIONAL RELEVANCY AMERICA S M I T C H W H I T A K E R Vice President, General Manager

Annual Update Mission Statement The Division of Mining and Reclamation's mission is to assure

2 nd Analyst Meeting Year 2018 Performance ICHITAN GROUP PCL. May ,2019 15.00 hrs. 16.00

Final Results Presentation For the year to 30 th June 2016 Investment Case Animalcare is a

Sambuz

Useful Links

Newsletter

Mail Us

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by