Robust scatter regularization G. Haesbroeck and C. Croux University - - PowerPoint PPT Presentation

robust scatter regularization
SMART_READER_LITE
LIVE PREVIEW

Robust scatter regularization G. Haesbroeck and C. Croux University - - PowerPoint PPT Presentation

Robust scatter regularization G. Haesbroeck and C. Croux University of Li` ege - University of Leuven COMPSTAT 2010 G. Haesbroeck and C. Croux (Belgium) Robust scatter regularization COMPSTAT 2010 1 / 14 Introduction Let X = ( X 1 , . . . ,


slide-1
SLIDE 1

Robust scatter regularization

  • G. Haesbroeck and C. Croux

University of Li` ege - University of Leuven

COMPSTAT 2010

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 1 / 14

slide-2
SLIDE 2

Introduction

Let X = (X1, . . . , Xp)T be a p-dimensional random vector with Xi ∼ Np(µ, Σ) where µ is the mean and Σ is the nonsingular covariance matrix. Aim: Estimate, in a robust way, µ and Θ = Σ−1 (concentration matrix) using a sample of size n.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 2 / 14

slide-3
SLIDE 3

Maximum Likelihood estimator

The ML estimator of (µ, Θ) maximizes log(det(Θ)) − 1 n

n

  • i=1

(xi − µ)TΘ(xi − µ). When the sample covariance matrix S is nonsingular, (ˆ µML, ˆ ΘML) = (¯ x, S−1). When S is singular (e.g. when n < p), the ML estimator does not exist.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 3 / 14

slide-4
SLIDE 4

Regularized Maximum Likelihood estimator

The Regularized ML estimator of (µ, Θ) maximizes log(det(Θ)) − 1 n

n

  • i=1

(xi − µ)TΘ(xi − µ) − λJ(Θ), where λ ≥ 0 is the penalty parameter and J is a penalty function. Typical choices: L1-norm: J(Θ) = p

i,j=1 |Θij|

L2-norm: J(Θ) = p

i,j=1 Θ2 ij

...

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 4 / 14

slide-5
SLIDE 5

Breakdown Point

Roughly speaking, the breakdown point is the smallest fraction of contamination that can drive the estimator over all bounds. For a scatter estimator, breakdown can occur due to explosion: λ1(Θ) → ∞

  • r

implosion: λp(Θ) → 0 with λp(Θ) ≤ . . . ≤ λ1(Θ).

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 5 / 14

slide-6
SLIDE 6

Breakdown of the Regularized ML procedure

µ = 0, Σ = Ip and x′

n = xn + xe1

5 10 15 20 0.0 0.1 0.2 0.3 0.4

x λp(Θ)

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 6 / 14

slide-7
SLIDE 7

Breakdown of the Regularized ML procedure

µ = 0, Σ = Ip and x′

n = xn + xe1

5 10 15 20 0.0 0.1 0.2 0.3 0.4

x λp(Θ) Robust alternatives are needed!

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 6 / 14

slide-8
SLIDE 8

Minimum Covariance Determinant estimator

Find a subsample H of size h (with n

2 ≤ h ≤ n) minimizing the

generalized variance log(det(ΣH)) (where ΣH is the covariance matrix based on the h points). The location and scatter MCD estimates are given by the mean and covariance matrix of the optimal subsample.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 7 / 14

slide-9
SLIDE 9

Regularized MCD estimator

Find a subsample H of size h maximizing log(det(ΘH)) − 1 h

  • i∈H

(xi − µH)TΘH(xi − µH) − λJ(ΘH) The regularized MCD estimator is given by the regularized ML estimator computed on the optimal subsample.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 8 / 14

slide-10
SLIDE 10

Properties of the Regularized MCD estimator

  • A. Robustness

The finite-sample breakdown point for joint location and scatter of the Regularized MCD estimator is equal to ε∗((ˆ µMCD, ˆ ΣMCD); X) = min(h, n − h + 1) n where n

2 ≤ h ≤ n is the number of observations selected in the MCD

solution. In particular, for h = n/2, ε∗((ˆ µMCD, ˆ ΣMCD); X) = 1/2.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 9 / 14

slide-11
SLIDE 11

Properties of the Regularized MCD estimator

  • B. Computation

Iterative algorithm: (ˆ µ0, ˆ Θ0) → . . . → (ˆ µk, ˆ Θk) → (ˆ µk+1, ˆ Θk+1) → . . . (ˆ µ0, ˆ Θ0) : Regularized ML estimator based on a random subset of 2

  • bservations

iteration k to k + 1 by means of a C−step works for n < p

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 10 / 14

slide-12
SLIDE 12

Simulations

Clean setting: n = p = 50, Σii = 1 and Σij = 0.5I(i, j ≤ 9) for all i = j. Contaminated setting: 5% of shift and correlation outliers (intermediate or extreme) L1 penalty ML MCD MSE(ˆ µ) KL( Θ) MSE(ˆ µ) KL( Θ) Clean 0.98 6.94 1.43 6.46 5% Intermediate 1.70 9.76 1.42 6.53 5% Extreme 200.89 17.58 1.41 6.53 where KL( Θ) = − log(det( Θ)) + tr( ΘΣ) − (− log(det(Σ−1)) + p)

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 11 / 14

slide-13
SLIDE 13

Applications

Detection of outliers in high dimensional data (with n < p or n/p small). Robust graphical modelling Robust regularized regression

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 12 / 14

slide-14
SLIDE 14

Detection of outliers

n = p = 50, Σii = 1 and Σij = 0.5I(i, j ≤ 9) for all i = j, 5% of shift and correlation outliers

2 4 6 8 10 100 200 300 Regularized ML Mahalanobis distances Regularized MCD robust distances

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 13 / 14

slide-15
SLIDE 15

Conclusions

Robust regularized scatter estimation is available. Other robust multivariate estimators can also be adapted to the penalized setting (e.g. M estimator,...). Still room for further research.

  • G. Haesbroeck and C. Croux (Belgium)

Robust scatter regularization COMPSTAT 2010 14 / 14