Selecting Variables in Two-Group Robust Linear Discriminant Analysis - - PowerPoint PPT Presentation

selecting variables in two group robust linear
SMART_READER_LITE
LIVE PREVIEW

Selecting Variables in Two-Group Robust Linear Discriminant Analysis - - PowerPoint PPT Presentation

. . Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van Aelst and Gert Willems Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT2010 Linear discriminant


slide-1
SLIDE 1

. . . . . . .

Selecting Variables in Two-Group Robust Linear Discriminant Analysis Stefan Van Aelst and Gert Willems

Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT’2010

slide-2
SLIDE 2

Linear discriminant analysis

Linear discriminant analysis setting

p-dimensional data set Group 1: x11 . . . , x1n1 ∈ Π1 ∼ F1 = Fµ1,Σ Group 2: x21 . . . , x2n2 ∈ Π2 ∼ F2 = Fµ2,Σ Common covariance matrix Σ P(X ∈ Π1) = P(X ∈ Π2) dL

j (x) = µt jΣ−1x − 1 2µt jΣ−1µj; j = 1, 2

✤ ✣ ✜ ✢

Linear Bayes rule: Classify x ∈ Rp into Π1 if dL

1(x) > dL 2(x)

and into Π2 otherwise.

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 2

slide-3
SLIDE 3

Linear discriminant analysis

Linear discriminant analysis setting

p-dimensional data set Group 1: x11 . . . , x1n1 ∈ Π1 ∼ F1 = Fµ1,Σ Group 2: x21 . . . , x2n2 ∈ Π2 ∼ F2 = Fµ2,Σ Common covariance matrix Σ P(X ∈ Π1) = P(X ∈ Π2) dL

j (x) = µt jΣ−1x − 1 2µt jΣ−1µj; j = 1, 2

✤ ✣ ✜ ✢

Linear Bayes rule: Classify x ∈ Rp into Π1 if dL

1(x) > dL 2(x)

and into Π2 otherwise.

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 3

slide-4
SLIDE 4

Linear discriminant analysis

Discriminant coordinate

Direction a that best separates the two populations: a = Σ−1(µ1 − µ2) The projection atx is called the canonical variate or discriminant coordinate

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 4

slide-5
SLIDE 5

Linear discriminant analysis

Sample LDA

Estimate the centers µ1 and µ2 and the scatter Σ from the data Standard LDA uses the sample means ¯ x1 and ¯ x2, and the pooled sample covariance matrix Sn = (n1 − 1)S1 + (n2 − 1)S2 n1 + n2 − 2

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 5

slide-6
SLIDE 6

Robust LDA

Robust LDA

Use robust estimators of the centers µ1 and µ2 and the common scatter Σ − → S-estimators − → MM-estimators

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 6

slide-7
SLIDE 7

Robust LDA

One-sample S-estimators

Observations {x1, . . . , xn} ⊂ Rp ρ0 : [0, ∞[→ [0, ∞[ is bounded, increasing and smooth

✬ ✫ ✩ ✪

S-estimates of the location µn and scatter Σn minimize |C| sub- ject to 1 n

n

i=1

ρ0 ( [(xi − T)tC−1(xi − T)]

1 2

) = b among all T ∈ Rp and C ∈ PDS(p) (Davies 1987, Rousseeuw and Leroy 1987, Lopuhaä 1989)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 7

slide-8
SLIDE 8

Robust LDA

ρ functions

A popular family of loss functions is the Tukey biweight (bisquare) family of ρ functions: ρc(t) =   

t2 2 − t4 2c2 + t6 6c4

if |t| ≤ c

c2 6

if |t| ≥ c. The constant c can be tuned for robustness (breakdown point) The choice of c also determines the efficiency of the S-estimator → Trade-off robustness vs efficiency

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 8

slide-9
SLIDE 9

Robust LDA

Tukey biweight ρ functions

−4 −2 2 4 0.0 0.5 1.0 1.5 2.0 t ρ(t)

c=3 c=2 c=∞

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 9

slide-10
SLIDE 10

Robust LDA

One-sample MM-estimates

Put ˜ σn = det( Σn)1/2p, the S-estimate of scale

✬ ✫ ✩ ✪

Then the MM-estimates of the location µn and shape Γn mini- mize 1 n

n

i=1

ρ1 ( [(xi − T)tG−1(xi − T)]

1 2 /˜

σn ) among all T ∈ Rp and G ∈ PDS(p) for which det(G)=1 (Tatsuoka and Tyler 2000)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 10

slide-11
SLIDE 11

Robust LDA

ρ functions

Both ρ0 and ρ1 are taken from the same family The constant c in ρ0 can be tuned for robustness (breakdown point) MM-estimator inherits its robustness from the S-scale The constant c in ρ1 can be tuned for efficiency of locations

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 11

slide-12
SLIDE 12

Robust LDA

Tukey biweight ρ functions

p = 2 p = 5

−7 7 0.5 1 1.5 2

ρ ρ

1

c c

1 −8 8 1 2 3 4 5 6

ρ ρ

1

c c

1

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 12

slide-13
SLIDE 13

Robust LDA

Robust two-sample estimates

Pool the scatter estimates Σ1n1 and Σ2n2 of both groups:

  • Σn = n1

Σ1n1 + n2 Σ2n2 n1 + n2 Calculate simultaneous S-estimates of the two locations and the common scatter matrix:

✬ ✫ ✩ ✪

  • µ1n,

µ2n and Σn minimize |C| subject to 1 n1 + n2

2

j=1 nj

i=1

ρ0 ( [(xji − Tj)tC−1(xji − Tj)]

1 2

) = b among all T1, T2 ∈ Rp and C ∈ PDS(p) (He and Fung 2000) Similarly, simultaneous MM-estimates can be calculated

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 13

slide-14
SLIDE 14

Robust LDA

Bootstrap inference

Advantages of bootstrap

Few assumptions Wide range of applications

Bootstrapping robust estimators

High computational cost Robustness not guaranteed

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 14

slide-15
SLIDE 15

Robust LDA

Bootstrap inference

Advantages of bootstrap

Few assumptions Wide range of applications

Bootstrapping robust estimators

High computational cost Robustness not guaranteed

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 15

slide-16
SLIDE 16

Fast and robust bootstrap

Fast and robust bootstrap principle

For each bootstrap sample Calculate an approximation for the estimates Use the estimating equations Fast to compute approximations Inherit robustness of initial solution

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 16

slide-17
SLIDE 17

Fast and robust bootstrap

Fast and robust bootstrap

Consider estimates that are the solution of a fixed point equation Θn = gn( Θn) For a bootstrap sample Θ∗

n = g∗ n(

Θ∗

n) consider the one-step

approximation

  • Θ1⋆

n = g∗ n(

Θn) Take a Taylor expansion about estimands Θ:

  • Θn = gn(Θ) + ∇gn(Θ)(

Θn − Θ) + OP(n−1) which can be rewritten as: √n( Θn − Θ) = [I − ∇gn(Θ)]−1√n(gn(Θ) − Θ) + OP(n−1/2) We then obtain √n( Θ∗

n −

Θn) = [I−∇gn( Θn)]−1√n(g∗

n(

Θn)− Θn)+OP(n−1/2) which yields the FRB estimate

  • ΘR⋆

n =

Θn + [I − ∇gn( Θn)]−1( Θ1⋆

n −

Θn)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 17

slide-18
SLIDE 18

Fast and robust bootstrap

Fast and robust bootstrap

Consider estimates that are the solution of a fixed point equation Θn = gn( Θn) For a bootstrap sample Θ∗

n = g∗ n(

Θ∗

n) consider the one-step

approximation

  • Θ1⋆

n = g∗ n(

Θn) Take a Taylor expansion about estimands Θ:

  • Θn = gn(Θ) + ∇gn(Θ)(

Θn − Θ) + OP(n−1) which can be rewritten as: √n( Θn − Θ) = [I − ∇gn(Θ)]−1√n(gn(Θ) − Θ) + OP(n−1/2) We then obtain √n( Θ∗

n −

Θn) = [I−∇gn( Θn)]−1√n(g∗

n(

Θn)− Θn)+OP(n−1/2) which yields the FRB estimate

  • ΘR⋆

n =

Θn + [I − ∇gn( Θn)]−1( Θ1⋆

n −

Θn)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 18

slide-19
SLIDE 19

Fast and robust bootstrap

Properties of fast robust bootstrap

Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution

  • f

Θn and the sample distribution of Θn converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ−1(µ1 − µ2)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 19

slide-20
SLIDE 20

Fast and robust bootstrap

Properties of fast robust bootstrap

Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution

  • f

Θn and the sample distribution of Θn converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ−1(µ1 − µ2)

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 20

slide-21
SLIDE 21

Fast and robust bootstrap

Variable selection in robust LDA

Two group robust LDA Selection criterion: test for significance of the discriminant coordinate coefficients Use FRB distribution to estimate p-values

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 21

slide-22
SLIDE 22

Examples

Example: Biting Flies

Two groups of 35 flies (Leptoconops torrens and Leptoconops carteri) Measurements of

wing length wing width third palp length third palp width fourth palp length

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 22

slide-23
SLIDE 23

Examples

Biting Flies: outliers

Wing width

1 2 20 25 30 35 40 45 50 Wing width Group Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 23

slide-24
SLIDE 24

Examples

Biting Flies: LDA

Robust LDA Simultaneous two-sample MM-estimates Backward elimination variable selection

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 24

slide-25
SLIDE 25

Examples

Biting Flies: FRB

Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200

V 1

Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200

V 2

Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200

V 3

Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100

V 4

Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200

V 5

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 25

slide-26
SLIDE 26

Examples

Biting Flies: Backward elimination

Variable Model 1 2 3 4 5 1 0.490 0.817 0.006 0.296 0.002 2 0.306

  • 0.016

0.216 0.000 3

  • 0.016

0.096 0.000 4

  • 0.006
  • 0.000

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 26

slide-27
SLIDE 27

Conclusions

Conclusions and outlook

Robust LDA based on S/MM-estimators Inference based on fast robust bootstrap Simulations confirm its good performance Variable selection based on contributions to discriminant coordinate More than two groups: Use a robust likelihood ratio type test statistics as selection criterion

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 27

slide-28
SLIDE 28

Conclusions

Robust likelihood ratio type test statistics

ΛR

n = |

Σ(g)

n |

| Σ(1)

n |

≡ ˜ σ(g)

n

˜ σ(1)

n

= Sn( µ(g)

1,n, . . . ,

µ(g)

g,n, ˜

Γ(g)

n )

Sn( µ(1)

n , ˜

Γ(1)

n )

ΛR

n = g

j=1 nj

i=1

ρ0([(xji − µ(g)

j,n )t(˜

Γ(g)

n )−1(xji −

µ(g)

j,n )]

1 2 /˜

σ(g)

n ) g

j=1 nj

i=1

ρ0([(xji − µ(1)

n )t(˜

Γ(1)

n )−1(xji −

µ(1)

n )]

1 2 /˜

σ(g)

n )

ΛR

n = g

j=1 nj

i=1

ρ0([(xji − µ(g)

j,n )t(

Σ(g)

n )−1(xji −

µ(g)

j,n )]

1 2 )

g

j=1 nj

i=1

ρ0([(xji − µ(1)

n )t(

Σ(g)

n )−1(xji −

µ(1)

n )]

1 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 28

slide-29
SLIDE 29

References

.

.

He, X. and Fung, W.K. (2000). High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151–162.

Lopuhaä, H. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, 17, 1662-1683.

Salibian-Barrera, M., Van Aelst, S., and Willems, G. (2006). PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.

Tatsuoka, K.S. and Tyler, D.E. (2000). The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28, 1219–1243.

Van Aelst, S. and Willems, G. (2010). Inference for robust canonical variate analysis. Advances in Data Analysis and Classification, to appear.

Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 29