Selecting Variables in Two-Group Robust Linear Discriminant Analysis - - PowerPoint PPT Presentation
Selecting Variables in Two-Group Robust Linear Discriminant Analysis - - PowerPoint PPT Presentation
. . Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van Aelst and Gert Willems Department of Applied Mathematics and Computer Science Ghent University, Belgium COMPSTAT2010 Linear discriminant
Linear discriminant analysis
Linear discriminant analysis setting
p-dimensional data set Group 1: x11 . . . , x1n1 ∈ Π1 ∼ F1 = Fµ1,Σ Group 2: x21 . . . , x2n2 ∈ Π2 ∼ F2 = Fµ2,Σ Common covariance matrix Σ P(X ∈ Π1) = P(X ∈ Π2) dL
j (x) = µt jΣ−1x − 1 2µt jΣ−1µj; j = 1, 2
✤ ✣ ✜ ✢
Linear Bayes rule: Classify x ∈ Rp into Π1 if dL
1(x) > dL 2(x)
and into Π2 otherwise.
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 2
Linear discriminant analysis
Linear discriminant analysis setting
p-dimensional data set Group 1: x11 . . . , x1n1 ∈ Π1 ∼ F1 = Fµ1,Σ Group 2: x21 . . . , x2n2 ∈ Π2 ∼ F2 = Fµ2,Σ Common covariance matrix Σ P(X ∈ Π1) = P(X ∈ Π2) dL
j (x) = µt jΣ−1x − 1 2µt jΣ−1µj; j = 1, 2
✤ ✣ ✜ ✢
Linear Bayes rule: Classify x ∈ Rp into Π1 if dL
1(x) > dL 2(x)
and into Π2 otherwise.
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 3
Linear discriminant analysis
Discriminant coordinate
Direction a that best separates the two populations: a = Σ−1(µ1 − µ2) The projection atx is called the canonical variate or discriminant coordinate
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 4
Linear discriminant analysis
Sample LDA
Estimate the centers µ1 and µ2 and the scatter Σ from the data Standard LDA uses the sample means ¯ x1 and ¯ x2, and the pooled sample covariance matrix Sn = (n1 − 1)S1 + (n2 − 1)S2 n1 + n2 − 2
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 5
Robust LDA
Robust LDA
Use robust estimators of the centers µ1 and µ2 and the common scatter Σ − → S-estimators − → MM-estimators
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 6
Robust LDA
One-sample S-estimators
Observations {x1, . . . , xn} ⊂ Rp ρ0 : [0, ∞[→ [0, ∞[ is bounded, increasing and smooth
✬ ✫ ✩ ✪
S-estimates of the location µn and scatter Σn minimize |C| sub- ject to 1 n
n
∑
i=1
ρ0 ( [(xi − T)tC−1(xi − T)]
1 2
) = b among all T ∈ Rp and C ∈ PDS(p) (Davies 1987, Rousseeuw and Leroy 1987, Lopuhaä 1989)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 7
Robust LDA
ρ functions
A popular family of loss functions is the Tukey biweight (bisquare) family of ρ functions: ρc(t) =
t2 2 − t4 2c2 + t6 6c4
if |t| ≤ c
c2 6
if |t| ≥ c. The constant c can be tuned for robustness (breakdown point) The choice of c also determines the efficiency of the S-estimator → Trade-off robustness vs efficiency
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 8
Robust LDA
Tukey biweight ρ functions
−4 −2 2 4 0.0 0.5 1.0 1.5 2.0 t ρ(t)
c=3 c=2 c=∞
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 9
Robust LDA
One-sample MM-estimates
Put ˜ σn = det( Σn)1/2p, the S-estimate of scale
✬ ✫ ✩ ✪
Then the MM-estimates of the location µn and shape Γn mini- mize 1 n
n
∑
i=1
ρ1 ( [(xi − T)tG−1(xi − T)]
1 2 /˜
σn ) among all T ∈ Rp and G ∈ PDS(p) for which det(G)=1 (Tatsuoka and Tyler 2000)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 10
Robust LDA
ρ functions
Both ρ0 and ρ1 are taken from the same family The constant c in ρ0 can be tuned for robustness (breakdown point) MM-estimator inherits its robustness from the S-scale The constant c in ρ1 can be tuned for efficiency of locations
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 11
Robust LDA
Tukey biweight ρ functions
p = 2 p = 5
−7 7 0.5 1 1.5 2
ρ ρ
1
c c
1 −8 8 1 2 3 4 5 6
ρ ρ
1
c c
1
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 12
Robust LDA
Robust two-sample estimates
Pool the scatter estimates Σ1n1 and Σ2n2 of both groups:
- Σn = n1
Σ1n1 + n2 Σ2n2 n1 + n2 Calculate simultaneous S-estimates of the two locations and the common scatter matrix:
✬ ✫ ✩ ✪
- µ1n,
µ2n and Σn minimize |C| subject to 1 n1 + n2
2
∑
j=1 nj
∑
i=1
ρ0 ( [(xji − Tj)tC−1(xji − Tj)]
1 2
) = b among all T1, T2 ∈ Rp and C ∈ PDS(p) (He and Fung 2000) Similarly, simultaneous MM-estimates can be calculated
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 13
Robust LDA
Bootstrap inference
Advantages of bootstrap
Few assumptions Wide range of applications
Bootstrapping robust estimators
High computational cost Robustness not guaranteed
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 14
Robust LDA
Bootstrap inference
Advantages of bootstrap
Few assumptions Wide range of applications
Bootstrapping robust estimators
High computational cost Robustness not guaranteed
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 15
Fast and robust bootstrap
Fast and robust bootstrap principle
For each bootstrap sample Calculate an approximation for the estimates Use the estimating equations Fast to compute approximations Inherit robustness of initial solution
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 16
Fast and robust bootstrap
Fast and robust bootstrap
Consider estimates that are the solution of a fixed point equation Θn = gn( Θn) For a bootstrap sample Θ∗
n = g∗ n(
Θ∗
n) consider the one-step
approximation
- Θ1⋆
n = g∗ n(
Θn) Take a Taylor expansion about estimands Θ:
- Θn = gn(Θ) + ∇gn(Θ)(
Θn − Θ) + OP(n−1) which can be rewritten as: √n( Θn − Θ) = [I − ∇gn(Θ)]−1√n(gn(Θ) − Θ) + OP(n−1/2) We then obtain √n( Θ∗
n −
Θn) = [I−∇gn( Θn)]−1√n(g∗
n(
Θn)− Θn)+OP(n−1/2) which yields the FRB estimate
- ΘR⋆
n =
Θn + [I − ∇gn( Θn)]−1( Θ1⋆
n −
Θn)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 17
Fast and robust bootstrap
Fast and robust bootstrap
Consider estimates that are the solution of a fixed point equation Θn = gn( Θn) For a bootstrap sample Θ∗
n = g∗ n(
Θ∗
n) consider the one-step
approximation
- Θ1⋆
n = g∗ n(
Θn) Take a Taylor expansion about estimands Θ:
- Θn = gn(Θ) + ∇gn(Θ)(
Θn − Θ) + OP(n−1) which can be rewritten as: √n( Θn − Θ) = [I − ∇gn(Θ)]−1√n(gn(Θ) − Θ) + OP(n−1/2) We then obtain √n( Θ∗
n −
Θn) = [I−∇gn( Θn)]−1√n(g∗
n(
Θn)− Θn)+OP(n−1/2) which yields the FRB estimate
- ΘR⋆
n =
Θn + [I − ∇gn( Θn)]−1( Θ1⋆
n −
Θn)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 18
Fast and robust bootstrap
Properties of fast robust bootstrap
Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution
- f
Θn and the sample distribution of Θn converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ−1(µ1 − µ2)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 19
Fast and robust bootstrap
Properties of fast robust bootstrap
Computational efficiency: The FRB estimates are solutions of a system of linear equations Robustness: The FRB estimates use the weights of the MM-estimates at the original sample Consistency: Under regularity conditions, the FRB distribution
- f
Θn and the sample distribution of Θn converge to the same limiting distribution Smooth mappings: FRB commutes with smooth functions, such as a = Σ−1(µ1 − µ2)
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 20
Fast and robust bootstrap
Variable selection in robust LDA
Two group robust LDA Selection criterion: test for significance of the discriminant coordinate coefficients Use FRB distribution to estimate p-values
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 21
Examples
Example: Biting Flies
Two groups of 35 flies (Leptoconops torrens and Leptoconops carteri) Measurements of
wing length wing width third palp length third palp width fourth palp length
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 22
Examples
Biting Flies: outliers
Wing width
1 2 20 25 30 35 40 45 50 Wing width Group Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 23
Examples
Biting Flies: LDA
Robust LDA Simultaneous two-sample MM-estimates Backward elimination variable selection
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 24
Examples
Biting Flies: FRB
Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200
V 1
Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200
V 2
Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200
V 3
Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100
V 4
Vj −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 100 200
V 5
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 25
Examples
Biting Flies: Backward elimination
Variable Model 1 2 3 4 5 1 0.490 0.817 0.006 0.296 0.002 2 0.306
- 0.016
0.216 0.000 3
- 0.016
0.096 0.000 4
- 0.006
- 0.000
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 26
Conclusions
Conclusions and outlook
Robust LDA based on S/MM-estimators Inference based on fast robust bootstrap Simulations confirm its good performance Variable selection based on contributions to discriminant coordinate More than two groups: Use a robust likelihood ratio type test statistics as selection criterion
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 27
Conclusions
Robust likelihood ratio type test statistics
ΛR
n = |
Σ(g)
n |
| Σ(1)
n |
≡ ˜ σ(g)
n
˜ σ(1)
n
= Sn( µ(g)
1,n, . . . ,
µ(g)
g,n, ˜
Γ(g)
n )
Sn( µ(1)
n , ˜
Γ(1)
n )
ΛR
n = g
∑
j=1 nj
∑
i=1
ρ0([(xji − µ(g)
j,n )t(˜
Γ(g)
n )−1(xji −
µ(g)
j,n )]
1 2 /˜
σ(g)
n ) g
∑
j=1 nj
∑
i=1
ρ0([(xji − µ(1)
n )t(˜
Γ(1)
n )−1(xji −
µ(1)
n )]
1 2 /˜
σ(g)
n )
ΛR
n = g
∑
j=1 nj
∑
i=1
ρ0([(xji − µ(g)
j,n )t(
Σ(g)
n )−1(xji −
µ(g)
j,n )]
1 2 )
g
∑
j=1 nj
∑
i=1
ρ0([(xji − µ(1)
n )t(
Σ(g)
n )−1(xji −
µ(1)
n )]
1 2 ) Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 28
References
.
.
◮
He, X. and Fung, W.K. (2000). High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151–162.
◮
Lopuhaä, H. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, 17, 1662-1683.
◮
Salibian-Barrera, M., Van Aelst, S., and Willems, G. (2006). PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.
◮
Tatsuoka, K.S. and Tyler, D.E. (2000). The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28, 1219–1243.
◮
Van Aelst, S. and Willems, G. (2010). Inference for robust canonical variate analysis. Advances in Data Analysis and Classification, to appear.
Robust Variable Selection in Discriminant Analysis Van Aelst & Willems 29