 
              Fast and Robust Classifiers Adjusted for Skewness Mia Hubert and Stephan Van der Veeken Katholieke Universiteit Leuven, Department of Mathematics Mia.Hubert@wis.kuleuven.be COMPSTAT 2010 Mia Hubert, August 24, 2010 Robust classifiers. - p. 1/30
Outline Outline Some classifiers New classifiers Simulations Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 2/30
Outline ■ Review of some classifiers Outline Some classifiers New classifiers Simulations Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline Some classifiers New classifiers Simulations Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline ◆ depth based approaches Some classifiers New classifiers Simulations Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline ◆ depth based approaches Some classifiers New classifiers ■ New approaches based on adjusted outyingness Simulations Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline ◆ depth based approaches Some classifiers New classifiers ■ New approaches based on adjusted outyingness Simulations Example ■ Simulation results Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline ◆ depth based approaches Some classifiers New classifiers ■ New approaches based on adjusted outyingness Simulations Example ■ Simulation results Conclusion ■ A real data set K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Outline ■ Review of some classifiers ◆ normally distributed data Outline ◆ depth based approaches Some classifiers New classifiers ■ New approaches based on adjusted outyingness Simulations Example ■ Simulation results Conclusion ■ A real data set ■ Conclusions and outlook K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 3/30
Some classifiers Setting: ■ Observations sampled from k different classes X j , j = 1 , . . . , k . Outline ■ data belonging to group X j are denoted by x j i ( i = 1 , . . . , n j ) Some classifiers New classifiers ■ the dimension of the data space is p and p ≪ n j . Simulations ■ outliers possible! Example Conclusion Classification: construct a rule to classify a new observation into one of the k populations. K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 4/30
Some classifiers Normally distributed data: ■ Classical Linear discriminant analysis (when covariance matrices in each Outline group are equal) Some classifiers ■ Classical Quadratic discriminant analysis (CQDA) New classifiers Simulations based on classical mean and covariance matrices. Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 5/30
Some classifiers Normally distributed data: ■ Classical Linear discriminant analysis (when covariance matrices in each Outline group are equal) Some classifiers ■ Classical Quadratic discriminant analysis (CQDA) New classifiers Simulations based on classical mean and covariance matrices. Example Conclusion Robust versions (RLDA, RQDA) are obtained by using robust covariance matrices, such as the MCD-estimator or S-estimators. (He and Fung 2000, Croux and Dehon 2001, Hubert and Van Driessen 2004) . K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 5/30
Depth based classifiers Proposed by Ghosh and Chaudhuri (2005). ■ Consider a depth function (Tukey depth, simplicial depth, ...). Outline ■ For a new observation: compute its depth with respect to each group. Some classifiers New classifiers ■ Assign the new observation to the group for which it attains the maximal Simulations depth . Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 6/30
Depth based classifiers Advantages: ■ does not rely on normality Outline ■ optimality results at normal data Some classifiers New classifiers ■ robust towards outliers (degree of robustness depends on depth function) Simulations ■ can handle multigroup classification, not only two-group Example Conclusion K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 7/30
Depth based classifiers Advantages: ■ does not rely on normality Outline ■ optimality results at normal data Some classifiers New classifiers ■ robust towards outliers (degree of robustness depends on depth function) Simulations ■ can handle multigroup classification, not only two-group Example Conclusion Disadvantages: ■ computation time ■ ties: observations outside the convex hull of all groups have zero depth w.r.t. each group ■ adaptations necessary for unequal sample sizes. Ghosh and Chaudhuri propose methods that rely on kernel density estimates. K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 7/30
New depth based classifiers New proposals based on adjusted outlyingness . First consider univariate data . Outline Standard boxplot has whiskers as the smallest and the largest data point that Some classifiers do not exceed: New classifiers [ Q 1 − 1 . 5 IQR , Q 3 + 1 . 5 IQR ] Simulations Adjusted boxplot has whiskers that end at the smallest and the largest data Example point that do not exceed Conclusion [ Q 1 − 1 . 5 e − 4 MC IQR , Q 3 + 1 . 5 e 3 MC IQR ] with MC ( X ) = x i <m<x j h ( x i , x j ) med with m the median of X and h ( x i , x j ) = ( x j − m ) − ( m − x i ) x j − x i (Hubert and Vandervieren, CSDA, 2008) K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 8/30
Medcouple - A robust measure of skewness ■ Robustness : ◆ bounded influence function Outline → adding a small probability mass at a certain point has a bounded Some classifiers influence on the estimate. New classifiers Simulations ◆ high breakdown point Example ǫ ∗ ( MC ) = 25% Conclusion → 25% of the data needs to be replaced to make the estimator break down K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 9/30
Medcouple - A robust measure of skewness ■ Robustness : ◆ bounded influence function Outline → adding a small probability mass at a certain point has a bounded Some classifiers influence on the estimate. New classifiers Simulations ◆ high breakdown point Example ǫ ∗ ( MC ) = 25% Conclusion → 25% of the data needs to be replaced to make the estimator break down ■ Computation : ◆ fast algorithm available O( n log n ) K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 9/30
Adjusted boxplot Example: Length of stay in hospital Comparison of the standard and adjusted boxplot Outline 100 Some classifiers 60 80 New classifiers Simulations 60 40 Values Example 40 Conclusion 20 20 0 0 0 20 40 60 80 100 Standard boxplot Adjusted boxplot data K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 10/30
Adjusted outlyingness - univariate data For univariate data, the adjusted outlyingness is defined as: | x i − m | Outline AO (1) = i ( w 2 − m ) I [ x i > m ] + ( m − w 1 ) I [ x i < m ] Some classifiers New classifiers with w 1 and w 2 the whiskers of the adjusted boxplot. Simulations Example d 1 d 2 Conclusion ✛ ✲ ✛ ✲ s s x 1 x 2 ✛ ✲ ✛ ✲ s 1 s 2 K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 11/30
Adjusted outlyingness - univariate data ■ AO (1) ( x 1 ) = d 1 /s 1 and AO (1) ( x 2 ) = d 2 /s 2 . i i ■ Although x 1 and x 2 are located at the same distance from the median, x 1 Outline will have a higher value of adjusted outlyingness, because of the fact that Some classifiers the denominator s 1 is smaller. New classifiers Simulations ■ Skewness is thus used to estimate the scale differently on both sides of the Example median. Conclusion ■ Data-driven (outlying with respect to bulk of the data) Brys, Hubert and Rousseeuw (2005), Hubert and Van der Veeken (2008) K A T H O L I E K E U N I V E R S I T E I T Mia Hubert, August 24, 2010 Robust classifiers. - p. 12/30
Recommend
More recommend