robust statistics part 3 regression analysis
play

Robust Statistics Part 3: Regression analysis Peter Rousseeuw - PDF document

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical


  1. Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 2

  2. Linear regression Classical estimators The linear regression model The linear regression model says: y i = β 0 + β 1 x i 1 + . . . + β p x ip + ε i = x ′ i β + ε i with i.i.d. errors ε i ∼ N (0 , σ 2 ) , x i = (1 , x i 1 , . . . , x ip ) ′ and β = ( β 0 , β 1 , . . . , β p ) ′ . Denote the n × ( p + 1) matrix containing the predictors x i as X = ( x 1 , . . . , x n ) ′ , the vector of responses y = ( y 1 , . . . , y n ) ′ and the error vector ε = ( ε 1 , . . . , ε n ) ′ . Then: y = X β + ε Any regression estimate ˆ y = X ˆ β yields fitted values ˆ β and residuals r i = r i (ˆ β ) = y i − ˆ y i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 3 Linear regression Classical estimators The least squares estimator Least squares estimator n ˆ � r 2 β LS = argmin i ( β ) β i =1 If X has full rank, then the solution is unique and given by ˆ β LS = ( X ′ X ) − 1 X ′ y The usual unbiased estimator of the error variance is n 1 � i (ˆ σ 2 r 2 ˆ LS = β LS ) n − p − 1 i =1 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 4

  3. Linear regression Classical estimators Outliers in regression Different types of outliers: vertical outlier good leverage point • • y • • • regular data • • • • • • • • • • • • • • • • • • •• bad leverage point • • • • x Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 5 Linear regression Classical estimators Outliers in regression regular observations : internal x i and well-fitting y i 1 vertical outliers : internal x i and non-fitting y i 2 good leverage points : outlying x i and well-fitting y i 3 bad leverage points : outlying x i and non-fitting y i 4 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 6

  4. Linear regression Classical estimators Effect of vertical outliers Example: Telephone data set, which contains the number of international telephone calls (in tens of millions) from Belgium in the years 1950-1973. ● 20 ● ● 15 ● ● ● Calls 10 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 7 Linear regression Classical estimators Effect of vertical outliers LS fit with and without the outliers: ● 20 ● ● 15 ● ● ● Calls 10 LS (all) 5 ● ● ● ● ● ● LS (reduced) ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 8

  5. Linear regression Classical estimators Effect of bad leverage points Stars data set: Hertzsprung-Russell diagram of the star cluster CYG OB1 (47 stars). Here X is the logarithm of a star’s surface temperature, and Y is the logarithm of its light intensity. ● 6.0 ● ● ● ● ● ● ● ● 5.5 ● ● ● ● ● ● log.light ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 9 Linear regression Classical estimators Effect of bad leverage points LS fit with and without the giant stars: 34 ● 30 6.0 ● 20 ● 11 ● ● ● 9 ● ● ● 5.5 ● ● ● ● ● ● log.light ● LS (all) ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● 7 ● ● ● ● ● 4.5 ● LS (reduced) ● ● ● ● ● ● ● ● ● 14 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 10

  6. Linear regression Classical outlier diagnostics Classical outlier diagnostics Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 11 Linear regression Classical outlier diagnostics Standardized residuals This residual plot shows the standardized LS residuals r i (ˆ β LS ) ˆ σ LS Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● Standardized LS residual Standardized LS residual ● ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 12

  7. Linear regression Classical outlier diagnostics Studentized residuals Residual plot of the studentized LS residuals given by: remove observation ( x i , y i ) from the data set 1 ( i ) compute ˆ LS on the remaining data β 2 ( i ) y ( i ) i ˆ = x ′ compute the fitted value of y i given by ˆ β 3 i LS compute the “deleted residual”: 4 y ( i ) d i = y i − ˆ i the studentized residuals are r ∗ i = d i /s ( d j ) where s ( d i ) is the standard 5 deviation of all d j . The studentized residuals can be computed without refitting the model each time an observation is deleted. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 13 Linear regression Classical outlier diagnostics Studentized residuals Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● ● ● Studentized LS residual Studentized LS residual ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 14

  8. Linear regression Classical outlier diagnostics Hat matrix The hat matrix H = X ( X ′ X ) − 1 X ′ transforms the observed response vector y into its LS estimate: ˆ y = H y or equivalently y i = h i 1 y 1 + h i 2 y 2 + . . . + h in y n . ˆ The element h ij of H thus measures the effect of the j th observation on ˆ y i , and the diagonal element h ii the effect of the i th observation on its own prediction. Since it holds that average ( h ii ) = ( p + 1) /n and 0 � h ii � 1 , it is sometimes suggested to call observation i a leverage point iff h ii > 2( p + 1) . n Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 15 Linear regression Classical outlier diagnostics Hat matrix Telephone data Stars data 0.20 0.20 ● ● ● ● 0.15 ● ● ● ● 0.15 Hat matrix diagonal Hat matrix diagonal ● ● 0.10 ● ● 0.10 ● ● ● ● ● ● ● ● ● 0.05 ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 16

  9. Linear regression Classical outlier diagnostics Hat matrix It can be shown that there is a one-to-one correspondence between the squared Mahalanobis distance for object i and its h ii : 1 i + 1 n − 1 MD 2 h ii = n with � x n ) ′ S − 1 MD i = MD ( x i ) = ( x i − ¯ n ( x i − ¯ x n ) . From this expression we see that h ii measures the distance of x i to the center of the data points in the x -space. On the other hand, it shows that the h ii diagnostic is based on nonrobust estimates! Indeed, it often masks outlying x i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 17 Linear regression Classical outlier diagnostics Cook’s distance Cook’s distance D i measures the influence of the i th case on all n fitted values: y ( i ) ) ′ (ˆ y ( i ) ) D i = (ˆ y − ˆ y − ˆ . σ 2 ( p + 1)ˆ LS It is also equivalent to ( i ) ) ′ ( X ′ X )(ˆ ( i ) ) D i = (ˆ β − ˆ β − ˆ β β . σ 2 ( p + 1)ˆ LS In this sense D i measures the influence of the i th case on the regression coefficients. Often the cutoff values 1 or 4 /n are suggested. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend