Robust Statistics Part 3: Regression analysis Peter Rousseeuw - PDF document

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 2

Linear regression Classical estimators The linear regression model The linear regression model says: y i = β 0 + β 1 x i 1 + . . . + β p x ip + ε i = x ′ i β + ε i with i.i.d. errors ε i ∼ N (0 , σ 2 ) , x i = (1 , x i 1 , . . . , x ip ) ′ and β = ( β 0 , β 1 , . . . , β p ) ′ . Denote the n × ( p + 1) matrix containing the predictors x i as X = ( x 1 , . . . , x n ) ′ , the vector of responses y = ( y 1 , . . . , y n ) ′ and the error vector ε = ( ε 1 , . . . , ε n ) ′ . Then: y = X β + ε Any regression estimate ˆ y = X ˆ β yields fitted values ˆ β and residuals r i = r i (ˆ β ) = y i − ˆ y i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 3 Linear regression Classical estimators The least squares estimator Least squares estimator n ˆ � r 2 β LS = argmin i ( β ) β i =1 If X has full rank, then the solution is unique and given by ˆ β LS = ( X ′ X ) − 1 X ′ y The usual unbiased estimator of the error variance is n 1 � i (ˆ σ 2 r 2 ˆ LS = β LS ) n − p − 1 i =1 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 4

Linear regression Classical estimators Outliers in regression Different types of outliers: vertical outlier good leverage point • • y • • • regular data • • • • • • • • • • • • • • • • • • •• bad leverage point • • • • x Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 5 Linear regression Classical estimators Outliers in regression regular observations : internal x i and well-fitting y i 1 vertical outliers : internal x i and non-fitting y i 2 good leverage points : outlying x i and well-fitting y i 3 bad leverage points : outlying x i and non-fitting y i 4 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 6

Linear regression Classical estimators Effect of vertical outliers Example: Telephone data set, which contains the number of international telephone calls (in tens of millions) from Belgium in the years 1950-1973. ● 20 ● ● 15 ● ● ● Calls 10 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 7 Linear regression Classical estimators Effect of vertical outliers LS fit with and without the outliers: ● 20 ● ● 15 ● ● ● Calls 10 LS (all) 5 ● ● ● ● ● ● LS (reduced) ● ● ● ● ● ● ● ● ● ● ● ● 0 50 55 60 65 70 Year Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 8

Linear regression Classical estimators Effect of bad leverage points Stars data set: Hertzsprung-Russell diagram of the star cluster CYG OB1 (47 stars). Here X is the logarithm of a star’s surface temperature, and Y is the logarithm of its light intensity. ● 6.0 ● ● ● ● ● ● ● ● 5.5 ● ● ● ● ● ● log.light ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● 4.5 ● ● ● ● ● ● ● ● ● ● 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 9 Linear regression Classical estimators Effect of bad leverage points LS fit with and without the giant stars: 34 ● 30 6.0 ● 20 ● 11 ● ● ● 9 ● ● ● 5.5 ● ● ● ● ● ● log.light ● LS (all) ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● ● 7 ● ● ● ● ● 4.5 ● LS (reduced) ● ● ● ● ● ● ● ● ● 14 4.0 ● ● 3.6 3.8 4.0 4.2 4.4 4.6 log.Te Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 10

Linear regression Classical outlier diagnostics Classical outlier diagnostics Classical regression estimators 1 Classical outlier diagnostics 2 Regression M-estimators 3 The LTS estimator 4 Outlier detection 5 Regression S-estimators and MM-estimators 6 Regression with categorical predictors 7 Software 8 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 11 Linear regression Classical outlier diagnostics Standardized residuals This residual plot shows the standardized LS residuals r i (ˆ β LS ) ˆ σ LS Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● Standardized LS residual Standardized LS residual ● ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 12

Linear regression Classical outlier diagnostics Studentized residuals Residual plot of the studentized LS residuals given by: remove observation ( x i , y i ) from the data set 1 ( i ) compute ˆ LS on the remaining data β 2 ( i ) y ( i ) i ˆ = x ′ compute the fitted value of y i given by ˆ β 3 i LS compute the “deleted residual”: 4 y ( i ) d i = y i − ˆ i the studentized residuals are r ∗ i = d i /s ( d j ) where s ( d i ) is the standard 5 deviation of all d j . The studentized residuals can be computed without refitting the model each time an observation is deleted. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 13 Linear regression Classical outlier diagnostics Studentized residuals Telephone data Stars data 3 3 ● 2 2 ● ● ● ● ● ● ● Studentized LS residual Studentized LS residual ● ● ● ● 1 ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 ● ● −3 −3 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 14

Linear regression Classical outlier diagnostics Hat matrix The hat matrix H = X ( X ′ X ) − 1 X ′ transforms the observed response vector y into its LS estimate: ˆ y = H y or equivalently y i = h i 1 y 1 + h i 2 y 2 + . . . + h in y n . ˆ The element h ij of H thus measures the effect of the j th observation on ˆ y i , and the diagonal element h ii the effect of the i th observation on its own prediction. Since it holds that average ( h ii ) = ( p + 1) /n and 0 � h ii � 1 , it is sometimes suggested to call observation i a leverage point iff h ii > 2( p + 1) . n Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 15 Linear regression Classical outlier diagnostics Hat matrix Telephone data Stars data 0.20 0.20 ● ● ● ● 0.15 ● ● ● ● 0.15 Hat matrix diagonal Hat matrix diagonal ● ● 0.10 ● ● 0.10 ● ● ● ● ● ● ● ● ● 0.05 ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 5 10 15 20 0 10 20 30 40 Index Index Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 16

Linear regression Classical outlier diagnostics Hat matrix It can be shown that there is a one-to-one correspondence between the squared Mahalanobis distance for object i and its h ii : 1 i + 1 n − 1 MD 2 h ii = n with � x n ) ′ S − 1 MD i = MD ( x i ) = ( x i − ¯ n ( x i − ¯ x n ) . From this expression we see that h ii measures the distance of x i to the center of the data points in the x -space. On the other hand, it shows that the h ii diagnostic is based on nonrobust estimates! Indeed, it often masks outlying x i . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 17 Linear regression Classical outlier diagnostics Cook’s distance Cook’s distance D i measures the influence of the i th case on all n fitted values: y ( i ) ) ′ (ˆ y ( i ) ) D i = (ˆ y − ˆ y − ˆ . σ 2 ( p + 1)ˆ LS It is also equivalent to ( i ) ) ′ ( X ′ X )(ˆ ( i ) ) D i = (ˆ β − ˆ β − ˆ β β . σ 2 ( p + 1)ˆ LS In this sense D i measures the influence of the i th case on the regression coefficients. Often the cutoff values 1 or 4 /n are suggested. Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 18

Robust Statistics Part 3: Regression analysis Peter Rousseeuw - PDF document

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69

Data-driven window width adaption adaption for robust for robust online moving window regression

Statistics 430/514 Introduction to Regression Analysis/ Statistics for Management and the Social

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Analysis of variance and regression Other types of regression models Other types of regression

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Introduction to Regression Analysis Modeling a Response A regression model describes how a

Preserving Statistical Validity in Adaptive Data Analysis Moritz Hardt IBM Research Almaden

1 Analysis Information Where Do Facts Hold? How much information depends on the client

Exploratory Data Analysis Nam Wook Kim Mini-Courses January @ GSAS 2018 Goal Learn the

+ Program Evaluation Planning & Data Analysis ScWk 242 Session 11 Slides + Evaluation

Practical Traffic Analysis Attacks on Secure Messaging Applications Alireza Bahramali, Ramin

Acknowledgement Frank Chen, Glenn Holloway, Dan Janni, Peter Mattson, Lifeng Nai, David

Pr Progr gram T am Trans ansforma,o rma,on f n for A r Aiding iding St Sta,c a,c A

Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu

Robust Statistics Part 3: Regression analysis Peter Rousseeuw - PDF document

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline Classical

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Part 3 Robust Bayesian statistics &amp; applications in reliability networks by Gero Walter 69

Data-driven window width adaption adaption for robust for robust online moving window regression

Statistics 430/514 Introduction to Regression Analysis/ Statistics for Management and the Social

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Analysis of variance and regression Other types of regression models Other types of regression

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Introduction to Regression Analysis Modeling a Response A regression model describes how a

Preserving Statistical Validity in Adaptive Data Analysis Moritz Hardt IBM Research Almaden

1 Analysis Information Where Do Facts Hold? How much information depends on the client

Exploratory Data Analysis Nam Wook Kim Mini-Courses January @ GSAS 2018 Goal Learn the

+ Program Evaluation Planning &amp; Data Analysis ScWk 242 Session 11 Slides + Evaluation

Practical Traffic Analysis Attacks on Secure Messaging Applications Alireza Bahramali, Ramin

Acknowledgement Frank Chen, Glenn Holloway, Dan Janni, Peter Mattson, Lifeng Nai, David

Pr Progr gram T am Trans ansforma,o rma,on f n for A r Aiding iding St Sta,c a,c A

Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu

Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

+ Program Evaluation Planning & Data Analysis ScWk 242 Session 11 Slides + Evaluation