Residual Analysis Inferences about a regression model are valid only - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Residual Analysis Inferences about a regression model are valid only under assumptions about the random errors in the observations. Objectives: Show how residuals reveal departures from assumptions; Suggest procedures for coping with such departures. 1 / 17 Residual Analysis Introduction

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Regression Residuals The random errors ǫ satisfy Y = E ( Y ) + ǫ, or ǫ = Y − E ( Y ) . We observe Y , but we do not know E ( Y ), so we cannot calculate ǫ . We estimate E ( Y ) by ˆ y , the predicted (or fitted) value. We approximate the random errors by regression residuals : ˆ ǫ i = y i − ˆ y i , i = 1 , 2 , . . . , n . 2 / 17 Residual Analysis Regression Residuals

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Properties of residuals If the model contains an intercept, the sum of the residuals, and also their mean, is zero: n � ǫ i = 0 , and so ¯ ˆ ǫ = 0 . ˆ i =1 The covariance of the residuals and any term in the regression model is zero: n � ˆ ǫ i x i , j = 0 , j = 1 , 2 , . . . , k . i =1 3 / 17 Residual Analysis Properties of residuals

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Detecting Lack of Fit A misspecified model is one that leaves out a relevant predictor. The residuals from a misspecified model do not have mean zero. Example: serum cholesterol ( y ) and dietary fat ( x ) in Olympic athletes. ath <- read.table("Text/Exercises&Examples/OLYMPIC.txt", header = TRUE) pairs(ath) 4 / 17 Residual Analysis Detecting Lack of Fit

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Suppose we ignore the graph, and fit a first-order model: l1 <- lm(CHOLESTEROL ~ FAT, ath) summary(l1) plot(ath$FAT, residuals(l1)) The summary of the fitted model looks reasonable. But the graph of the residuals against x show that the assumption E ( ǫ ) = 0 is violated. Because this is a straight-line model, this graph is effectively the same as the “residuals versus fitted value” graph from plot(l1) . 5 / 17 Residual Analysis Detecting Lack of Fit

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Because of the curvature, we could fit the second-order (quadratic) model: l2 <- lm(CHOLESTEROL ~ FAT + I(FAT^2), ath) summary(l2) plot(ath$FAT, residuals(l2)) The residual plot suggests that the model is satisfactory. The quadratic term is highly significant. 6 / 17 Residual Analysis Detecting Lack of Fit

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Partial residuals Sometimes the effect of an independent variable is better described by a transformed version: log( x ) , 1 / x , etc. The partial residual plot can help identify the transformation: The partial residuals for independent variable x j are ǫ ∗ = ˆ ǫ + ˆ ˆ β j x j ǫ ∗ against x j . Plot ˆ Also known as a “Component + Residual” plot. 7 / 17 Residual Analysis Partial residuals

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example Effect of price ( p ) and advertising ( x 2 ) on demand ( y ) for coffee. coffee <- read.table("Text/Exercises&Examples/COFFEE2.txt", header = TRUE) pairs(coffee) Try a first-order model: l1 <- lm(DEMAND ~ PRICE + AD, coffee) summary(l1) plot(coffee$PRICE, residuals(l1)) The residual plot shows misspecification. 8 / 17 Residual Analysis Partial residuals

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The Component + Residual plot: library(car) crPlot(l1, variable = "PRICE") Curve suggests either adding PRICE^2 , or transforming to log( PRICE ) or 1 / PRICE . R 2 and R 2 a are highest for 1 / PRICE . Note: the partial regression plot is different. 9 / 17 Residual Analysis Partial residuals

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Detecting Unequal Variances Homoscedasticity versus heteroscedasticity . That is, constant variance versus varying variance. When the variance is not constant, it is most often related to the mean. For Poisson-distributed data (counts), var( Y ) = E ( Y ). When errors are multiplicative, Y = E ( Y ) × (1 + ǫ ), and var( Y ) ∝ E ( Y ) 2 . 10 / 17 Residual Analysis Detecting Unequal Variances

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Sometimes the variance can be made constant by transforming Y . For example, with multiplicative errors, log( Y ) = log[ E ( Y ) × (1 + ǫ )] = log[ E ( Y )] + log[1 + ǫ ] ≈ log[ E ( Y )] + ǫ. So var[log Y ] is (approximately) constant. Sometimes variance can be made constant by transformation, but a different method may be better than using a transformation. √ For example, with Poisson-distributed counts, Y has approximately constant variance, but a generalized linear model may be more satisfactory. 11 / 17 Residual Analysis Detecting Unequal Variances

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example Salary and experience for social workers. workers <- read.table("Text/Exercises&Examples/SOCWORK.txt", header = TRUE) pairs(workers) Try a second-order model: l2 <- lm(SALARY ~ EXP + I(EXP^2), workers) summary(l2) plot(l2) 12 / 17 Residual Analysis Detecting Unequal Variances

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The “Residuals vs Fitted” plot shows a fan-shaped scatter, and the “Scale-Location” plot shows an upward trend. It suggests std dev( Y ) ∝ E ( Y ) , hence var( Y ) ∝ E ( Y ) 2 , so try logarithms. 13 / 17 Residual Analysis Detecting Unequal Variances

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Second-order model for log(SALARY) : lLog2 <- lm(log(SALARY) ~ EXP + I(EXP^2), workers) summary(lLog2) The quadratic term is not significant, so try a first-order model: lLog1 <- lm(log(SALARY) ~ EXP, workers) summary(lLog1) plot(lLog1) The residual plots are more satisfactory. 14 / 17 Residual Analysis Detecting Unequal Variances

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Simple Test for Heteroscedasticity Divide the data set in two, for instance low fitted values versus high fitted values. Fit the model separately to each part, and compare the MSEs (Mean Square for Errors). Under H 0 : variance is constant, F ∗ = MSE 1 MSE 2 has the F -distribution with ν 1 = n 1 − ( k + 1) and ν 2 = n 2 − ( k + 1) degrees of freedom. 15 / 17 Residual Analysis Simple Test for Heteroscedasticity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II This is usually a two-sided test; H a : variance is not constant. Reject H 0 at level α if F ∗ differs too far from 1 in either direction; that is, if F ∗ < F 1 − α/ 2 ( ν 1 , ν 2 ), the lower α/ 2-point of the distribution, or F ∗ > F α/ 2 ( ν 1 , ν 2 ), the upper α/ 2-point of the distribution. 16 / 17 Residual Analysis Simple Test for Heteroscedasticity

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Note: F 1 − α/ 2 ( ν 1 , ν 2 ) = 1 / F α/ 2 ( ν 2 , ν 1 ), so an equivalent method is based on F = Larger MSE � F ∗ , 1 � Smaller MSE = max . F ∗ Then we reject H 0 if F > F α/ 2 ( ν Larger , ν Smaller ) 17 / 17 Residual Analysis Simple Test for Heteroscedasticity

Residual Analysis Inferences about a regression model are valid only - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Residual Analysis Inferences about a regression model are valid only under assumptions about the random errors in the observations. Objectives:

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual

Lecture 9: Residual Analysis Instructor: Prof. Shuai Huang Industrial and Systems Engineering

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2018 1 Residual

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

Residual Networks (ResNet) Residual Networks (ResNet) In [1]: import d2l from mxnet import gluon,

Residual modular Galois representations and their images Samuele Anni University of Warwick

SESSION 8: VALUING RESIDUAL CLAIMS (EQUITY) Valuing Equity Equity represents a residual

RESIDUAL STRAIN MEASUREMENT IN Presentation by: Jason Cantrell COMPOSITES USING CURE-

Residual Unit Commitment Procedure in MRTU Lorenzo Kristov Principal Market Architect Joint

SOUTH STREET LIME RESIDUAL CLEAN UP July 30, 2018 1 APPROXIMATELY 20,000 CUBIC YARDS (30,000

RADIOLOGICAL ASSESSMENT OF AN AREA WITH URANIUM RESIDUAL MATERIAL Danyl Prez-Snchez

Extraction of Humic substances from Extraction of Humic substances from residual mixed Municipal

Gentrification and Crime: Evidence from Rent Deregulation David Autor Christopher Palmer Parag

Good morning! Well get started at 10 a.m. Improving Academic Websites at Michigan Medicine:

Inversions and the Legal Fiction of Corporate Residence University of San Francisco Law School

Tui$on Se*ng Discussion 2016 2017 Academic Year Tui$on

CSC321 Lecture 17: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 17: ResNets

Deep Residual Learning for Portfolio Optimization: With Attention and Switching Modules Jeff

Linear Regression Part 2: Residuals and Errors INFO-1301, Quantitative Reasoning 1 University of

Linear regression Linear regression is a simple approach to supervised learning. It assumes