Checking Normality One of the standard assumptions that ensure that - - PowerPoint PPT Presentation

checking normality
SMART_READER_LITE
LIVE PREVIEW

Checking Normality One of the standard assumptions that ensure that - - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Checking Normality One of the standard assumptions that ensure that inferences are valid is that the random errors = Y E ( Y | x ) are


slide-1
SLIDE 1

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Checking Normality

One of the standard assumptions that ensure that inferences are valid is that the random errors ǫ = Y − E(Y |x) are normally distributed. Standard error calculations do not depend on the normality assumption, but P-values do. Except in small samples, departures from normality do not usually invalidate hypothesis tests or confidence intervals.

1 / 26 Residual Analysis Checking Normality

slide-2
SLIDE 2

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Often, when data are not normal, they show longer/heavier tails. Heavy tails generally make inferences conservative. For instance, a 95% confidence interval actually covers the true parameter value with a probability higher than 95%. Similarly, the Type I error rate in a hypothesis test is less than the nominal α. Conservative inferences are not optimal (for instance, confidence intervals are wider than they need to be), but they are better than anti-conservative.

2 / 26 Residual Analysis Checking Normality

slide-3
SLIDE 3

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

One approach to checking normality is by a hypothesis test: H0: ǫ is normally distributed, versus Ha: ǫ is not normally distributed. The Shapiro-Wilk test is often recommended. All tests have relatively low power in small samples, and even in moderately large samples. That is, the chance of detecting moderate non-normality is not close to 1.

3 / 26 Residual Analysis Checking Normality

slide-4
SLIDE 4

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Graphical checks

Stem-and-leaf display (semi-graphical)

r <- residuals(lm(log(SALARY) ~ EXP + I(EXP^2), workers)) stem(r)

produces the display:

The decimal point is 1 digit(s) to the left of the |

  • 3 | 54
  • 2 | 110
  • 1 | 87665210
  • 0 | 7776555542221

0 | 01233445788 1 | 045688 2 | 1134566

4 / 26 Residual Analysis Checking Normality

slide-5
SLIDE 5

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Histogram

hist(r) # to match Figure 8.20: hist(r, breaks = seq(from = -0.425, to = 0.425, by = 0.05), freq = FALSE) # overlay a normal curve: curve(dnorm(x, mean = mean(r), sd = sd(r)), col = "red", add = TRUE)

Quantile-quantile plot

qqnorm(r) # to match Figure 8.22: qqnorm(r, datax = TRUE)

Note: The quantile-quantile plot is more useful than the histogram, even with an overlaid normal density.

5 / 26 Residual Analysis Checking Normality

slide-6
SLIDE 6

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Outliers

Recall that ˆ ǫi = yi − ˆ yi is the ith residual; it has the same units as Y . Residuals are often scaled in some way to make them dimensionless. Terminology varies! Here we follow R (rstandard() and rstudent()) and SAS/INSIGHT, not the text.

6 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-7
SLIDE 7

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Scaled residual (“standardized” residual in the text): scaled residual zi = ˆ ǫi s = yi − ˆ yi s . Rule of thumb If |zi| > 3, the ith observation is an outlier. Equivalently, |yi − ˆ yi| > 3s, a “3-σ event”.

7 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-8
SLIDE 8

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The “hat” matrix Each observation contributes to the value of ˆ β; in matrix notation, ˆ β = (X′X)−1 X′y. So it also contributes to the predicted values: ˆ y = Xˆ β = X (X′X)−1 X′y = Hy, where H = X (X′X)−1 X′ is the hat matrix. H “puts the hat on y”.

8 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-9
SLIDE 9

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The residuals are ˆ ǫ = y − ˆ y = (I − H)y and consequently (with some matrix algebra) var (ˆ ǫi) = σ2(1 − hi) where hi is the ith diagonal entry in H. The standardized residual z∗

i =

ˆ ǫi s√1 − hi = yi − ˆ yi s√1 − hi = zi √1 − hi (“studentized” residual in the text) is adjusted for these different variances. We can also use the rule of thumb with standardized residuals: if |z∗

i | > 3, the ith observation is an outlier.

9 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-10
SLIDE 10

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example Fast food data with a data-entry error:

fast <- read.table("Text/Exercises&Examples/FASTFOOD.txt", header = TRUE) fastBad <- fast fastBad[13, "SALES"] <- 82 lBad <- lm(SALES ~ factor(CITY) + TRAFFIC, fastBad) plot(lBad)

Note that the last three plots use standardized residuals z∗

i , so the

rule of thumb is easy to use. An outlier needs careful scrutiny, to distinguish bad data from unusual data.

10 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-11
SLIDE 11

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Leverage

Recall that ˆ y = Hy, where H is the hat matrix: ˆ yi =

n

  • j=1

hi,jyj. The diagonal entry hi,i = hi is the weight attached to yi itself in computing ˆ yi. The diagonal entry hi is defined to be the leverage of the ith

  • bservation.

Leverage measures the contribution of yi to its predicted value ˆ yi.

11 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-12
SLIDE 12

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Leverage satisfies 0 < hi ≤ 1; the average leverage is always ¯ h = p n, where p = k + 1 is the number of model parameters (including the intercept). In many designed experiments, all observations have the same leverage: hi ≡ ¯ h; in observational studies, leverage can vary widely. Rule of thumb If hi > 2¯ h, the ith observation is a leverage point. In the fourth residual plot, the standardized residuals are plotted against leverage.

12 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-13
SLIDE 13

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Influence

An observation can be a leverage point but not have a great influence

  • n ˆ

β. Write ˆ β

(i) for the parameter estimates when the ith observation is

  • mitted.

If ˆ β

(i) is very different from ˆ

β, the ith observation has high influence.

13 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-14
SLIDE 14

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

One measure of the magnitude of ˆ β

(i) − ˆ

β is Cook’s distance, Di = n

j=1

  • ˆ

y (i)

j

− ˆ yj 2 ps2 =

  • ˆ

β

(i) − ˆ

β ′ (X′X)

  • ˆ

β

(i) − ˆ

β

  • ps2

where ˆ yj is the usual predicted value of yj and ˆ y (i)

j

is the predicted value using ˆ β

(i).

14 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-15
SLIDE 15

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

It can be shown that Di = z2

i

p

  • hi

(1 − hi)2

  • = z∗

i 2

p

  • hi

1 − hi

  • .

where zi is the scaled residual, z∗

i is the standardized residual, and hi

is the leverage. If the ith observation has a large standardized residual z∗

i and high

leverage hi, Cook’s distance Di will be large. Rule of thumb If Di > 1, the ith observation is highly influential.

15 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-16
SLIDE 16

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Note Some statisticians suggest using the median of the F-distribution with p and n − p degrees of freedom as the threshold for being “highly influential”. If p < n/2, this is less than 1, but often close. Others prefer a yet more stringent threshold of 4/n. A threshold of 1 is the simplest rule, and is recommended. The fourth residual plot shows contours of Cook’s distance, so the rule of thumb is easy to use.

16 / 26 Residual Analysis Outliers, Leverage, and Influence

slide-17
SLIDE 17

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Detecting Correlation

Time series data Regression models are sometimes used with responses Y1, Y2, . . . , Yn that are collected over time. Often one response is similar to the immediately preceding responses, which means that they are correlated. Since standard errors are usually calculated on the assumption of zero correlation, they can be quite incorrect, often too small by a factor

  • f 2 or more.

17 / 26 Residual Analysis Detecting Correlation

slide-18
SLIDE 18

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

When such serial correlation is present, both the estimation procedure (least squares) and the calculation of standard errors need to be modified. First we need to know when significant correlation is present. Durbin-Watson test The widely available Durbin-Watson test was developed by Jim Durbin and Geof Watson. It is based on the statistic d = n

i=2 (ˆ

ǫi − ˆ ǫi−1)2 n

i=1 ˆ

ǫ2

i

.

18 / 26 Residual Analysis Detecting Correlation

slide-19
SLIDE 19

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

If there is no correlation, d ≈ 2, while if observations are positively correlated, d < 2. Example Trend in sales:

sales35 <- read.table("Text/Exercises&Examples/SALES35.txt", header = TRUE) plot(SALES ~ T, sales35) sales35Lm <- lm(SALES ~ T, sales35) summary(sales35Lm) plot(sales35Lm) library(car) durbinWatsonTest(sales35Lm)

The usual four plots give no information about correlation.

19 / 26 Residual Analysis Detecting Correlation

slide-20
SLIDE 20

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Looking Ahead... The arima() function fits a regression model given an assumed model for the residual correlation. One simple model is the first order autoregression, AR(1):

arima(sales35$SALES, order = c(1, 0, 0), xreg = sales35$T)

Note the increase in standard error of the estimated trend, from 0.1069 to 0.1760.

20 / 26 Residual Analysis Detecting Correlation

slide-21
SLIDE 21

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Supplementary Material

Alternative Views of Leverage From the matrix equation ˆ β = (X′X)−1 X′y it is easily shown that the dispersion matrix of ˆ β is Var

  • ˆ

β

  • = σ2 (X′X)−1 .

So the precision matrix is Var

  • ˆ

β −1 = σ−2X′X

21 / 26 Residual Analysis Detecting Correlation

slide-22
SLIDE 22

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Now σ2Var

  • ˆ

β −1 = X′X =

n

  • i=1

xix′

i

where x′

i, the ith row of X, contains the factor values associated

with yi. So the ith observation contributes xix′

i to the precision of ˆ

β. Which observations contribute the most? We need to measure the “size” of xix′

i relative to X′X.

22 / 26 Residual Analysis Detecting Correlation

slide-23
SLIDE 23

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

In general, we measure the “size” of one nonnegative matrix A with respect to another B by the two-matrix eigenvalues λ1, λ2, . . . , λp which satisfy Aξ = λBξ for corresponding eigenvectors ξ. Because xix′

i has rank 1, the equation

xix′

iξ = λX′Xξ

has only one non-zero solution, λ = x′

i (X′X)−1 xi = hi.

23 / 26 Residual Analysis Detecting Correlation

slide-24
SLIDE 24

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

So the leverage hi may also be viewed as a measure of the size of xix′

i relative to X′X.

But how much does it contribute to X′X? Difficult: the elements may even be in different units. Transform: suppose that A is an inverse “square root” of X′X, in the sense that AX′XA′ = I. Then

n

  • i=1

Axix′

iA′ = AX′XA′ = I.

So now we can just measure the “size” of Axix′

iA′ relative to I.

24 / 26 Residual Analysis Detecting Correlation

slide-25
SLIDE 25

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Problem: A is not unique: if U is any orthogonal matrix, then UA is also an inverse square root. Solution: for any symmetric matrix S, trace[(UA)S(UA)′] = trace(ASA′). That it, the trace is the same for all such inverse square roots. So in the equation

n

  • i=1

trace(Axix′

iA′) = trace(AX′XA′) = trace(I) = p,

the contributions trace(Axix′

iA′) do not depend on which A is used.

25 / 26 Residual Analysis Detecting Correlation

slide-26
SLIDE 26

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Now trace(Axix′

iA′) = trace(x′ iA′Axi) = x′ i (X′X)−1 xi = hi

and we have already used the identity

n

  • i=1

hi = p. So hi

p may also be viewed as the fraction of precision contributed by

the ith observation. That is, observations with high leverage also contribute most to the precision of ˆ β.

26 / 26 Residual Analysis Detecting Correlation