Week 7: Regression Issues Standardized and Studentized residuals, - - PowerPoint PPT Presentation

week 7 regression issues
SMART_READER_LITE
LIVE PREVIEW

Week 7: Regression Issues Standardized and Studentized residuals, - - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage, nonconstant variance, non-normality, nonlinearity, transformations, multicollinearity Max H. Farrell The University


slide-1
SLIDE 1

BUS41100 Applied Regression Analysis

Week 7: Regression Issues

Standardized and Studentized residuals,

  • utliers and leverage, nonconstant variance,

non-normality, nonlinearity, transformations, multicollinearity Max H. Farrell The University of Chicago Booth School of Business

slide-2
SLIDE 2

Model assumptions

Y |X ∼ N(β0 + β1X, σ2)

Key assumptions of our linear regression model: (i) The conditional mean of Y is linear in X. (ii) The additive errors (deviations from line)

◮ are Normally distributed ◮ independent from each other ◮ identically distributed (i.e., they have constant variance)

1

slide-3
SLIDE 3

Inference and prediction relies on this model being true! If the model assumptions do not hold, then all bets are off: ◮ prediction can be systematically biased ◮ standard errors and confidence intervals are wrong

(but how wrong?)

We will focus on using graphical methods (plots!) to detect violations of the model assumptions. You’ll see that ◮ It is more of an art than a science, ◮ but it is grounded in mathematics.

2

slide-4
SLIDE 4

Example model violations

Anscombe’s quartet comprises four datasets that have similar statistical properties . . .

> attach(anscombe <- read.csv("anscombe.csv")) > c(x.m1=mean(x1), x.m2=mean(x2), x.m3=mean(x3), x.m4=mean(x4)) x.m1 x.m2 x.m3 x.m4 9 9 9 9 > c(y.m1=mean(y1), y.m2=mean(y2), y.m3=mean(y3), y.m4=mean(y4)) y.m1 y.m2 y.m3 y.m4 7.500909 7.500909 7.500000 7.500909 > c(x.sd1=sd(x1), x.sd2=sd(x2), x.sd3=sd(x3), x.sd3=sd(x4)) x.sd1 x.sd2 x.sd3 x.sd4 3.316625 3.316625 3.316625 3.316625 > c(y.sd1=sd(y1), y.sd2=sd(y2), y.sd4=sd(y3), y.sd4=sd(y4)) y.sd1 y.sd2 y.sd3 y.sd4 2.031568 2.031657 2.030424 2.030579 > c(cor1=cor(x1,y1), cor2=cor(x2,y2), cor3=cor(x3,y3), cor4=cor(x4,y4)) cor1 cor2 cor3 cor4 0.8164205 0.8162365 0.8162867 0.8165214 3

slide-5
SLIDE 5

. . . but vary considerably when graphed.

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x1 y1

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x2 y2

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x3 y3

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x4 y4

4

slide-6
SLIDE 6

Similarly, let’s consider linear regression for each dataset.

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x1 y1

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x2 y2

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x3 y3

  • 4

6 8 10 12 14 16 18 20 4 6 8 10 x4 y4

5

slide-7
SLIDE 7

The regression lines and even R2 values are the same...

> ansreg <- list(reg1=lm(y1~x1), reg2=lm(y2~x2), + reg3=lm(y3~x3), reg4=lm(y4~x4)) > attach(ansreg) > cbind(reg1$coef, reg2$coef, reg3$coef, reg4$coef) [,1] [,2] [,3] [,4] (Intercept) 3.0000909 3.000909 3.0024545 3.0017273 x1 0.5000909 0.500000 0.4997273 0.4999091 > smry <- lapply(ansreg, summary) > c(smry$reg1$r.sq, smry$reg2$r.sq, + smry$reg3$r.sq, smry$reg4$r.sq) [1] 0.6665425 0.6662420 0.6663240 0.6667073

6

slide-8
SLIDE 8

...but the residuals (plotted against ˆ Y ) look totally different.

  • 5

6 7 8 9 10 −2 −1 1 reg1$fitted reg1$residuals

  • 5

6 7 8 9 10 −2.0 −1.0 0.0 1.0 reg2$fitted reg2$residuals

  • 5

6 7 8 9 10 −1 1 2 3 reg3$fitted reg3$residuals

  • 7

8 9 10 11 12 −1.5 0.0 1.0 reg4$fitted reg4$residuals

7

slide-9
SLIDE 9

Plotting e vs ˆ Y is your #1 tool for finding fit problems. Why? ◮ Because it gives a quick visual indicator of whether or not the model assumptions are true. What should we expect to see if they are true?

  • 1. Each εi has the same variance (σ2).
  • 2. Each εi has the same mean (0).
  • 3. The εi collectively have the same Normal distribution.

Remember: ˆ Y is made from X in SLR and MLR, so one plot summarizes across the X.

8

slide-10
SLIDE 10

How do we check these? Well, the true εi residuals are unknown, so must look instead at the least squares estimated residuals. ◮ We estimate Yi = b0 + b1Xi + ei, such that the sample least squares regression residuals are ei = Yi − ˆ Yi What should the ei look like if the SLR model is true?

9

slide-11
SLIDE 11

If the SLR model is true, it turns out that:

ei ∼ N(0, σ2[1 − hi]), hi = 1 n + (Xi − ¯ X)2 n

j=1(Xj − ¯

X)2.

The hi term is referred to as the ith observation’s leverage: ◮ It is that point’s share of the data (1/n) plus its proportional contribution to variability in X. Notice that as n → ∞, hi → 0 and residuals ei “obtain” the same distribution as the unknown errors εi, i.e., ei ∼ N(0, σ2).

————————————— See handout on course page for derivations. 10

slide-12
SLIDE 12

Understanding Leverage

The hi leverage term measures sensitivity of the estimated least squares regression line to changes in Yi. The term “leverage” provides a mechanical intuition: ◮ The farther you are from a pivot joint, the more torque you have pulling on a lever. Online illustration of leverage:

https://rstudio-class.chicagobooth.edu

Outliers do more damage if they have high leverage!

11

slide-13
SLIDE 13

Standardized residuals

Since ei ∼ N(0, σ2[1 − hi]), we know that

ei σ√1 − hi ∼ N(0, 1).

These transformed ei’s are called the standardized residuals. ◮ They all have the same distribution if the SLR model assumptions are true. ◮ They are almost (close enough) independent (

iid

∼ N(0, 1)). ◮ Estimate σ2 using ˆ σ2 or s2

12

slide-14
SLIDE 14

About estimating s under sketchy SLR assumptions ... We want to see whether any particular ei is “too big”, but we don’t want a single outlier to make s artificially large.

> plot(x3,y3, col=3, pch=20, cex=1.5) > abline(reg3, col=3)

  • 4

6 8 10 12 14 6 8 10 12 x3 y3

◮ One big outlier can make s

  • verestimate σ.

13

slide-15
SLIDE 15

Studentized residuals

We thus define a Studentized residual as

ri = ei s−i √1 − hi

where s2

−i = 1 n−p−1

  • j=i e2

j is ˆ

σ calculated without ei. These are easy to get in R with the rstudent() function.

> rstudent(reg3) [1]

  • 0.4390554
  • 0.1855022 1203.5394638
  • 0.3138441

[5]

  • 0.5742948
  • 1.1559818

0.0664074 0.3618514 [9]

  • 0.7356770
  • 0.0657680

0.2002633

14

slide-16
SLIDE 16

Outliers and Studentized residuals

Since the studentized residuals should be ≈ N(0, 1), we should be concerned about any ri outside of about [−3, 3].

  • 5

6 7 8 9 10 −1 1 2 3 reg3$fitted reg3$residuals

  • 5

6 7 8 9 10 200 600 1000 reg3$fitted rstudent(reg3)

These aren’t hard and fast cutoffs. As n gets bigger, we will expect to see some very rare events (big εi) and not get worried unless |ri| > 4.

15

slide-17
SLIDE 17

How to deal with outliers

When should you delete outliers? ◮ Only when you have a really good reason! There is nothing wrong with running a regression with and without potential outliers to see whether results are significantly impacted. Any time outliers are dropped, the reasons for doing so should be clearly noted. ◮ I maintain that both a statistical and a non-statistical reason are required. (What?)

16

slide-18
SLIDE 18

Outliers, leverage, and residuals

Warning: Unfortunately, outliers with high leverage are hard to catch through ri (since the line is pulled towards them). Consider data on house Rents vs SqFt:

  • 20

40 60 80 200 600 1000 1400 SqFt Rent

  • 20

40 60 80 −4 −2 2 4 SqFt rstudent(rentreg)

  • Plots of ri or ei against ˆ

Yi or Xi are still your best diagnostic!

17

slide-19
SLIDE 19

Normality and studentized residuals

A more subtle issue is the normality of the distribution on ε. We can look at the residuals to judge normality if n is big enough (say > 20; less than that makes it too hard to call). In particular, if we have decent size n, we want the shape of the studentized residual distribution to “look” like N(0, 1). The most obvious tactic is to look at a histogram of ri.

18

slide-20
SLIDE 20

For example, consider the residuals from a regression of Rent

  • n SqFt which ignores houses with ≥ 2000 sqft.

> rentreg <- lm(Rent[SqFt<20] ~ SqFt[SqFt<20]) > par(mfrow=c(1,2)) > plot(SqFt[SqFt<20], Rent[SqFt<20], pch=20, col=7, + main="Regression for <2000 sqft Rent") > abline(rentreg) > hist(rstudent(rentreg), col=7)

  • 5

10 15 400 800 1400

Regression for <2000 sqft Rent

SqFt[SqFt < 20] Rent[SqFt < 20]

Histogram of rstudent(rentreg)

rstudent(rentreg) Frequency −3 −2 −1 1 2 3 50 100 150

19

slide-21
SLIDE 21

Assessing normality via Q-Q plots

Higher fidelity diagnostics are provided by normal Q-Q plots. Q-Q stands for quantile-quantile: ◮ plot the sample quantiles (e.g. 10th percentile, etc.) ◮ against true percentiles from a N(0, 1) distribution (e.g. −1.96 is the true 2.5% quantile). If ri ∼ N(0, 1) these quantiles should be equal ◮ lie on a line through 0 with slope 1

20

slide-22
SLIDE 22

R has a function for normal Q-Q plots: > qqnorm(rstudent(rentreg), col=4) > abline(a=0, b=1)

  • ●●
  • −3

−2 −1 1 2 3 −3 −2 −1 1 2 3

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

◮ It is good to add

the line Y = X to see where points should be.

21

slide-23
SLIDE 23

Example Q-Q plots: normal, exponential, and t3 data

> znorm <- rnorm(1000); zexp <- rexp(1000); zt <- rt(1000, df=3)

Histogram of znorm

znorm Frequency

  • 3
  • 2
  • 1

1 2 3 20 40 60 80

Histogram of zexp

zexp Frequency 2 4 6 8 50 150 250

Histogram of zt

zt Frequency

  • 5

5 50 100 150

  • 3
  • 2
  • 1

1 2 3

  • 3
  • 2
  • 1

1 2

Normal Q-Q plot for znorm

Theoretical Quantiles Sample Quantiles

  • 3
  • 2
  • 1

1 2 3 2 4 6 8

Normal Q-Q plot for zexp

Theoretical Quantiles Sample Quantiles

  • 3
  • 2
  • 1

1 2 3

  • 6
  • 2

2 4 6

Normal Q-Q plot for zt

Theoretical Quantiles Sample Quantiles

22

slide-24
SLIDE 24

Example: recall our pickup data regression of price on years:

> attach(pickup <- read.csv("pickup.csv")) > truckreg <- lm(price ~ year) > r <- rstudent(truckreg)

Our go-to suite of three diagnostic plots:

> par(mfrow=c(1,3)) > plot(truckreg$fitted, truckreg$residuals, + xlab="y.hat", ylab = "e", + main="residuals vs fitted", pch=20) > abline(h=0, col=2, lty=2) > hist(r, col=8) > qqnorm(r, main="Normal Q-Q plot for r") > abline(a=0, b=1, col=4, lty=2)

23

slide-25
SLIDE 25
  • −5000

5000 15000 −5000 5000 10000

residuals vs fitted

y.hat e

  • Histogram of r

r Frequency −2 1 2 3 4 5 5 10 15 20 25

  • −2

−1 1 2 −1 1 2 3 4

Normal Q−Q plot for r

Theoretical Quantiles Sample Quantiles

The plots tell us that: ◮ Data are more curved than straight (i.e. line doesn’t fit). ◮ Residuals are skewed to the right. ◮ There is a huge positive ei for an old “classic” truck.

24

slide-26
SLIDE 26

Nonconstant variance

One of the most common violations (problems?) in real data ◮ E.g. A trumpet shape in the scatterplot

  • 0.0

0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6

scatter plot

x y

  • 1

2 3 4 5 −2 −1 1 2

residual plot

fit$fitted fit$residual

We can try to stabilize the variance . . . or do robust inference

25

slide-27
SLIDE 27

Variance stabilizing transformations

This is one of the most common model violations; luckily, it is usually fixable by transforming the response (Y ) variable. log(Y ) is the most common variance stabilizing transform. ◮ If Y has only positive values (e.g. sales) or is a count (e.g. # of customers), take log(Y ) (always natural log). Also, consider looking at Y/X or dividing by another factor. In general, think about in what scale you expect linearity.

26

slide-28
SLIDE 28

For example, suppose Y = β0 + β1X + ε, ε ∼ N(0, (Xσ)2). ◮ This is not cool! ◮ sd(εi) = |Xi|σ ⇒ nonconstant variance. But we could look instead at Y X = β0 X + β1 + ε X = β⋆

0 + 1

X β⋆

1 + ε⋆

where var(ε⋆) = X−2var(ε) = σ2 is now constant. Hence, the proper linear scale is to look at Y/X ∼ 1/X.

27

slide-29
SLIDE 29

Reconsider the regression of truck price onto year, after removing trucks older than 1993 (truck[year>1992,]).

  • 1995

2000 2005 5000 15000 year[year > 1992] price[year > 1992]

  • 1995

2000 2005 7.5 8.5 9.5 year[year > 1992] log(price[year > 1992])

  • 5000

10000 15000 −6000 4000

price ~ year

fitted residuals

  • 8.0

8.5 9.0 9.5 −1.0 −0.5 0.0 0.5

log(price) ~ year

fitted residuals

28

slide-30
SLIDE 30

Warning: be careful when interpreting the transformed model. If E[log(Y )|X] = b0 + b1X, then E[Y |X] ≈ eb0eb1X. We have a multiplicative model now! Also, you cannot compare R2 values for regressions corresponding to different transformations of the response. ◮ Y and f(Y ) may not be on the same scale, ◮ therefore var(Y ) and var(f(Y )) may not be either. Look at residuals to see which model is better.

29

slide-31
SLIDE 31

Heteroskedasticity Robust Inference

What if σ2 is not constant?

Predictions, point estimates ˆ

Yf = b0 + b1Xf

◮ Everything from week 1 still applies

  • X. Inference: CI: σb1 = σ/
  • (n − 1)s2

x

◮ But week 2 is all wrong! ◮ Luckily, we can find different (more complicated) variance formulas.

⇒ Keep the original model ◮ Same scale, same interpretation ◮ New standard errors (bigger → less precision) ◮ Impacts confidence intervals, tests ◮ What about prediction intervals?

30

slide-32
SLIDE 32

Example: back to the full pickup regression of price on years, all trucks. Ignoring the violation:

> truckreg <- lm(price ~ year) > coef(summary(truckreg)) Estimate

  • Std. Error

t value Pr(>|t|) (Intercept)

  • 1468663.94

202492.62

  • 7.2529

4.8767e-09 year 738.54 101.28 7.2920 4.2764e-09

Accounting for nonconstant variance:

> library(lmtest) > library(sandwich) > coeftest(truckreg, vcov = vcovHC) Estimate

  • Std. Error t value Pr(>|t|)

(Intercept)

  • 1468663.94

574787.49 -2.5551 0.01415 year 738.54 287.37 2.5700 0.01363

31

slide-33
SLIDE 33

Nonlinear residual patterns

Consider regression residuals for the second Anscombe dataset:

  • 4

6 8 10 12 14 3 4 5 6 7 8 9 x2 y2

  • 4

6 8 10 12 14 −2.0 −1.0 0.0 1.0 x2 reg2$residuals

◮ Things are not good! It appears that we do not have a linear mean function; that is E[Y |X] = β0 + β1X. ◮ . . . but we already know what to do: add X2!

32

slide-34
SLIDE 34

Residual diagnostics for MLR

Consider the residuals from the sales data:

  • 0.5

1.0 1.5 2.0 −0.03 0.00 0.02 fitted residuals

  • 0.2

0.4 0.6 0.8 −0.03 0.00 0.02 P1 residuals

  • 0.2

0.6 1.0 −0.03 0.00 0.02 P2 residuals

We use the same residual diagnostics (scatterplots, QQ, etc). ◮ Plot raw residuals against ˆ Y to see overall fit. ◮ Compare e against each X to identify problems. Diagnosing the problem and finding a solution involves looking at lots of residual plots (against different Xj’s).

33

slide-35
SLIDE 35

For example, the sales, P1, and P2 variables were pre-transformed from raw values to a log scale. On the original scale, things don’t look so good:

> expsalesMLR <- lm(exp(Sales) ~ exp(P1) + exp(P2))

  • 1

2 3 4 5 6 7 −0.2 0.2 0.6 fitted residuals

  • 1.2

1.6 2.0 2.4 −0.2 0.2 0.6 exp(P1) residuals

  • 1.5

2.0 2.5 3.0 3.5 −0.2 0.2 0.6 exp(P2) residuals

34

slide-36
SLIDE 36

In particular, the studentized residuals are heavily right skewed.

(“studentizing” is the same, but leverage is now distance in d-dim.)

> hist(rstudent(expsalesMLR), col=7, + xlab="Studentized Residuals", main="")

Studentized Residuals Frequency −2 −1 1 2 3 4 10 20 30 40 50

◮ Our log-log transform fixes this problem.

35

slide-37
SLIDE 37

Wrapping Up Diagnostics

Use the three go-to diagnostic plots to check assumptions ◮ Plot residuals v.s. X or ˆ Y to determine next step Think about the correct scale for linearity ◮ Use polynomials for nonlinearities ◮ log() transform is your best friend, gives elasticities ◮ Always pay attention to interpretation!

◮ You can’t use R2 to compare models under different transformations of Y !

36

slide-38
SLIDE 38

Multicollinearity

Our next issue is Multicollinearity: strong linear dependence between some of the covariates in a multiple regression. The usual marginal effect interpretation is lost: ◮ change in one X variable leads to change in others. Coefficient standard errors will be large (since you don’t know which Xj to regress onto) ◮ leads to large uncertainty about the bj’s ◮ therefore you may fail to reject βj = 0 for all of the Xj’s even if they do have a strong effect on Y .

37

slide-39
SLIDE 39

Suppose that you regress Y onto X1 and X2 = 10 × X1. Then E[Y |X1, X2] = β0 + β1X1 + β2X2 = β0 + β1X1 + β2(10X1) and the marginal effect of X1 on Y is

∂E[Y |X1, X2] ∂X1 = β1 + 10β2

◮ X1 and X2 do not act independently!

38

slide-40
SLIDE 40

Example: how employee ratings of their supervisor relate to performance metrics. The Data: Y: Overall rating of supervisor X1: Handles employee complaints X2: Opportunity to learn new things X3: Does not allow special privileges X4: Raises based on performance X5: Overly critical of performance X6: Rate of advancing to better jobs

39

slide-41
SLIDE 41

Y

40 60 80

  • ● ●
  • 30

50 70

  • 50

70 90

40 60 80

  • 40

70

  • X1
  • X2
  • 40

60

  • 30

60

  • X3
  • X4
  • 50

70

  • 50

70 90

  • X5
  • 40

60 80

  • 40

60

  • 50

70

  • 30

50 70 30 50 70

X6

40

slide-42
SLIDE 42

> attach(supervisor) > bosslm <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6) > summary(bosslm) Coefficients: ## abbreviated output Estimate Std. Error t value Pr(>|t|) (Intercept) 10.78708 11.58926 0.931 0.361634 X1 0.61319 0.16098 3.809 0.000903 *** X2 0.32033 0.16852 1.901 0.069925 . X3

  • 0.07305

0.13572

  • 0.538 0.595594

X4 0.08173 0.22148 0.369 0.715480 X5 0.03838 0.14700 0.261 0.796334 X6

  • 0.21706

0.17821

  • 1.218 0.235577

41

slide-43
SLIDE 43

Consider 3 of our supervisor rating covariates: X2: Opportunity to learn new things X3: Does not allow special privileges X4: Raises based on performance ⇒ A boss good at one aspect is usually good at the others.

  • 40

50 60 70 30 40 50 60 70 80

r = 0.5

X2 X3

  • 40

50 60 70 50 60 70 80

r = 0.6

X2 X4

  • 30

40 50 60 70 80 50 60 70 80

r = 0.4

X3 X4

◮ Sure enough, they are all correlated with each other.

42

slide-44
SLIDE 44

In the 3 covariate regression, none of the effects are significant:

> summary(lm(Y~ X2 + X3 + X4)) Coefficients: ## abbreviated output Estimate Std. Error t value Pr(>|t|) (Intercept) 14.1672 11.5195 1.230 0.2298 X2 0.3936 0.2044 1.926 0.0651 . X3 0.1046 0.1682 0.622 0.5396 X4 0.3516 0.2242 1.568 0.1289 Residual standard error: 9.458 on 26 degrees of freedom Multiple R-squared: 0.4587, Adjusted R-squared: 0.3963 F-statistic: 7.345 on 3 and 26 DF, p-value: 0.00101

43

slide-45
SLIDE 45

If you look at individual regression effects, all 3 are significant:

> summary(lm(Y ~ X2)) ## severely abbreviated Estimate Std. Error t value Pr(>|t|) X2 0.6468 0.1532 4.222 0.000231 *** > summary(lm(Y ~ X3)) ## severely abbreviated Estimate Std. Error t value Pr(>|t|) X3 0.4239 0.1701 2.492 0.0189 * > summary(lm(Y ~ X4)) ## severely abbreviated Estimate Std. Error t value Pr(>|t|) X4 0.6909 0.1786 3.868 0.000598 ***

44

slide-46
SLIDE 46

Multicollinearity is not a big problem in and of itself, you just need to know that it is there. If you recognize multicollinearity: ◮ Understand that the βj are not true marginal effects. ◮ Consider dropping variables to get a more simple model (coming up next). ◮ Expect to see big standard errors on your coefficients (i.e., your coefficient estimates are unstable).

45

slide-47
SLIDE 47

The F-test

All X2 + X3 + X4 are insignificant . . . but we know they contain information. The F-test asks if there is “information” in a regression. Tries to formalize the idea of a big R2, instead of just testing one coefficient. The test statistic is

f = SSR/(p − 1) SSE/(n − p) = R2/(p − 1) (1 − R2)/(n − p)

If f is big, then the regression is “worthwhile”: ◮ Big SSR relative to SSE? ◮ R2 close to one?

46

slide-48
SLIDE 48

What we are really testing: H0 : β1 = β2 = · · · = βd = 0 H1 : at least one βj = 0. The test is contained in the R summary for every MLR fit.

> summary(lm(Y~ X2 + X3 + X4)) Residual standard error: 9.458 on 26 degrees of freedom Multiple R-squared: 0.4587, Adjusted R-squared: 0.3963 F-statistic: 7.345 on 3 and 26 DF, p-value: 0.00101

Hypothesis testing only gives a yes/no answer. ◮ Which βj = 0? ◮ How many?

47

slide-49
SLIDE 49

Next Steps

We talked today about problems in regressions ◮ . . . and some fixes. ◮ Idea: a “good” regression satisfies our assumptions Our next step is variable selection. ◮ Idea: a “good” regression has the right X variables ◮ What is the right set? We will see it depends on the goal

48

slide-50
SLIDE 50

Glossary and Equations

Leverage is hi = 1 n + (Xi − ¯ X)2 n

j=1(Xj − ¯

X)2 Studentized residuals are ri = ei s−i √1 − hi

approx

∼ N(0, 1) F-test ◮ H0 : βdbase+1 = βdbase+2 = . . . = βdfull = 0. ◮ H1 : at least one βj = 0 for j > 0. ◮ f =

(R2)/(p−1) (1−R2)/(n−p) ∼ Fp−1,n−p 49