BUS41100 Applied Regression Analysis
Week 7: Regression Issues
Standardized and Studentized residuals,
- utliers and leverage, nonconstant variance,
Week 7: Regression Issues Standardized and Studentized residuals, - - PowerPoint PPT Presentation
BUS41100 Applied Regression Analysis Week 7: Regression Issues Standardized and Studentized residuals, outliers and leverage, nonconstant variance, non-normality, nonlinearity, transformations, multicollinearity Max H. Farrell The University
1
2
> attach(anscombe <- read.csv("anscombe.csv")) > c(x.m1=mean(x1), x.m2=mean(x2), x.m3=mean(x3), x.m4=mean(x4)) x.m1 x.m2 x.m3 x.m4 9 9 9 9 > c(y.m1=mean(y1), y.m2=mean(y2), y.m3=mean(y3), y.m4=mean(y4)) y.m1 y.m2 y.m3 y.m4 7.500909 7.500909 7.500000 7.500909 > c(x.sd1=sd(x1), x.sd2=sd(x2), x.sd3=sd(x3), x.sd3=sd(x4)) x.sd1 x.sd2 x.sd3 x.sd4 3.316625 3.316625 3.316625 3.316625 > c(y.sd1=sd(y1), y.sd2=sd(y2), y.sd4=sd(y3), y.sd4=sd(y4)) y.sd1 y.sd2 y.sd3 y.sd4 2.031568 2.031657 2.030424 2.030579 > c(cor1=cor(x1,y1), cor2=cor(x2,y2), cor3=cor(x3,y3), cor4=cor(x4,y4)) cor1 cor2 cor3 cor4 0.8164205 0.8162365 0.8162867 0.8165214 3
6 8 10 12 14 16 18 20 4 6 8 10 x1 y1
6 8 10 12 14 16 18 20 4 6 8 10 x2 y2
6 8 10 12 14 16 18 20 4 6 8 10 x3 y3
6 8 10 12 14 16 18 20 4 6 8 10 x4 y4
4
6 8 10 12 14 16 18 20 4 6 8 10 x1 y1
6 8 10 12 14 16 18 20 4 6 8 10 x2 y2
6 8 10 12 14 16 18 20 4 6 8 10 x3 y3
6 8 10 12 14 16 18 20 4 6 8 10 x4 y4
5
6
6 7 8 9 10 −2 −1 1 reg1$fitted reg1$residuals
6 7 8 9 10 −2.0 −1.0 0.0 1.0 reg2$fitted reg2$residuals
6 7 8 9 10 −1 1 2 3 reg3$fitted reg3$residuals
8 9 10 11 12 −1.5 0.0 1.0 reg4$fitted reg4$residuals
7
8
9
————————————— See handout on course page for derivations. 10
11
iid
12
6 8 10 12 14 6 8 10 12 x3 y3
13
−i = 1 n−p−1
j is ˆ
14
6 7 8 9 10 −1 1 2 3 reg3$fitted reg3$residuals
6 7 8 9 10 200 600 1000 reg3$fitted rstudent(reg3)
15
16
40 60 80 200 600 1000 1400 SqFt Rent
40 60 80 −4 −2 2 4 SqFt rstudent(rentreg)
17
18
> rentreg <- lm(Rent[SqFt<20] ~ SqFt[SqFt<20]) > par(mfrow=c(1,2)) > plot(SqFt[SqFt<20], Rent[SqFt<20], pch=20, col=7, + main="Regression for <2000 sqft Rent") > abline(rentreg) > hist(rstudent(rentreg), col=7)
10 15 400 800 1400
Regression for <2000 sqft Rent
SqFt[SqFt < 20] Rent[SqFt < 20]
Histogram of rstudent(rentreg)
rstudent(rentreg) Frequency −3 −2 −1 1 2 3 50 100 150
19
20
−2 −1 1 2 3 −3 −2 −1 1 2 3
Normal Q−Q Plot
Theoretical Quantiles Sample Quantiles
21
> znorm <- rnorm(1000); zexp <- rexp(1000); zt <- rt(1000, df=3)
Histogram of znorm
znorm Frequency
1 2 3 20 40 60 80
Histogram of zexp
zexp Frequency 2 4 6 8 50 150 250
Histogram of zt
zt Frequency
5 50 100 150
1 2 3
1 2
Normal Q-Q plot for znorm
Theoretical Quantiles Sample Quantiles
1 2 3 2 4 6 8
Normal Q-Q plot for zexp
Theoretical Quantiles Sample Quantiles
1 2 3
2 4 6
Normal Q-Q plot for zt
Theoretical Quantiles Sample Quantiles
22
23
5000 15000 −5000 5000 10000
residuals vs fitted
y.hat e
r Frequency −2 1 2 3 4 5 5 10 15 20 25
−1 1 2 −1 1 2 3 4
Normal Q−Q plot for r
Theoretical Quantiles Sample Quantiles
24
0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6
scatter plot
x y
2 3 4 5 −2 −1 1 2
residual plot
fit$fitted fit$residual
25
26
0 + 1
1 + ε⋆
27
2000 2005 5000 15000 year[year > 1992] price[year > 1992]
2000 2005 7.5 8.5 9.5 year[year > 1992] log(price[year > 1992])
10000 15000 −6000 4000
price ~ year
fitted residuals
8.5 9.0 9.5 −1.0 −0.5 0.0 0.5
log(price) ~ year
fitted residuals
28
29
x
30
31
6 8 10 12 14 3 4 5 6 7 8 9 x2 y2
6 8 10 12 14 −2.0 −1.0 0.0 1.0 x2 reg2$residuals
32
1.0 1.5 2.0 −0.03 0.00 0.02 fitted residuals
0.4 0.6 0.8 −0.03 0.00 0.02 P1 residuals
0.6 1.0 −0.03 0.00 0.02 P2 residuals
33
2 3 4 5 6 7 −0.2 0.2 0.6 fitted residuals
1.6 2.0 2.4 −0.2 0.2 0.6 exp(P1) residuals
2.0 2.5 3.0 3.5 −0.2 0.2 0.6 exp(P2) residuals
34
Studentized Residuals Frequency −2 −1 1 2 3 4 10 20 30 40 50
35
36
37
38
39
40 60 80
50 70
70 90
40 60 80
70
60
60
70
70 90
60 80
60
70
50 70 30 50 70
40
41
50 60 70 30 40 50 60 70 80
r = 0.5
X2 X3
50 60 70 50 60 70 80
r = 0.6
X2 X4
40 50 60 70 80 50 60 70 80
r = 0.4
X3 X4
42
43
44
45
46
> summary(lm(Y~ X2 + X3 + X4)) Residual standard error: 9.458 on 26 degrees of freedom Multiple R-squared: 0.4587, Adjusted R-squared: 0.3963 F-statistic: 7.345 on 3 and 26 DF, p-value: 0.00101
47
48
j=1(Xj − ¯
approx
(R2)/(p−1) (1−R2)/(n−p) ∼ Fp−1,n−p 49