Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
STAT 213 Indicator Variables in MLR
Colin Reimer Dawson
Oberlin College
STAT 213 Indicator Variables in MLR Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests STAT 213 Indicator Variables in MLR Colin Reimer Dawson Oberlin College February 28, 2018 1 / 36 Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Oberlin College
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
ε
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
library(Stat2Data); data(Pulse) head(Pulse, n = 3) Active Rest Smoke Sex Exercise Hgt Wgt 1 97 78 1 1 63 119 2 82 68 1 3 70 225 3 88 62 3 72 175
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
pulseModel <- lm(Active ~ Rest + Hgt + Wgt, data = Pulse) coef(pulseModel) %>% round(digits = 2) (Intercept) Rest Hgt Wgt 57.26 1.13
0.11
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
ε. So... what are d
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
## Coefficients w/ standard errors and t-tests summary(pulseModel) %>% coef() %>% round(digits = 2) Estimate Std. Error t value Pr(>|t|) (Intercept) 57.26 25.01 2.29 0.02 Rest 1.13 0.10 11.09 0.00 Hgt
0.41
0.03 Wgt 0.11 0.05 2.31 0.02 ## The estimated standard deviation of the residuals sigma(pulseModel) %>% round(digits = 2) [1] 14.91
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
library(Stat2Data); data(Pulse) PulseWithBMI <- mutate( Pulse, BMI = Wgt / Hgt^2 * 703, InvActive = 1 / Active, InvRest = 1 / Rest, Male = 1 - Sex)
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
### Male = 1 for males, 0 for others ### factor() tells R this represents categories pulseBySex <- lm(Active ~ factor(Male), data = PulseWithBMI) coef(pulseBySex) %>% round(digits = 2) (Intercept) factor(Male)1 94.82
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
summary(pulseBySex) %>% coef() %>% round(digits = 2) Estimate Std. Error t value Pr(>|t|) (Intercept) 94.82 1.77 53.58 0.00 factor(Male)1
2.44
0.01
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
pulseBySexAndRest <- lm(Active ~ Rest + factor(Male), data = PulseWithBMI) pulseBySexAndRest %>% coef() %>% round(2) (Intercept) Rest factor(Male)1 16.47 1.12
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
## CAUTION: don't try to use this with multiple quantitative ## predictors; it won't make sense plotModel(pulseBySexAndRest) + scale_color_discrete( name = "Sex", labels = c("0" = "Others", "1" = "Male"))
75 100 125 150 40 60 80 100
Rest Active Sex
Male
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
summary(pulseBySexAndRest) %>% coef() %>% round(digits = 2) Estimate Std. Error t value Pr(>|t|) (Intercept) 16.47 7.19 2.29 0.02 Rest 1.12 0.10 11.12 0.00 factor(Male)1
2.00
0.14
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
twoLinesModel <- lm(Active ~ Rest + factor(Male) + Rest:factor(Male), data = PulseWithBMI) coef(twoLinesModel) %>% round(digits = 2) (Intercept) Rest factor(Male)1 11.98 1.18 6.82 Rest:factor(Male)1
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
## CAUTION: don't try to use this with multiple quantitative ## predictors; it won't make sense plotModel(twoLinesModel) + scale_color_discrete( name = "Sex", labels = c("0" = "Others", "1" = "Male"))
75 100 125 150 40 60 80 100
Rest Active Sex
Male
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
summary(twoLinesModel) %>% coef() %>% round(digits = 2) Estimate Std. Error t value Pr(>|t|) (Intercept) 11.98 9.58 1.25 0.21 Rest 1.18 0.14 8.74 0.00 factor(Male)1 6.82 13.96 0.49 0.63 Rest:factor(Male)1
0.20
0.48
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
PulseWithBMI <- mutate(PulseWithBMI, RestCentered = Rest - mean(Rest)) twoLinesModel <- lm(Active ~ RestCentered + factor(Male) + RestCentered:factor(Male), data = PulseWithBMI) coef(twoLinesModel) %>% round(digits = 2) (Intercept) RestCentered 92.76 1.18 factor(Male)1 RestCentered:factor(Male)1
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
plotModel(twoLinesModel) + scale_color_discrete( name = "Sex", labels = c("0" = "Others", "1" = "Male"))
75 100 125 150 −20 20 40
RestCentered Active Sex
Male
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
Outline CHOOSE step FIT step Indicator Variables ASSESS: Nested F -tests
modelA <- lm(Active ~ Rest, data = PulseWithBMI) modelB <- lm(Active ~ Rest + factor(Male) + factor(Male):Rest, data = PulseWithBMI) anova(modelA,modelB) Analysis of Variance Table Model 1: Active ~ Rest Model 2: Active ~ Rest + factor(Male) + factor(Male):Rest Res.Df RSS Df Sum of Sq F Pr(>F) 1 230 51953 2 228 51335 2 617.27 1.3708 0.256