STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

stat 213 anova as multiple regression
SMART_READER_LITE
LIVE PREVIEW

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin - - PowerPoint PPT Presentation

Outline Last Time One-Way ANOVA as Multiple Regression STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline Last Time One-Way ANOVA as Multiple Regression Outline Last Time One-Way ANOVA as


slide-1
SLIDE 1

Outline Last Time One-Way ANOVA as Multiple Regression

STAT 213 ANOVA as Multiple Regression

Colin Reimer Dawson

Oberlin College

5 April 2016

slide-2
SLIDE 2

Outline Last Time One-Way ANOVA as Multiple Regression

Outline

Last Time One-Way ANOVA as Multiple Regression

slide-3
SLIDE 3

Outline Last Time One-Way ANOVA as Multiple Regression

Reflection Questions

When do you do a neested F-test, and what is the meaning if it is statistically significant? In a nested F-test, is MSEFull calculated from the more complex models or both models? Is nested F-test only for comparison between two models? How do we select models among three or more?

slide-4
SLIDE 4

Outline Last Time One-Way ANOVA as Multiple Regression

Reflection Questions

How do you know when to use interaction terms in polynomial regression? What’s the logic behind mutate() in R?

slide-5
SLIDE 5

Outline Last Time One-Way ANOVA as Multiple Regression

Reading Quiz

A group of middle school students performed an experiment to see whether each of two treatments helps lengthen the shelf life of strawberries: (1) spraying with lemon juice, and (2) puttin the strawberries on paper towels to soak up the extra

  • moisture. They compared these two treatments to a control

treatment where they did nothing special to the strawberries. Write down the multiple regression model for this experiment (using indicator variables), and explain what each coefficient represents.

slide-6
SLIDE 6

Outline Last Time One-Way ANOVA as Multiple Regression

For Thursday

  • Read: Ch. 4.2
  • Write: Finish multicollinearity lab (not to turn in, but we

will discuss in class)

  • Answer:
  • 1. True or False: When selecting a set of predictors from a

pool, we should prefer a model that yields a larger Mallow’s Cp statistic, all else being equal.

  • 2. Suppose we have six candidate predictor variables that

we might use to build a multiple regression model. How many models will we need to consider in total to find the best two-predictor model according to forward selection?

slide-7
SLIDE 7

Outline Last Time One-Way ANOVA as Multiple Regression

library("mosaic"); library("Stat2Data"); data("Pulse") PulseWithBMI <- mutate( Pulse, BMI = Wgt / Hgt^2 * 703, InvActive = 1 / Active, InvRest = 1 / Rest, Male = 1 - Gender)

slide-8
SLIDE 8

Outline Last Time One-Way ANOVA as Multiple Regression

Testing multiple (but not all) predictors

We can test:

  • one term at a time (t-test)

H0 : βk = 0 H1 : βk = 0

  • all terms at once (F-test)

H0 :β1 = β2 = · · · = βK = 0 H1 : Some βk = 0

  • What if we want to test a subset of the βs together?
slide-9
SLIDE 9

Outline Last Time One-Way ANOVA as Multiple Regression

Nested Models

If Model B has all the terms in Model A and then some, we say that Model A is nested in Model B

Model A: Active = β0 + β1Rest Model B: Active = β0 + β1Rest + β2Male + β3Male · Rest Model A is nested in Model B

slide-10
SLIDE 10

Outline Last Time One-Way ANOVA as Multiple Regression

Comparing Nested Models

  • Is the improved fit for Model B “worth it”?
  • Some of SSError for the simpler model moves to SSModel

for the complex model.

  • Nested F-test: is this difference more than we would

expect by chance?

  • H0 : βKA+1 = · · · = βKB = 0

FComparison = MSComparison MSEFull = Increase in SSModel/Increase in d fModel MSEFull

slide-11
SLIDE 11

Outline Last Time One-Way ANOVA as Multiple Regression

Nested F-test

modelA <- lm(Active ~ Rest, data = PulseWithBMI) modelB <- lm(Active ~ Rest + factor(Male) + factor(Male):Rest, data = PulseWithBMI) anova(modelA,modelB) Analysis of Variance Table Model 1: Active ~ Rest Model 2: Active ~ Rest + factor(Male) + factor(Male):Rest Res.Df RSS Df Sum of Sq F Pr(>F) 1 230 51953 2 228 51335 2 617.27 1.3708 0.256

slide-12
SLIDE 12

Outline Last Time One-Way ANOVA as Multiple Regression

Conclusion of a Nested F-test

If the nested F-test comes out significant, we have evidence that the additional predictor variables are collectively useful for predicting the response.

slide-13
SLIDE 13

Outline Last Time One-Way ANOVA as Multiple Regression

Polynomial Regression

We can create “new” predictors from old, e.g.: Y = β0 + β1X + β2X2 + · · · + βpXp p =            1, linear 2, quadratic 3, cubic etc.

slide-14
SLIDE 14

Outline Last Time One-Way ANOVA as Multiple Regression

R: Three Equivalent Methods

library("mosaic"); library("mosaicData"); data("SAT") ## sat = mean SAT score

Method 1: Explicit Variable Creation

SAT.augmented <- mutate(SAT, frac.squared = frac^2) quadratic.model <- lm(sat ~ frac + frac.squared, data = SAT.augmented)

Method 2: Inline transformation (note use of I())

quadratic.model <- lm(sat ~ frac + I(frac^2), data = SAT.augmented)

Method 3: Using poly() to generate polynomials

quadratic.model <- lm(sat ~ poly(frac, degree = 2, raw = TRUE), data = SAT.augmented) Call: lm(formula = sat ~ frac + I(frac^2), data = SAT.augmented) Coefficients: (Intercept) frac I(frac^2) 1094.09787

  • 6.52850

0.05242

slide-15
SLIDE 15

Outline Last Time One-Way ANOVA as Multiple Regression

Example: State SAT Scores

f.hat <- makeFun(quadratic.model) xyplot(sat ~ frac, data = SAT) plotFun(f.hat(frac) ~ frac, add = TRUE)

frac sat

850 900 950 1000 1050 1100 20 40 60 80

  • plot(quadratic.model, which = 1)

900 950 1000 1050 −80 −40 20 40 60 Fitted values Residuals

  • lm(sat ~ frac + I(frac^2))

Residuals vs Fitted

48 4 37

slide-16
SLIDE 16

Outline Last Time One-Way ANOVA as Multiple Regression

Selecting Polynomial Order

  • Start with a higher-order model, then remove highest
  • rder term if not significant.
  • Repeat until highest order term is significant.
  • To be safe: nested F-test between final model and

highest-order model.

  • Don’t remove lower order terms even if nonsignificant!
slide-17
SLIDE 17

Outline Last Time One-Way ANOVA as Multiple Regression

Interaction Terms and Second-Order Models

Consider the model:

sat = β0 + β1 · frac + β2 · expend + β3 · frac · expend + ε

where expend is state education expenditure per pupil. β3 represents change in slope for expend for each unit increase in frac (or vice versa)

slide-18
SLIDE 18

Outline Last Time One-Way ANOVA as Multiple Regression

So many models...

  • How to decide among all these models?
  • 1. Understand the subject area! Build sensible models.
  • 2. Nested F-tests
slide-19
SLIDE 19

Outline Last Time One-Way ANOVA as Multiple Regression

One-Way ANOVA as Multiple Regression

Worksheet