Unit 7: Multiple Linear Regression Lecture 1: Introduction to MLR - - PowerPoint PPT Presentation

▶

Nov 04, 2022 119 likes •291 views

Unit 7: Multiple Linear Regression Lecture 1: Introduction to MLR Statistics 101 Thomas Leininger June 20, 2013 Many variables in a model Weights of books volume (cm 3 ) weight (g) cover 1 800 885 hc 2 950 1016 hc 3 1050 1125 hc

SLIDE 1

Unit 7: Multiple Linear Regression Lecture 1: Introduction to MLR

Statistics 101

Thomas Leininger

June 20, 2013

SLIDE 2

Many variables in a model

Weights of books

weight (g) volume (cm3) cover 1 800 885 hc 2 950 1016 hc 3 1050 1125 hc 4 350 239 hc 5 750 701 hc 6 600 641 hc 7 1075 1228 hc 8 250 412 pb 9 700 953 pb 10 650 929 pb 11 975 1492 pb 12 350 419 pb 13 950 1010 pb 14 425 595 pb 15 725 1034 pb

w l h

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 2 / 17

SLIDE 3

Many variables in a model

Weights of hard cover and paperback books

Can you identify a trend in the relationship between volume and weight

f hardcover and paperback books?

200 400 600 800 1000 1200 1400 400 600 800 1000 volume (cm3) weight (g)

hardcover paperback Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 3 / 17

SLIDE 4

Many variables in a model

Modeling weights of books using volume and cover type

book_mlr = lm(weight ˜ volume + cover, data = allbacks) summary(book_mlr) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 197.96284 59.19274 3.344 0.005841 volume 0.71795 0.06153 11.669 6.6e-08 * cover:pb

184.04727

40.49420

4.545 0.000672 ***

Residual standard error: 78.2 on 12 degrees of freedom Multiple R-squared: 0.9275, Adjusted R-squared: 0.9154 F-statistic: 76.73 on 2 and 12 DF, p-value: 1.455e-07 Conditions for MLR?

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 4 / 17

SLIDE 5

Many variables in a model

Linear model

Estimate

Std. Error

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

184.05

40.49

4.55

0.00

weight = 197.96 + 0.72 volume − 184.05 cover : pb

For hardcover books: plug in 0 for cover

weight

=

197.96 + 0.72 volume − 184.05 × 0

=

197.96 + 0.72 volume

For paperback books: plug in 1 for cover

weight

=

197.96 + 0.72 volume − 184.05 × 1

=

13.91 + 0.72 volume

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 5 / 17

SLIDE 6

Many variables in a model

Visualising the linear model

200 400 600 800 1000 1200 1400 400 600 800 1000 volume (cm3) weight (g)

hardcover paperback

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 6 / 17

SLIDE 7

Many variables in a model

Interpretation of the regression coefficients

Estimate

Std. Error

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

184.05

40.49

4.55

0.00

Slope of volume: All else held constant, for each 1 cm3 increase in volume we would expect weight to increase on average by grams. Slope of cover: All else held constant, the model predicts that paperback books weigh grams lower than hardcover books,

n average.

Intercept: Hardcover books with no volume are expected on average to weigh about grams.

Obviously, the intercept does not make sense in context. It only serves to adjust the height of the line.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 7 / 17

SLIDE 8

Many variables in a model

Prediction

Question Which of the following is the correct calculation for the predicted weight

f a paperback book that is 600 cm3?

Estimate

Std. Error

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

184.05

40.49

4.55

0.00

(a) 197.96 + 0.72 × 600 − 184.05 × 1 (b) 184.05 + 0.72 × 600 − 197.96 × 1 (c) 197.96 + 0.72 × 600 − 184.05 × 0 (d) 197.96 + 0.72 × 1 − 184.05 × 600

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 8 / 17

SLIDE 9

Many variables in a model

A note on “interaction” variables

weight = 197.96 + 0.72 volume − 184.05 cover : pb

200 400 600 800 1000 1200 1400 400 600 800 1000 volume (cm3) weight (g)

hardcover paperback

This model assumes that hardcover and paperback books have the same slope for the relationship between their volume and weight. If this isn’t reasonable, then we would include an “interaction” variable in the model (beyond the scope of this course).

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 9 / 17

SLIDE 10

Many variables in a model

Regression topics we may/may not cover

Adjusted R2 Inference in MLR Collinearity in MLR Interactions between variables Model diagnostics and transformations Logistic/Poisson/other regression and many more

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 10 / 17

SLIDE 11

Adjusted R2

R2 vs. adjusted R2

When any variable is added to the model, R2 will always increase. If the added variable doesn’t really provide any new information,

r is completely unrelated, adjusted R2 does not increase.

R2

adj properties:

R2

adj will always be smaller than R2.

R2

adj applies a penalty for the number of predictors included in the

model. Therefore, we choose models with higher R2

adj over others.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 11 / 17

SLIDE 12

Collinearity and parsimony

Collinearity between explanatory variables

Two predictor variables are said to be collinear when they are correlated, and this collinearity (also called multicollinearity) complicates model estimation.

Remember: Predictors are also called explanatory or independent variables, so they should be independent of each other.

We don’t like adding predictors that are associated with each

ther to the model, because often times the addition of such

variable brings nothing to the table. Instead, we prefer the simplest best model, i.e. parsimonious model. In addition, addition of collinear variables can result in biased estimates of the slope parameters.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 12 / 17

SLIDE 13

Collinearity and parsimony Inference for the model as a whole

Inference for the model as a whole

Is the model as a whole significant? H0 : β1 = β2 = · · · = βk = 0 HA : At least one of the βi 0

F-statistic: 29.74 on 4 and 429 DF, p-value: < 2.2e-16

Since p-value < 0.05, the model as a whole is significant. The F test yielding a significant result doesn’t mean the model fits the data well, it just means at least one of the βs is non-zero. The F test not yielding a significant result doesn’t mean individuals variables included in the model are not good predictors of y, it just means that the combination of these variables doesn’t yield a good model.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 13 / 17

SLIDE 14

Collinearity and parsimony Inference for the slope(s)

Inference for the slope(s)

Is whether or not the mother went to high school a significant predictor

f kid’s cognitive test score, given all other variables in the model?

H0 : β1 = 0, when all other variables are included in the model HA : β1 0, when all other variables are included in the model

Estimate Std. Error t value Pr(>|t|) (Intercept) 19.59241 9.21906 2.125 0.0341 mom_hsyes 5.09482 2.31450 2.201 0.0282 mom_iq 0.56147 0.06064 9.259 <2e-16 mom_workyes 2.53718 2.35067 1.079 0.2810 mom_age 0.21802 0.33074 0.659 0.5101 Residual standard error: 18.14 on 429 degrees of freedom

T = 2.201, df = n − k − 1 = 434 − 4 − 1 = 429, p-value = 0.0282 Since p-value < 0.05, whether or not mom went to high school is a significant predictor of kid’s test score, given all other variables in the model.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 14 / 17

SLIDE 15

Collinearity and parsimony Inference for the slope(s)

Interpreting the slope

Question What is the correct interpretation of the slope for mom work?

Estimate

Std. Error

t value Pr(>|t|) (Intercept) 19.59 9.22 2.13 0.03 mom hs:yes 5.09 2.31 2.20 0.03 mom iq 0.56 0.06 9.26 0.00 mom work:yes 2.54 2.35 1.08 0.28 mom age 0.22 0.33 0.66 0.51

All else being equal, kids whose moms worked during the first three years of the kid’s life (a) are estimated to score 2.54 points lower (b) are estimated to score 2.54 points higher than those whose moms did not work.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 15 / 17

SLIDE 16

Collinearity and parsimony Inference for the slope(s)

Application exercise: CI for slope in MLR Construct a 95% confidence interval for the slope of mom work. bi

±

t⋆SEbi df

=

n − k − 1 = 434 − 4 − 1 = 429 → 400 2.54

±

1.97 × 2.35 2.54

±

4.62

(−2.08 ,

7.16) We are 95% confident that, all else being equal, kids whose moms worked during the first three years of the kid’s life are estimated to score 2.08 points lower to 7.16 points higher than those whose moms did not work.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 16 / 17

SLIDE 17

Collinearity and parsimony Inference for the slope(s)

Inference for the slope(s) (cont.)

Given all variables in the model, which ones are significant predictors

f kid’s cognitive test score?

Estimate Std. Error t value Pr(>|t|) (Intercept) 19.59241 9.21906 2.125 0.0341 mom_hsyes 5.09482 2.31450 2.201 0.0282 mom_iq 0.56147 0.06064 9.259 <2e-16 mom_workyes 2.53718 2.35067 1.079 0.2810 mom_age 0.21802 0.33074 0.659 0.5101

mom hs and mom iq are significant, mom work and mom age are not.

Statistics 101 (Thomas Leininger) U7 - L1: Multiple Linear Regression June 20, 2013 17 / 17

Unit 7: Multiple Linear Regression Lecture 1: Introduction to MLR

Statistics 101

Thomas Leininger

June 20, 2013

Weights of books

weight (g) volume (cm3) cover 1 800 885 hc 2 950 1016 hc 3 1050 1125 hc 4 350 239 hc 5 750 701 hc 6 600 641 hc 7 1075 1228 hc 8 250 412 pb 9 700 953 pb 10 650 929 pb 11 975 1492 pb 12 350 419 pb 13 950 1010 pb 14 425 595 pb 15 725 1034 pb

w l h

Weights of hard cover and paperback books

Can you identify a trend in the relationship between volume and weight

200 400 600 800 1000 1200 1400 400 600 800 1000 volume (cm3) weight (g)

Modeling weights of books using volume and cover type

book_mlr = lm(weight ˜ volume + cover, data = allbacks) summary(book_mlr) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 197.96284 59.19274 3.344 0.005841 ** volume 0.71795 0.06153 11.669 6.6e-08 *** cover:pb

40.49420

Residual standard error: 78.2 on 12 degrees of freedom Multiple R-squared: 0.9275, Adjusted R-squared: 0.9154 F-statistic: 76.73 on 2 and 12 DF, p-value: 1.455e-07 Conditions for MLR?

Linear model

Estimate

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

40.49

0.00

For hardcover books: plug in 0 for cover

=

197.96 + 0.72 volume − 184.05 × 0

=

197.96 + 0.72 volume

For paperback books: plug in 1 for cover

=

197.96 + 0.72 volume − 184.05 × 1

=

13.91 + 0.72 volume

Visualising the linear model

200 400 600 800 1000 1200 1400 400 600 800 1000 volume (cm3) weight (g)

hardcover paperback

Interpretation of the regression coefficients

Estimate

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

40.49

0.00

Slope of volume: All else held constant, for each 1 cm3 increase in volume we would expect weight to increase on average by grams. Slope of cover: All else held constant, the model predicts that paperback books weigh grams lower than hardcover books,

Intercept: Hardcover books with no volume are expected on average to weigh about grams.

Obviously, the intercept does not make sense in context. It only serves to adjust the height of the line.

Prediction

Question Which of the following is the correct calculation for the predicted weight

Estimate

t value Pr(>|t|) (Intercept) 197.96 59.19 3.34 0.01 volume 0.72 0.06 11.67 0.00 cover:pb

40.49

0.00

(a) 197.96 + 0.72 × 600 − 184.05 × 1 (b) 184.05 + 0.72 × 600 − 197.96 × 1 (c) 197.96 + 0.72 × 600 − 184.05 × 0 (d) 197.96 + 0.72 × 1 − 184.05 × 600

A note on “interaction” variables

This model assumes that hardcover and paperback books have the same slope for the relationship between their volume and weight. If this isn’t reasonable, then we would include an “interaction” variable in the model (beyond the scope of this course).

Regression topics we may/may not cover

Adjusted R2 Inference in MLR Collinearity in MLR Interactions between variables Model diagnostics and transformations Logistic/Poisson/other regression and many more

R2 vs. adjusted R2

When any variable is added to the model, R2 will always increase. If the added variable doesn’t really provide any new information,

R2

adj properties:

R2

adj will always be smaller than R2.

R2

adj applies a penalty for the number of predictors included in the

model. Therefore, we choose models with higher R2

adj over others.

Collinearity between explanatory variables

Two predictor variables are said to be collinear when they are correlated, and this collinearity (also called multicollinearity) complicates model estimation.

Remember: Predictors are also called explanatory or independent variables, so they should be independent of each other.

We don’t like adding predictors that are associated with each

variable brings nothing to the table. Instead, we prefer the simplest best model, i.e. parsimonious model. In addition, addition of collinear variables can result in biased estimates of the slope parameters.

Inference for the model as a whole

Is the model as a whole significant? H0 : β1 = β2 = · · · = βk = 0 HA : At least one of the βi 0

F-statistic: 29.74 on 4 and 429 DF, p-value: < 2.2e-16

Inference for the slope(s)

Is whether or not the mother went to high school a significant predictor

H0 : β1 = 0, when all other variables are included in the model HA : β1 0, when all other variables are included in the model

Estimate Std. Error t value Pr(>|t|) (Intercept) 19.59241 9.21906 2.125 0.0341 mom_hsyes 5.09482 2.31450 2.201 0.0282 mom_iq 0.56147 0.06064 9.259 <2e-16 mom_workyes 2.53718 2.35067 1.079 0.2810 mom_age 0.21802 0.33074 0.659 0.5101 Residual standard error: 18.14 on 429 degrees of freedom

T = 2.201, df = n − k − 1 = 434 − 4 − 1 = 429, p-value = 0.0282 Since p-value < 0.05, whether or not mom went to high school is a significant predictor of kid’s test score, given all other variables in the model.

Interpreting the slope

Question What is the correct interpretation of the slope for mom work?

Estimate

t value Pr(>|t|) (Intercept) 19.59 9.22 2.13 0.03 mom hs:yes 5.09 2.31 2.20 0.03 mom iq 0.56 0.06 9.26 0.00 mom work:yes 2.54 2.35 1.08 0.28 mom age 0.22 0.33 0.66 0.51

All else being equal, kids whose moms worked during the first three years of the kid’s life (a) are estimated to score 2.54 points lower (b) are estimated to score 2.54 points higher than those whose moms did not work.

Application exercise: CI for slope in MLR Construct a 95% confidence interval for the slope of mom work. bi

book_mlr = lm(weight ˜ volume + cover, data = allbacks) summary(book_mlr) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 197.96284 59.19274 3.344 0.005841 volume 0.71795 0.06153 11.669 6.6e-08 * cover:pb