Technical conditions for linear regression Jo Hardin Professor, - - PowerPoint PPT Presentation

technical conditions for linear regression
SMART_READER_LITE
LIVE PREVIEW

Technical conditions for linear regression Jo Hardin Professor, - - PowerPoint PPT Presentation

DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp Inference for Linear Regression in R What are the technical


slide-1
SLIDE 1

DataCamp Inference for Linear Regression in R

Technical conditions for linear regression

INFERENCE FOR LINEAR REGRESSION IN R

Jo Hardin

Professor, Pomona College

slide-2
SLIDE 2

DataCamp Inference for Linear Regression in R

What are the technical conditions?

Y = β + β ⋅ X + ϵ ϵ ∼ N(0,σ ) L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

1 ϵ

slide-3
SLIDE 3

DataCamp Inference for Linear Regression in R

Linear model: residuals

fitted value: = b + b X residual: e = Y −

linear_lm <- augment( lm(response ~ explanatory, data = lineardata) ) ggplot(linear_lm, aes(x =. fitted, y = .resid)) + geom_point() + geom_hline(yintercept=0)

Y ^ i

1 i i i

Y ^ i

slide-4
SLIDE 4

DataCamp Inference for Linear Regression in R

Not linear

Y = β + β ⋅ X + ϵ ϵ ∼ N(0,σ ) L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

1 ϵ

slide-5
SLIDE 5

DataCamp Inference for Linear Regression in R

Not linear: residuals

fitted value: = b + b X residual: e = Y −

nonlinear_lm <- augment( lm(response ~ explanatory, data = nonlineardata) ) ggplot(nonlinear_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept=0)

Y ^ i

1 i i i

Y ^ i

slide-6
SLIDE 6

DataCamp Inference for Linear Regression in R

Not normal

Y = β + β ⋅ X + ϵ ϵ ∼ N(0,σ ) L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

1 ϵ

slide-7
SLIDE 7

DataCamp Inference for Linear Regression in R

Not normal: residuals

fitted value: = b + b X residual: e = Y −

nonnormal_lm <- augment( lm(response ~ explanatory, data = nonnormaldata) ) ggplot(nonnormal_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept = 0)

Y ^ i

1 i i i

Y ^ i

slide-8
SLIDE 8

DataCamp Inference for Linear Regression in R

Not equal variance

Y = β + β ⋅ X + ϵ ϵ ∼ N(0,σ ) L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

1 ϵ

slide-9
SLIDE 9

DataCamp Inference for Linear Regression in R

Not equal variance: residuals

fitted value: = b + b X residual: e = Y −

nonequal_lm <- augment( lm(response ~ explanatory, data = nonequaldata) ) ggplot(nonequal_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept = 0)

Y ^ i

1 i i i

Y ^ i

slide-10
SLIDE 10

DataCamp Inference for Linear Regression in R

Let's practice!

INFERENCE FOR LINEAR REGRESSION IN R

slide-11
SLIDE 11

DataCamp Inference for Linear Regression in R

Effect of an outlier

INFERENCE FOR LINEAR REGRESSION IN R

Jo Hardin

Professor, Pomona College

slide-12
SLIDE 12

DataCamp Inference for Linear Regression in R

slide-13
SLIDE 13

DataCamp Inference for Linear Regression in R

Different regression lines

slide-14
SLIDE 14

DataCamp Inference for Linear Regression in R

slide-15
SLIDE 15

DataCamp Inference for Linear Regression in R

Different regression models

starbucks_lowFib <- starbucks %>% filter(Fiber < 15) lm(Protein ~ Fiber, data = starbucks) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 7.526138 0.9924180 7.583637 1.101756e-11 # 2 Fiber 1.383684 0.2451395 5.644476 1.286752e-07 lm(Protein ~ Fiber, data = starbucks_lowFib) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 6.537053 1.0633640 6.147521 1.292803e-08 # 2 Fiber 1.796844 0.2995901 5.997675 2.600224e-08

slide-16
SLIDE 16

DataCamp Inference for Linear Regression in R

Different regression randomization tests

FULL DATA SET LOW FIBER DATA SET

perm_slope %>% mutate( abs_perm_slope = abs(stat) ) %>% summarize( p_value = mean( abs_perm_slope > abs(obs_slope) ) ) # A tibble: 1 x 1 # p_value # <dbl> # 1 0 perm_slope_lowFib %>% mutate( abs_perm_slope = abs(stat) ) %>% summarize( p_value = mean( abs_perm_slope > abs(obs_slope_lowFib) ) ) # A tibble: 1 x 1 # p_value # <dbl> # 1 0

slide-17
SLIDE 17

DataCamp Inference for Linear Regression in R

Let's practice!

INFERENCE FOR LINEAR REGRESSION IN R

slide-18
SLIDE 18

DataCamp Inference for Linear Regression in R

Moving forward when model assumptions are violated

INFERENCE FOR LINEAR REGRESSION IN R

Jo Hardin

Professor, Pomona College

slide-19
SLIDE 19

DataCamp Inference for Linear Regression in R

Linear Model

Y = β + β ⋅ X + ϵ where ϵ ∼ N(0,σ )

1 ϵ

slide-20
SLIDE 20

DataCamp Inference for Linear Regression in R

Transforming the explanatory variable

Y = β + β ⋅ X + β ⋅ X + ϵ, where ϵ ∼ N(0,σ ) Y = β + β ⋅ ln(X) + ϵ, where ϵ ∼ N(0,σ ) Y = β + β ⋅ + ϵ, where ϵ ∼ N(0,σ )

1 2 2 ϵ 1 ϵ 1 √

X

ϵ

slide-21
SLIDE 21

DataCamp Inference for Linear Regression in R

Squaring the explanatory variable

ggplot(data=data_nonlinear, aes(x=explanatory, y=response)) + geom_point() ggplot(data=data_nonlinear, aes(x=explanatory^2, y=response))+ geom_point()

slide-22
SLIDE 22

DataCamp Inference for Linear Regression in R

Transforming the response variable

Y = β + β ⋅ X + ϵ, where ϵ ∼ N(0,σ ) ln(Y ) = β + β ⋅ X + ϵ, where ϵ ∼ N(0,σ ) Y ) = β + β ⋅ X + ϵ, where ϵ ∼ N(0,σ )

2 1 ϵ 1 ϵ

√ (

1 ϵ

slide-23
SLIDE 23

DataCamp Inference for Linear Regression in R

A natural log transformation

ggplot(data=data_nonnorm, aes(x=explanatory, y=response)) + geom_point() ggplot(data=data_nonnorm, aes(x = explanatory, y = log(response))) + geom_point()

slide-24
SLIDE 24

DataCamp Inference for Linear Regression in R

Let's practice!

INFERENCE FOR LINEAR REGRESSION IN R