technical conditions for linear regression
play

Technical conditions for linear regression Jo Hardin Professor, - PowerPoint PPT Presentation

DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp Inference for Linear Regression in R What are the technical


  1. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Technical conditions for linear regression Jo Hardin Professor, Pomona College

  2. DataCamp Inference for Linear Regression in R What are the technical conditions? Y = β + β ⋅ X + ϵ 0 1 ϵ ∼ N (0, σ ) ϵ L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

  3. DataCamp Inference for Linear Regression in R Linear model: residuals linear_lm <- augment( lm(response ~ explanatory, data = lineardata) ) ggplot(linear_lm, aes(x =. fitted, y = .resid)) + geom_point() + geom_hline(yintercept=0) ^ i fitted value: = b + b X Y 0 1 i ^ i residual: e = Y − Y i i

  4. DataCamp Inference for Linear Regression in R Not linear Y = β + β ⋅ X + ϵ 0 1 ϵ ∼ N (0, σ ) ϵ L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

  5. DataCamp Inference for Linear Regression in R Not linear: residuals nonlinear_lm <- augment( lm(response ~ explanatory, data = nonlineardata) ) ggplot(nonlinear_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept=0) ^ i fitted value: = b + b X Y 0 1 i ^ i residual: e = Y − Y i i

  6. DataCamp Inference for Linear Regression in R Not normal Y = β + β ⋅ X + ϵ 0 1 ϵ ∼ N (0, σ ) ϵ L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

  7. DataCamp Inference for Linear Regression in R Not normal: residuals nonnormal_lm <- augment( lm(response ~ explanatory, data = nonnormaldata) ) ggplot(nonnormal_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept = 0) ^ i fitted value: = b + b X Y 0 1 i ^ i residual: e = Y − Y i i

  8. DataCamp Inference for Linear Regression in R Not equal variance Y = β + β ⋅ X + ϵ 0 1 ϵ ∼ N (0, σ ) ϵ L: linear model I: independent observations N: points are normally distributed around the line E: equal variability around the line for all values of the explanatory variable

  9. DataCamp Inference for Linear Regression in R Not equal variance: residuals nonequal_lm <- augment( lm(response ~ explanatory, data = nonequaldata) ) ggplot(nonequal_lm, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept = 0) ^ i fitted value: = b + b X Y 0 1 i ^ i residual: e = Y − Y i i

  10. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Let's practice!

  11. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Effect of an outlier Jo Hardin Professor, Pomona College

  12. DataCamp Inference for Linear Regression in R

  13. DataCamp Inference for Linear Regression in R Different regression lines

  14. DataCamp Inference for Linear Regression in R

  15. DataCamp Inference for Linear Regression in R Different regression models starbucks_lowFib <- starbucks %>% filter(Fiber < 15) lm(Protein ~ Fiber, data = starbucks) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 7.526138 0.9924180 7.583637 1.101756e-11 # 2 Fiber 1.383684 0.2451395 5.644476 1.286752e-07 lm(Protein ~ Fiber, data = starbucks_lowFib) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 6.537053 1.0633640 6.147521 1.292803e-08 # 2 Fiber 1.796844 0.2995901 5.997675 2.600224e-08

  16. DataCamp Inference for Linear Regression in R Different regression randomization tests FULL DATA SET LOW FIBER DATA SET perm_slope %>% mutate( perm_slope_lowFib %>% mutate( abs_perm_slope = abs(stat) abs_perm_slope = abs(stat) ) %>% ) %>% summarize( summarize( p_value = mean( p_value = mean( abs_perm_slope > abs(obs_slope) abs_perm_slope > ) abs(obs_slope_lowFib) ) ) ) # A tibble: 1 x 1 # A tibble: 1 x 1 # p_value # p_value # <dbl> # <dbl> # 1 0 # 1 0

  17. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Let's practice!

  18. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Moving forward when model assumptions are violated Jo Hardin Professor, Pomona College

  19. DataCamp Inference for Linear Regression in R Linear Model Y = β + β ⋅ X + ϵ 0 1 where ϵ ∼ N (0, σ ) ϵ

  20. DataCamp Inference for Linear Regression in R Transforming the explanatory variable 2 Y = β + β ⋅ X + β ⋅ X + ϵ , where ϵ ∼ N (0, σ ) 0 1 2 ϵ Y = β + β ⋅ ln( X ) + ϵ , where ϵ ∼ N (0, σ ) 0 1 ϵ 1 √ Y = β + β ⋅ + ϵ , where ϵ ∼ N (0, σ ) X 0 ϵ

  21. DataCamp Inference for Linear Regression in R Squaring the explanatory variable ggplot(data=data_nonlinear, ggplot(data=data_nonlinear, aes(x=explanatory, y=response)) + aes(x=explanatory^2, y=response))+ geom_point() geom_point()

  22. DataCamp Inference for Linear Regression in R Transforming the response variable 2 = β + β ⋅ X + ϵ , where ϵ ∼ N (0, σ ) Y 0 1 ϵ ln( Y ) = β + β ⋅ X + ϵ , where ϵ ∼ N (0, σ ) 0 1 ϵ √ ( Y ) = β + β ⋅ X + ϵ , where ϵ ∼ N (0, σ ) 0 1 ϵ

  23. DataCamp Inference for Linear Regression in R A natural log transformation ggplot(data=data_nonnorm, ggplot(data=data_nonnorm, aes(x=explanatory, y=response)) + aes(x = explanatory, geom_point() y = log(response))) + geom_point()

  24. DataCamp Inference for Linear Regression in R INFERENCE FOR LINEAR REGRESSION IN R Let's practice!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend