Workshop 7: (Generalized) Linear models Murray Logan July 19, - PDF document

-1- Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1 Linear model Assumptions 1 2 Multiple (Genearalized) Linear Regression 15 3 Centering data 17 4 Assumptions 20 5 Multiple linear models in R 21 6 Model selection 26 7 Worked Examples 27 8 Anova Parameterization 29 9 Partitioning of variance (ANOVA) 35 10 Worked Examples 37 1. Linear model Assumptions 1.1. Assumptions • Independence - unbiased, scale of treatment • Normality - residuals • Homogeneity of variance - residuals • Linearity 1.2. Assumptions 1.2.1. Normality ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● 2.5 ● ● ● ● ● ● -2- 2.0 ● ● ● ● 1.5 ● ● ● 1.0 ● ● ● ● 0.5 ● 0.0 0 5 10 15 20 25 30 1.3. Assumptions 1.3.1. Homogeneity of variance ● ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● res Y ● ● ● ● y y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X x Predicted x ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● res Y y ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X x Predicted x ● 2.5 ● 1.4. Assumptions ● 1.4.1. Linearity ● ● ● Trendline ● ● ● 2.0 60 ● ● 2.5 ● ● ● ● ● ● ● ● ● 50 ● 2.0 ● 40 ● ● ● ● ● ● ● 1.5 ● ● ● ● 30 ● 1.5 ● ● ● 1.0 ● ● 20 ● ● ● ● ● ● ● ● 0.5 10 ● ● ● ● ● ● ● 0 0.0 ● 0 5 10 15 20 25 30 0 5 10 15 20 25 30 1.0 1.5. Assumptions ● ● 1.5.1. Linearity ● Loess (lowess) smoother 0.5 ● 0.0 0 5 10 15 20 25 30

-3- ● 60 ● ● ● ● 50 40 ● ● ● ● 30 ● ● ● 20 ● ● ● ● 10 ● ● ● ● 0 0 5 10 15 20 25 30 ● ● 2.5 ● ● ● ● ● ● 2.0 ● ● ● ● 1.5 ● ● ● ● 1.0 ● ● ● 0.5 ● 0.0 0 5 10 15 20 25 30 1.6. Assumptions 1.6.1. Linearity Spline smoother

-4- ● ● 60 ● ● 2.5 ● ● ● ● ● ● ● ● 50 ● 2.0 ● 40 ● ● ● ● ● ● 1.5 ● ● 30 ● ● ● ● 1.0 ● ● 20 ● ● ● ● ● ● ● 0.5 10 ● ● ● ● ● 0 0.0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 1.7. Assumptions y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0, σ 2 ) 1.8. Assumptions y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0, σ 2 ) 1.9. The linear predictor y = β 0 + β 1 x Y X 3 0 2.5 1 6 2 5.5 3 9 4 8.6 5 12 6 β 0 × 1 β 1 × 0 3.0 = + β 0 × 1 β 1 × 1 2.5 = + β 0 × 1 β 1 × 2 6.0 = + 5.5 = β 0 × 1 + β 1 × 3 9.0 = β 0 × 1 + β 1 × 4 8.6 = β 0 × 1 + β 1 × 5 12.0 = β 0 × 1 + β 1 × 6

-5-     3.0 1 0 2.5 1 1         6.0 1 2     ( β 0 )     5.5 = 1 3 ×     β 1     9.0 1 4     � ��     8.6 1 5 Parameter vector     12.0 1 6 � �� Response values Model matrix 1.10. Linear models in R > lm(formula, data= DATAFRAME) Model R formula Description y i = β 0 + β 1 x i y~1+x Full model y~x y i = β 0 Null model y~1 y i = β 1 Through origin y~-1+x > lm(Y~X, data= DATA) Call: lm(formula = Y ~ X, data = DATA) Coefficients: (Intercept) X 2.136 1.507 1.11. Linear models in R > Xmat = model.matrix(~X, data=DATA) > Xmat (Intercept) X 1 1 0 2 1 1 3 1 2 4 1 3 5 1 4 6 1 5 7 1 6 attr(,"assign") [1] 0 1 > lm.fit(Xmat,DATA$Y) $coefficients (Intercept) X 2.135714 1.507143

-6- $residuals [1] 0.8642857 -1.1428571 0.8500000 -1.1571429 0.8357143 -1.0714286 0.8214286 $effects (Intercept) X -17.6131444 7.9750504 0.6507186 -1.5697499 0.2097817 -1.9106868 -0.2311552 $rank [1] 2 $fitted.values [1] 2.135714 3.642857 5.150000 6.657143 8.164286 9.671429 11.178571 $assign [1] 0 1 $qr $qr (Intercept) X 1 -2.6457513 -7.93725393 2 0.3779645 5.29150262 3 0.3779645 0.03347335 4 0.3779645 -0.15550888 5 0.3779645 -0.34449112 6 0.3779645 -0.53347335 7 0.3779645 -0.72245559 attr(,"assign") [1] 0 1 $qraux [1] 1.377964 1.222456 $pivot [1] 1 2 $tol [1] 1e-07 $rank [1] 2 attr(,"class") [1] "qr" $df.residual [1] 5 1.12. Example 1.12.1. Linear Model N (0, σ 2 ) y i = β 0 + β 1 x i > DATA.lm<-lm(Y~X, data=DATA)

-7- 1.12.2. Generalized Linear Model N (0, σ 2 ) g ( y i ) = β 0 + β 1 x i > DATA.glm<-glm(Y~X, data=DATA, family='gaussian') 1.13. Model diagnostics 1.13.1. Residuals 30 25 20 ● ● y 15 ● 10 ● ● ● 5 ● ● 0 0 1 2 3 4 5 6 x 1.14. Model diagnostics

-8- 1.14.1. Residuals 30 25 20 ● ● y 15 ● 10 ● ● ● 5 ● ● 0 0 1 2 3 4 5 6 x 1.15. Model diagnostics 1.15.1. Leverage 30 ● ● 25 20 y 15 10 ● ● ● ● 5 ● ● 0 0 5 10 15 20 x

-9- 1.16. Model diagnostics 1.16.1. Cook’s D 50 40 ● ● 30 y 20 10 ● ● ● ● ● ● 0 0 5 10 15 20 x 1.17. Example 1.17.1. Model evaluation Extractor Description residuals() Extracts residuals from model 15 ● 10 ● ● ● y 5 ● ● ● 0 0 1 2 3 4 5 6 x > residuals(DATA.lm) 1 2 3 4 5 6 7 0.8642857 -1.1428571 0.8500000 -1.1571429 0.8357143 -1.0714286 0.8214286 1.18. Example

-10- 1.18.1. Model evaluation Extractor Description residuals() Extracts residuals from model fitted() Extracts the predicted values 15 ● ● 10 ● ● ● ● ● y ● 5 ● ● ● ● ● ● 0 0 1 2 3 4 5 6 x > fitted(DATA.lm) 1 2 3 4 5 6 7 2.135714 3.642857 5.150000 6.657143 8.164286 9.671429 11.178571 1.19. Example 1.19.1. Model evaluation Extractor Description Extracts residuals from model residuals() fitted() Extracts the predicted values plot() Series of diagnostic plots > plot(DATA.lm)

-11- Residuals vs Fitted Normal Q−Q 1.0 Standardized residuals 1.0 ● ● ● ● ● ● ● ● Residuals 0.0 0.0 −1.0 −1.0 ● ● ● 6 2 4 4 6 ● ● ● 2 2 4 6 8 10 −1.0 0.0 0.5 1.0 Fitted values Theoretical Quantiles Scale−Location Residuals vs Leverage Standardized residuals 2 ● Standardized residuals ● 4 6 ● 1.0 ● 1 7 ● 0.5 ● ● ● ● ● ● 0.8 0.0 0.4 −1.0 0.5 ● ● Cook's distance ● 2 0.0 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 Fitted values Leverage 1.20. Example 1.20.1. Model evaluation Extractor Description residuals() Residuals Predicted values fitted() Diagnostic plots plot() Leverage (hat) and Cook’s D influence.measures() 1.21. Example 1.21.1. Model evaluation > influence.measures(DATA.lm) Influence measures of lm(formula = Y ~ X, data = DATA) : dfb.1_ dfb.X dffit cov.r cook.d hat inf 1 0.9603 -7.99e-01 0.960 1.82 0.4553 0.464 2 -0.7650 5.52e-01 -0.780 1.15 0.2756 0.286 3 0.3165 -1.63e-01 0.365 1.43 0.0720 0.179 4 -0.2513 -7.39e-17 -0.453 1.07 0.0981 0.143 5 0.0443 1.60e-01 0.357 1.45 0.0696 0.179 6 0.1402 -5.06e-01 -0.715 1.26 0.2422 0.286 7 -0.3466 7.50e-01 0.901 1.91 0.4113 0.464 1.22. Example 1.22.1. Model evaluation

-12- Extractor Description Residuals residuals() Predicted values fitted() Diagnostic plots plot() Leverage, Cook’s D influence.measures() Summarizes important output from model summary() 1.23. Example 1.23.1. Model evaluation > summary(DATA.lm) Call: lm(formula = Y ~ X, data = DATA) Residuals: 1 2 3 4 5 6 7 0.8643 -1.1429 0.8500 -1.1571 0.8357 -1.0714 0.8214 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1357 0.7850 2.721 0.041737 * X 1.5071 0.2177 6.923 0.000965 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.152 on 5 degrees of freedom Multiple R-squared: 0.9055, Adjusted R-squared: 0.8866 F-statistic: 47.92 on 1 and 5 DF, p-value: 0.0009648 1.24. Example 1.24.1. Model evaluation Extractor Description Residuals residuals() Predicted values fitted() Diagnostic plots plot() Leverage, Cook’s D influence.measures() Model output summary() Confidence intervals of parameters confint() 1.25. Example

Workshop 7: (Generalized) Linear models Murray Logan July 19, - PDF document

-1- Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1 Linear model Assumptions 1 2 Multiple (Genearalized) Linear Regression 15 3 Centering data 17 4 Assumptions 20 5 Multiple linear models

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan 07 Feb 2017

Workshop 10.4: Generalized linear models Murray Logan 16 Aug 2016 Linear models Homogeneity

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico

Learning theory Lecture 10 David Sontag New York University

MANOVA and the Multivariate GLM Here we generalize the notation we learned before to the case of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning:

Week 3: Linear Regression Instructor: Sergey Levine 1 The regression problem We saw how we can

Workshop 7: (Generalized) Linear models Murray Logan July 19, - PDF document

-1- Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1 Linear model Assumptions 1 2 Multiple (Genearalized) Linear Regression 15 3 Centering data 17 4 Assumptions 20 5 Multiple linear models

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan 07 Feb 2017

Workshop 10.4: Generalized linear models Murray Logan 16 Aug 2016 Linear models Homogeneity

15-388/688 - Practical Data Science: Intro to Machine Learning &amp; Linear Regression J. Zico

Learning theory Lecture 10 David Sontag New York University

MANOVA and the Multivariate GLM Here we generalize the notation we learned before to the case of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning:

Week 3: Linear Regression Instructor: Sergey Levine 1 The regression problem We saw how we can

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico