workshop 3
play

Workshop 3 Building from Linear Models to Generalised Linear Models - PowerPoint PPT Presentation

1 Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are linear models? Stage 1: response continuous - General Linear Model Procedure Response Predictors Single linear regression y ~ x Continuous


  1. 1 Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs

  2. 2 2 What are linear models? Stage 1: response continuous - General Linear Model Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each) Stage 2: incl other types of response - Generalised Linear Model

  3. 3 3 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response

  4. 4 What are Generalised linear models? • Response can be continuous, discrete or categorical • Predictors can be continuous or categorical • Type of response determines type of GLM

  5. 5 Two types of GLM Counts (“Poisson”) number of cases, individuals Binary (“Binomial”) true-false, yes-no, proportions These are the two types of GLM we will consider

  6. 6 GLM • We have to specify which one lm(y ~ x) Same as: glm(y ~ x, family = gaussian) glm(y ~ x, family = poisson) glm(y ~ x, family = binomial)

  7. 7 LM – GLM summary • Linear models (lm) – Response assumed normal – Residuals assumed normal and homoscedastic – Residual SS measures fit • GLM – Response can be other than normal – Variance need not be constant Levels of competence Minimum required – Deviance measures fit Above the minimum Advanced

  8. 8 GLM - poisson • Response is a count • Explanatory is continuous, categorical or combination • If explanatory is category then you have a chi-squared

  9. 9 Example: Poisson GLM • Does proximity to a nuclear power plant alter the risk of cancer? • Number of cases of cancer reported by a clinic by its distance from nuclear plant (note not good epidemiology) > str(cases) Each row is a clinic 'data.frame': 43 obs. of 2 variables: $ cancers : int 0 0 4 0 0 0 0 1 0 0 ... $ distance: num 154.37 93.14 3.83 60.83 142.61 ...

  10. 10 Example: Poisson GLM ggplot(cases, aes(x = distance, y = cancers)) + geom_point() Each row is a clinic > str(cases) 'data.frame': 43 obs. of 2 variables: $ cancers : int 0 0 4 0 0 0 0 1 0 0 ... $ distance: num 154.37 93.14 3.83 60.83 142.61 ...

  11. 11 Example: Poisson GLM Selecting and applying mod <- glm(cancers ~ distance, data = cases, family = poisson) summary(mod) Call: glm(formula = cancers ~ distance, family = poisson, data = cases) Deviance Residuals: Min 1Q Median 3Q Max -1.8423 -0.7437 -0.4834 0.4206 1.8933 Estimate is –’ve Coefficients: Estimate Std. Error z value Pr(>|z|) Cancers decrease with (Intercept) 1.019169 0.308709 3.301 0.000962 *** distance -0.021495 0.005034 -4.270 1.96e-05 *** increasing distance... --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 and significantly (Dispersion parameter for poisson family taken to be 1) Estimate is logged! Null deviance: 54.522 on 42 degrees of freedom Residual deviance: 31.790 on 41 degrees of freedom “Estimate is on the scale of the link function” AIC: 78.163 Number of Fisher Scoring iterations: 5

  12. 12 Example: Poisson GLM mod <- glm(cancers ~ distance, data = cases, family = poisson) anova(mod,test=”Chisq”) anova() on the object more useful for models with more than Analysis of Deviance Table one explanatory variable Model: poisson, link: log Response: cancers lm object Terms added sequentially (first to last) anova(mod, test = ”F”) summary(mod) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 42 54.522 distance 1 22.732 41 31.790 1.862e-06 *** --- glm object Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 anova(mod, test = ”Chisq”) summary(mod)

  13. 13 Example: Poisson GLM ggplot(cases, aes(x = distance, y = cancers)) + geom_point()+ stat_smooth(method="glm", method.args = list(family = "poisson"), col="black",se=FALSE) + xlab("Distance (km)") + Need to antilog estimates to understand ylab("Number of Cancer cases per year")+ theme_bw() > exp(coef(mod)) (Intercept) distance 2.770892 0.978734 2.77 cancer cases are expected at 0km For every km from the nuclear plant, a clinic reports 0.97 of the cases reported at previous km

  14. 14 GLM - Binomial • Response is a binary variable 1 – (0 and 1, Y and N, T and F, red and blue) • Explanatory is continuous or combination 1. actually, response can be in several equivalent formats but here we use 0 and 1

  15. 15 Example: Binomial GLM • Do fibrinogen levels affect whether a person’s erythrocyte sedimentation rate (ESR) is classed is healthy or unhealthy • ESR classed as 0 (healthy) or 1 (ill) by fibrinogen levels • aka logistic regression > str(plasma) Each row is an individual 'data.frame': 32 obs. of 2 variables: $ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 3.15 2.6 ... $ ESR : num 0 0 0 0 0 0 0 0 0 0 ...

  16. 16 Example: Binomial GLM ggplot(plasma, aes(x = fibrinogen, y = ESR)) + geom_point()

  17. 17 Example: Binomial GLM mod2 <- glm(ESR ~ fibrinogen,data = plasma, family = binomial) summary(mod2) Call: glm(formula = ESR ~ fibrinogen, family = binomial, data = plasma) Deviance Residuals: Min 1Q Median 3Q Max -0.9298 -0.5399 -0.4382 -0.3356 2.4794 Estimate is +’ve Coefficients: Estimate Std. Error z value Pr(>|z|) Probability of ‘ill’ (1) is (Intercept) -6.8451 2.7703 -2.471 0.0135 * fibrinogen 1.8271 0.9009 2.028 0.0425 * higher as fibrinogen --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 increases (Dispersion parameter for binomial family taken to be 1) Estimate is on logit scale! Null deviance: 30.885 on 31 degrees of freedom Residual deviance: 24.840 on 30 degrees of freedom “Estimate is on the scale of the link function” AIC: 28.84 Number of Fisher Scoring iterations: 5

  18. 18 Example: Binomial GLM Estimates are odds ratios and hard to ggplot(plasma, aes(x = fibrinogen, y = ESR)) + geom_point()+ understand, instead: stat_smooth(method = "glm", Use predict() to understand method.args = list(family = "binomial"), col = "black",se = FALSE) + fibv <- seq(1, 6, 1) xlab("Fibrinogen") + newdat <- data.frame(fibrinogen = fibv) ylab("ESR class")+ newdat$pr <- predict(mod2, type = "response", newdata = newdat) theme_bw() newdat fibrinogen pr 1 1 0.006574281 2 2 0.039509110 3 3 0.203618144 4 4 0.613784514 5 5 0.908072941 6 6 0.983974363

  19. 19 LM – GLM summary • Linear models (lm) – response assumed continuous; Residuals assumed normal and homoscedastic • GLM – Response can be other than normal; variance need not be constant – Specify the family • poisson for counts • binomial for binary outcome mod <- glm( response ~ explanatory1 , data = mydata , family = family ) summary(mod) mod <- glm( response ~ explanatory1 * explanatory2 , data = mydata, family = family ) summary(mod) anova(mod,test = “Chisq”)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend