workshop 11 1 generalized linear models
play

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 - PDF document

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 Other data types Binary - only 0 and 1 (dead/alive) (present/absent) Proportional abundance - range from 0 to 100 Count data - min of zero General linear models


  1. Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 Other data types • Binary - only 0 and 1 (dead/alive) (present/absent) • Proportional abundance - range from 0 to 100 • Count data - min of zero General linear models General linear models Residual i = y i − E ( Y i ) General linear models General linear models E ( Y ) = β 0 + β 1 x 1 + ... + β p x p + ε, ε ∼ Dist ( ... ) � �� � � �� � Link function Systematic General linear models E ( Y ) = β 0 + β 1 x 1 + ... + β p x p + e � �� � � �� � � �� � Random Link function Systematic • Random component. E ( Y i ) ∼ N ( µ i , σ 2 ) • Systematic component β 0 + β 1 x 1 + ... + β p x p 1

  2. • Link function µ = β 0 + β 1 x 1 + ... + β p x p Generalized linear models g ( µ ) = β 0 + β 1 x 1 + ... + β p x p + e � �� � ���� � �� � Random Link function Systematic • Random component. A nominated distribution (Gaussian, Pois- son, Binomial, Gamma, Beta,. . . ) • Systematic component • Link function Generalized linear models Response variable Probability distribution Link function Model name Continuous measurements Gaussian identity: µ Linear regression Binary Binomial � � π logit: log 1 − π Logistic regression � α + β.X � 2 Z 2 � 1 − 1 probit: exp dZ √ 2 π −∞ Probit regression 2

  3. Complimentary log-log: log ( − log (1 − π )) Logistic regression Counts Poisson log: logµ Poisson regressionlog-linear model Negative binomial � � µ log µ + θ Negative biomial regression Quasi-poisson logµ Poisson regression OLS Maximum Likelihood Logistic regression Binary data � � π Link: log 1 − π Transform scale of linear predictor ( −∞ , ∞ ) into that of the response (0,1) Logistic regression � n � p x (1 − p ) n − x E ( Y ) = x Dispersion Spread assumed to be equal to mean. ( φ = 1) 3

  4. Dispersion Over-dispersion Sample more varied than expected from its mean • variability due to other unmeasured influences – quasi- model • due to more zeros than expected – zero-inflated model Logistic regression Binary data • Fit model dat.glmL <- glm (y ~ x, data = dat, family = "binomial") Logistic regression Binary data • Fit model dat.glmL <- glm (y ~ x, data = dat, family = "binomial") • Explore residuals plot (dat.glmL) Logistic regression Binary data • Fit model dat.glmL <- glm (y ~ x, data = dat, family = "binomial") • Explore residuals 4

  5. plot (dat.glmL) • Explore goodness of fit – Pearson’s χ 2 residuals – Deviance ( G 2 ) Logistic regression Binary data • Explore model parameters summary (data.glmL } Slope parameter is on log odds-ratio scale Logistic regression Binary data • Explore model parameters summary (data.glmL } Slope parameter is on log odds-ratio scale • Quasi R 2 � � deviance quasiR 2 = 1 − null deviance Logistic regression Binary data • LD50 LD 50 = − intercept slope 5

  6. Worked Example polis <- read.table ("../data/polis.csv", header = T, sep = ",", strip.white = T) head (polis) ISLAND RATIO PA 1 Bota 15.41 1 2 Cabeza 5.63 1 3 Cerraja 25.92 1 4 Coronadito 15.17 0 5 Flecha 13.04 1 6 Gemelose 18.85 0 plot (PA ~ RATIO, data = polis) abline ( lm (PA ~ RATIO, data = polis)) with (polis, lines ( lowess (PA ~ RATIO))) polis.glm <- glm (PA ~ RATIO, family = binomial, data = polis) par (mfrow = c (2, 3)) plot (polis.glm) ## Check the model for lack of fit via: Pearson ## chisq polis.resid <- sum ( resid (polis.glm, type = "pearson")^2) 1 - pchisq (polis.resid, polis.glm$df.resid) [1] 0.5715 # No evidence of a lack of fit # Deviance 1 - pchisq (polis.glm$deviance, polis.glm$df.resid) [1] 0.6514 # No evidence of a lack of fit # Estimate dispersal polis.resid/polis.glm$df.resid 6

  7. Figure 1: plot of chunk unnamed-chunk-8 7

  8. [1] 0.9019 # OR polis.glm$deviance/polis.glm$df.resid [1] 0.8365 # Number of zeros polis.tab <- table (polis$PA == 0) polis.tab/ sum (polis.tab) FALSE TRUE 0.5263 0.4737 summary (polis.glm) Call: glm(formula = PA ~ RATIO, family = binomial, data = polis) Deviance Residuals: Min 1Q Median 3Q Max -1.607 -0.638 0.237 0.433 2.099 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.606 1.695 2.13 0.033 RATIO -0.220 0.101 -2.18 0.029 (Intercept) * RATIO * --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 26.287 on 18 degrees of freedom Residual deviance: 14.221 on 17 degrees of freedom AIC: 18.22 Number of Fisher Scoring iterations: 6 8

  9. exp (-0.296) [1] 0.7438 # R2 1 - (polis.glm$deviance/polis.glm$null) [1] 0.459 anova (polis.glm, test = "Chisq") Analysis of Deviance Table Model: binomial, link: logit Response: PA Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 18 26.3 RATIO 1 12.1 17 14.2 0.00051 NULL RATIO *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 par (mar = c (4, 5, 0, 0)) plot (PA ~ RATIO, data = polis, type = "n", ann = F, axes = F) points (PA ~ RATIO, data = polis, pch = 16) xs <- seq ( min (polis$RATIO), max (polis$RATIO), len = 1000) ys <- predict (polis.glm, newdata = data.frame (RATIO = xs), type = "link", se = T) ys$lwr <- polis.glm$family$ linkinv (ys$fit - ys$se.fit) ys$upr <- polis.glm$family$ linkinv (ys$fit + ys$se.fit) ys$fit <- polis.glm$family$ linkinv (ys$fit) 9

  10. lines (ys$fit ~ xs, col = "black") lines (ys$lwr ~ xs, col = "black", lty = 2) lines (ys$upr ~ xs, col = "black", lty = 2) axis (1) mtext ("Island perimeter to area ratio", 1, cex = 1.5, line = 3) axis (2, las = 2) mtext ( expression ( paste ( italic (Uta), " lizard presence/absence")), 2, cex = 1.5, line = 3) box (bty = "l") # LD50 ld50 <- -polis.glm$coef[1]/polis.glm$coef[2] Poisson regression p ( Y i ) = e − λ λ x x ! log ( µ ) = β 0 + β 1 x i + ... + β p x p Dispersion Spread assumed to be equal to mean. ( φ = 1) Dispersion Over-dispersion Sample more varied than expected from its mean • variability due to other unmeasured influences – quasi-poisson model • clumpiness – negative binomial model • due to more zeros than expected – zero-inflated model 10

  11. Figure 2: plot of chunk unnamed-chunk-8 11

  12. Figure 3: Figure 4: 12

  13. Worked Example data.pois <- read.table ("../data/data.pois.csv", header = T, sep = ",", strip.white = T) head (data.pois) y x 1 1 1.025 2 2 2.697 3 3 3.626 4 2 4.949 5 4 6.025 6 8 6.254 library (car) scatterplot (y ~ x, data = data.pois) scatterplot (y ~ x, data = data.pois, log = "y") rug ( jitter (data.pois$y), side = 2) hist (data.pois$y) boxplot (data.pois$y, horizontal = TRUE) rug ( jitter (data.pois$y), side = 1) # plot(y~x, data.pois, log=’y’) with(data.pois, # lines(lowess(y~x))) library(car) # scatterplot(y~x,data=data.pois, log=’y’) # rug(jitter(data.pois$y), side=2) # Zero inflation proportion of 0’s in the data data.pois.tab <- table (data.pois$y == 0) data.pois.tab/ sum (data.pois.tab) FALSE 1 13

  14. Figure 5: plot of chunk unnamed-chunk-9 14

  15. Figure 6: plot of chunk unnamed-chunk-9 15

  16. Figure 7: plot of chunk unnamed-chunk-9 16

  17. Figure 8: plot of chunk unnamed-chunk-9 17

  18. # proportion of 0’s expected from a Poisson # distribution mu <- mean (data.pois$y) cnts <- rpois (1000, mu) data.pois.tab <- table (cnts == 0) data.pois.tab/ sum (data.pois.tab) FALSE TRUE 0.997 0.003 # fit model data.pois.glm <- glm (y ~ x, data = data.pois, family = "poisson") data.pois.glm <- glm (y ~ x, data = data.pois, family = poisson (link = "log")) # Model validation par (mfrow = c (2, 3)) plot (data.pois.glm, ask = F, which = 1:6) # goodness of fit data.pois.resid <- sum ( resid (data.pois.glm, type = "pearson")^2) 1 - pchisq (data.pois.resid, data.pois.glm$df.resid) [1] 0.4897 # Deviance based 1 - pchisq (data.pois.glm$deviance, data.pois.glm$df.resid) [1] 0.5076 data.pois.glmG <- glm (y ~ x, data = data.pois, family = "gaussian") AIC (data.pois.glm, data.pois.glmG) df AIC data.pois.glm 2 90.32 data.pois.glmG 3 98.28 # Poisson deviance data.pois.glm$deviance 18

  19. Figure 9: plot of chunk unnamed-chunk-9 19

  20. [1] 17.23 # Gaussian deviance data.pois.glmG$deviance [1] 118.1 data.pois.resid/data.pois.glm$df.resid [1] 0.9717 data.pois.glm$deviance/data.pois.glm$df.resid [1] 0.957 # Or alternatively, via Pearson’s residuals Resid <- resid (data.pois.glm, type = "pearson") sum (Resid^2)/( nrow (data.pois) - length ( coef (data.pois.glm))) [1] 0.9717 summary (data.pois.glm) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = data.pois) Deviance Residuals: Min 1Q Median 3Q Max -1.635 -0.705 0.044 0.562 2.046 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.5600 0.2539 2.21 0.027 x 0.1115 0.0186 6.00 2e-09 (Intercept) * x *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend