Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 - - PDF document

workshop 11 1 generalized linear models
SMART_READER_LITE
LIVE PREVIEW

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 - - PDF document

Workshop 11.1: Generalized linear models Murray Logan 26-011-2013 Other data types Binary - only 0 and 1 (dead/alive) (present/absent) Proportional abundance - range from 0 to 100 Count data - min of zero General linear models


slide-1
SLIDE 1

Workshop 11.1: Generalized linear models

Murray Logan 26-011-2013 Other data types

  • Binary - only 0 and 1 (dead/alive) (present/absent)
  • Proportional abundance - range from 0 to 100
  • Count data - min of zero

General linear models General linear models

Residuali = yi − E(Yi)

General linear models General linear models

E(Y )

Link function

= β0 + β1x1 + ... + βpxp

  • Systematic

+ε, ε ∼ Dist(...)

General linear models

E(Y )

Link function

= β0 + β1x1 + ... + βpxp

  • Systematic

+ e

Random

  • Random component.

E(Yi) ∼ N(µi, σ2)

  • Systematic component

β0 + β1x1 + ... + βpxp 1

slide-2
SLIDE 2
  • Link function

µ = β0 + β1x1 + ... + βpxp

Generalized linear models

g(µ)

  • Link

function

= β0 + β1x1 + ... + βpxp

  • Systematic

+ e

Random

  • Random component. A nominated distribution (Gaussian, Pois-

son, Binomial, Gamma, Beta,. . . )

  • Systematic component
  • Link function

Generalized linear models

Response variable Probability distribution Link function Model name Continuous measurements Gaussian identity: µ Linear regression Binary Binomial logit: log

  • π

1−π

  • Logistic regression

probit:

1 √ 2π

α+β.X

−∞

exp

  • − 1

2Z2

dZ Probit regression 2

slide-3
SLIDE 3

Complimentary log-log: log(−log(1 − π)) Logistic regression Counts Poisson log: logµ Poisson regressionlog-linear model Negative binomial log

  • µ

µ+θ

  • Negative biomial regression

Quasi-poisson logµ Poisson regression

OLS Maximum Likelihood Logistic regression

Binary data Link: log

  • π

1−π

  • Transform scale of linear predictor (−∞, ∞) into that of the response (0,1)

Logistic regression

E(Y ) = n x

  • px(1 − p)n−x

Dispersion

Spread assumed to be equal to mean. (φ = 1) 3

slide-4
SLIDE 4

Dispersion

Over-dispersion Sample more varied than expected from its mean

  • variability due to other unmeasured influences

– quasi- model

  • due to more zeros than expected

– zero-inflated model

Logistic regression

Binary data

  • Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

Logistic regression

Binary data

  • Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

  • Explore residuals

plot(dat.glmL)

Logistic regression

Binary data

  • Fit model

dat.glmL <- glm(y ~ x, data = dat, family = "binomial")

  • Explore residuals

4

slide-5
SLIDE 5

plot(dat.glmL)

  • Explore goodness of fit

– Pearson’s χ2 residuals – Deviance (G2)

Logistic regression

Binary data

  • Explore model parameters

summary(data.glmL} Slope parameter is on log odds-ratio scale

Logistic regression

Binary data

  • Explore model parameters

summary(data.glmL} Slope parameter is on log odds-ratio scale

  • Quasi R2

quasiR2 = 1 −

  • deviance

null deviance

  • Logistic regression

Binary data

  • LD50

LD50 = −intercept slope 5

slide-6
SLIDE 6

Worked Example

polis <- read.table("../data/polis.csv", header = T, sep = ",", strip.white = T) head(polis) ISLAND RATIO PA 1 Bota 15.41 1 2 Cabeza 5.63 1 3 Cerraja 25.92 1 4 Coronadito 15.17 5 Flecha 13.04 1 6 Gemelose 18.85 plot(PA ~ RATIO, data = polis) abline(lm(PA ~ RATIO, data = polis)) with(polis, lines(lowess(PA ~ RATIO))) polis.glm <- glm(PA ~ RATIO, family = binomial, data = polis) par(mfrow = c(2, 3)) plot(polis.glm) ## Check the model for lack of fit via: Pearson ## chisq polis.resid <- sum(resid(polis.glm, type = "pearson")^2) 1 - pchisq(polis.resid, polis.glm$df.resid) [1] 0.5715 # No evidence of a lack of fit # Deviance 1 - pchisq(polis.glm$deviance, polis.glm$df.resid) [1] 0.6514 # No evidence of a lack of fit # Estimate dispersal polis.resid/polis.glm$df.resid 6

slide-7
SLIDE 7

Figure 1: plot of chunk unnamed-chunk-8 7

slide-8
SLIDE 8

[1] 0.9019 # OR polis.glm$deviance/polis.glm$df.resid [1] 0.8365 # Number of zeros polis.tab <- table(polis$PA == 0) polis.tab/sum(polis.tab) FALSE TRUE 0.5263 0.4737 summary(polis.glm) Call: glm(formula = PA ~ RATIO, family = binomial, data = polis) Deviance Residuals: Min 1Q Median 3Q Max

  • 1.607
  • 0.638

0.237 0.433 2.099 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.606 1.695 2.13 0.033 RATIO

  • 0.220

0.101

  • 2.18

0.029 (Intercept) * RATIO *

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 26.287

  • n 18

degrees of freedom Residual deviance: 14.221

  • n 17

degrees of freedom AIC: 18.22 Number of Fisher Scoring iterations: 6 8

slide-9
SLIDE 9

exp(-0.296) [1] 0.7438 # R2 1 - (polis.glm$deviance/polis.glm$null) [1] 0.459 anova(polis.glm, test = "Chisq") Analysis of Deviance Table Model: binomial, link: logit Response: PA Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 18 26.3 RATIO 1 12.1 17 14.2 0.00051 NULL RATIO ***

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 par(mar = c(4, 5, 0, 0)) plot(PA ~ RATIO, data = polis, type = "n", ann = F, axes = F) points(PA ~ RATIO, data = polis, pch = 16) xs <- seq(min(polis$RATIO), max(polis$RATIO), len = 1000) ys <- predict(polis.glm, newdata = data.frame(RATIO = xs), type = "link", se = T) ys$lwr <- polis.glm$family$linkinv(ys$fit - ys$se.fit) ys$upr <- polis.glm$family$linkinv(ys$fit + ys$se.fit) ys$fit <- polis.glm$family$linkinv(ys$fit) 9

slide-10
SLIDE 10

lines(ys$fit ~ xs, col = "black") lines(ys$lwr ~ xs, col = "black", lty = 2) lines(ys$upr ~ xs, col = "black", lty = 2) axis(1) mtext("Island perimeter to area ratio", 1, cex = 1.5, line = 3) axis(2, las = 2) mtext(expression(paste(italic(Uta), " lizard presence/absence")), 2, cex = 1.5, line = 3) box(bty = "l") # LD50 ld50 <- -polis.glm$coef[1]/polis.glm$coef[2]

Poisson regression

p(Yi) = e−λλx x! log(µ) = β0 + β1xi + ... + βpxp

Dispersion

Spread assumed to be equal to mean. (φ = 1)

Dispersion

Over-dispersion Sample more varied than expected from its mean

  • variability due to other unmeasured influences

– quasi-poisson model

  • clumpiness

– negative binomial model

  • due to more zeros than expected

– zero-inflated model 10

slide-11
SLIDE 11

Figure 2: plot of chunk unnamed-chunk-8 11

slide-12
SLIDE 12

Figure 3: Figure 4: 12

slide-13
SLIDE 13

Worked Example

data.pois <- read.table("../data/data.pois.csv", header = T, sep = ",", strip.white = T) head(data.pois) y x 1 1 1.025 2 2 2.697 3 3 3.626 4 2 4.949 5 4 6.025 6 8 6.254 library(car) scatterplot(y ~ x, data = data.pois) scatterplot(y ~ x, data = data.pois, log = "y") rug(jitter(data.pois$y), side = 2) hist(data.pois$y) boxplot(data.pois$y, horizontal = TRUE) rug(jitter(data.pois$y), side = 1) # plot(y~x, data.pois, log=’y’) with(data.pois, # lines(lowess(y~x))) library(car) # scatterplot(y~x,data=data.pois, log=’y’) # rug(jitter(data.pois$y), side=2) # Zero inflation proportion of 0’s in the data data.pois.tab <- table(data.pois$y == 0) data.pois.tab/sum(data.pois.tab) FALSE 1 13

slide-14
SLIDE 14

Figure 5: plot of chunk unnamed-chunk-9 14

slide-15
SLIDE 15

Figure 6: plot of chunk unnamed-chunk-9 15

slide-16
SLIDE 16

Figure 7: plot of chunk unnamed-chunk-9 16

slide-17
SLIDE 17

Figure 8: plot of chunk unnamed-chunk-9 17

slide-18
SLIDE 18

# proportion of 0’s expected from a Poisson # distribution mu <- mean(data.pois$y) cnts <- rpois(1000, mu) data.pois.tab <- table(cnts == 0) data.pois.tab/sum(data.pois.tab) FALSE TRUE 0.997 0.003 # fit model data.pois.glm <- glm(y ~ x, data = data.pois, family = "poisson") data.pois.glm <- glm(y ~ x, data = data.pois, family = poisson(link = "log")) # Model validation par(mfrow = c(2, 3)) plot(data.pois.glm, ask = F, which = 1:6) # goodness of fit data.pois.resid <- sum(resid(data.pois.glm, type = "pearson")^2) 1 - pchisq(data.pois.resid, data.pois.glm$df.resid) [1] 0.4897 # Deviance based 1 - pchisq(data.pois.glm$deviance, data.pois.glm$df.resid) [1] 0.5076 data.pois.glmG <- glm(y ~ x, data = data.pois, family = "gaussian") AIC(data.pois.glm, data.pois.glmG) df AIC data.pois.glm 2 90.32 data.pois.glmG 3 98.28 # Poisson deviance data.pois.glm$deviance 18

slide-19
SLIDE 19

Figure 9: plot of chunk unnamed-chunk-9 19

slide-20
SLIDE 20

[1] 17.23 # Gaussian deviance data.pois.glmG$deviance [1] 118.1 data.pois.resid/data.pois.glm$df.resid [1] 0.9717 data.pois.glm$deviance/data.pois.glm$df.resid [1] 0.957 # Or alternatively, via Pearson’s residuals Resid <- resid(data.pois.glm, type = "pearson") sum(Resid^2)/(nrow(data.pois) - length(coef(data.pois.glm))) [1] 0.9717 summary(data.pois.glm) Call: glm(formula = y ~ x, family = poisson(link = "log"), data = data.pois) Deviance Residuals: Min 1Q Median 3Q Max

  • 1.635
  • 0.705

0.044 0.562 2.046 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.5600 0.2539 2.21 0.027 x 0.1115 0.0186 6.00 2e-09 (Intercept) * x ***

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 20

slide-21
SLIDE 21

(Dispersion parameter for poisson family taken to be 1) Null deviance: 55.614

  • n 19

degrees of freedom Residual deviance: 17.226

  • n 18

degrees of freedom AIC: 90.32 Number of Fisher Scoring iterations: 4 library(gmodels) ci(data.pois.glm) Estimate CI lower CI upper Std. Error (Intercept) 0.5600 0.02649 1.0935 0.25395 x 0.1115 0.07247 0.1506 0.01858 p-value (Intercept) 2.744e-02 x 1.971e-09 1 - (data.pois.glm$deviance/data.pois.glm$null) [1] 0.6903 par(mfrow = c(1, 1), mar = c(4, 5, 0, 0)) plot(y ~ x, data = data.pois, type = "n", ann = F, axes = F) points(y ~ x, data = data.pois, pch = 16) xs <- seq(min(data.pois$x), max(data.pois$x), len = 1000) # ys <- predict(data.pois.glm, newdata = # data.frame(x = xs), type = ’response’, se = T) ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "link", se = T) ys$lwr <- exp(ys$fit - 2 * ys$se.fit) ys$upr <- exp(ys$fit + 2 * ys$se.fit) ys$fit <- exp(ys$fit) lines(ys$fit ~ xs, col = "black") lines(ys$lwr ~ xs, col = "black", lty = 2) lines(ys$upr ~ xs, col = "black", lty = 2) 21

slide-22
SLIDE 22

polygon(c(xs, rev(xs)), c(ys$lwr, rev(ys$upr)), border = NA, col = "#0000FF50") # lines(ys$fit - 2 * ys$se.fit ~ xs, col = ’black’, # type = ’l’, lty = 2) lines(ys$fit + 2 * ys$se.fit # ~ xs, col = ’black’, type = ’l’, lty = 2) axis(1) mtext("X", 1, cex = 1.5, line = 3) axis(2, las = 2) mtext("Abundance of Y", 2, cex = 1.5, line = 3) box(bty = "l") Figure 10: plot of chunk unnamed-chunk-9 22

slide-23
SLIDE 23

Quasi-Poisson

  • specify var(y) = φµ
  • φ (scale parameter) estimated
  • parameter estimates same
  • Standard errors = √φ × SE

Worked Example

data.qp <- read.table("../data/data.qp.csv", header = T, sep = ",", strip.white = T) head(data.qp) y x 1 0 1.025 2 3 2.697 3 0 3.626 4 5 4.949 5 6 6.025 6 1 6.254 ## library(car) scatterplot(y~x,data=data.qp) ## scatterplot(y~x,data=data.qp, log=’y’) plot(y ~ x, data.qp) with(data.qp, lines(lowess(y ~ x))) rug(jitter(data.qp$y), side = 2) plot(y ~ x, data.qp, log = "y") with(data.qp, lines(lowess(y ~ x))) rug(jitter(data.qp$y), side = 2) hist(data.qp$y) boxplot(data.qp$y, horizontal = TRUE) rug(jitter(data.qp$y), side = 1) 23

slide-24
SLIDE 24

Figure 11: plot of chunk unnamed-chunk-10 24

slide-25
SLIDE 25

Figure 12: plot of chunk unnamed-chunk-10 25

slide-26
SLIDE 26

Figure 13: plot of chunk unnamed-chunk-10 26

slide-27
SLIDE 27

Figure 14: plot of chunk unnamed-chunk-10 27

slide-28
SLIDE 28

# Zero inflation proportion of 0’s in the data data.qp.tab <- table(data.qp$y == 0) data.qp.tab/sum(data.qp.tab) FALSE TRUE 0.85 0.15 # proportion of 0’s expected from a Poisson # distribution mu <- mean(data.qp$y) cnts <- rpois(1000, mu) data.qp.tab <- table(cnts == 0) data.qp.tab/sum(data.qp.tab) FALSE TRUE 0.998 0.002 # fit model data.qp.glm <- glm(y ~ x, data = data.qp, family = "quasipoisson") # Model validation par(mfrow = c(2, 3)) plot(data.qp.glm, ask = F, which = 1:6) data.qp.glm$deviance/data.qp.glm$df.resid [1] 3.888 # Or alternatively, via Pearson’s residuals Resid <- resid(data.qp.glm, type = "pearson") sum(Resid^2)/(nrow(data.qp) - length(coef(data.qp.glm))) [1] 3.699 summary(data.qp.glm) 28

slide-29
SLIDE 29

Figure 15: plot of chunk unnamed-chunk-10 29

slide-30
SLIDE 30

Call: glm(formula = y ~ x, family = "quasipoisson", data = data.qp) Deviance Residuals: Min 1Q Median 3Q Max

  • 3.366
  • 1.736
  • 0.124

0.744 3.933 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6997 0.4746 1.47 0.158 x 0.1005 0.0353 2.85 0.011 (Intercept) x *

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for quasipoisson family taken to be 3.699) Null deviance: 101.521

  • n 19

degrees of freedom Residual deviance: 69.987

  • n 18

degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5 library(gmodels) ci(data.qp.glm) Estimate CI lower CI upper Std. Error (Intercept) 0.6997 -0.29744 1.6967 0.4746 x 0.1005 0.02632 0.1746 0.0353 p-value (Intercept) 0.15770 x 0.01071 1 - (data.qp.glm$deviance/data.qp.glm$null) [1] 0.3106 par(mar = c(4, 5, 0, 0)) 30

slide-31
SLIDE 31

plot(y ~ x, data = data.qp, type = "n", ann = F, axes = F) points(y ~ x, data = data.qp, pch = 16) xs <- seq(0, 20, l = 1000) ys <- predict(data.pois.glm, newdata = data.frame(x = xs), type = "link", se = T) ys$lwr <- exp(ys$fit - 2 * ys$se.fit) ys$upr <- exp(ys$fit + 2 * ys$se.fit) ys$fit <- exp(ys$fit) lines(ys$fit ~ xs, col = "black") lines(ys$lwr ~ xs, col = "black", lty = 2) lines(ys$upr ~ xs, col = "black", lty = 2) axis(1) mtext("X", 1, cex = 1.5, line = 3) axis(2, las = 2) mtext("Abundance of Y", 2, cex = 1.5, line = 3) box(bty = "l")

Negative Binomial

p(yi) = Γ(yi + ω) Γ(ω)y! × µyi

i ωω

(µi + ω)µi+ω

Negative Binomial

p(yi) = Γ(yi + ω) Γ(ω)y! × µyi

i ωω

(µi + ω)µi+ω

  • mixture model
  • ηi ∼ Gamma(µi, ω)
  • E(y) ∼ Poisson(ηi)

Worked Example

data.nb <- read.table("../data/data.nb.csv", header = T, sep = ",", strip.white = T) head(data.nb) y x 1 1 1.042 2 7 2.498 31

slide-32
SLIDE 32

Figure 16: plot of chunk unnamed-chunk-10 32

slide-33
SLIDE 33

Figure 17: 33

slide-34
SLIDE 34

3 2 3.677 4 4 3.798 5 1 4.871 6 9 7.690 scatterplot(y ~ x, data.nb, log = "y") Error: NA/NaN/Inf in ’y’ Figure 18: plot of chunk unnamed-chunk-11 34

slide-35
SLIDE 35

plot(y ~ x, data.nb) with(data.nb, lines(lowess(y ~ x))) Figure 19: plot of chunk unnamed-chunk-11 plot(y ~ x, data.nb, log = "y") with(data.nb, lines(lowess(y ~ x))) rug(jitter(data.nb$y), side = 2) hist(data.nb$y, br = 10) 35

slide-36
SLIDE 36

Figure 20: plot of chunk unnamed-chunk-11 36

slide-37
SLIDE 37

Figure 21: plot of chunk unnamed-chunk-11 37

slide-38
SLIDE 38

boxplot(data.nb$y, horizontal = TRUE) rug(jitter(data.nb$y), side = 1) Figure 22: plot of chunk unnamed-chunk-11 # Zero inflation proportion of 0’s in the data data.nb.tab <- table(data.nb$y == 0) data.nb.tab/sum(data.nb.tab) FALSE TRUE 0.95 0.05 38

slide-39
SLIDE 39

# proportion of 0’s expected from a Poisson # distribution mu <- mean(data.nb$y) cnts <- rpois(1000, mu) hist(cnts) Figure 23: plot of chunk unnamed-chunk-11 tab <- table(cnts == 0) tab/sum(tab) FALSE TRUE 0.996 0.004 39

slide-40
SLIDE 40

# fit model library(MASS) data.nb.glm <- glm.nb(y ~ x, data = data.nb) # Model validation par(mfrow = c(2, 3)) plot(data.nb.glm, ask = F, which = 1:6) Figure 24: plot of chunk unnamed-chunk-11 data.ss <- sum(resid(data.nb1, type = "pearson")^2) Error: object ’data.nb1’ not found 40

slide-41
SLIDE 41

data.ss/data.nb1$df.resid Error: object ’data.ss’ not found data.nb1 <- glm(y ~ x, data = data.nb, family = "poisson") plot(data.nb1, ask = F, which = 1:6) Figure 25: plot of chunk unnamed-chunk-11 1 - pchisq(data.nb1$deviance, data.nb1$df.resid) [1] 8.6e-08 41

slide-42
SLIDE 42

# goodness of fit data.nb.resid <- sum(resid(data.nb.glm, type = "pearson")^2) 1 - pchisq(data.nb.resid, data.nb.glm$df.resid) [1] 0.5825 # Deviance based 1 - pchisq(data.nb.glm$deviance, data.nb.glm$df.resid) [1] 0.2407 data.nb.glmG <- glm(y ~ x, data = data.nb, family = "gaussian") AIC(data.nb.glm, data.nb.glmG) df AIC data.nb.glm 3 115.8 data.nb.glmG 3 128.2 # Poisson deviance data.nb.glm$deviance [1] 21.81 # Gaussian deviance data.nb.glmG$deviance [1] 527.6 summary(data.nb.glm) Call: glm.nb(formula = y ~ x, data = data.nb, init.theta = 2.359878187, link = log) Deviance Residuals: Min 1Q Median 3Q Max

  • 2.787
  • 0.894
  • 0.317

0.442 1.366 42

slide-43
SLIDE 43

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.7454 0.4253 1.75 0.0797 x 0.0949 0.0344 2.76 0.0058 (Intercept) . x **

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for Negative Binomial(2.36) family taken to be 1) Null deviance: 30.443

  • n 19

degrees of freedom Residual deviance: 21.806

  • n 18

degrees of freedom AIC: 115.8 Number of Fisher Scoring iterations: 1 Theta: 2.36

  • Std. Err.:

1.10 2 x log-likelihood:

  • 109.85

library(gmodels) ci(data.nb.glm) Estimate CI lower CI upper Std. Error (Intercept) 0.74543 -0.14818 1.6390 0.42534 x 0.09494 0.02265 0.1672 0.03441 p-value (Intercept) 0.079682 x 0.005792 1 - (data.nb.glm$deviance/data.nb.glm$null) [1] 0.2837 par(mfrow = c(1, 1), mar = c(4, 5, 0, 0)) 43

slide-44
SLIDE 44

plot(y ~ x, data = data.nb, type = "n", ann = F, axes = F) points(y ~ x, data = data.nb, pch = 16) xs <- seq(0, 20, l = 1000) ys <- predict(data.nb.glm, newdata = data.frame(x = xs), type = "link", se = T) ys$lwr <- exp(ys$fit - 2 * ys$se.fit) ys$upr <- exp(ys$fit + 2 * ys$se.fit) ys$fit <- exp(ys$fit) lines(ys$fit ~ xs, col = "black") lines(ys$lwr ~ xs, col = "black", lty = 2) lines(ys$upr ~ xs, col = "black", lty = 2) axis(1) mtext("X", 1, cex = 1.5, line = 3) axis(2, las = 2) mtext("Abundance of Y", 2, cex = 1.5, line = 3) box(bty = "l")

Zero-inflated model

  • Two sources of zeros

– those expected by Poisson (or binomial) – false zeros

  • Mixture model

– binary logistic (rate of false zeros) – Poisson (trend in counts)

Worked Example

data.zip <- read.table("../data/data.zip.csv", header = T, sep = ",", strip.white = T) head(data.zip) y x 1 0 1.042 2 3 2.498 44

slide-45
SLIDE 45

Figure 26: plot of chunk unnamed-chunk-11 45

slide-46
SLIDE 46

3 1 3.677 4 4 3.798 5 5 4.871 6 1 7.690 plot(y ~ x, data.zip) with(data.zip, lines(lowess(y ~ x))) Figure 27: plot of chunk unnamed-chunk-12 plot(y ~ x, data.zip, log = "y") with(data.zip, lines(lowess(y ~ x))) rug(jitter(data.pois$y), side = 2) 46

slide-47
SLIDE 47

Figure 28: plot of chunk unnamed-chunk-12 47

slide-48
SLIDE 48

hist(data.zip$y, br = 20) Figure 29: plot of chunk unnamed-chunk-12 boxplot(data.zip$y, horizontal = TRUE) rug(jitter(data.zip$y), side = 1) # Zero inflation proportion of 0’s in the data data.zip.tab <- table(data.zip$y == 0) data.zip.tab/sum(data.zip.tab) 48

slide-49
SLIDE 49

Figure 30: plot of chunk unnamed-chunk-12 49

slide-50
SLIDE 50

FALSE TRUE 0.55 0.45 # proportion of 0’s expected from a Poisson # distribution mu <- mean(data.zip$y) cnts <- rpois(1000, mu) data.zip.tab <- table(cnts == 0) data.zip.tab/sum(data.zip.tab) FALSE TRUE 0.935 0.065 summary(glm(y ~ x, data.zip, family = "poisson")) Call: glm(formula = y ~ x, family = "poisson", data = data.zip) Deviance Residuals: Min 1Q Median 3Q Max

  • 2.541
  • 2.377
  • 0.975

1.174 3.638 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.6705 0.3302 2.03 0.042 x 0.0296 0.0266 1.11 0.266 (Intercept) * x

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 85.469

  • n 19

degrees of freedom Residual deviance: 84.209

  • n 18

degrees of freedom AIC: 124 Number of Fisher Scoring iterations: 6 50

slide-51
SLIDE 51

plot(glm(y ~ x, data.zip, family = "poisson")) 51

slide-52
SLIDE 52

52

slide-53
SLIDE 53

library(pscl) data.zip.zip <- zeroinfl(y ~ x | 1, dist = "poisson", data = data.zip) plot(resid(data.zip.zip) ~ fitted(data.zip.zip)) summary(data.zip.zip) Call: zeroinfl(formula = y ~ x | 1, data = data.zip, dist = "poisson") 53

slide-54
SLIDE 54

Figure 31: plot of chunk unnamed-chunk-12 54

slide-55
SLIDE 55

Figure 32: plot of chunk unnamed-chunk-12 55

slide-56
SLIDE 56

Pearson residuals: Min 1Q Median 3Q Max

  • 1.001 -0.956 -0.393

0.966 1.619 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.7047 0.3196 2.21 0.02745 x 0.0873 0.0253 3.45 0.00056 (Intercept) * x *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 0.229

0.456

  • 0.5

0.62

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Number of iterations in BFGS optimization: 13 Log-likelihood: -36.2 on 3 Df data.zip.zip1 <- zeroinfl(y ~ x | x, dist = "poisson", data = data.zip) plot(resid(data.zip.zip1) ~ fitted(data.zip.zip1)) summary(data.zip.zip1) Call: zeroinfl(formula = y ~ x | x, data = data.zip, dist = "poisson") Pearson residuals: Min 1Q Median 3Q Max

  • 1.276 -0.781 -0.593

0.908 2.132 Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 0.6111 0.3300 1.85 0.06404 x 0.0939 0.0257 3.65 0.00027 56

slide-57
SLIDE 57

Figure 33: plot of chunk unnamed-chunk-12 57

slide-58
SLIDE 58

(Intercept) . x *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 2.731

1.890

  • 1.44

0.15 x 0.217 0.146 1.49 0.14

  • Signif. codes:

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Number of iterations in BFGS optimization: 17 Log-likelihood: -34.4 on 4 Df AIC(data.zip.zip, data.zip.zip1) df AIC data.zip.zip 3 78.34 data.zip.zip1 4 76.81 par(mar = c(4, 5, 0, 0)) plot(y ~ x, data = data.zip, type = "n", ann = F, axes = F) points(y ~ x, data = data.zip, pch = 16) xs <- seq(0, 20, l = 1000) # ys <- predict(data.pois.glm, newdata = # data.frame(x = xs), type = ’response’, se = T) coefs <- coef(data.zip.zip)[1:2] mm <- model.matrix(~x, data = data.frame(x = xs)) fit <- coefs %*% t(mm) se <- sqrt(diag(mm %*% vcov(data.zip.zip)[-3, -3] %*% t(mm))) lwr <- as.vector(exp(fit - 2 * se)) upr <- as.vector(exp(fit + 2 * se)) fit <- as.vector(exp(fit)) lines(fit ~ xs, col = "black") lines(lwr ~ xs, col = "black", lty = 2) lines(upr ~ xs, col = "black", lty = 2) polygon(c(xs, rev(xs)), c(lwr, rev(upr)), border = NA, col = "#0000FF50") # lines(ys$fit - 2 * ys$se.fit ~ xs, col = ’black’, 58

slide-59
SLIDE 59

# type = ’l’, lty = 2) lines(ys$fit + 2 * ys$se.fit # ~ xs, col = ’black’, type = ’l’, lty = 2) axis(1) mtext("X", 1, cex = 1.5, line = 3) axis(2, las = 2) mtext("Abundance of Y", 2, cex = 1.5, line = 3) box(bty = "l") Figure 34: plot of chunk unnamed-chunk-12

Generalized Linear Mixed effects Models (GLMM)

59