lecture 3
play

Lecture 3 Residual Analysis + Generalized Linear Models Colin - PowerPoint PPT Presentation

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual Analysis 2 3 Atmospheric CO 2 (ppm) from Mauna Loa 360 co2 350 1988 1992 1996 date Where to start? Well, it looks like stuff is going up on


  1. Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1

  2. Residual Analysis 2

  3. 3 Atmospheric CO 2 (ppm) from Mauna Loa 360 co2 350 1988 1992 1996 date

  4. Where to start? Well, it looks like stuff is going up on average … 4

  5. Where to start? Well, it looks like stuff is going up on average … 4 360 co2 350 1988 1992 1996 date 2.5 resid 0.0 −2.5 1988 1992 1996 date

  6. and then? Well there is some periodicity lets add the month … 5

  7. and then? Well there is some periodicity lets add the month … 5 2.5 resid 0.0 −2.5 1988 1992 1996 date 1.0 0.5 resid2 0.0 −0.5 −1.0 1988 1992 1996 date

  8. and then and then? Maybe there is some different effect by year … 6

  9. and then and then? Maybe there is some different effect by year … 6 1.0 0.5 resid2 0.0 −0.5 −1.0 1988 1992 1996 date 0.8 0.4 resid3 0.0 −0.4 −0.8 1988 1992 1996 date

  10. Too much 9.722e-03 9.978e-01 1.065e+00 ## as.factor(year)1990 as.factor(year)1989 ## as.factor(year)1988 -2.585e-01 ## as.factor(year)1991 -6.064e+00 ## as.factor(year)1987 as.factor(year)1986 monthSep ## 7.726e-01 as.factor(year)1992 -4.838e+00 ## NA ## ## as.factor(year)1997 3.768e-01 1.119e-01 -4.146e-01 as.factor(year)1996 as.factor(year)1993 as.factor(year)1995 ## as.factor(year)1994 -7.911e-01 1.236e-02 7.067e-01 ## -6.135e+00 4.821e-01 (lm = lm (co2~date + month + as.factor (year), data=co2_df)) date ## -4.177e+00 1.508e+00 -2.645e+03 ## monthAug (Intercept) monthFeb ## ## Coefficients: ## ## lm(formula = co2 ~ date + month + as.factor(year), data = co2_df) ## Call: ## monthDec monthJan ## -2.035e+00 monthOct monthNov monthMay ## -1.227e+00 -3.251e-01 ## ## monthMar monthJun monthJul ## -2.705e+00 -2.008e+00 -3.612e+00 7

  11. 8 360 co2 350 1988 1992 1996 date

  12. Generalized Linear Models 9

  13. Background A generalized linear model has three key components: 1. a probability distribution (from the exponential family) that describes your response variable 10 2. a linear predictor η = X β , 3. and a link function g such that g ( E ( Y | X )) = µ = η .

  14. Poisson Regression 11

  15. Model Specification A generalized linear model for count data where we assume the outcome variable follows a poisson distribution (mean = variance). 12 Y i ∼ Poisson ( λ i ) log E ( Y | X ) = log λ = X β

  16. 13 Example - AIDS in Belgium AIDS cases in Belgium 250 200 150 cases 100 50 0 1985 1990 year

  17. Frequentist glm fit g = glm (cases~year, data=aids, family=poisson) 14 pred = data_frame (year= seq (1981,1993,by=0.1)) pred$cases = predict (g, newdata=pred, type = ”response”) AIDS cases in Belgium 300 200 cases 100 0 1985 1990 year

  18. Residuals Pearson residuals: Deviance residuals: Standard residuals: 15 Y i = Y i − ˆ r i = Y i − ˆ λ i Y i − ˆ Y i − E ( Y i | X ) λ i r i = Var ( Y i | X ) = √ √ ˆ λ i √ 2 ( y i log ( y i / ˆ λ i ) − ( y i − ˆ d i = sign ( y i − λ i ) λ i ))

  19. Deviance and deviance residuals Deviance can be interpreted as the difference between your model’s fit and n d i 2 Deviance is a measure of goodness of fit in a similar way to the residual sum of squares (which is just the sum of squared standard residuals). 16 the fit of an ideal model (where E (ˆ Y i ) = Y i ). D = 2 ( L ( Y | θ best ) − L ( Y | ˆ ∑ θ )) = i = 1

  20. Deviance residuals derivation 17

  21. Residual plots 18 standard pearson deviance 2.5 2.5 0 0.0 0.0 residual −2.5 −2.5 −50 −5.0 1985 1990 1985 1990 1985 1990 year

  22. 19 print (aids_fit) AIDS cases in Belgium 300 200 cases 100 0 1985 1990 year

  23. Quadratic fit g2 = glm (cases~year+ I (year^2), data=aids, family=poisson) 20 pred2 = data_frame (year= seq (1981,1993,by=0.1)) pred2$cases = predict (g2, newdata=pred, type = ”response”) AIDS cases in Belgium 250 200 150 cases 100 50 0 1985 1990 year

  24. Quadratic fit - residuals 21 standard pearson deviance 20 1 1 10 residual 0 0 0 −1 −1 −10 1985 1990 1985 1990 1985 1990 year

  25. Bayesian Poisson Regression Model ## ## } } ## beta[j] ~ dnorm(0,1/100) ## for(j in 1:2){ ## # Prior for beta ## ## } Y_hat[i] ~ dpois(lambda[i]) ## model{ ## # In-sample prediction ## ## log(lambda[i]) <- beta[1] + beta[2]*X[i] ## Y[i] ~ dpois(lambda[i]) ## for(i in 1:length(Y)){ ## # Likelihood ## 22

  26. Bayesian Model fit m = jags.model ( textConnection (poisson_model1), quiet = TRUE, data = list (Y=aids$cases, X=aids$year) ) update (m, n.iter=1000, progress.bar=”none”) n.iter=5000, progress.bar=”none” ) 23 samp = coda.samples ( m, variable.names= c (”beta”,”lambda”,”Y_hat”),

  27. MCMC Diagnostics 24 Trace of beta[1] Density of beta[1] −1 0.2 −3 0.0 −5 2000 3000 4000 5000 6000 7000 −6 −4 −2 0 Iterations N = 5000 Bandwidth = 0.3209 Trace of beta[2] Density of beta[2] 0.0040 400 0.0020 0 2000 3000 4000 5000 6000 7000 0.002 0.003 0.004 0.005 Iterations N = 5000 Bandwidth = 0.0001615

  28. 25 Model fit? AIDS cases in Belgium 300 200 cases 100 0 1985 1990 year

  29. What went wrong? ## 2.021e-01 7.771e-03 26.01 <2e-16 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## <2e-16 *** Null deviance: 872.206 on 12 degrees of freedom ## Residual deviance: 80.686 on 11 degrees of freedom ## AIC: 166.37 ## ## Number of Fisher Scoring iterations: 4 ## year -25.68 summary (g) 3Q ## ## Call: ## glm(formula = cases ~ year, family = poisson, data = aids) ## ## Deviance Residuals: ## Min 1Q Median Max 1.546e+01 ## -4.6784 -1.5013 -0.2636 2.1760 2.7306 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.971e+02 26

  30. What went wrong? ## 2.021e-01 7.771e-03 26.01 <2e-16 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## <2e-16 *** Null deviance: 872.206 on 12 degrees of freedom ## Residual deviance: 80.686 on 11 degrees of freedom ## AIC: 166.37 ## ## Number of Fisher Scoring iterations: 4 ## year -25.68 summary (g) 3Q ## ## Call: ## glm(formula = cases ~ year, family = poisson, data = aids) ## ## Deviance Residuals: ## Min 1Q Median Max 1.546e+01 ## -4.6784 -1.5013 -0.2636 2.1760 2.7306 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.971e+02 26

  31. summary ( glm (cases~ I (year-1981), data=aids, family=poisson)) ## 0.007771 26.01 <2e-16 *** ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## Null deviance: 872.206 <2e-16 *** on 12 degrees of freedom ## Residual deviance: 80.686 on 11 degrees of freedom ## AIC: 166.37 ## ## Number of Fisher Scoring iterations: 4 ## I(year - 1981) 0.202121 47.13 ## Max ## Call: ## glm(formula = cases ~ I(year - 1981), family = poisson, data = aids) ## ## Deviance Residuals: ## Min 1Q Median 3Q ## -4.6784 0.070920 -1.5013 -0.2636 2.1760 2.7306 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 3.342711 27

  32. Revising the model ## ## } } ## beta[j] ~ dnorm(0,1/100) ## for(j in 1:2){ ## # Prior for beta ## ## } Y_hat[i] ~ dpois(lambda[i]) ## model{ ## # In-sample prediction ## ## log(lambda[i]) <- beta[1] + beta[2]*(X[i] - 1981) ## Y[i] ~ dpois(lambda[i]) ## for(i in 1:length(Y)){ ## # Likelihood ## 28

  33. MCMC Diagnostics 29 Trace of beta[1] Density of beta[1] 3.5 4 3.3 2 3.1 0 2000 3000 4000 5000 6000 7000 3.1 3.2 3.3 3.4 3.5 3.6 Iterations N = 5000 Bandwidth = 0.01336 Trace of beta[2] Density of beta[2] 40 0.21 20 0.18 0 2000 3000 4000 5000 6000 7000 0.18 0.19 0.20 0.21 0.22 0.23 Iterations N = 5000 Bandwidth = 0.001456

  34. Model fit 30 AIDS cases in Belgium (Linear Poisson Model) 300 cases 200 100 0 1985 1990 year

  35. Bayesian Residual Plots 31 standard pearson deviance 4 50 2 2 0 post_mean 0 0 −2 −2 −50 −4 −4 −100 −6 −6 1985 1990 1985 1990 1985 1990 year

  36. Model fit 32 AIDS cases in Belgium (Quadratic Poisson Model) 300 200 cases 100 0 1985 1990 year

  37. Bayesian Residual Plots 33 standard pearson deviance 40 2 2 20 1 1 post_mean 0 0 0 −1 −1 −20 −2 −2 1985 1990 1985 1990 1985 1990 year

  38. Negative Binomial Regression 34

  39. Overdispersion If we are constructing a model where we claim that our response variable Y follows a Poisson distribution then we are making a very strong assumption which has implactions for both inference and prediction. mean (aids$cases) ## [1] 124.7692 var (aids$cases) ## [1] 8124.526 35 One of the properties of the Poisson distribution is that if X ∼ Pois ( λ ) then E ( X ) = Var ( X ) = λ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend