size estimation statistical models for underreporting
play

Size Estimation - Statistical Models for Underreporting Gerhard - PowerPoint PPT Presentation

Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djura & Herwig Friedl JOANNEUM RESEARCH and Technical University, Graz useR! 2009, Rennes 8.-10. July 2009 p. 1 1 Introduction useR! 2009, Rennes


  1. Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djuraš & Herwig Friedl JOANNEUM RESEARCH and Technical University, Graz useR! 2009, Rennes 8.-10. July 2009 – p. 1

  2. 1 Introduction useR! 2009, Rennes 8.-10. July 2009 – p. 2

  3. Underreporting Any sample of count data may be incomplete ■ criminology: crimes with an aspect of shame (sexuality, domestic violence) or theft of low values goods ■ public health: infectious (HIV) or chronic (diabetes) disease data ■ production: error counts in a production process ■ traffic accidents with minor damage Estimation of total number of cases useR! 2009, Rennes 8.-10. July 2009 – p. 3

  4. Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) useR! 2009, Rennes 8.-10. July 2009 – p. 4

  5. Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, useR! 2009, Rennes 8.-10. July 2009 – p. 4

  6. Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, Y the number of reported events π the reporting probability the total number of events - size parameter λ U = λ − Y the number of unreported events useR! 2009, Rennes 8.-10. July 2009 – p. 4

  7. Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, Both λ and π have to be estimated No longer member of Exponential Family useR! 2009, Rennes 8.-10. July 2009 – p. 5

  8. Estimation For T iid samples Y t Method of Moments For the binomial we have var( Y ) = µ − µ 2 /λ ≤ µ Limitation to data with s 2 < ¯ y For s 2 > ¯ y 1. Regression approach using ML ind Y t ∼ Binomial ( λ t , π ) with λ t = f ( x t , β ) Neubauer & Friedl (2006) 2. Mixed model approaches useR! 2009, Rennes 8.-10. July 2009 – p. 6

  9. 2 Alternative Models useR! 2009, Rennes 8.-10. July 2009 – p. 7

  10. Beta-Binomial Random reporting probability P Y t | P ∼ Binomial ( λ, p ) P ∼ Beta ( γ, δ ) Y t ∼ Beta-Binomial ( λ, γ, δ ) Mean-variance relation µ − µ 2 � � var( Y t ) = φ λ φ = λ + γ + δ 1 + γ + δ ≥ 1 useR! 2009, Rennes 8.-10. July 2009 – p. 8

  11. Poisson Random total number of cases L Y t | L ∼ Binomial ( l, π ) L ∼ Poisson ( λ ) Y t ∼ Poisson ( λπ ) Parameters not identified useR! 2009, Rennes 8.-10. July 2009 – p. 9

  12. Negative Binomial Additional randomness in E( L ) L | K ∼ Poisson ( kλ ) Y t | K ∼ Poisson ( kλπ ) K ∼ Gamma ( ω, ω ) Y t ∼ Negative Binomial ( ω, 1 − π ) ω the number of unreported cases π the reporting probability Mean-variance relation var( Y t ) = µ + µ 2 ω ≥ µ useR! 2009, Rennes 8.-10. July 2009 – p. 10

  13. Beta-Poisson Consider both binomial parameters as random Y | L, P ∼ Binomial ( L, P ) L ∼ Poisson ( λ ) P ∼ Beta ( γ, δ ) Y ∼ Beta-Poisson ( λ, γ, δ ) γ E ( Y ) = λπ = µ where π = γ + δ φ = 1 + λ (1 − π ) var( Y ) = µφ with 1 + γ + δ ≥ 1 useR! 2009, Rennes 8.-10. July 2009 – p. 11

  14. Generalized Poisson Distribution Consul(1989) Moments θ θ E( Y ) = var( Y ) = (1 − τ ) (1 − τ ) 3 ■ τ = 0 : E( Y ) = var( Y ) ⇒ Poisson ( θ ) ■ 0 < τ < 1 : E( Y ) < var( Y ) ⇒ Neg. Binomial ■ τ < 0 : E( Y ) > var( Y ) ⇒ Binomial useR! 2009, Rennes 8.-10. July 2009 – p. 12

  15. Conditional Poisson Models Motivation: π → 1 leads to Y → λ in the binomial approach Assume Y | L ∼ Poisson ( L ) Choose p ( L ) such that E( Y ) = λπ and var( Y ) = λπφ For example: L ∼ Binomial ( λ, π ) 1 < φ = 2 − π < 2 2 < φ = 2 − π L ∼ Negative Binomial ( λ, π ) 1 − π < ∞ useR! 2009, Rennes 8.-10. July 2009 – p. 13

  16. Regression Modelling For all models - except the GP - we use exp( α ) λ t,β = exp( x ′ t β ) and π α = 1 + exp( α ) x t , d -vector of known regressors β , d -vector of unknown parameters For the GP model we use θ t,β = exp( x ′ t β ) τ α = 1 − exp( − α ) and Test α = 0 ⇒ Identify Poisson misdispersion useR! 2009, Rennes 8.-10. July 2009 – p. 14

  17. 3 Implementation useR! 2009, Rennes 8.-10. July 2009 – p. 15

  18. R package sizEst Full ML estimation of all models: done Conditional Poisson in work Testing competing models: in development Testing parameters within models: done Model diagnostics: done Main functions: arrayEst, sizEst Implemented methods: sizEst: plot, predict, residuals, summary summary.arrayEst useR! 2009, Rennes 8.-10. July 2009 – p. 16

  19. 4 Real Data Application useR! 2009, Rennes 8.-10. July 2009 – p. 17

  20. Stroke Data Hypothesis: Slight strokes are not seen in hospitals Data: Hospital discharges Output from function arrayEst() iterations loglik chi.sq gradient p1 GP 0.5 3 -405.256 94.602 0.000 0.4594 NegBin 0.5 3 -405.195 94.411 0.000 0.4604 BetaBin 0.5 19 -402.135 80.779 0.000 0.8230 BetaPois 0.5 30 -408.170 80.811 0.000 0.9097 useR! 2009, Rennes 8.-10. July 2009 – p. 18

  21. Stroke Data Output from function sizEst() Distribution: BetaPois Formula: y ˜ beta01 + T.cos1 + T.sin1 - 1 Estimate Std. Error t value Pr(>|t|) alpha1 2.309 0.289 8.002 0 beta01 5.071 0.025 204.435 0 T.cos1 0.060 0.016 3.673 0 T.sin1 -0.083 0.016 -5.249 0 Theta 11.231 --- --- - Performance measures: measures loglik -408.170 chi.sq 80.811 df.residual 92.000 aic 824.341 bic 834.599 Reporting Probabilities: lower estimate upper alpha1 0.8622 0.9097 0.9571 useR! 2009, Rennes 8.-10. July 2009 – p. 19

  22. Stroke Data Distribution=BetaPois, p0=0.3 Distribution=BetaPois, p0=0.3 200 150 150 Frequency Frequency 100 100 50 50 0 0 0 20 40 60 80 0 20 40 60 80 Mean function from Quasi−Poisson−Model (blue) and BetaPois model (red) Lambda function from BetaPois model (red) Pearson residuals Pearson residuals 1.0 0.4 0.8 0.6 0.3 Density 0.4 ACF 0.2 0.2 0.1 −0.2 0.0 −2 −1 0 1 2 0 5 10 15 Histogram of residuals and N(0,1) density Lag π = 0 . 91 ˆ useR! 2009, Rennes 8.-10. July 2009 – p. 20

  23. Crime Register Data SIMO : Austrian online crime register ■ Time range: 2004 - 2007 ■ weekly counts ■ 132 regions ■ different crime categories In most cases Poisson overdispersion useR! 2009, Rennes 8. -10. July 2009 – p. 21

  24. GP model estimates Shop Lifting Bicycle Theft 120 100 100 80 80 Frequency Frequency 60 60 40 40 20 20 0 0 0 50 100 150 200 0 50 100 150 200 Week Week π = 0 . 69 ˆ π = 0 . 67 ˆ useR! 2009, Rennes 8. -10. July 2009 – p. 22

  25. Beta-Poisson model estimates Shop Lifting Bicycle Theft 120 150 100 80 Frequency Frequency 100 60 40 50 20 0 0 0 50 100 150 200 0 50 100 150 200 Week Week π = 0 . 65 ˆ π = 0 . 52 ˆ useR! 2009, Rennes 8. -10. July 2009 – p. 23

  26. Summary ■ Great variety of models ■ MLE based implementation in R ■ Good performance for simulated data ■ Reasonable estimates for real data useR! 2009, Rennes 8. -10. July 2009 – p. 24

  27. Summary ■ Great variety of models ■ MLE based implementation in R ■ Good performance for simulated data ■ Reasonable estimates for real data Future work ■ Implement Conditional Poisson models ■ Non-nested Testing for more than two models useR! 2009, Rennes 8. -10. July 2009 – p. 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend