multilevel models for estimating the number of deaths in
play

Multilevel Models for Estimating the Number of Deaths in Armed - PowerPoint PPT Presentation

Multilevel Models for Estimating the Number of Deaths in Armed Conflict (in Colombia) Shira Mitchell JSM 2014 Collaborators: Brent Coull, Alan Zaslavsky, Al Ozonoff, Kristian Lum, Patrick Ball, Megan Price August 5, 2014 Shira Mitchell


  1. Multilevel Models for Estimating the Number of Deaths in Armed Conflict (in Colombia) Shira Mitchell JSM 2014 Collaborators: Brent Coull, Alan Zaslavsky, Al Ozonoff, Kristian Lum, Patrick Ball, Megan Price August 5, 2014 Shira Mitchell (Columbia) August 5, 2014 1 / 27

  2. Shira Mitchell (Columbia) August 5, 2014 2 / 27

  3. Shira Mitchell (Columbia) August 5, 2014 3 / 27

  4. Data from the Human Rights Data Analysis Group (HRDAG) NGOs and govt groups provide lists of killings in 1998-2007, Casanare, Colombia: department of Colombia, population 300,000, BP oil pipeline, much corruption and violence. Goal: Estimate the number of killings in Casanare in years 1998-2007. Shira Mitchell (Columbia) August 5, 2014 4 / 27

  5. Shira Mitchell (Columbia) August 5, 2014 5 / 27

  6. Shira Mitchell (Columbia) August 5, 2014 6 / 27

  7. n k 1 k 2 ∼ Pois ( µ k 1 k 2 ) independent Shira Mitchell (Columbia) August 5, 2014 7 / 27

  8. log ( µ k 1 k 2 ) = λ 0 + λ 1 k 1 + λ 2 k 2 E [ N ] MLE = n 1 + n + 1 ⇒ � n 11 Shira Mitchell (Columbia) August 5, 2014 8 / 27

  9. General number of lists k = ( k 1 , k 2 , ..., k J ) for example, if in lists 3, 4, and 6 = ( 0, 0, 1, 1, 0, 1 ) Independence model: log ( µ k ) = λ 0 + λ 1 k 1 + ... + λ J k J Shira Mitchell (Columbia) August 5, 2014 9 / 27

  10. Shira Mitchell (Columbia) August 5, 2014 10 / 27

  11. Data - from HRDAG Matching: Commission of Jurists National Police Year Gender location Year Perpetrator Gender . . . . . . . 1998 male TAMARA . . . . . . . . . . . 1998 FARC male 6 lists contain among them 2619 observed killings. Shira Mitchell (Columbia) August 5, 2014 11 / 27

  12. Data - from HRDAG year IMLM PN0 VP CCJ CIN CCE (govt) (govt) (govt) (NGO) (NGO) (NGO) 1998 1 0 0 14 13 3 1999 2 0 0 6 8 2 2000 213 0 5 22 23 0 2001 262 0 2 21 12 0 2002 268 1 0 33 9 0 2003 348 274 2 12 11 0 2004 412 324 295 14 11 1 2005 210 155 138 8 13 16 2006 104 71 26 3 2 15 2007 54 0 33 27 36 35 Shira Mitchell (Columbia) August 5, 2014 12 / 27

  13. We’re far from the independence model Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 13 / 27

  14. Heterogeneity of a person’s recordability P j ( θ ) = P ( person with recordability θ is recorded on list j ) � � P j ( θ govt ) log = θ govt + λ j for j ∈ govts 1 − P j ( θ govt ) � � P j ( θ NGO ) log = θ NGO + λ j for j ∈ NGOs 1 − P j ( θ NGO ) Shira Mitchell (Columbia) August 5, 2014 14 / 27

  15. Heterogeneity of a person’s recordability Let ( θ NGO , θ govt ) ∼ p ( θ NGO , θ govt ) . Then , k govt log ( µ k ) = λ 0 + λ 1 k 1 + ... + λ 6 k 6 + γ ( k NGO ) + + Shira Mitchell (Columbia) August 5, 2014 15 / 27

  16. log ( µ k ) = λ 0 + λ 1 k 1 + ... + λ 6 k 6 + � � � ω NGO k j k j ′ + ω govt k j k j ′ + ω mix k j k j ′ j , j ′ ∈ NGOs j , j ′ ∈ govts j ∈ NGOs , j ′ ∈ govts Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 16 / 27

  17. � ( t ) � log µ k = λ 0 t + λ 1, t k 1 + ... + λ 6, t k 6 + � � � ω NGO k j k j ′ + ω govt k j k j ′ + ω mix k j k j ′ j , j ′ ∈ NGOs j , j ′ ∈ govts j ∈ NGOs , j ′ ∈ govts Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 17 / 27

  18. � ( t ) � log µ k = λ 0 t + λ 1, t k 1 + ... + λ 6, t k 6 + � � � ω NGO k j k j ′ + ω govt k j k j ′ + ω mix k j k j ′ j , j ′ ∈ NGOs j , j ′ ∈ govts j ∈ NGOs , j ′ ∈ govts λ j , t ∼ N ( µ j , τ 2 ) for j = 1, ..., 6 Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 18 / 27

  19. AR1 Model       ρ T − 1 1 . . . µ j ρ   λ j ,1 . ...   .     . . . . . . . ρ .        | µ j , ρ , τ 2 ∼ N  .  τ 2 , .       . . .     .   .    . .       λ j , T       ρ T − 1 1 µ j Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 19 / 27

  20. Mixture Model λ j , t | γ j , t ∼ ( 1 − γ j , t ) N ( µ inactive , σ 2 inactive ) + γ j , t N ( µ j , τ 2 ) for j = 1, ..., 6 γ j , t ∼ Bern ( p ) independent p ∼ Unif ( 0, 1 ) Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods Shira Mitchell (Columbia) August 5, 2014 20 / 27

  21. Missing Data Consider inactive lists as missing data [Zwane et al., 2004]. In year t , lists 3 and 4 are inactive. Treat n ( t ) 01000 as margin n ( t ) 01 ++ 0 , and cells n ( t ) 01000 , n ( t ) 01010 , n ( t ) 01100 , n ( t ) 01110 as missing data. Zeros from missing data (ZM) vs Zeros from sampling (ZS) Shira Mitchell (Columbia) August 5, 2014 21 / 27

  22. Zeros from Zeros from missing sampling data (ZS) (ZM) unpooled main U-ZM U-ZS effects (U) Multilevel M-ZS M-ZM model AR1-ZS (M) Shira Mitchell (Columbia) August 5, 2014 22 / 27

  23. Casanare Data Results Separate each year 2500 U−ZS ● M−ZS AR1−ZS U−ZM 2000 ● M−ZM Estimated Killings 1500 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● ● 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 Year Shira Mitchell (Columbia) August 5, 2014 23 / 27

  24. Posterior Predictive Checks: M-ZS Maximum number of Number recorded once Number recorded twice recordings 2500 p = 0.46 1500 p = 0.35 3000 Frequency 1500 Frequency Frequency p = 0.4 1000 500 1000 500 0 0 0 1600 1800 2000 450 550 650 4.0 4.5 5.0 5.5 6.0 Shira Mitchell (Columbia) August 5, 2014 24 / 27

  25. Simulations Simulate from posterior predictive distribution of M-ZM, M-ZS, and AR1-ZS fit to Casanare data. Fit all the models. Coverage is similar for all models. Multilevel models have narrower intervals, and lower bias. Shira Mitchell (Columbia) August 5, 2014 25 / 27

  26. Recommendations In many applications, lists concentrate effort in different years, locations, or demographics. If these groups are overlapping ⇒ fit joint models, to be able to model more list interactions, and to borrow information across strata. Shira Mitchell (Columbia) August 5, 2014 26 / 27

  27. We recommend Multilevel Models In years with little data, we might not trust unpooled estimates - high variance, likely to get extreme estimates. Exchangeability and normality can be assessed via posterior predictive checks, relaxed by expanding the model. If we want monthly estimates at municipality-level, less and less data per stratum. Colombia (2003-2011) Syria (2011-2013) Shira Mitchell (Columbia) August 5, 2014 27 / 27

  28. References I F Dominici. Combining contingency tables with missing dimensions. Biometrics , 56(2):546–553, 2000. A Gelman, A Jakulin, M G Pittau, and Y Su. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics , 2(4):1360–1383, 2008. E N Zwane, K van der Pal-de Bruin, and P G M van der Heijden. The multiple-record systems estimator when registrations refer to different but overlapping populations. Statistics in Medicine , 23:2267–2281, 2004. Shira Mitchell (Columbia) August 5, 2014 28 / 27

  29. Shira Mitchell (Columbia) August 5, 2014 29 / 27

  30. EM-like algorithm E step: s = 1 µ ( s ) � T n ( t ) � n ( t ) 01010 ˆ 01010 = 01 ++ 0 . � µ ( s ) 01000 + µ ( s ) 01010 + µ ( s ) 01100 + µ ( s ) � T s = 1 01110 M step: Fit log-linear model to completed data { n ( t ) k } k � = 00000,00010,00100,00110 . Bayesian version [Dominici, 2000]. Shira Mitchell (Columbia) August 5, 2014 30 / 27

  31. Sensitivity Analysis: Choice of µ inactive , τ 2 inactive 2000 −13, 3 −9, 1.5 ● −9, 3 1500 −9, 6 −5, 3 ● no mix ● estimate offs ● Estimated Killings 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 Year Shira Mitchell (Columbia) August 5, 2014 31 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend