Multilevel Models for Estimating the Number of Deaths in Armed - - PowerPoint PPT Presentation

multilevel models for estimating the number of deaths in
SMART_READER_LITE
LIVE PREVIEW

Multilevel Models for Estimating the Number of Deaths in Armed - - PowerPoint PPT Presentation

Multilevel Models for Estimating the Number of Deaths in Armed Conflict (in Colombia) Shira Mitchell JSM 2014 Collaborators: Brent Coull, Alan Zaslavsky, Al Ozonoff, Kristian Lum, Patrick Ball, Megan Price August 5, 2014 Shira Mitchell


slide-1
SLIDE 1

Multilevel Models for Estimating the Number of Deaths in Armed Conflict (in Colombia)

Shira Mitchell

JSM 2014

Collaborators: Brent Coull, Alan Zaslavsky, Al Ozonoff, Kristian Lum, Patrick Ball, Megan Price August 5, 2014

Shira Mitchell (Columbia) August 5, 2014 1 / 27

slide-2
SLIDE 2

Shira Mitchell (Columbia) August 5, 2014 2 / 27

slide-3
SLIDE 3

Shira Mitchell (Columbia) August 5, 2014 3 / 27

slide-4
SLIDE 4

Data from the Human Rights Data Analysis Group (HRDAG)

NGOs and govt groups provide lists of killings in 1998-2007, Casanare, Colombia: department of Colombia, population 300,000, BP oil pipeline, much corruption and violence. Goal: Estimate the number of killings in Casanare in years 1998-2007.

Shira Mitchell (Columbia) August 5, 2014 4 / 27

slide-5
SLIDE 5

Shira Mitchell (Columbia) August 5, 2014 5 / 27

slide-6
SLIDE 6

Shira Mitchell (Columbia) August 5, 2014 6 / 27

slide-7
SLIDE 7

nk1k2 ∼ Pois(µk1k2) independent

Shira Mitchell (Columbia) August 5, 2014 7 / 27

slide-8
SLIDE 8

log(µk1k2) = λ0 + λ1k1 + λ2k2 ⇒ E[N]MLE = n1+n+1 n11

Shira Mitchell (Columbia) August 5, 2014 8 / 27

slide-9
SLIDE 9

General number of lists k = (k1, k2, ..., kJ) for example, if in lists 3, 4, and 6 = (0, 0, 1, 1, 0, 1) Independence model: log(µk) = λ0 + λ1k1 + ... + λJkJ

Shira Mitchell (Columbia) August 5, 2014 9 / 27

slide-10
SLIDE 10

Shira Mitchell (Columbia) August 5, 2014 10 / 27

slide-11
SLIDE 11

Data - from HRDAG

Matching:

Commission of Jurists Year Gender location . . . 1998 male TAMARA . . . . . . National Police Year Perpetrator Gender . . . . . . . . . 1998 FARC male

6 lists contain among them 2619 observed killings.

Shira Mitchell (Columbia) August 5, 2014 11 / 27

slide-12
SLIDE 12

Data - from HRDAG

year IMLM (govt) PN0 (govt) VP (govt) CCJ (NGO) CIN (NGO) CCE (NGO) 1998 1 14 13 3 1999 2 6 8 2 2000 213 5 22 23 2001 262 2 21 12 2002 268 1 33 9 2003 348 274 2 12 11 2004 412 324 295 14 11 1 2005 210 155 138 8 13 16 2006 104 71 26 3 2 15 2007 54 33 27 36 35

Shira Mitchell (Columbia) August 5, 2014 12 / 27

slide-13
SLIDE 13

We’re far from the independence model Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 13 / 27

slide-14
SLIDE 14

Heterogeneity of a person’s recordability

Pj(θ) = P(person with recordability θ is recorded on list j) log

  • Pj(θgovt)

1 − Pj(θgovt)

  • = θgovt + λj for j ∈ govts

log

  • Pj(θNGO)

1 − Pj(θNGO)

  • = θNGO + λj for j ∈ NGOs

Shira Mitchell (Columbia) August 5, 2014 14 / 27

slide-15
SLIDE 15

Heterogeneity of a person’s recordability

Let (θNGO, θgovt) ∼ p(θNGO, θgovt). Then log(µk) = λ0 + λ1k1 + ... + λ6k6 + γ(kNGO

+

, kgovt

+

)

Shira Mitchell (Columbia) August 5, 2014 15 / 27

slide-16
SLIDE 16

log(µk) = λ0 + λ1k1 + ... + λ6k6+

  • j,j′∈NGOs

ωNGOkjkj′ +

  • j,j′∈govts

ωgovtkjkj′ +

  • j∈NGOs,j′∈govts

ωmixkjkj′

Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 16 / 27

slide-17
SLIDE 17

log

  • µk

(t)

= λ0t + λ1,tk1 + ... + λ6,tk6+

  • j,j′∈NGOs

ωNGOkjkj′ +

  • j,j′∈govts

ωgovtkjkj′ +

  • j∈NGOs,j′∈govts

ωmixkjkj′

Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 17 / 27

slide-18
SLIDE 18

log

  • µk

(t)

= λ0t + λ1,tk1 + ... + λ6,tk6+

  • j,j′∈NGOs

ωNGOkjkj′ +

  • j,j′∈govts

ωgovtkjkj′ +

  • j∈NGOs,j′∈govts

ωmixkjkj′

λj,t ∼ N(µj, τ2) for j = 1, ..., 6 Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 18 / 27

slide-19
SLIDE 19

AR1 Model

    λj,1 . . . λj,T     | µj, ρ, τ2 ∼ N               µj . . . . . . µj        ,        1 ρ . . . ρT−1 ρ ... . . . . . . . . . ρT−1 1        τ2        . Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 19 / 27

slide-20
SLIDE 20

Mixture Model

λj,t | γj,t ∼ (1 − γj,t)N(µinactive, σ2

inactive) + γj,tN(µj, τ2) for j = 1, ..., 6

γj,t ∼ Bern(p) independent p ∼ Unif(0, 1)

Heterogeneity of a person’s recordability Groups collecting data interact Want yearly estimates, but very little data exist in some years Groups operating in different but overlapping time periods

Shira Mitchell (Columbia) August 5, 2014 20 / 27

slide-21
SLIDE 21

Missing Data

Consider inactive lists as missing data [Zwane et al., 2004]. In year t, lists 3 and 4 are inactive. Treat n(t)

01000 as margin n(t) 01++0, and cells n(t) 01000, n(t) 01010, n(t) 01100, n(t) 01110 as

missing data. Zeros from missing data (ZM) vs Zeros from sampling (ZS)

Shira Mitchell (Columbia) August 5, 2014 21 / 27

slide-22
SLIDE 22

Zeros from missing data (ZM) Zeros from sampling (ZS) unpooled main effects (U) U-ZM U-ZS Multilevel model (M) M-ZM M-ZS AR1-ZS

Shira Mitchell (Columbia) August 5, 2014 22 / 27

slide-23
SLIDE 23

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 500 1000 1500 2000 2500 Casanare Data Results Year Estimated Killings

  • Separate each year

U−ZS M−ZS AR1−ZS U−ZM M−ZM Shira Mitchell (Columbia) August 5, 2014 23 / 27

slide-24
SLIDE 24

Posterior Predictive Checks: M-ZS

Number recorded once Frequency 1600 1800 2000 500 1500 2500 p = 0.46 Number recorded twice Frequency 450 550 650 500 1000 1500 p = 0.35 Maximum number of recordings Frequency 4.0 4.5 5.0 5.5 6.0 1000 3000 p = 0.4

Shira Mitchell (Columbia) August 5, 2014 24 / 27

slide-25
SLIDE 25

Simulations Simulate from posterior predictive distribution of M-ZM, M-ZS, and AR1-ZS fit to Casanare data. Fit all the models. Coverage is similar for all models. Multilevel models have narrower intervals, and lower bias.

Shira Mitchell (Columbia) August 5, 2014 25 / 27

slide-26
SLIDE 26

Recommendations

In many applications, lists concentrate effort in different years, locations, or demographics. If these groups are overlapping ⇒ fit joint models, to be able to model more list interactions, and to borrow information across strata.

Shira Mitchell (Columbia) August 5, 2014 26 / 27

slide-27
SLIDE 27

We recommend Multilevel Models

In years with little data, we might not trust unpooled estimates - high variance, likely to get extreme estimates. Exchangeability and normality can be assessed via posterior predictive checks, relaxed by expanding the model. If we want monthly estimates at municipality-level, less and less data per stratum.

Colombia (2003-2011) Syria (2011-2013)

Shira Mitchell (Columbia) August 5, 2014 27 / 27

slide-28
SLIDE 28

References I

F Dominici. Combining contingency tables with missing dimensions. Biometrics, 56(2):546–553, 2000. A Gelman, A Jakulin, M G Pittau, and Y Su. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4):1360–1383, 2008. E N Zwane, K van der Pal-de Bruin, and P G M van der Heijden. The multiple-record systems estimator when registrations refer to different but overlapping populations. Statistics in Medicine, 23:2267–2281, 2004.

Shira Mitchell (Columbia) August 5, 2014 28 / 27

slide-29
SLIDE 29

Shira Mitchell (Columbia) August 5, 2014 29 / 27

slide-30
SLIDE 30

EM-like algorithm

E step: ˆ n(t)

01010 =

T

s=1 µ(s) 01010

T

s=1

  • µ(s)

01000 + µ(s) 01010 + µ(s) 01100 + µ(s) 01110

n(t)

01++0.

M step: Fit log-linear model to completed data {n(t)

k }k=00000,00010,00100,00110.

Bayesian version [Dominici, 2000].

Shira Mitchell (Columbia) August 5, 2014 30 / 27

slide-31
SLIDE 31

Sensitivity Analysis: Choice of µinactive, τ2

inactive

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 500 1000 1500 2000 Year Estimated Killings

  • −13, 3

−9, 1.5 −9, 3 −9, 6 −5, 3 no mix estimate offs

Shira Mitchell (Columbia) August 5, 2014 31 / 27

slide-32
SLIDE 32

Posterior Predictive Checks: AR1-ZS

Number recorded once Frequency 1600 1800 2000 500 1500 2500 p = 0.49 Number recorded twice Frequency 450 550 650 500 1000 p = 0.37 Maximum number of recordings Frequency 4.0 4.5 5.0 5.5 6.0 1000 3000 p = 0.43

Shira Mitchell (Columbia) August 5, 2014 32 / 27

slide-33
SLIDE 33

Posterior Predictive Checks: M-ZM

Number recorded once Frequency 1700 1850 2000 100 200 300 p = 0.53 Number recorded twice Frequency 500 550 600 50 150 250 p = 0.2 Maximum number of recordings Frequency 4.0 4.5 5.0 5.5 6.0 200 400 600 p = 0.36

Shira Mitchell (Columbia) August 5, 2014 33 / 27

slide-34
SLIDE 34

Generate data from M-ZM: Coverage

  • Coverage

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM 0.2 0.4 0.6 0.8 1

Shira Mitchell (Columbia) August 5, 2014 34 / 27

slide-35
SLIDE 35

Generate data from M-ZS: Coverage

  • Coverage

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM 0.2 0.4 0.6 0.8 1

Shira Mitchell (Columbia) August 5, 2014 35 / 27

slide-36
SLIDE 36

Generate data from AR1-ZS: Coverage

  • Coverage

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM 0.2 0.4 0.6 0.8 1

Shira Mitchell (Columbia) August 5, 2014 36 / 27

slide-37
SLIDE 37

Generate data from M-ZM: Bias

  • −20

20 40 60 80 100 Average Bias

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 37 / 27

slide-38
SLIDE 38

Generate data from M-ZS: Bias

  • −20

20 40 60 80 100 Average Bias

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 38 / 27

slide-39
SLIDE 39

Generate data from AR1-ZS: Bias

  • −20

20 40 60 80 100 Average Bias

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 39 / 27

slide-40
SLIDE 40

Generate data from M-ZM: Interval Width

  • 20

40 60 80 100 Average Interval Width

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 40 / 27

slide-41
SLIDE 41

Generate data from M-ZS: Interval Width

  • 20

40 60 80 100 Average Interval Width

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 41 / 27

slide-42
SLIDE 42

Generate data from AR1-ZS: Interval Width

  • 20

40 60 80 100 Average Interval Width

(log scale)

Separate each year U−ZS M−ZS AR1−ZS U−ZM M−ZM

Shira Mitchell (Columbia) August 5, 2014 42 / 27

slide-43
SLIDE 43

Continuous Model Expansion

log

  • µ(t)

k

  • = λ0t + λ1,tk1 + ... + λ6,tk6+
  • j,j′∈NGOs

ωNGOkjkj′ +

  • j,j′∈govts

ωgovtkjkj′ +

  • j∈NGOs,j′∈govts

ωmixkjkj′

ωj,j′ ∼ N

  • ωNGO, σ2

NGO

  • Shira Mitchell (Columbia)

August 5, 2014 43 / 27

slide-44
SLIDE 44

Continuous Model Expansion: 3-way log-linear interactions Population heterogeneity ⇒ higher-order interactions. Story for list cooperations?

Shira Mitchell (Columbia) August 5, 2014 44 / 27

slide-45
SLIDE 45

Continuous Model Expansion: 3-way log-linear interactions A Story:

Shira Mitchell (Columbia) August 5, 2014 45 / 27

slide-46
SLIDE 46

Continuous Model Expansion: 3-way log-linear interactions Cauchy priors - regularization [Gelman et al., 2008] Exchangeable based on NGO/govt

Shira Mitchell (Columbia) August 5, 2014 46 / 27