Doubly robust treatment e ff ect estimation with missing attributes - - PowerPoint PPT Presentation

doubly robust treatment e ff ect estimation with missing
SMART_READER_LITE
LIVE PREVIEW

Doubly robust treatment e ff ect estimation with missing attributes - - PowerPoint PPT Presentation

Doubly robust treatment e ff ect estimation with missing attributes E ff ect of tranexamic acid on mortality of patients with traumatic brain injury Imke Mayer, Julie Josse, Stefan Wager, Tobias Gauss, Jean-Denis Moyer Group EHESS; Ecole


slide-1
SLIDE 1

Doubly robust treatment effect estimation with missing attributes

Effect of tranexamic acid on mortality of patients with traumatic brain injury Imke Mayer, Julie Josse, Stefan Wager, Tobias Gauss, Jean-Denis Moyer

EHESS; ´ Ecole Polytechnique; Stanford Business School; Traumabase R

Group

Statistique, Math´ ematique et Applications, Fr´ ejus, 3 sept. 2019

1

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3

Traumabase

  • 20, 000 patients
  • 250 continuous and categorical variables: heterogeneous
  • 16 hospitals: multilevel data
  • 4,000 new patients/ year

Center Accident Age Sex Weight Lactactes BP shock . . . Beaujon fall 54 m 85 NM 180 yes Pitie gun 26 m NR NA 131 no Beaujon moto 63 m 80 3.9 145 yes Pitie moto 30 w NR Imp 107 no HEGP knife 16 m 98 2.5 118 no . . . ...

2

slide-4
SLIDE 4

Traumabase

  • 20, 000 patients
  • 250 continuous and categorical variables: heterogeneous
  • 16 hospitals: multilevel data
  • 4,000 new patients/ year

Center Accident Age Sex Weight Lactactes BP shock . . . Beaujon fall 54 m 85 NM 180 yes Pitie gun 26 m NR NA 131 no Beaujon moto 63 m 80 3.9 145 yes Pitie moto 30 w NR Imp 107 no HEGP knife 16 m 98 2.5 118 no . . . ... ) Estimate causal effect: Administration of the treatment ”tranexamic acid” (within 3 hours after the accident) on the outcome mortality for traumatic brain injury patients

2

slide-5
SLIDE 5

Missing values

25 50 75 100

AIS.external AIS.face AIS.head ISS TBI Trauma.center Pupil.anomaly OTI.MICU Pupil.anomaly.ph HR GSC.init Cardiac.arrest.ph Delta.hemoCue IGS.II Vasopressor.therapy Hemoglobin SBP DBP Death.in.ICU SpO2 Anticoagulant.therapy Antiplatelet.therapy FiO2 SBP.min HR.max DBP.min SpO2.min GSC.motor.init Neurosurgery.day0 Medcare.time.ph Tranexamic.acid HemoCue.init Cristalloid.volume Colloid.volume ICP Osmotherapy EVD Decompressive.craniectomy TCD.PI.max SBP.MICU HR.MICU DBP.MICU IICP Glasgow.discharge Osmotherapy.ph Cause.of.death Improv.anomaly.osmo Temperature.min Variable Percentage

NA Not Informed Not made Not Applicable Impossible

3

slide-6
SLIDE 6

Causal inference: classical framework

slide-7
SLIDE 7

Potential outcome framework (Neyman, 1923, Rubin, 1974)

Causal effect Binary treatment w 2 {0, 1} on i-th individual with potential outcomes Yi(1) and Yi(0). Individual causal effect of the treatment: ∆i = Yi(1) Yi(0)

  • Problem: ∆i never observed (only observe one outcome/indiv).

Causal inference as a missing value pb? Covariates Treatment Outcome(s) X1 X2 X3 W Y(0) Y(1) 1.1 20 F 1 NA T

  • 6

45 F F NA 15 M 1 NA F . . . . . . . . . . . .

  • 2

52 M T NA

4

slide-8
SLIDE 8

Potential outcome framework (Neyman, 1923, Rubin, 1974)

Causal effect Binary treatment w 2 {0, 1} on i-th individual with potential outcomes Yi(1) and Yi(0). Individual causal effect of the treatment: ∆i = Yi(1) Yi(0)

  • Problem: ∆i never observed (only observe one outcome/indiv).

Causal inference as a missing value pb?

  • Average treatment effect (ATE) τ = E[∆i] = E[Yi(1) Yi(0)]:

The ATE is the difference of the average outcome had everyone gotten treated and the average outcome had nobody gotten treatment. ) First solution: estimate τ with randomized controlled trials (RCT).

4

slide-9
SLIDE 9

Observational data

Non random assignment ! Confounding

Mortality rate 20% - treated 38% - not treated 16%: treatment kills?

survived deceased Pr(survived | treatment) Pr(deceased | treatment) TA not administered 2,167 (68%) 399 (13%) 0.84 0.16 TA administered 374 (12%) 228 (7%) 0.62 0.38

Table 1: Occurrence and frequency table for traumatic brain injury patients (total number: 3,168).

5

slide-10
SLIDE 10

Unconfoundedness and the propensity score

Assumptions

  • n iid samples (Xi, Yi, Wi),
  • Yi = WiYi(1) + (1 Wi)Yi(0)

(SUTVA)

  • Treatment assignment is random conditionally on Xi:

{Yi(0), Yi(1)} ? ? Wi | Xi ⌘ unconfoundedness assumption. Propensity score and overlap assumption e(x) , P(Wi = 1 | Xi = x) 8 x 2 X. We will assume overlap, i.e. 0 < e(x) < 1 8 x 2 X. Key property e is a balancing score, i.e. under unconfoundedness, it satisfies {Yi(0), Yi(1)} ? ? Wi | e(Xi)

6

slide-11
SLIDE 11

Propensity based estimators

Inverse Propensity Weighted estimator ˆ τIPW , 1 n

n

X

i=1

✓WiYi ˆ e(Xi) (1 Wi)Yi 1 ˆ e(Xi) ◆

7

slide-12
SLIDE 12

Propensity based estimators

Inverse Propensity Weighted estimator ˆ τIPW , 1 n

n

X

i=1

✓WiYi ˆ e(Xi) (1 Wi)Yi 1 ˆ e(Xi) ◆ Augmented IPW: a doubly robust estimator Define µ(w)(x) := E[Yi(w) | Xi = x].

ˆ τAIPW := 1 n

n

X

i=1

✓ ˆ µ(1)(Xi) − ˆ µ(0)(Xi) + Wi Yi − ˆ µ(1)(Xi) ˆ e(Xi) − (1 − Wi)Yi − ˆ µ(0)(Xi) 1 − ˆ e(Xi) ◆

is consistent if either the ˆ µ(w)(x) are consistent or ˆ e(x) is consistent. ) The AIPW has better statistical properties than IPW (Robins et al.,

1994; Chernozhukov et al., 2018). )Possibility to use any (machine learning) procedure such as random forests, deep nets, etc. to estimate ˆ e(x) and ˆ µ(w)(x) without harming the interpretability of the causal effect estimation. R package grf (Athey et al., 2019)

7

slide-13
SLIDE 13

Causal inference: with missing attributes?

slide-14
SLIDE 14

Unconfoundedness with missing attributes?

Without any changes to the previous framework, the only straightforward – but generally biased – solution is complete-case analysis. Covariates Treatment Outcome(s) X1 X2 X3 W Y(0) Y(1) NA 20 F 1 NA T

  • 6

45 NA F NA NA M 1 NA F NA 32 F 1 NA T 1 63 M 1 F NA

  • 2

NA M T NA

8

slide-15
SLIDE 15

Unconfoundedness with missing attributes?

Without any changes to the previous framework, the only straightforward – but generally biased – solution is complete-case analysis. Covariates Treatment Outcome X1 X2 X3 W Y NA 20 F 1 T

  • 6

45 NA F NA M 1 F NA 32 F 1 T 1 63 M 1 F

  • 2

NA M T

8

slide-16
SLIDE 16

Unconfoundedness with missing attributes?

Without any changes to the previous framework, the only straightforward – but generally biased – solution is complete-case analysis. Covariates Treatment Outcome X1 X2 X3 W Y NA 20 F 1 T

  • 6

45 NA F NA M 1 F NA 32 F 1 T 1 63 M 1 F

  • 2

NA M T

8

slide-17
SLIDE 17

Unconfoundedness with missing attributes?

Without any changes to the previous framework, the only straightforward – but generally biased – solution is complete-case analysis. ! Often not a good idea! What are the alternatives? Two families of methods

  • Unconfoundedness despite missingness
  • Classical missing values mechanisms (MCAR, MAR, MNAR, (Rubin,

1976))

8

slide-18
SLIDE 18

Unconfoundedness with missing attributes?

Unconfoundedness despite missingness Adapt the initial assumptions s.t. treatment assignment is unconfounded given only the observed information, that is, observed covariates and the response pattern.

8

slide-19
SLIDE 19

Unconfoundedness with missing attributes?

Notations

  • response pattern R 2 {NA, 1}p, Rj , 1{Xj is observed} + NA 1{Xj is missing},
  • X∗ = R X 2 {R [ NA}p

Unconfoundedness despite missingness Treatment is unconfounded given X ∗: {Yi(1), Yi(0)} ? ? Wi | X ∗, (1)

  • r alternatively:

{Yi(1), Yi(0)} ? ? Wi | Xi, Ri, 8 > < > : CIT: Wi ? ? Xi | X ∗

i , Ri

  • r

CIO: Yi(t) ? ? Xi | X ∗

i , Ri

for t 2 {0, 1} (2)

8

slide-20
SLIDE 20

Unconfoundedness with missing attributes?

Unconfoundedness despite missingness

Treatment is unconfounded given X ∗: {Yi(1), Yi(0)} ⊥ ⊥ Wi | X ∗, (1)

  • r alternatively:

{Yi(1), Yi(0)} ? ? Wi | Xi, Ri,      CIT: Wi ? ? Xi | X ∗

i , Ri

  • r

CIO: Yi(t) ? ? Xi | X ∗

i , Ri

for t 2 {0, 1} (2)

(a) CIT

X X ∗ R W w Y (w)

(b) CIO

X X ∗ R W w Y (w)

8

slide-21
SLIDE 21

Generalized propensity score and random forests

Generalized propensity score (Rosenbaum and Rubin, 1984) e∗(X ∗) = P(W = 1 | X ∗). ! Allows to balance treatment and control groups on the observed information X ∗ in the case of missing values (1).

9

slide-22
SLIDE 22

Generalized propensity score and random forests

Generalized propensity score (Rosenbaum and Rubin, 1984) e∗(X ∗) = P(W = 1 | X ∗). ! Allows to balance treatment and control groups on the observed information X ∗ in the case of missing values (1). ! Random forests allow incorporating missing values directly since they allow semi-discrete variables (e.g. X ∗ 2 (R ⇥ NA)p). ! With specific representation/encoding of missing values (MIA), splits are possible either on observed variables or on response pattern (Josse

et al., 2019).

9

slide-23
SLIDE 23

Generalized propensity score and random forests

Generalized propensity score (Rosenbaum and Rubin, 1984) e∗(X ∗) = P(W = 1 | X ∗). ! Random forests allow incorporating missing values directly since they allow semi-discrete variables (e.g. X ∗ 2 (R ⇥ NA)p). ! With specific representation/encoding of missing values (MIA), splits are possible either on observed variables or on response pattern (Josse

et al., 2019).

! recursively find partition that minimizes empirical risk. For every covariate Xj and threshold z, there are three possibilities: {X ∗

j  z or X ∗ j = NA}

vs {X ∗

j > z}

{X ∗

j  z}

vs {X ∗

j > z or X ∗ j = NA}

{X ∗

j = NA}

vs {X ∗

j 6= NA} 9

slide-24
SLIDE 24

Generalized propensity score and random forests

Generalized propensity score (Rosenbaum and Rubin, 1984) e∗(X ∗) = P(W = 1 | X ∗). ! Random forests allow incorporating missing values directly since they allow semi-discrete variables (e.g. X ∗ 2 (R ⇥ NA)p). ! With specific representation/encoding of missing values (MIA), splits are possible either on observed variables or on response pattern (Josse

et al., 2019).

! recursively find partition that minimizes empirical risk. For every covariate Xj and threshold z, there are three possibilities: {X ∗

j  z or X ∗ j = NA}

vs {X ∗

j > z}

{X ∗

j  z}

vs {X ∗

j > z or X ∗ j = NA}

{X ∗

j = NA}

vs {X ∗

j 6= NA}

! This procedure targets: e∗(X ∗) = X

r∈{0,1}p

E[W |X ∗, R = r]1R=r.

9

slide-25
SLIDE 25

Unconfoundedness with missing attributes?

Assumption on missing values mechanism Assume standard unconfoundedness and MAR mechanism. Then multiple imputation using (X, W , Y ) is consistent (Hill (2004); Seaman

and White (2014); Leyrat et al. (2019)).

? ? ? ? ? ? ? ? ? ? ? ? ? ?

! Problem: can we use Y for propensity score estimation? ! What happens with informative missing values?

10

slide-26
SLIDE 26

Simulations

Setup

  • Different data generating models (linear, nonlinear, latent, etc.)
  • Different missingness mechanisms

Figure 2: Estimated and true average treatment effect (τ = 1, MCAR)

  • AIPW

IPW Standard unconf.

  • Unconf. despite missingness

100 500 1000 5000 100 500 1000 5000

−15 −10 −5 5 10 −15 −10 −5 5 10

Method

mean.loglin mean.grf mice mf saem mia.grf

11

slide-27
SLIDE 27

Simulations

Setup

  • Different data generating models (linear, nonlinear, latent, etc.)
  • Different missingness mechanisms

Figure 2: Estimated and true average treatment effect (τ = 1, MNAR)

  • AIPW

IPW Standard unconf.

  • Unconf. despite missingness

100 500 1000 5000 100 500 1000 5000

−20 −10 10 −20 −10 10

Method

mean.loglin mean.grf mice mf saem mia.grf

11

slide-28
SLIDE 28

Simulations

Setup

  • Different data generating models (linear, nonlinear, latent, etc.)
  • Different missingness mechanisms

Results

  • AIPW estimators perform better than their IPW counterparts.
  • For ˆ

τmia, the unconfoundedness despite missingness is indeed necessary.

  • ˆ

τmia unbiased for all missingness mechanisms, especially for MNAR.

  • Multiple imputation (mice) only requires standard unconfoundedness, but cannot

handle informative missing values.

11

slide-29
SLIDE 29

Application: Traumabase

slide-30
SLIDE 30

Plausibility of underlying assumptions with Traumabase data

  • Unconfoundedness despite missingness: seems plausible (physicians

decide based on what they observe+record)

  • Many variables have informative missing data.

12

slide-31
SLIDE 31

Results

40 covariates, 13 confounders. 3,168 patients. ATE estimations (⇥100) for the effect of tranexamic acid on in-hospital mortality for TBI patients. Imputations on all patients (TBI + no-TBI).

  • (c) imputation (MICE)

(b) MF (a) MIA −5 5 10 15 20

ATE (in %) Imputation.set

(y-axis: estimation approach, solid: DR, dotted: IPW, turquoise: without adjustment), (x-axis: ATE estimation with bootstrap CI) We compute the mortality rate in the treated group and the mortality rate in the control group (after covariate balancing). The obtained value corresponds to the difference in percentage points between mortality rates in treatment and control. 13

slide-32
SLIDE 32

Conclusion

slide-33
SLIDE 33

Conclusion and perspectives

Conclusion

  • Missing attributes alter causal analyses.
  • Additional assumptions guaranteeing either unconfoundedness

given missing values or MAR.

  • New proposal to handle missing values in causal inference.
  • Prefer AIPW to IPW estimators, in theory and in practice.
  • First application on real data.

14

slide-34
SLIDE 34

Conclusion and perspectives

Conclusion

  • Missing attributes alter causal analyses.
  • Additional assumptions guaranteeing either unconfoundedness

given missing values or MAR.

  • New proposal to handle missing values in causal inference.
  • Prefer AIPW to IPW estimators, in theory and in practice.
  • First application on real data.

Ongoing work

  • Extension to heterogeneous treatment effects (Athey and Imbens,

2015) and optimal policy learning (Imai and Ratkovic, 2013).

  • Compare results to the ones from CRASH3 study (Dewan et al. (2012)).
  • Apply methodology to other causal questions on the Traumabase

(e.g. treatment “bundles” in case of TBI).

14

slide-35
SLIDE 35

Missing value website

“One of the ironies of Big Data is that missing data play an ever more significant role” (R. Samworth, 2019) More information and details on missing values: R-miss-tastic platform. ! Theoretical and practical tutorials, popular datasets, bibliography, workflows (in R), active contributors/researchers in the community, etc. https: //rmisstastic.netlify.com

M., Josse, J., Tierney, N., & Vialaneix,

  • N. (2019). R-miss-tastic: a unified

platform for missing values methods and

  • workflows. arXiv preprint

arXiv:1908.04822.

15

slide-36
SLIDE 36

MERCI

15

slide-37
SLIDE 37

References

slide-38
SLIDE 38

References i

Athey, S., Tibshirani, J., and Wager, S. (2019). Generalized random forests. The Annals of Statistics, 47(2):1148–1178. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68. Dewan, Y., Komolafe, E. O., Mej´ ıa-Mantilla, J. H., Perel, P., Roberts, I., and Shakur, H. (2012). Crash-3-tranexamic acid for the treatment of significant traumatic brain injury: study protocol for an international randomized, double-blind, placebo-controlled trial. Trials, 13(1):87. Hill, J. (2004). Reducing bias in treatment effect estimation in observational studies suffering from missing data. Technical report, Institute for Social and Economic Research and Policy, Columbia University. Josse, J., Prost, N., Scornet, E., and Varoquaux, G. (2019). On the consistency of supervised learning with missing values. arXiv preprint. Leyrat, C., Seaman, S. R., White, I. R., Douglas, I., Smeeth, L., Kim, J., Resche-Rigon, M., Carpenter, J. R., and Williamson, E. J. (2019). Propensity score analysis with partially observed covariates: How should multiple imputation be used? Statistical methods in medical research, 28(1):3–19. Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427):846–866.

slide-39
SLIDE 39

References ii

Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79(387):516–524. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3):581–592. Seaman, S. and White, I. (2014). Inverse probability weighting with missing predictors of treatment assignment or missingness. Communications in Statistics-Theory and Methods, 43(16):3499–3515. Textor, J., Hardt, J., and Kn¨ uppel, S. (2011). Dagitty: a graphical tool for analyzing causal

  • diagrams. Epidemiology, 22(5):745.
slide-40
SLIDE 40

Traumabase

  • 20, 000 patients
  • 250 continuous and categorical variables: heterogeneous
  • 16 hospitals: multilevel data
  • 4,000 new patients/ year

Graph produced using DAGitty (Textor et al. (2011))

slide-41
SLIDE 41

Simulations: importance of CIT/CIO and performance of ˆ τmia

Conditional independences CIT: W ⇠ Z R

(where Rij = 1{Zij is observed} and = Hadamard product). Example: for fixed α 2 R4 and τ 2 R: ri = (1, 1, 0, 0, 0, 1, 0, 0, 0, 1) ) logit(P(W i = 1|Z i

  • bs = zi
  • bs, Ri = ri)) = α0 + α1zi

1 + α2zi 2 + α6zi 6 + α10zi 10

rj = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ) logit(P(W j = 1|Z j

  • bs = zj
  • bs, Ri = rj)) = α0 + α2zj

2

¬CIT: logit(P(W i = 1|Z i = zi)) = α0 + αTzi. CIO: Y ⇠ Z R.

Example: for fixed β 2 R4 and τ 2 R: ri = (1, 1, 0, 0, 0, 1, 0, 0, 0, 1) ) E(Y i|Z i

  • bs = zi
  • bs, Ri = ri, W i = wi) = β0 + β1zi

1 + β2zi 2 + β6zi 6 + β10zi 10 + τwi

rj = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ) E(Y j|Z j

  • bs = zj
  • bs, Ri = rj, W j = wj) = β0 + β2zj

2 + τwj

¬CIO: E(Y i|Z i = zi, W i = w i) = β0 + βTzi + τw i.