Treatment effect estimation with missing attributes
Julie Josse
´ Ecole Polytechnique, INRIA Visiting Researcher, Google Brain
Mathematical Methods of Modern Statistics, June 2020
1
Treatment effect estimation with missing attributes Julie Josse - - PowerPoint PPT Presentation
Treatment effect estimation with missing attributes Julie Josse Ecole Polytechnique, INRIA Visiting Researcher, Google Brain Mathematical Methods of Modern Statistics, June 2020 1 Collaborators Methods: Imke Mayer (PhD X, EHESS),
´ Ecole Polytechnique, INRIA Visiting Researcher, Google Brain
1
2
3
3
survived deceased Pr(survived | treatment) Pr(deceased | treatment) HCQ 497 (11.4%) 111 (2.6%) 0.817 0.183 HCQ+AZI 158 (3.6%) 54 (1.2%) 0.745 0.255 none 2699 (62.1%) 830 (19.1%) 0.765 0.235
25 50 75 100
0.000 0.005 0.010 0.015 0.020 0.025 0.000 0.005 0.010 0.015 0.020 0.025 Age
Mean Median
Treatment arm
HCQ Nothing
Comparison of the distribution of Age between HCQ and non treated.
4
Covariates Treatment Outcome(s) X1 X2 X3 W Y(0) Y(1) 1.1 20 F 1 ? Survived
45 F Dead ? 15 M 1 ? Survived . . . . . . . . . . . .
52 M Survived ? 5
Covariates Treatment Outcome(s) X1 X2 X3 W Y(0) Y(1) 1.1 20 F 1 ? Survived
45 F Dead ? 15 M 1 ? Survived . . . . . . . . . . . .
52 M Survived ?
5
X W Y {Y (0), Y (1)}
6
7
n
X Y Treated
higher X’s on average X Y Reweighting control
high X’s adjusts for difference
Credit: S. Athey 8
n
i=1
Yi −ˆ µ(1)(Xi ) ˆ e(Xi )
Yi −ˆ µ(0)(Xi ) 1−ˆ e(Xi )
d
n→∞ N(0, V ∗), V ∗ semiparametric efficient variance. 9
20 40 60 age days_since_first_case_datetime gender num_hospitals
period asthma cancer chemotherapy_radiotherapy chronic_hepatic_disease chronic_obstructive_pulmonary_disease chronic_respiratory_failure diabetes dyslipidemia heart_arrhythmia hematological_malignancies hypertension ischemic_heart_disease kidney_disease
smoker CREAT_value CRP_value PNN_value LYM_value TP_value GDS_PaCO2_value GDS_PaO2_value weigh_kg GDS_SAT_value LDH_value DDI_value Variable Percentage
Percentage of missing values
10
20 40 60 a g e d a y s _ s i n c e _ f i r s t _ c a s e _ d a t e t i m e g e n d e r n u m _ h
p i t a l s
_ c
t i c
d s p e r i
a s t h m a c a n c e r c h e m
h e r a p y _ r a d i
h e r a p y c h r
i c _ h e p a t i c _ d i s e a s e c h r
i c _
s t r u c t i v e _ p u l m
a r y _ d i s e a s e c h r
i c _ r e s p i r a t
y _ f a i l u r e d i a b e t e s d y s l i p i d e m i a h e a r t _ a r r h y t h m i a h e m a t
i c a l _ m a l i g n a n c i e s h y p e r t e n s i
i s c h e m i c _ h e a r t _ d i s e a s e k i d n e y _ d i s e a s e
e s i t y s m
e r C R E A T _ v a l u e C R P _ v a l u e P N N _ v a l u e L Y M _ v a l u e T P _ v a l u e G D S _ P a C O 2 _ v a l u e G D S _ P a O 2 _ v a l u e w e i g h _ k g G D S _ S A T _ v a l u e L D H _ v a l u e D D I _ v a l u e Variable Percentage
Percentage of missing values
10
Covariates Treatment Outcome(s) X1 X2 X3 W Y(0) Y(1) NA 20 F 1 ? Survived
45 NA Dead ? NA M 1 ? Survived NA 32 F 1 ? Dead 1 63 M 1 Dead ?
NA M Survived ?
11
Covariates Treatment Outcome(s) X ∗
1
X ∗
2
X ∗
3
W Y(0) Y(1) NA 20 F 1 ? S
45 NA D ? NA M 1 ? S NA 32 F 1 ? D 1 63 M 1 D ?
NA M S ?
12
n
(1)(Xi) −
(0)(Xi) + Wi Yi− µ∗
(1)(Xi)
Yi− µ∗
(0)(Xi)
1− e∗(Xi)
r∈{0,1}d E
1consistency of supervised learning with missing values J., Prost, Scornet, Varoquaux. JMLR 2020
13
n
(1)(Xi) −
(0)(Xi) + Wi Yi− µ∗
(1)(Xi)
Yi− µ∗
(0)(Xi)
1− e∗(Xi)
r∈{0,1}d E
1consistency of supervised learning with missing values J., Prost, Scornet, Varoquaux. JMLR 2020
13
5 10 15 Gravity score (GCS) 100 200 Systolic Blood Pressure 5 10 15 Gravity score (GCS) 100 200 Systolic Blood Pressure 5 10 15 Gravity score (GCS) 100 200 Systolic Blood Pressure
14
X∗ 1 X∗ 2 X∗ 3 ... W Y NA 20 10 ... 1 survived
45 NA ... 1 survived NA 30 ... died NA 32 35 ... survived
NA 12 ... died 1 63 40 ... 1 survived
X1 X2 X3 ... W Y 3 20 10 ... 1 s
45 6 ... 1 s 4 30 ... d
32 35 ... s
15 12 ... d 1 63 40 ... 1 s X1 X2 X3 ... W Y
20 10 ... 1 s
45 9 ... 1 s 12 30 ... d 13 32 35 ... s
10 12 ... d 1 63 40 ... 1 s X1 X2 X3 ... W Y 7 20 10 ... 1 s
45 12 ... 1 s
30 ... d 2 32 35 ... s
20 12 ... d 1 63 40 ... 1 s
1 M
m=1 ˆ
1 M
m=1
M
M−1
m=1 (ˆ
15
X X ⋆ R Z W Y {Y (0), Y (1)}
16
X X ⋆ R Z W Y {Y (0), Y (1)}
16
X X ⋆ R Z W Y {Y (0), Y (1)}
17
Covariates Missingness Unconfoundedness Models for (W , Y ) multiva- riate normal general M(C)AR general Missing Latent Classical logistic- linear non- param.
✓ ✗ ✓ ✗ ✓ ✗ ✗ ✓ ✗
✓ ✓ ✓ (✓) ✓ ✗ ✗ ✓ ✓
✓ ✓ ✓ (✓) ✓ ✗ ✗ ✓ ✓
✓ ✓ ✓ ✗ (✗) ✗ ✓ ✓ (✗)
✓ ✗ ✓ ✗ ✗ ✓ ✗ ✓ (✗) 3. MissDeep- Causal ✓ ✓ ✓ ✗ ✗ ✓ ✗ ✓ ✓
2Use of EM algorithms for logistic regression with missing values. Jiang et al. (2020)
18
Complete data unconf.
100 500 1000 5000
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
Method
mean.loglin mice mf saem grf
19
M D C . m i M D C . p r
e s s M F M I C E X Z
0.6 0.7 0.8 0.9 1.0 1.1 1.2
19
Matrix Facto.grf GRF−MIA MICE.grf Matrix Facto.glm MICE.glm −50 −25 25 50
ATE (x 100) Imputation.method Matrix Facto GRF−MIA MICE IPW DR Unadjusted HCQ vs Nothing, ATE estimation (4137 patients)
(y-axis: estimation approach, solid: Doubly Robust AIPW, dotted: IPW), (x-axis: ATE estimation with CI)
20
21
21
22
22
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68. Jiang, W., Josse, J., Lavielle, M., and Group, T. (2020). Logistic regression with missing covariates?parameter estimation, model selection and prediction within a joint-modeling
Kallus, N., Mao, X., and Udell, M. (2018). Causal inference with noisy and missing covariates via matrix factorization. In Advances in Neural Information Processing Systems, pages 6921–6932. Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems, pages 6446–6456. Mattei, P.-A. and Frellsen, J. (2019). MIWAE: Deep generative modelling and imputation of incomplete data sets. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4413–4423, Long Beach, California, USA. PMLR. Mayer, I., Josse, J., Tierney, N., and Vialaneix, N. (2019). R-miss-tastic: a unified platform for missing values methods and workflows. arXiv preprint arXiv:1908.04822. Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79(387):516–524.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3):581–592. Seaman, S. and White, I. (2014). Inverse probability weighting with missing predictors of treatment assignment or missingness. Communications in Statistics-Theory and Methods, 43(16):3499–3515.
i , Ri or CIO:Yi(w) ⊥
i , Ri
X X∗ R W Y {Y (0), Y (1)} X X∗ R W Y {Y (0), Y (1)}
i.i.d. N2((µx, µy), Σxy)
−1 1 2 3 4 −2 −1 1 2 3
X Y
i.i.d. N2((µx, µy), Σxy)
−1 1 2 3 4 −1 1 2
X Y
i.i.d. N2((µx, µy), Σxy)
−2 −1 1 2 −2 −1 1 2
Mean imputation X Y
5 −6 −4 −2 2 4 6 8 Individuals factor map (PCA) Dim 1 (44.79%) Dim 2 (23.50%) alpine boreal desert grass/m temp_for temp_rf trop_for trop_rf tundra wland
boreal desert grass/m temp_for temp_rf trop_for trop_rf tundra wland
0.0 0.5 1.0
0.0 0.5 1.0
Variables factor map (PCA)
Dim 1 (44.79%) Dim 2 (23.50%) LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass LL LMA Nmass Pmass Amass Rmass
−5 5 −6 −4 −2 2 4 6 Individuals factor map (PCA) Dim 1 (91.18%) Dim 2 (4.97%) alpine boreal desert grass/m temp_for temp_rf trop_for trop_rf tundra wland
boreal desert grass/m temp_for temp_rf trop_for trop_rf tundra wland
−1.0 −0.5 0.0 0.5 1.0 1.5 −1.0 −0.5 0.0 0.5 1.0 Variables factor map (PCA) Dim 1 (91.18%) Dim 2 (4.97%) LL LMA Nmass Pmass Amass Rmass
library(FactoMineR) PCA(ecolo) Warning message: Missing are imputed by the mean
You should use imputePCA from missMDA library(missMDA) imp <- imputePCA(ecolo) PCA(imp$comp)
3Wright, I. et al. (2004). The worldwide leaf economics spectrum. Nature.
−2 −1 1 2 −2 −1 1 2
Mean imputation X Y
−2 −1 1 2 −2 −1 1 2
Regression imputation X Y
−2 −1 1 2 −3 −2 −1 1 2
Stochastic regression imputation X Y
iid
(softimpute Hastie & Mazuder; missMDA J. & Husson)
4J., Husson, Robin & Narasimhan. (2018). Imputation of mixed data with multilevel SVD. 5https://rmisstastic.netlify.com/
1, x2, . . . , xd): x′ 1 = x11R1=0 + E[X1]1R1=1
impute(x′) = E[Y |X ∗ = x∗]
−2 −1 1 2 3 −5 5 x y
−2 −1 1 2 3 −6 −4 −2 2 4 6 x y
−2 −1 1 2 3 −5 5 10 x y
−2 −1 1 2 3 −5 5 10 x y
103 104 105 Sample size 0.3 0.4 0.5 0.6 0.7 0.8 Explained variance Linear problem (high noise) 103 104 105 Sample size 0.3 0.4 0.5 0.6 0.7 0.8 Explained variance Friedman problem (high noise) 103 104 105 Sample size 0.7 0.8 0.9 1.0 Explained variance Non-linear problem (low noise) DECISION TREE 103 104 105 Sample size 0.70 0.75 0.80 Explained variance 103 104 105 Sample size 0.55 0.60 0.65 0.70 0.75 Explained variance 103 104 105 Sample size 0.96 0.97 0.98 0.99 1.00 Explained variance RANDOM FOREST 103 104 105 Sample size 0.65 0.70 0.75 0.80 Explained variance 103 104 105 Sample size 0.60 0.65 0.70 0.75 Explained variance 103 104 105 Sample size 0.96 0.97 0.98 0.99 1.00 Explained variance XGBOOST Surrogates (rpart) Mean imputation Gaussian imputation MIA Bayes rate Block (XGBoost) 103 104 105 Sample size 0.3 0.4 0.5 0.6 0.7 0.8 Explained variance Linear problem (high noise) 103 104 105 Sample size 0.3 0.4 0.5 0.6 0.7 0.8 Explained variance Friedman problem (high noise) 103 104 105 Sample size 0.7 0.8 0.9 1.0 Explained variance Non-linear problem (low noise) DECISION TREE 103 104 105 Sample size 0.70 0.75 0.80 Explained variance 103 104 105 Sample size 0.55 0.60 0.65 0.70 0.75 Explained variance 103 104 105 Sample size 0.96 0.97 0.98 0.99 1.00 Explained variance RANDOM FOREST 103 104 105 Sample size 0.65 0.70 0.75 0.80 Explained variance 103 104 105 Sample size 0.60 0.65 0.70 0.75 Explained variance 103 104 105 Sample size 0.96 0.97 0.98 0.99 1.00 Explained variance XGBOOST Surrogates (rpart) Mean imputation Gaussian imputation MIA Bayes rate Block (XGBoost)
NA NA NA NA NA NA NA
NA NA NA NA NA NA NA
(j,z)∈S
(j,z)∈S
(j,z)∈S
6
E
2 · 1Xj ≤z,Rj =0 +
2 · 1Xj >z,Rj =0
6 Variable selection bias (not a problem to predict): partykit package, Hothorn, et al.
E
2 · 1Xj ≤z,Rj =0 +
2 · 1Xj >z,Rj =0
#L #L+#R ) (Rweeka)
6 Variable selection bias (not a problem to predict): partykit package, Hothorn, et al.
f ∈Pc,miss
j ≤ z ∨ X ∗ j = NA}, {X ∗ j > z}}
j ≤ z}, {X ∗ j > z ∨ X ∗ j = NA}}
j = NA}, {X ∗ j = NA}}.
r∈{0,1}d E
7Implemented for conditional forests partykit, generalized random forest grf, scikitlearn