Classifying Treatment Responders Under Causal Effect Monotonicity - - PowerPoint PPT Presentation

classifying treatment responders under causal effect
SMART_READER_LITE
LIVE PREVIEW

Classifying Treatment Responders Under Causal Effect Monotonicity - - PowerPoint PPT Presentation

June 12, 2019 ICML 2019 Classifying Treatment Responders Under Causal Effect Monotonicity Nathan Kallus ORIE and Cornell Tech, Cornell University Heterogeneous Treatment Effect Estimation X Age X Weight X BMI X SysBP T (Anticoagulant) Y


slide-1
SLIDE 1

Classifying Treatment Responders Under Causal Effect Monotonicity

Nathan Kallus June 12, 2019 ICML 2019 ORIE and Cornell Tech, Cornell University

slide-2
SLIDE 2

Heterogeneous Treatment Effect Estimation

XAge XWeight XBMI XSysBP T (Anticoagulant) Y (Hemorrhage) 49 106 31 Warfarin 1 54 89 26 None 43 130 38 None 1 . . . . . . . . . . . . . . . . . .

Fit CATE τ(X) = E[Y (1) − Y (0) | X] to data on X, T, Y

E.g.: Causal Forest (Wager & Athey ’17), TARNet (Shalit et al. ’17), ...

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 2

slide-3
SLIDE 3

Often Outcome is Binary

Treatment Outcome Observed (T) (Y ) Give anticoagulant Hemorrhage? Personalized discount Buy? Target job training Employed in 6 months? Homelessness prevention program Re-enter? Recidivism prevention program Recidivate? Support for minority CS students Drop out?

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 3

slide-4
SLIDE 4

Often We Want to Predict Response

Treatment Individual Label of Interest (T) (Y (1) − Y (0)) Give anticoagulant Hemorrhage iff medicated Personalized discount Would buy iff discounted Target job training Would get job iff trained Homelessness prevention program Re-enter iff not targeted Recidivism prevention program Recidivate iff not targeted Support for minority CS students Drop out iff not targeted

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 4

slide-5
SLIDE 5

Classifying Responders: The Problem

◮ Each unit consists of

◮ Features X ◮ Potential outcomes Y (1), Y (0) ∈ {0, 1}

◮ “Non-responder” has Y (0) = Y (1)

◮ Would’ve bought (or, not bought) regardless of discount ◮ Would’ve hemorrhaged (or, not) regardless of anticoagulant

◮ “Responder” has Y (1) = 1 > 0 = Y (0)

◮ Would’ve bought if and only if offered discount ◮ R = I [Y (1) > Y (0)] ◮ Ground truth NOT observed in X, T, Y data

◮ Want classifier f : X → {0, 1} with small loss Lθ(f) = θP (false positive) + (1 − θ)P (false negative) = θP (f(X) = 1, R = 0) + (1 − θ)P (f(X) = 0, R = 1) .

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 5

slide-6
SLIDE 6

Monotonicity

◮ Monotone treatment response assumption: Y (1) ≥ Y (0)

◮ Discount never causes a would-be buyer to not buy ◮ Job training never causes someone to not get employed?

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 6

slide-7
SLIDE 7

Monotonicity

◮ Monotone treatment response assumption: Y (1) ≥ Y (0)

◮ Discount never causes a would-be buyer to not buy ◮ Job training never causes someone to not get employed?

◮ Under monotonicity, R = Y (1) − Y (0) ∈ {0, 1}

◮ So, P (R = 1 | X) = τ(X) = E [Y (1) − Y (0) | X] ◮ f(X) = I [τ(X) ≥ θ] minimizes Lθ(f) ◮ Can take plug-in approach using any CATE estimator ˆ τ ◮ Question: any value to a direct classification approach?

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 6

slide-8
SLIDE 8

Classifying Responders

◮ For simplicity, consider completely randomized data with P (T = 1) = 0.5 ◮ Let Z = I [Y = T] (observable!)

◮ R = 1 = ⇒ Z = 1 ◮ R = 0 = ⇒ Z ∼ Bernoulli(0.5)

◮ Z is like a corrupted observation of R

◮ Seeing Z = 0 is more informative about R

◮ Using Z as a surrogate label for R leads to new direct approaches to the classification problem

◮ Two instantiations of this are RespSVM, RespNet

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 7

slide-9
SLIDE 9

Empirical Results: Synthetic

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 Responder Non-responder −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 Z = + 1 Z = − 1

The true label R The observable label Z

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 T = + 1, Y = + 1 T = + 1, Y = − 1 −3 −2 −1 1 2 3 −3 −2 −1 1 2 3 T = − 1, Y = + 1 T = − 1, Y = − 1

T = +1 T = 0

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 8

slide-10
SLIDE 10

Empirical Results: Synthetic

Linear responder classification boundary

101 102 103 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy 101 102 103 0.5 0.6 0.7 0.8 0.9 1.0 101 102 103 0.5 0.6 0.7 0.8 0.9 5eVS690 lLn 5eVS690 5BF 5eVSL5-gen 5eVSL5-dLVF 5eVS1et-gen 5eVS1et-dLVF 5F CF 7A51et

101 102 103 0.5 0.6 0.7 0.8 0.9 5eVS690 lLn 5eVS690 5BF 5eVSL5-gen 5eVSL5-dLVF 5eVS1et-gen 5eVS1et-dLVF 5F CF 7A51et

d = 2 d = 10 d = 20 Spherical responder classification boundary

101 102 103 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy 101 102 103 0.5 0.6 0.7 0.8 0.9 101 102 103 0.5 0.6 0.7 0.8 0.9 5eVS690 lLn 5eVS690 5BF 5eVSL5-gen 5eVSL5-dLVF 5eVS1et-gen 5eVS1et-dLVF 5F CF 7A51et

d = 2 d = 10 d = 20

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 9

slide-11
SLIDE 11

Empirical Results: Census Data

◮ Predict whether the sex-at-birth of mother’s first two kids being the same influences her decision to have a third

◮ Follows data construction by Angirst & Evans ’96 ◮ Covariates: ethnicity of mother and father; their ages at marriage, at census, at 1st kid, and at 2nd kid, year of marriage, and education level

Method Lθ (in 0.01) % 1st % 2nd % 3rd RespSVM lin 49 ± 2.7 100% RespLR-gen 57 ± 2.4 100% RespLR-disc 58 ± 2.3 2% LR 58 ± 2.3 92% RF 58 ± 2.3 6%

Nathan Kallus Classifying Treatment Responders Under Causal Effect Monotonicity 10

slide-12
SLIDE 12

Thank you!

Poster: Today 6:30pm @ Pacific Ballroom #74