SLIDE 1 1
Examining moderated effects of additional adolescent substance use treatment: Structural nested mean model estimation using inverse-weighted regression-with-residuals
Daniel Almirall1, Daniel F. McCaffrey2, Beth Ann Griffin2, Rajeev Ramchand2, Susan A. Murphy1
1Univ of Michigan, Institute for Social Research 2RAND, Statistics
Institute of Mathematical Statistics Asia Pacific Rim Meeting — July 2, 2012
SLIDE 2
1 Time-Varying Setting 2
1 Time-Varying Setting
The data structure in the time-varying setting is:
S0 a1 S1(a1) a2 S2(¯ a2) a3 Y (¯ a3)
Motivating Example: Adolescents & Substance Use Treatment S0 Age, severity @ intake, contr. env in p.90 a1 0-3mo treatment; binary, a1 = yes/no S1(a1) Severity @ 0-3mo a2 3-6mo treatment; binary, a2 = yes/no S2(a1, a2) Severity @ 3-6mo a3 6-9mo treatment; binary, a3 = yes/no Y (a1, a2, a3) Substance use frequency 9-12mo
SLIDE 3
2 What Scientific Question of Interest? 3
2 What Scientific Question of Interest?
The data structure: {S0, a1, S1(a1), a2, S2(a1, a2), a3, Y (a1, a2)}. We began wondering about: Cumulative effect of treatment? Observed treatment sequences in data are: (A1, A2, A3), Rate (0,0,0), 11% (0,0,1), 2% (1,0,0), 41% (0,1,1), 2% (1,1,0), 19% (1,0,1), 5% (1,1,1), 17% (0,1,0), 2% More specific questions emerged: What are the incremental effects of additional substance use treatment? Are these effects heterogeneous? i.e., Do they differ as a function of severity at intake and improvements over time?
SLIDE 4 3 Time-Varying Effect Moderation 4
3 Time-Varying Effect Moderation
The data structure: {S0, a1, S1(a1), a2, S2(a1, a2), a3, Y (a1, a2)}. Overarching question: What are the incremental effects of additional substance use treatment, as a function of severity at intake and improvements over time? More specifically, there are 3 types of causal effects of interest:
- 1. Distal moderated effect of initial treatment: What are
the effects of (1,0,0) vs (0,0,0) on Y given S0?
- 2. Medial moderated effect of cumulative treatment: What
are the effects of (1,1,0) vs (1,0,0) on Y given (S0, S1)?
- 3. Proximal moderated effect of cumulative treatment:
What are effects of (1,1,1) vs (1,1,0) on Y given (S0, S1, S2)?
SLIDE 5
3 Time-Varying Effect Moderation 5
What are the distal moderated effects of initial treatment?
What are the effects of (1,0,0) vs (0,0,0) on Y given S0? µ1 = E[Y (1, 0, 0) − Y (0, 0, 0) | S0 = s0]
S0 a1 a2 = 0 a3 = 0 Y (¯ a3)
SLIDE 6
3 Time-Varying Effect Moderation 6
What are the medial moderated effects of cumulative initial treatment?
What are the effects of (1,1,0) vs (1,0,0) on Y given (S0, S1)? µ2 = E[Y (1, 1, 0) − Y (1, 0, 0) | S0 = s0, S1(1) = s1]
S0 a1 a2 a3 = 0 Y (¯ a3) S1(a1)
SLIDE 7
3 Time-Varying Effect Moderation 7
What are the proximal moderated effects of cumulative initial treatment?
What are the effects of (1,1,1) vs (1,1,0) on Y given (S0, S1, S2)? µ3 = E[Y (1, 1, 1) − Y (1, 1, 0) | ¯ S2(1, 1) = ¯ s2]
S0 a1 S1(a1) a2 S2(¯ a2) a3 Y (¯ a3)
SLIDE 8 4 Robins’ Structural Nested Mean Model 8
4 Robins’ Structural Nested Mean Model
decomposes E(Y | ·) into nuisance and causal parts: E
- Y (a1, a2) | S0, S1(a1)
- = E[Y (0, 0)] +
- E[Y (0, 0) | S0] − E[Y (0, 0)]
- +
- E
- Y(a1, 0) − Y(0, 0) | S0
- +
- E[Y (a1, 0) | ¯
S1(a1)] − E[Y (a1, 0) | S0]
- +
- E
- Y(a1, a2) − Y(a1, 0) | ¯
S1(a1)
- = µ0 + ǫ1(s0) + µ1(s0, a1) + ǫ2(¯
s1, a1) + µ2(¯ s1, ¯ a2) Constraint: µt = 0 when at = 0 Constraint: ES1|S0[ǫ2(¯ s1, a1) | S0 = s0] = 0, and ES0[ǫ1(s0)] = 0
SLIDE 9 5 Problems with Traditional Regression 9
5 Problems with Traditional Regression
Ex: Use the Traditional Estimator to model the t = 2 SNMM as: E(Y | ¯ S1 = ¯ s1, ¯ A2 = ¯ a2) = β∗
0 + η1s0 + β∗ 1a1 + β∗ 2a1s0
+ η2s1 + β∗
3a2 + β∗ 4a2s0 + β∗ 5a2s1
- Two problems arise from the way we condition on St:
(1)WRONG EFFECT, (2)SPURIOUS BIAS
- One problem arises from not adjusting for time-varying
confounders: (3)TIME-VARYING CONFOUNDING BIAS
SLIDE 10
5 Problems with Traditional Regression 10
First problem with the Traditional Approach
Wrong Effect
S0 a1 a2 = 0 Y (¯ a2) S1
But what about the effect transmitted through S1(a1)? So the end result is the term β∗
1a1 + β∗ 2a1s0 does not capture the
“total” impact of (a1, 0) vs (0, 0) on Y given values of S0.
SLIDE 11
5 Problems with Traditional Regression 11
Second problem with the Traditional Approach
Spurious Bias
S0 a1 a2 = 0 Y (¯ a2) S1 V
This is also known as “Berkson’s paradox”; and is related to Judea Pearl’s back-door criterion and “collider bias”
SLIDE 12
5 Problems with Traditional Regression 12
Intuition about the Spurious Bias
Txt Substance Subst Social Support − − Use Use Later −
Imagine adolescent who is a high user despite getting treated: Q: What does this tell you in terms of his social support? A: There must be poor social support. Implication: Conditional on substance use, getting treated is associated with more substance use! Bias is −1(−)(−)(−) = +.
SLIDE 13 5 Problems with Traditional Regression 13
Proposed Regression with Residuals Estimator
Instead of the traditional regression estimator E(Y | ¯ S1 = ¯ s1, ¯ A2 = ¯ a2) = β∗
0 + η1s0 + β∗ 1a1 + β∗ 2a1s0
+ η2s1 + β∗
3a2 + β∗ 4a2s0 + β∗ 5a2s1,
we use the following E(Y | ¯ S1 = ¯ s1, ¯ A2 = ¯ a2) = β∗
0 + η1s0 + β∗ 1a1 + β∗ 2a1s0
+ η2
3a2 + β∗ 4a2s0 + β∗ 5a2s1.
We call it “regression with residuals” because first we estimate E(S1 | A1, S0), then use the residual s1 −
second regression to get β’s.
SLIDE 14 5 Problems with Traditional Regression 14
Proposed Regression with Residuals Estimator
E(Y | ¯ S1 = ¯ s1, ¯ A2 = ¯ a2) = β∗
0 + η1s0 + β∗ 1a1 + β∗ 2a1s0
+ η2
3a2 + β∗ 4a2s0 + β∗ 5a2s1.
The proposed estimator is unbiased for the µt’s provided:
- 1. Correctly modeled SNMM, incl. the ǫt’s functions.
- 2. A1 ⊥ {Y (a1, a2)} | S0, and
- 3. A2 ⊥ {Y (a1, a2)} | S0, A1, S1
Together, 2. and 3. is a Sequential Ignorability Assumption. But there may be other measured time-varying confounders...
SLIDE 15 5 Problems with Traditional Regression 15
Third Problem with Traditional Approach
Time-varying Confounding Bias: Time-varying covariates Xt that are confounders, but not moderators of interest?
S0 A1 A2 Y (¯ a2) S1 X0 X1
Use RR with S1
What is Xt? EPS + SPS + MAXCE - LRI + AGE - NONWHITE - ...and so on.
The auxiliary variables Xt may be high-dimensional.
SLIDE 16 5 Problems with Traditional Regression 16
Solution: Inverse-Probability-of-Treatment Weights
We use IPTW version of the proposed 2-Stage RR Estimator:
S0 A1 A2 Y (¯ a2) S1 X0 X1
Use RR with S1 Use IPTW Use IPTW
What is Xt? EPS + SPS + MAXCE - LRI + AGE - NONWHITE - ...
The proposed IPTW estimator is unbiased provided (1) correct SNMM, (2) sequential ignorability given ( ¯ St, ¯ Xt), (3) consistency, and (4) get the “right” weights.
SLIDE 17 5 Problems with Traditional Regression 17
The Form of the IPT Weights
W1 = 1 Pr(A1 = a1 | S0 = s0, X0 = x0) W2 = 1 Pr(A2 = a2 | S0 = s0, X0 = x0, A1 = a1, S1 = s1, X1 = x1)
- Assumes denominator probabilities are non-zero.
- We use logistic regressions to estimate the denominator
probs.; models chosen to result in “best” balance.
- W1 × W2 is used in the IPTW+RR estimator of the SNMM.
- Following Murphy, van der Laan, Robins (unpublished), we
use a stabilized version where the numerator for Wt is Pr(At = at | ¯ At−1, ¯ St−1).
SLIDE 18 6 Data Analysis 18
6 Data Analysis
- From US substance abuse prgms (CSAT ⊂ SAMHSA)
- GAIN: structured clinical interview; over 100 scales/indices
- n = 2870 adolescents; data every 3 months for 1 year
- {(S0, X0), A1, (S1, X1), A2, (S2, X2), A3, Y }
- S0 = hx controlled environment, age
- St = substance frequency scale at intake, 0-3, 3-6
- Xt = measured time-varying confounders at intake, 0-3, 3-6
- At = none (0) vs some txt (1=outpt, inpt, or both)
- Y = substance frequency scale at 9-12mo
SLIDE 19 6 Data Analysis 19
The weights did a good job adjusting for Xt.
0.0 0.2 0.4 0.6 0.8 1.0 1.2
t = 1
Effect Size Unweighted Weighted
B = 0.161 B = 0.041
small medium large 0.0 0.2 0.4 0.6 0.8 1.0 1.2
t = 2
Effect Size Unweighted Weighted
B = 0.155 B = 0.024
small medium large 0.0 0.2 0.4 0.6 0.8 1.0 1.2
t = 3
Effect Size Unweighted Weighted
B = 0.198 B = 0.037
small medium large
SLIDE 20 6 Data Analysis 20
EDA
Time−varying moderator = sfs8pt’ Y = sfs8p12
0.1 0.3 0.5 0.7
Under 16 No CE 16 or older No CE Under 16 Yes CE 16 or older Yes CE
0.1 0.3 0.5 0.7
Under 16 No CE 16 or older No CE Under 16 Yes CE 16 or older Yes CE
0.1 0.3 0.5 0.7 0.1 0.3 0.5 0.7
Under 16 No CE
0.1 0.3 0.5 0.7
16 or older No CE
0.1 0.3 0.5 0.7
Under 16 Yes CE
0.1 0.3 0.5 0.7
16 or older Yes CE µ1 = Distal effects of initial treatment, given sfs8p0, age, and baseline CE status µ2 = Medial effects of additional treatment, given sfs8p3, age, and baseline CE status µ3 = Proximal effects of additional treatment, given sfs8p6, age, and baseline CE status (1,0,0) (0,0,0) (1,1,0) (1,0,0) (1,1,1) (1,1,0)
SLIDE 21
6 Data Analysis 21
Effect Estimates from SNMM, using RR+IPTW
Contrast Subgroup Est. Eff.Sz. P-val µ1: Distal (1, 0, 0) vs (0, 0, 0) no intake sevrty, < 16yrs −0.004 −0.03 0.74 (1, 0, 0) vs (0, 0, 0) hi intake sevrty, ≥ 16yrs 0.033 0.25 0.08 µ2: Medial (1, 1, 0) vs (1, 0, 0) no 0-3 severity −0.008 −0.06 0.42 (1, 1, 0) vs (1, 0, 0) hi 0-3 severity, yes ce −0.048 −0.36 0.21 (1, 1, 0) vs (1, 0, 0) hi 0-3 severity, no ce 0.021 0.16 0.66 µ3: Proximal (1, 1, 1) vs (1, 1, 0) no 6-9 severity −0.006 −0.04 0.59 (1, 1, 1) vs (1, 1, 0) hi 6-9 severity −0.168 −1.27 < 0.01 (., ., 1) vs (., ., 0) no 6-9 severity 0.026 0.19 0.12 (., ., 1) vs (., ., 0) hi 6-9 severity −0.165 −1.24 < 0.01
SLIDE 22 6 Data Analysis 22
Some conjectures about the substantive story
- Initial treatment alone may be iatrogenic for older kids with
high severity at intake (evidence is not so strong here).
- An additional 3mos of treatment may be more helpful for kids
still severe at the end of 3 months who have a hx of a controlled environment (evidence is very weak here).
- Providing full treatment is especially beneficial for the kids
who are still looking bad after 6 months (evidence here is reasonably strong).
- There is not a lot of evidence for a treatment effect for kids
who are not severe.
SLIDE 23 7 Summary 23
7 Summary
- Time-varying causal effect moderation: “What is the
incremental effect of additional community-based substance use treatment, as a function of severity at intake and improvements over time?”
- Examine using Robins’ Structural Nested Mean Model
- Propose wtd regression with residuals estimator for SNMM
– Resembles traditional regression estimator; easy-to-use – Adjust time-varying confounders via IPT Weighting
- Standard errors: In simulation experiments, we find bootstrap
SEs to be better than ASEs in small samples
SLIDE 24
7 Summary 24
Acknowledgements
NIDA Funding: The Methodology Center at Penn State University (P50-DA-010075; PIs: Collins, Murphy & Co-I: Almirall) RAND (R01-DA-015697; PIs: McCaffrey, Griffin & Co-I: Ramchand) NIMH Funding: Univ of Michigan (R01-MH-080015; PI: Murphy) Special Help From: Mary Ellen Slaughter, RAND Bobby Yuen, Graduate Student, Michigan Statistics
SLIDE 25 7 Summary 25
Thank you.
Contact Information: Daniel Almirall, Univ of Michigan, ISR mailto:dalmiral@umich.edu, 734 936 3077 * Robins (1994), Communications in Statistics. * Almirall, Tenhave, Murphy (2010), Biometrics. * Almirall, McCaffrey, Ramchand, Murphy (2011), Prevention
- Science. Software to fit SNMM using RR on my website.
* Almirall, McCaffrey, Griffin, Ramchand, Murphy (to submit).
SLIDE 26
8 Back Pocket Slides 26
8 Back Pocket Slides
SLIDE 27
8 Back Pocket Slides 27
Warm-up: Suppose we want A → Y .
S A Y ?
Examples S = pre-A covt A = txt/expsr Y = outcome Social Support Inpatient vs. Outpatient Substance Abuse Why condition on (“adjust for”) pre-exposure covariables S?
SLIDE 28 8 Back Pocket Slides 28
Suppose we want the effect of A on Y . Why condition on (adjust for) pre-treatment (or pre-exposure) variables S?
- 1. Confounding: S is correlated with both A and Y . In this
case, S is known as a “confounder” of the effect of A on Y .
- 2. Precision: S may be a pre-treatment measure of Y, or any
- ther variable highly correlated with Y .
- 3. Missing Data: The outcome Y is missing for some units, S
and A predict missingness, and S is associated with Y .
- 4. Effect Heterogeneity: S may moderate, temper, or specify
the effect of A on Y . In this case, S is known as a “moderator” of the effect of A on Y .
SLIDE 29 8 Back Pocket Slides 29
Suppose we want the effect of A on Y . Why condition on (adjust for) pre-treatment (or pre-exposure) variables S?
S A Y
- 4. Effect Heterogeneity: S may moderate, temper, or specify
the effect of A on Y . In this case, S is known as a “moderator” of the effect of A on Y . Formalized in next slide.
SLIDE 30 8 Back Pocket Slides 30
Final Warm-up: Mean Model in One Time Point
Decomposition of the conditional mean E(Y (a) | S) and the prototypical linear model: E(Y (a) | S = s) = E(Y (0)) +
- E(Y (0) | S = s) − E(Y (0))
- + E(Y (a) − Y (0) | S = s)
= η0 + ǫ(s) + µ(s, a)
e.g.
= η0 + η1(s − E(S)) + β1a + β2as. Boils down to what we always do anyway: that is, treatment × covariate interaction terms to examine effect heterogeneity.
SLIDE 31 8 Back Pocket Slides 31
Effect Moderation in One Time Point
µ(s, a) ≡ E(Y (a) − Y (0) | S = s)
S = Social Support: High is better Y(a) = Substance Use: Low is better a = 1 = residential a = 0 = outpatient S = Social Support: High is better µ(s) = E( Y(inpat) − Y(outpat) | S=s ) µ = 0 = No Effect
Outpatient substance abuse treatment is better than residential treatment for individuals with higher levels of social support.
SLIDE 32
8 Back Pocket Slides 32
Causal Effect Moderation in Context: Relevance?
Theoretical Implication: Understanding the heterogeneity of treatment or exposures effects enhances our understanding of various (competing) scientific theories; and it may suggest new scientific hypotheses to be tested. Practical Implication: Identifying types, or subgroups, of individuals for which treatment or exposure is not effective may suggest altering the treatment to suit the needs of those types of individuals.
SLIDE 33
8 Back Pocket Slides 33
Prototypical Linear Parametric Model
We use β for our causal parameters of interest: E(Y (a) | S) = η0 + φ(S) + µ(S, a; β) = η0 + φ(S) + aHβ where H is a function of S. Sometimes we parameterize φ(S) using φ(S; η−0) = Gη−0, where G is a function of S. Example: Let G = (S) and H = (1, S): E(Y (a) | S = s) = η0 + η1s + a × (β1 + β2s). If a and S are binary, then this is the fully saturated model.
SLIDE 34 8 Back Pocket Slides 34
Estimation in One Time Point
Consider three estimators for β in µ(S, a; β):
- 1. Traditional Regression
- 2. Semi-parametric Estimation Method: Robins’ E-Estimator
- 3. Inverse Probability of Treatment Weighted (IPTW) Regression
We discuss these (and more) in turn, supposing that
- 1. a is binary (0,1), and
- 2. True model for µ(s, a) is µ(S, a; β) = aHβ for some H.
Example: H = (1, S) ⇒ aHβ = a(β1 + β2s). An important consideration in estimation is how A comes about.
SLIDE 35 8 Back Pocket Slides 35
Traditional Ordinary Least Squares Regression
Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when S is sole confounder, and have good model for φ(s). Requires model for nuisance function: φ(S; η−0) = Gη−0. Regress Y ∼ [1, G, A × H] to get ( η, β). The β estimates solve 0 = Pn
- Y − η0 − Gη−0 − AHβ
- AHT
- .
- β unbiased for β if φ(S; η−0) = Gη−0 is true model for φ(s) and
A ⊥ {Y (0), Y (1)} given S.
SLIDE 36 8 Back Pocket Slides 36
Semi-parametric E-Estimator
Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when S is sole confounder, but we have no model for φ(s). Does NOT require model for nuisance function φ(s). Get β by solving the following estimating equations 0 = Pn
b(S; ξ) − AHβ
p(S; α)
where b(S; ξ) is a guess for E(Y − AHβ | S) = η0 + φ(S).
- β unbiased for β if p(S; α) is true model for Pr(A = 1 | S), and
A ⊥ {Y (0), Y (1)} given S. (Discuss double-robustness.)
SLIDE 37 8 Back Pocket Slides 37
IPT Weighted Regression (WLS)
Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when we have measured confounders V (⊃ S). Requires model for nuisance function: φ(S; η−0) = Gη−0. Regress Y
∼ [1, G, A × H] to get ( η, β), where weights are w(V, A) = A × Pr(A = 1 | S) Pr(A = 1 | V ) + (1 − A) × Pr(A = 0 | S) Pr(A = 0 | V ).
- β unbiased for β if φ(S; η−0) = Gη−0 is true model for φ(s), and
A ⊥ {Y (0), Y (1)} given V .
SLIDE 38 8 Back Pocket Slides 38
Semi-parametric Regression Method (Encore)
Now, model is: E(Y (a) | V ) = η0 + φ∗(V ) + aHβ. Useful with confounders V (⊃ S), have no model φ∗(V ), and if we can assume that V − S does not moderate impact of a on Y (a). Does NOT require model for nuisance function φ(V ). Get β by solving the following estimating equations 0 = Pn
b(V ; ξ) − AHβ
p(V ; α)
- HT
- .
- β unbiased for β if p(S; α) is true model for Pr(A = 1 | S), and
A ⊥ {Y (0), Y (1)} given V .
SLIDE 39
8 Back Pocket Slides 39
An Overview of Estimation Strategies
Model A: E(Y (a) | S) = η0 + φ(S) + aHβ Model B: E(Y (a) | V ) = φ0 + φ(V ) + aHβ H is always a function of S Ex: H = (1, S) Model A Model B No S is Sole Confnders Modrtrs S, Confnders Confnder V ♯ Confndrs V φ Is OLS∗ OLS∗ IPTW OLS Known Regression∗ φ Is Not OLS E-estimtr∗† IPTW E-estimtr♯ Known (if S ⊥ A) E-estimtr∗†
∗just discussed †need Pr(A = 1 | S) ♯need Pr(A = 1 | V )
SLIDE 40 8 Back Pocket Slides 40
As a Decomposition of the Marginal Causal Effect
Recall the data structure {S1, a1, S2(a1), a2, Y (a1, a2)}. Consider the following arithmetic decomposition of the causal effect of (a1, a2) on Y , using the covariates ¯ S2(a1): E
- Y (a1, a2) − Y (0, 0)
- = E
- E
- Y (a1, a2) − Y (a1, 0) | ¯
S2(a1)
- + E
- E
- Y (a1, 0) − Y (0, 0) | S1
- .
The inner expectations represent the conditional intermediate causal effects µ1 and µ2, respectively.
SLIDE 41 8 Back Pocket Slides 41
Robins’ Structural Nested Mean Model
The SNMM for the conditional mean of Y (a1, a2) given ¯ S2(a1) is: E
- Y (a1, a2) | S1, S2(a1)
- = E[Y (0, 0)] +
- E[Y (0, 0) | S1] − E[Y (0, 0)]
- +
- E
- Y(a1, 0) − Y(0, 0) | S1
- +
- E[Y (a1, 0) | ¯
S2(a1)] − E[Y (a1, 0) | S1]
- +
- E
- Y(a1, a2) − Y(a1, 0) | ¯
S2(a1)
- = µ0 + ǫ1(s1) + µ1(s1, a1) + ǫ2(¯
s2, a1) + µ2(¯ s2, ¯ a2)
SLIDE 42 8 Back Pocket Slides 42
SNMM Property I: µt = 0 when at = 0.
E
- Y (a1, a2) | S0, S1(a1)
- = E[Y (0, 0)] +
- E[Y (0, 0) | S0] − E[Y (0, 0)]
- +
- E
- Y(a1, 0) − Y(0, 0) | S0
- +
- E[Y (a1, 0) | ¯
S1(a1)] − E[Y (a1, 0) | S0]
- +
- E
- Y(a1, a2) − Y(a1, 0) | ¯
S1(a1)
- = µ0 + ǫ1(s0) + µ1(s0, a1) + ǫ2(¯
s1, a1) + µ2(¯ s1, ¯ a2) µ1(s0, 0) = 0 Ex Model: a1(β10 + β11s0) µ2(¯ s1, a2, 0) = 0 Ex Model: a2(β20 + β21s1)
SLIDE 43 8 Back Pocket Slides 43
SNMM Property II: ǫt’s are conditional mean zero.
E
- Y (a1, a2) | S0, S1(a1)
- = E[Y (0, 0)] +
- E[Y(0, 0) | S0] − E[Y(0, 0)]
- +
- E
- Y (a1, 0) − Y (0, 0) | S0
- +
- E[Y(a1, 0) | ¯
S1(a1)] − E[Y(a1, 0) | S0]
- +
- E
- Y (a1, a2) − Y (a1, 0) | ¯
S1(a1)
- = µ0 + ǫ1(s0) + µ1(s0, a1) + ǫ2(¯
s1, a1) + µ2(¯ s1, ¯ a2) ES1|S0[ǫ2(¯ s1, a1) | S0 = s0] = 0, and ES0[ǫ1(s0)] = 0 The ǫt’s make the SNMM a non-standard regression model.
SLIDE 44 8 Back Pocket Slides 44
So what’s wrong with the Traditional Estimator?
Ex: Use the Traditional Estimator to model the t = 2 SNMM as: E(Y | ¯ S1 = ¯ s1, ¯ A2 = ¯ a2) = β∗
0 + η1s0 + β∗ 1a1 + β∗ 2a1s0
+ η2s1 + β∗
3a2 + β∗ 4a2s0 + β∗ 5a2s1
- Two problems arise with the interpretation of β∗
1 and β∗ 2.
- These two problems may occur even when
– We use the correct model for the conditional mean, or – The sole time-varying confounder is the putative time-varying moderator St, or – There is no time-varying confounding bias at all!
SLIDE 45 8 Back Pocket Slides 45
Traditional approach to estimate µ1 is problematic.
To explain what is wrong with the traditional estimator, we focus
- n estimating µ1 using the traditional approach.
µ1(s0, a1) = E[Y (a1, 0) − Y (0, 0) | S0 = s0]
S0 a1 a2 = 0 Y (¯ a2)
SLIDE 46
8 Back Pocket Slides 46
First problem with the Traditional Approach
Wrong Effect
S0 a1 a2 = 0 Y (¯ a2) S1
But what about the effect transmitted through S1(a1)? So the end result is the term β∗
1a1 + β∗ 2a1s0 does not capture the
“total” impact of (a1, 0) vs (0, 0) on Y given values of S0.
SLIDE 47
8 Back Pocket Slides 47
Second problem with the Traditional Approach
Spurious Bias
S0 a1 a2 = 0 Y (¯ a2) S1 V
This is also known as “Berkson’s paradox”; and is related to Judea Pearl’s back-door criterion and “collider bias”
SLIDE 48
8 Back Pocket Slides 48
Intuition about the Spurious Bias
Txt Substance Subst Social Support − − Use Use Later −
Imagine adolescent who is a high user despite getting treated: Q: What does this tell you in terms of his social support? A: There must be poor social support. Implication: Conditional on substance use, getting treated is associated with more substance use! Bias is −1(−)(−)(−) = +.
SLIDE 49 8 Back Pocket Slides 49
The “old” warning against adjusting for post-treatment measures.
- Robins, Hernan, Cole, van der Laan, Pearl, Vanderwheele, &
many others have published countless articles on elucidating this problem.
- Rosenbaum has an early article on this issue as well.
- Berkson’s paradox—in the context of case-control studies
using hospitalized samples—is related to this problem.
- Clinical trialists have been warning against this for a very long
time! This is part of the reason why they advocate for ITT.
SLIDE 50 8 Back Pocket Slides 50
Proposed 2-Stage Regression Estimator
Recall that E
S2(a1) = ¯ s2
s2, ¯ a2; β2) + ǫ2(¯ s2, a1; η2, γ2) + µ1(s1, a1; β1) + ǫ1(s1; η1, γ1) + µ0.
- 1. We have models for the µ’s: A1H1β1 and A2H2β2; Set aside
- 2. Model m1(γ1) = E(S1), estimate γ1 with GLM; model
m2(s1, a1; γ2) = E(S2(a1) | S1 = s1), estimate γ2 with GLM
δ1 = s1 − ˆ m1(ˆ γ1) and ˆ δ2 = s2 − ˆ m2(s1, a1; ˆ γ2)
- 4. Construct models for ǫ’s: G1ˆ
δ1η1 = G∗
1η1 and G2ˆ
δ2η2 = G∗
2η2
β and ˆ η using OLS of Y ∼ [1, G∗
1, A1H1, G∗ 2, A2H2]
SLIDE 51
8 Back Pocket Slides 51
Data Descriptives: Treatment Trajectories
Treatment (A1, A2, A3) Frequency Proportion (0,0,0) 310 11% (0,1,0) 56 2% (1,0,0) 1184 41% (1,1,0) 555 19% (0,0,1) 56 2% (0,1,1) 56 2% (1,0,1) 153 5% (1,1,1) 499 17%
SLIDE 52
8 Back Pocket Slides 52
Data Descriptives: Moderators and Outcomes
Moderators Mean SD Range S0 sfs8p0 0.18 0.18 (0, 0.89) b2a 15.98 1.4 (12, 25) maxce0 13.95 24.6 (0, 90) S1 sfs8p3 0.07 0.11 (0, 0.67) S2 sfs8p6 0.08 0.13 (0, 0.73) Outcome Mean SD Range Y sfs8p12 0.09 0.13 (0, 0.78)
SLIDE 53 8 Back Pocket Slides 53
How did we choose our weights?
Selecting Denominator Model t=1
1000 1500 2000 All confounders ES Step COR Step
ESS
0.05 0.10 0.15
maxES
20 40 60 80
maxW
0.010 0.020 0.030 0.04
WBAL
BVAL=0.161
imp=1
Selecting Denominator Model t=2
1000 1500 2000 All confounders ES Step COR Step
ESS
0.05 0.10 0.15
maxES
20 40 60 80
maxW
0.010 0.020 0.030 0.04
WBAL
BVAL=0.155
Selecting Denominator Model t=3
1000 1500 2000 All confounders ES Step COR Step
ESS
0.05 0.10 0.15
maxES
20 40 60 80
maxW
0.010 0.020 0.030 0.04
WBAL
BVAL=0.198
imp=1 imp=2
SLIDE 54
8 Back Pocket Slides 54
How did the weights do?
Denom Pr. Denominator weights Balance t No. (min, max) (min, max) ESS J B M 1 19 (0.20, 0.98) (1.02, 32.83) 1214.8 46 0.041 0.13 2 45 (0.03, 0.95) (1.03, 15.43) 1867.7 86 0.024 0.13 3 76 (0.01, 0.97) (1.01, 37.93) 1057.0 126 0.037 0.16
SLIDE 55 8 Back Pocket Slides 55
Existing Semi-parametric G-Estimator
Recall our SNMM: E
- Y (a1, a2) | S1, S2(a1)
- = µ0 + ǫ1(s1) + β10a1 + β11a1s1
+ ǫ2(¯ s2, a1) + β20a2 + β21a2s1 + β22a2s2 Robins’ G-Estimator models the ǫt’s implicitly, as part of an algorithm. It also allows for incorrect models for the ǫt’s if models for the time-varying propensity scores—pt = Pr(At | ¯ St, At−1)—are correctly specified. That is, if either of the pt’s or ǫt’s are correctly specified, then the G-Estimator yields unbiased estimates of the causal β’s.
SLIDE 56 8 Back Pocket Slides 56
Robins’ Semi-parametric G-Estimator
Robins’ G-Estimator is the solution to these estimating equations: 0 = Pn
S2, A1)
S2, A1)
0 H′
2
′
+
- Y −A2H2β2−A1H1β1−b1(S1)
- ×
- A1−p1(S1)
- ×
H′
1
∆′(H1)
′
∆(H1) = E
- H2A2
- S1, A1 = 1
- − E
- H2A2
- S1, A1 = 0
- b2( ¯
S2, A1) = E
S2, A1
S2, A1) = Pr
S2, A1
- b1(S1) = E [Y − A2H2β2 − A1H1β1 | S1]
p1(S1) = Pr [A1 = 1 | S2]
SLIDE 57 8 Back Pocket Slides 57
Bias-Variance Trade-off
This discussion assumes true models for the causal effects, the µts: Robins’ G-Estimator is unbiased if either pt or bt are correctly
- specified. So-called double-robustness property.
Robins’ G-Estimator is semi-parametric efficient if pt, bt, and ∆ are all correctly specified. 2-Stage Regression Estimator is unbiased only if the nuisance functions are correctly specified. 2-Stage Regression Estimator with correctly specified nuisance is more efficient than G-Estimator But what happens as we mis-specify the nuisance functions?
SLIDE 58 8 Back Pocket Slides 58
Mis-specifying ǫt’s using S∗ = S × N(1, sd = ν)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1 1.5 2 Scaled Root Mean Squared Difference (SRMSD) υ true
Larger values of υ correspond to worse fitting 2−Stage Regression estimators. MSD is the mean squared difference between the true nuisance function and the mis−specified nuisance function. SRMSD is equal to root−MSD divided by the standard deviation of the response Y.
SLIDE 59 8 Back Pocket Slides 59
Results
Relative MSE versus level of Mis−specification
Relative Mean Squared Error for β: MSE(Robins’ G−Estimator) / MSE(2−Stage Estimator)
1.0 1.5 2.0 2.5
a1
0.0 0.5 1.0 1.5 2.0
a1:s1 a2
1.0 1.5 2.0 2.5
a2:I((s1 + s2)/2)
1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0
a3 a3:I((s1 + s2 + s3)/3) υ SRMSD
0.02 0.5 0.58 0.6 0.6
SRMSD
0.02 0.5 0.58 0.6 0.6
υ
SLIDE 60 8 Back Pocket Slides 60
The Generative Model in Simulations
nits = 1000 simulated data sets each of size n = 500
− 0.40 + δ1.
− Bin(n, p = 0.50). Then A1 ← − 0 if Z = 0; otherwise A1 ← − Bin(n, p1 = Λ(1.0 − 0.24s1))
- 3. δ2 ∼ Nn(0, sd = 0.75). Generate S2 by setting
S2 ← − 0.27 + 0.41s1 + 0.01a1 − 0.01s2
1 − 0.27s1a1 + δ2.
− 0 if A1 = 0; otherwise A2 ← − Bin(n, p2 = Λ(1.0 + 0.40s1 − 0.31s2)).
SLIDE 61 8 Back Pocket Slides 61
- 5. δ3 ∼ Nn(0, sd = 0.51). Generate S3 by setting
S3 ← − 0.17 + 0.10s1 − 0.25a1 + 0.30s2 − 0.75a2 + 0.05s2
1
− 0.04s2
2 − 0.1a1s1 + δ3.
− 0 if A2 = 0; otherwise A3 ← − Bin(n, p3 = Λ(1.0 − 0.2s1 − 0.3s2 + 0.4s3)). SNMM Generated as follows: Y ← − intercept + ǫTRUE
1
(s1; η1) + a1(β1,1 + β1,2s1) + ǫTRUE
2
(¯ s2, a1; η2) + a2(β2,1 + β2,2(s1 + s2)/2) + ǫTRUE
3
(¯ s3, ¯ a2; η3) + a3(β1,1 + β3,2(s1 + s2 + s3)/3) + δy, where
SLIDE 62 8 Back Pocket Slides 62
- 1. intercept = 3.55
- 2. β1,1 = β2,1 = β3,2 = 0.30,
- 3. β1,2 = β2,2 = β3,1 = −0.30,
- 4. δy is a random sample of size n from N(0, sd = 0.7),
and where the true nuisance functions are defined as
1
(s1; η1) = 0.45 × δ1,
2
(¯ s2, a1; η2) = (0.30 + 0.20s1 + 0.15a1 + 0.15a1s1 + 1.0 sin(4.5s1)) × δ2,
3
(¯ s3, ¯ a2; η3) = (0.40 − 0.30s2 + 0.30a2 + 0.60a2s2 + 1.6 sin(2.5s2)) × δ3.
SLIDE 63 8 Back Pocket Slides 63
Scaled Root Mean Squared Difference
This is how we measured amount of mis-specification: SRMSD(ν) =
K
t=3 ǫTRUE t
− K
t=1 ǫν t (
η, γ) 2 V ar(Y ) , where ν corresponds to a mis-specified 2-Stage Regression Estimator. The expectation E and variance V ar in SRMSD are over the data D = ( ¯ S3, ¯ A3, Y ) for fixed ( η, γ). Calculated via Monte Carlo integration. I claim SRMSD has an “effect-size-like” interpretation.
SLIDE 64 8 Back Pocket Slides 64
Confounding in
PROSPECT
Before Weighting After Weighting Variable Absolute Effect Name Effect Size† Sign Size‡ A1 = HSANY 4 HAMDA 0 0.77 + 0.18 RE 0 0.64
RE16N 0 0.66
MCS 0 0.53
MMSE2 0 0.45 + 0.22 SSI 0 0.40
A2 = HSANY 8 CAD 4 0.80 + 0.27 DYSTH 0 0.69
OPS 0 0.49
HAMDA 4 0.55
CAD 0 0.51 + 0.47 POSAF 0 0.53 +
A3 = HSANY 12 CAD 8 0.90 + 0.37 WHITE1 0.71 + 0.08 RP16N 0 0.60 + 0.28
SLIDE 65 8 Back Pocket Slides 65
Adolescent Substance Use and Community-based Treatment
- Motivating data set / application
- Collected by Center for Substance Abuse Treatment
- From a combination of major substance abuse programs
- Managed and cleaned by Chestnut Health Systems, IL
(Michael Dennis)
- Global Appraisal of Individual Needs (GAIN): structured
clinical interview, over 100 measures
- Full adolescent data set is n = 6000 and counting...
SLIDE 66 8 Back Pocket Slides 66
The Illustrative Data Set
- n = 2870 adolescents
- Interested in fitting a K = 2 time points SNMM
S1 A1 A2 S2 Y = ERS
- S1 = need0 = binary indicator of need/severity at baseline
- A1 = anytxt3 = reported no treatment (0) versus some
treatment (1=outpatient, inpatient, or both) at 3-months
- S2 = need6 = binary indicator of need/severity at 6-months
- A2 = anytxt9 = treatment indicator at at 9-months
- Y = ERS = Environmental Risk Scale at 12-months
SLIDE 67
8 Back Pocket Slides 67
What is the Scientific Question?
µ1 = What is the effect of receiving treatment versus not at 3-months (and not receiving treatment in the future) on 12-month ERS scores, conditional on baseline severity? µ2 = What is the effect of receiving treatment versus not at 9-months on 12-month ERS scores, as a function of baseline severity, having received (or not) treatment at 3-months, and 6-month severity?
SLIDE 68
8 Back Pocket Slides 68
SLIDE 69
8 Back Pocket Slides 69
SLIDE 70 8 Back Pocket Slides 70
Specifying the Saturated SNMM
Causal effects:
- 1. µ1 = anytxt3 (β10 + β11 need0),
- 2. µ2 = anytxt9(β20 + β21 need0 + β22 anytxt3 + β23 need6 +
β24 need0 anytxt3 + β25 need0 need6 + β26 anytxt3 need6 + β27 need0 anytxt3 need6) Nuisance functions:
- 1. ǫ1 = η11 × (need0 − Pr(need0 = 1)),
- 2. ǫ2 = (η21 + η22 need0 + η23 anytxt3 + η24 need0 anytxt3)
×(need6 − Pr( need6 = 1 | need0, anytxt3)), where Pr(need6 = 1 | need0, anytxt3) = γ20 + γ21 need0+ γ22 anytxt3 + γ23 need0 anytxt3.
SLIDE 71 8 Back Pocket Slides 71
Estimates of the SNMM Using the 2-Stage Regression Estimator
2-Stage Estimator Parameters
SE P-val µ0 Int β00 39.76 1.36 < 0.01 µ1 Int β10 −2.72 1.5 0.07 need0 β11 −6.89 3.73 0.06 µ2 Int β20 −6.59 4.04 0.10 need0 β21 −2.12 6.17 0.73 anytxt3 β22 1.13 4.20 0.78 need6 β23 3.37 17.53 0.85 need0anytxt3 β24 4.26 6.52 0.52 need0need6 β25 5.79 20.49 0.77 anytxt3need6 β26 −0.47 17.98 0.99 need0anytxt3need6 β27 −12.15 21.3 0.57
SLIDE 72
8 Back Pocket Slides 72
RR+IPTW vs RR vs TRAD
Different Estimators Contrast Subgroup RR+IPTW RR TRAD µ1: Distal (1, 0, 0) vs (0, 0, 0) no intake sevrty, < 16yrs −0.004 −0.016 −0.002 (1, 0, 0) vs (0, 0, 0) hi intake sevrty, ≥ 16yrs 0.033 0.015 0.038 µ2: Medial (1, 1, 0) vs (1, 0, 0) no 0-3 severity −0.008 −0.005 0.000 (1, 1, 0) vs (1, 0, 0) hi 0-3 severity, yes ce −0.048 −0.067 −0.040 (1, 1, 0) vs (1, 0, 0) hi 0-3 severity, no ce 0.021 −0.037 −0.010 µ3: Proximal (1, 1, 1) vs (1, 1, 0) no 3-6 severity −0.006 −0.012 −0.012 (1, 1, 1) vs (1, 1, 0) hi 3-6 severity −0.168 −0.110 −0.110 (., ., 1) vs (., ., 0) no 3-6 severity 0.026 0.002 0.002 (., ., 1) vs (., ., 0) hi 3-6 severity −0.165 −0.144 −0.144
SLIDE 73 8 Back Pocket Slides 73
Some conjectures about the methodological story
- Spurious bias for the distal effect is probably POS
(see arguments I made earlier)
- Confounding bias for the distal effect is probably NEG
(good kids get (1,0,0) and also have better/lower Y )
- Spurious and confounding bias cancel each other out and this
is why we see TRAD approximately the same as RR+IPTW.
- Confounding bias for the proximal effect is probably POS
(bad kids get (1,1,1) and also have worse/higher Y ) and this is why we see the estimated proximal effects under RR+IPTW (vs are much stronger NEG
SLIDE 74 9 Connections with the Marginal Structural Model 74
9 Connections with the Marginal Structural Model
- The MSM is a model for E(Y (a1, a2) | S0)
- The SNMM is a model for E(Y (a1, a2) | S0, S1(a1))
- So the law of iterated expectations gives us the MSM:
E
- Y (a1, a2) | S0 = s0
- = ES1(a1)|S0
- E
- Y (a1, a2) | ¯
S1(a1) = ¯ s1
- = µ0 + ǫ1(s0) + µ1(s0, a1) + ES1(a1)|S0
- ǫ2(¯
s1, a1) + µ2(¯ s1, ¯ a2)
- = µ0 + ǫ1(s0) + µ1(s0, a1) + ES1(a1)|S0
- µ2(¯
s1, ¯ a2)
SLIDE 75 9 Connections with the Marginal Structural Model 75
Connections with the Marginal Structural Model
- Due to linearity: If effect moderation ∃, we can get MSM
estimates by plugging-in the estimated stage 1 regression for the time-varying moderators in the µt’s. Think path analysis. But now we have to believe our stage 1 models are causal!
- If effect moderation ∃, estimates for the µt’s are indeed
estimates for the marginal effects. Just read them off.
- Regardless, it is possible to use the RR+IPTW to get a
double robust estimate of the marginal effects. Useful when we fail to balance on some covariates. To do this, (i) employ the plug-in estimator above, and (ii) don’t use the numerator propensity score model in the weights.