Structural Nested Mean Models for Assessing Time-Varying Causal - - PowerPoint PPT Presentation

structural nested mean models for assessing time varying
SMART_READER_LITE
LIVE PREVIEW

Structural Nested Mean Models for Assessing Time-Varying Causal - - PowerPoint PPT Presentation

1 Structural Nested Mean Models for Assessing Time-Varying Causal Effect Moderation Daniel Almirall 1 Thomas R. Ten Have 2 Susan A. Murphy 3 1 Health Services Research in Primary Care, Durham VA MC 1 Biostatistics & Bioinformatics Department,


slide-1
SLIDE 1

1

Structural Nested Mean Models for Assessing Time-Varying Causal Effect Moderation

Daniel Almirall1 Thomas R. Ten Have2 Susan A. Murphy3

1Health Services Research in Primary Care, Durham VA MC 1Biostatistics & Bioinformatics Department, Duke Univ MC 2Clinical Epi & Biostatistics, Univ of Pennsylvania Medicine 3Statistics Department & ISR, Univ of Michigan

May 21, 2009 2009 Atlantic Causal Modeling Conference Philadelphia, Pennsylvania

slide-2
SLIDE 2

Contents 2

Contents

1 Warm-up: Suppose we want A → Y . 4 2 Effect Moderation in One Time Point 7 3 Mean Model in One Time Point 9 4 Time-Varying Effect Moderation 10 5 Robins’ Structural Nested Mean Model 13 6 Estimation in Time-Varying Setting 15

slide-3
SLIDE 3

Contents 3

7 Sequential Ignorability Given ¯ SK 21 8 Application of the SNMM 22 9 Future Work 25 10 Extra Slides 34

slide-4
SLIDE 4

1 Warm-up: Suppose we want A → Y . 4

1 Warm-up: Suppose we want A → Y .

S A Y ?

Examples S = pre-A covt A = txt/expsr Y = outcome Suicidal? Medication? Depression Gender,SES SAT Coaching? SAT Math Score Social Support Inpatient vs. Outpatient Substance Abuse Why condition on (“adjust for”) pre-exposure covariables S?

slide-5
SLIDE 5

1 Warm-up: Suppose we want A → Y . 5

Suppose we want the effect of A on Y . Why condition on (adjust for) pre-treatment (or pre-exposure) variables S?

  • 1. Confounding: S is correlated with both A and Y . In this

case, S is known as a “confounder” of the effect of A on Y .

  • 2. Precision: S may be a pre-treatment measure of Y, or any
  • ther variable highly correlated with Y .
  • 3. Missing Data: The outcome Y is missing for some units, S

and A predict missingness, and S is associated with Y .

  • 4. Effect Heterogeneity: S may moderate, temper, or specify

the effect of A on Y . In this case, S is known as a “moderator” of the effect of A on Y .

slide-6
SLIDE 6

1 Warm-up: Suppose we want A → Y . 6

Suppose we want the effect of A on Y . Why condition on (adjust for) pre-treatment (or pre-exposure) variables S?

S A Y

  • 4. Effect Heterogeneity: S may moderate, temper, or specify

the effect of A on Y . In this case, S is known as a “moderator” of the effect of A on Y . Formalized in next slide.

slide-7
SLIDE 7

2 Effect Moderation in One Time Point 7

2 Effect Moderation in One Time Point

µ(s, a) ≡ E(Y (a) − Y (0) | S = s)

S = Social Support: High is better Y(a) = Substance Use: Low is better a = 1 = residential a = 0 = outpatient S = Social Support: High is better µ(s) = E( Y(inpat) − Y(outpat) | S=s ) µ = 0 = No Effect

Outpatient substance abuse treatment is better than residential treatment for individuals with higher levels of social support.

slide-8
SLIDE 8

2 Effect Moderation in One Time Point 8

Causal Effect Moderation in Context: Relevance?

Theoretical Implication: Understanding the heterogeneity of the effects of treatments or exposures enhances our understanding of various (competing) scientific theories; and it may suggest new scientific hypotheses to be tested. Elaboration of Yu Xie’s Social Grouping Principle: We really want Yi(a) − Yi(0) ∀ i. We settle for “groupings” of effects (here, groupings by S); µ(s, a) “comes closer” than E(Y (a) − Y (0)). Practical Implication: Identifying types, or subgroups, of individuals for which treatment or exposure is not effective may suggest altering the treatment to suit the needs of those types of individuals.

slide-9
SLIDE 9

3 Mean Model in One Time Point 9

3 Mean Model in One Time Point

Decomposition of the conditional mean E(Y (a) | S) and the prototypical linear model: E(Y (a) | S = s) = E(Y (0) | S = 0) +

  • E(Y (0) | S = s) − E(Y (0) | S = 0)
  • + E(Y (a) − Y (0) | S = s)

= η0 + φ(s) + µ(s, a)

e.g.

= η0 + η1s + β1a + β2as. This is precisely what I would do, too.

slide-10
SLIDE 10

4 Time-Varying Effect Moderation 10

4 Time-Varying Effect Moderation

The data structure in the time-varying setting is:

S1 a1

a2 S2(a1)

Y (a1, a2)

PROSPECT (Prevention of Suicide in Primary Care Elderly: CT) (a1, a2) Time-varying treatment pattern; at is binary (0,1) Y (a1, a2) Depression at the end of the study; continuous S1 Suicidal Ideation at baseline visit; continuous S2(a1) Suicidal Ideation at second visit; continuous Ex: What is the effect of switching off treatment for depression early versus later, as a function of time-varying suicidal ideation?

slide-11
SLIDE 11

4 Time-Varying Effect Moderation 11

Formal Definition of Time-Varying Causal Effects

Conditional Intermediate Causal Effect at t = 2: µ2(¯ s2, ¯ a2) = E[Y (a1, a2) − Y (a1, 0) | S1 = s1, S2(a1) = s2]

S1 a1 a2

S2(a1)

Y (a1, a2)

Conditional Intermediate Causal Effect at t = 1: µ1(s1, a1) = E[Y (a1, 0) − Y (0, 0) | S1 = s1]

S1 a1

a2 = 0

Y (a1, 0)

Set

slide-12
SLIDE 12

4 Time-Varying Effect Moderation 12

Formal Definition of Time-Varying Causal Effects

Conditional Intermediate Causal Effect at t = 2: µ2(¯ s2, ¯ a2) = E[Y (a1, a2) − Y (a1, 0) | S1 = s1, S2(a1) = s2]

S1 a1 a2

S2(a1)

Y (a1, a2)

Conditional Intermediate Causal Effect at t = 1: µ1(s1, a1) = E[Y (a1, 0) − Y (0, 0) | S1 = s1]

S1 a1

a2 = 0 S2(a1)

Y (a1, 0)

Set

slide-13
SLIDE 13

5 Robins’ Structural Nested Mean Model 13

5 Robins’ Structural Nested Mean Model

The SNMM for the conditional mean of Y (a1, a2) given ¯ S2(a1) is: E

  • Y (a1, a2) | S1, S2(a1)
  • = E[Y (0, 0)] +
  • E[Y (0, 0) | S1] − E[Y (0, 0)]
  • +
  • E
  • Y(a1, 0) − Y(0, 0) | S1
  • +
  • E[Y (a1, 0) | ¯

S2(a1)] − E[Y (a1, 0) | S1]

  • +
  • E
  • Y(a1, a2) − Y(a1, 0) | ¯

S2(a1)

  • = µ0 + ǫ1(s1) + µ1(s1, a1) + ǫ2(¯

s2, a1) + µ2(¯ s2, ¯ a2)

e.g.

= µ0 + ǫ1(s1) + β10a1 + β11a1s1 + ǫ2(¯ s2, a1) + β20a2 + β21a2s1 + β22a2s2

slide-14
SLIDE 14

5 Robins’ Structural Nested Mean Model 14

Constraints on the Causal and Nuisance Portions

E

  • Y (a1, a2) | ¯

S2(a1) = ¯ s2

  • = µ0 + ǫ1(s1) + µ1(s1, a1)

+ ǫ2(¯ s2, a1) + µ2(¯ s2, ¯ a2), where · µ2(¯ s2, a2, 0) = 0 and µ1(s1, 0) = 0, · ǫ2(¯ s2, a1) = E[Y (a1, 0) | ¯ S2(a1) = ¯ s2]−E[Y (a1, 0) | S1 = s1], · ǫ1(s1) = E[Y (0, 0) | S1 = s1] − E[Y (0, 0)], · ES2|S1[ǫ2(¯ s2, a1) | S1 = s1] = 0, and ES1[ǫ1(s1)] = 0. The ǫt’s make the SNMM a non-standard regression model.

slide-15
SLIDE 15

6 Estimation in Time-Varying Setting 15

6 Estimation in Time-Varying Setting

Recall that parametric models for our causal estimands µ1 and µ2 are based on the set of parameters β = (β′

1, β′ 2)′.

We considered two estimators for β:

  • 1. Proposed 2-Stage Regression Estimator
  • 2. Robins’ Semi-parametric G-Estimator

In order to make causal inferences, both estimators rely on Robins’ Sequential Ignorability (or Sequential Randomization)

  • Assumption. We discuss the two estimators in turn, but first . . .
slide-16
SLIDE 16

6 Estimation in Time-Varying Setting 16

So what’s wrong with the Traditional Estimator?

An Example of The Traditional Estimator: Apply OLS with E(Y | ¯ S2 = ¯ s2, ¯ A2 = ¯ a2) = β∗

0 + η1s1 + β∗ 1a1 + β∗ 2a1s1

+ η2s2 + β∗

3a2 + β∗ 4a2s1 + β∗ 5a2s2

  • Possibly incorrectly specified nuisance functions.
  • Two problems arise with the interpretation of β∗

1 and β∗ 2 (i.e.,

parameters thought to represent µ1) when using the traditional regression estimator. We describe them next.

  • These problems may occur even in the absence of time-varying

confounders (that is, even under Sequential Ignorability) . . .

slide-17
SLIDE 17

6 Estimation in Time-Varying Setting 17

First problem with the Traditional Approach

Wrong Effect

S1 a1

a2 = 0 S2(a1)

Baseline 4-month Visit 8-month Visit

Y (a1, 0)

Set

But what about the effect transmitted through S2(a1)? The term β∗

1a1 + β∗ 2a1s1 does not capture the “total” impact of

(a1, 0) vs (0, 0) on Y (a1, a2) given values of S1.

slide-18
SLIDE 18

6 Estimation in Time-Varying Setting 18

Second problem with the Traditional Approach

Spurious Effect

S1 a1

a2 = 0 S2(a1)

Baseline 4-month Visit 8-month Visit

V0 Y (a1, 0)

Set

This is also known as “Berkson’s paradox”; and is related to Judea Pearl’s back-door criterion.

slide-19
SLIDE 19

6 Estimation in Time-Varying Setting 19

Proposed 2-Stage Regression Estimator

The proposed 2-Stage Estimator resembles the Traditional

  • Estimator. Instead of using the Traditional Estimator

E(Y | ¯ S2 = ¯ s2, ¯ A2 = ¯ a2) = β∗

0 + η1s1 + β∗ 1a1 + β∗ 2a1s1

+ η2s2 + β∗

3a2 + β∗ 4a2s1 + β∗ 5a2s2,

we use the following E(Y | ¯ S2 = ¯ s2, ¯ A2 = ¯ a2) = β∗

0 + η1s1 + β∗ 1a1 + β∗ 2a1s1

+ η2

  • s2 − E(S2 | A1, S1)
  • + β∗

3a2 + β∗ 4a2s1 + β∗ 5a2s2.

We call it “2-Stage” because first we estimate E(S2 | A1, S1), then use the residual s2 −

  • E(S2 | A1, S1) in a second regression to

get β’s. Use sandwich/robust SEs for inference (p-vals, CIs, etc.).

slide-20
SLIDE 20

6 Estimation in Time-Varying Setting 20

Existing Semi-parametric G-Estimator

Recall our SNMM: E

  • Y (a1, a2) | S1, S2(a1)
  • = µ0 + ǫ1(s1) + β10a1 + β11a1s1

+ ǫ2(¯ s2, a1) + β20a2 + β21a2s1 + β22a2s2 Robins’ G-Estimator models the ǫt’s implicitly, as part of an algorithm. It also allows for incorrect models for the ǫt’s if models for the time-varying propensity scores—pt = Pr(At | ¯ St, At−1)—are correctly specified. That is, if either of the pt’s or ǫt’s are correctly specified, then the G-Estimator yields unbiased estimates of the causal β’s.

slide-21
SLIDE 21

7 Sequential Ignorability Given ¯ SK 21

7 Sequential Ignorability Given ¯ SK

Or the absence of confounders (known, unknown, measured, or unmeasured) other than ¯

  • St. Formally, for each t = 1, 2, . . . , K,

At is independent of {Y (¯ aK)} given (S1, A1, S2, A2, . . . , St)

S1 A1 A2 S2 X1 Y (a1, a2) X2

slide-22
SLIDE 22

8 Application of the SNMM 22

8 Application of the SNMM

n = 277 geriatric primary care patients from PROSPECT Study K = 3: visits to clinic at baseline, 4-, 8-, and 12-months Data structure is {S1, A1, S2, A2, S3, A3, Y } St1 = suicidal ideation, St2 = intermediate depression, At = adherence, Y = 12-month (end-of-study) depression Adherence is defined as ever meeting with a health specialist Monotonic adherence pattern: (0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1) Sequential Ignorability given ¯ S3 is very likely violated

slide-23
SLIDE 23

8 Application of the SNMM 23

Models Used in the Application

Causal effects: (expect βt0 < 0 and βt1 > 0)

  • 1. µ1(S1, a1) = a1 (β10 + β11SSI1 + β11HAMD1),
  • 2. µ2( ¯

S2, ¯ a2) = a2 (β20 + β21SSI2 + β21HAMD2), and

  • 3. µ3( ¯

S3, ¯ a3) = a3 (β30 + β31SSI3 + β31HAMD3). Nuisance functions used in the 2-Stage Estimator:

  • 1. For t = 1, 2, for both SSI and HAMD, we used the most

parsimonious model for the ǫt’s.

  • 2. ǫ31(S31, ¯

S2, ¯ A2) = (η3,1,1 + η3,1,2SSI1 + η3,1,3HAMD1 + η3,1,4SSI2 + η3,1,5HAMD2 +η3,1,6SSI1HAMD2) δ31(S31, ¯ S2, ¯ A2; γ31)

  • 3. ǫ32(S32, ¯

S2, ¯ A2) = (η3,2,1 + η3,2,2SSI2) δ32(S32, ¯ S2, ¯ A2; γ32)

slide-24
SLIDE 24

8 Application of the SNMM 24

Results

2-Stage Estimator Robins’ G-Estimator Parameters

  • β

SE P-val

  • β

SE P-val µ1 Int β10 −0.20 0.37 0.59 −0.21 0.37 0.57 SSI β11 0.11 0.15 0.48 0.09 0.34 0.79 HAMD β12 −0.24 0.35 0.49 −0.11 0.33 0.75 µ2 Int β20 0.40 0.31 0.20 0.52 0.33 0.11 SSI β21 0.22 0.27 0.42 −0.03 0.37 0.94 HAMD β22 0.11 0.16 0.49 0.11 0.24 0.65 µ3 Int β30 −0.27 0.26 0.30 −0.42 0.30 0.17 SSI β31 −0.12 0.26 0.65 −0.60 0.50 0.23 HAMD β32 −0.02 0.19 0.94 0.03 0.23 0.91

slide-25
SLIDE 25

9 Future Work 25

9 Future Work

slide-26
SLIDE 26

9 Future Work 26

(a) Handling Time-varying Confounders in the SNMM

How do we handle time-varying covariates Xt that are possible confounders, but are not moderators of interest?

S1 A1 A2 S2 X1 Y (a1, a2) X2

Note: In practice, the set Xt is much larger than St.

slide-27
SLIDE 27

9 Future Work 27

Proposed Solution: Propensity Score Weighting

We can use IPT-Weighted versions of the proposed 2-Stage Estimator (or G-Estimator):

S1 A1 A2 S2 X1 Y (a1, a2) X2

slide-28
SLIDE 28

9 Future Work 28

(b) Multi-Component MSMs and SNMMs

  • Thus far, we have had 1 treatment/exposure variable per time

point; and so, we have considered potential outcomes Y (a1, a2)

  • What if we had multi-component treatments/exposures at

each time point?

  • That is, what if at is now a vector of J treatment/exposure

variables?

  • Y (a1, a2); where at = (at1, . . . , atJ)
slide-29
SLIDE 29

9 Future Work 29

Multi-Component MSMs and SNMMs: Example

SB AB S6 Y18(a1, a2)

WEIGHT

CARBS FAT PROTEIN ALCOHOL

A6

CARBS FAT PROTEIN ALCOHOL

WGT,EXER,... WGT,EXER,...

slide-30
SLIDE 30

9 Future Work 30

Multi-Component MSMs and SNMMs: Example

  • −0.06

−0.02 0.02 0.06 Baseline Macronutrient Standardized Effect

  • Weight at 18 Months

RIAPRO RIVPRO RIFAT RIDFIB NFC RIALC

  • −0.06

−0.02 0.02 0.06 6−Month Macronutrient Standardized Effect

  • Weight at 18 Months

RIAPRO RIVPRO RIFAT RIDFIB NFC RIALC

slide-31
SLIDE 31

9 Future Work 31

(c) SNMMs With Longitudinal Outcomes

S1 A1 A2 S2 Y2(a1, a2) Y1(a1)

slide-32
SLIDE 32

9 Future Work 32

(d) Informing RCTs for Developing Adaptive Treatment Strategies

Clinicians are becoming more interested in developing adaptive treatment strategies (ATSs), which are sequences of individually tailored decision rules that specify whether, how, and when to alter the intensity, type, or delivery of treatment at critical decision points in the medical care process. Specialized trials are available that can be used to inform the development of ATSs. How can/should MSMs and SNMMs be used to inform the design

  • f these specialized trials?
slide-33
SLIDE 33

9 Future Work 33

Thank you! More Questions?

slide-34
SLIDE 34

10 Extra Slides 34

10 Extra Slides

slide-35
SLIDE 35

10 Extra Slides 35

Mean Model in One Time Point

Decompose the conditional mean E(Y (a) | S) as follows: E(Y (a) | S = s) = E(Y (0) | S = 0) +

  • E(Y (0) | S = s) − E(Y (0) | S = 0)
  • + E(Y (a) − Y (0) | S = s)

= η0 + φ(s) + µ(s, a). The intercept η0 and the function φ(s) are non-causal. They are known as nuisance functions. φ(s) is the “associational main effect” of S on Y (0).

slide-36
SLIDE 36

10 Extra Slides 36

Prototypical Linear Parametric Model

We use β for our causal parameters of interest: E(Y (a) | S) = η0 + φ(S) + µ(S, a; β) = η0 + φ(S) + aHβ where H is a function of S. Sometimes we parameterize φ(S) using φ(S; η−0) = Gη−0, where G is a function of S. Example: Let G = (S) and H = (1, S): E(Y (a) | S = s) = η0 + η1s + a × (β1 + β2s). If a and S are binary, then this is the fully saturated model.

slide-37
SLIDE 37

10 Extra Slides 37

Estimation in One Time Point

Consider three estimators for β in µ(S, a; β):

  • 1. Traditional Regression
  • 2. Semi-parametric Estimation Method: Robins’ E-Estimator
  • 3. Inverse Probability of Treatment Weighted (IPTW) Regression

We discuss these (and more) in turn, supposing that

  • 1. a is binary (0,1), and
  • 2. True model for µ(s, a) is µ(S, a; β) = aHβ for some H.

Example: H = (1, S) ⇒ aHβ = a(β1 + β2s). An important consideration in estimation is how A comes about.

slide-38
SLIDE 38

10 Extra Slides 38

Traditional Ordinary Least Squares Regression

Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when S is sole confounder, and have good model for φ(s). Requires model for nuisance function: φ(S; η−0) = Gη−0. Regress Y ∼ [1, G, A × H] to get ( η, β). The β estimates solve 0 = Pn

  • Y − η0 − Gη−0 − AHβ
  • AHT
  • .
  • β unbiased for β if φ(S; η−0) = Gη−0 is true model for φ(s) and

A ⊥ {Y (0), Y (1)} given S.

slide-39
SLIDE 39

10 Extra Slides 39

Semi-parametric E-Estimator

Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when S is sole confounder, but we have no model for φ(s). Does NOT require model for nuisance function φ(s). Get β by solving the following estimating equations 0 = Pn

  • Y −

b(S; ξ) − AHβ

  • A −

p(S; α)

  • HT
  • ,

where b(S; ξ) is a guess for E(Y − AHβ | S) = η0 + φ(S).

  • β unbiased for β if p(S; α) is true model for Pr(A = 1 | S), and

A ⊥ {Y (0), Y (1)} given S. (Discuss double-robustness.)

slide-40
SLIDE 40

10 Extra Slides 40

IPT Weighted Regression (WLS)

Recall true model: E(Y (a) | S) = η0 + φ(S) + aHβ. Useful when we have measured confounders V (⊃ S). Requires model for nuisance function: φ(S; η−0) = Gη−0. Regress Y

  • w

∼ [1, G, A × H] to get ( η, β), where weights are w(V, A) = A × Pr(A = 1 | S) Pr(A = 1 | V ) + (1 − A) × Pr(A = 0 | S) Pr(A = 0 | V ).

  • β unbiased for β if φ(S; η−0) = Gη−0 is true model for φ(s), and

A ⊥ {Y (0), Y (1)} given V .

slide-41
SLIDE 41

10 Extra Slides 41

Semi-parametric Regression Method (Encore)

Now, model is: E(Y (a) | V ) = η0 + φ∗(V ) + aHβ. Useful with confounders V (⊃ S), have no model φ∗(V ), and if we can assume that V − S does not moderate impact of a on Y (a). Does NOT require model for nuisance function φ(V ). Get β by solving the following estimating equations 0 = Pn

  • Y −

b(V ; ξ) − AHβ

  • A −

p(V ; α)

  • HT
  • .
  • β unbiased for β if p(S; α) is true model for Pr(A = 1 | S), and

A ⊥ {Y (0), Y (1)} given V .

slide-42
SLIDE 42

10 Extra Slides 42

An Overview of Estimation Strategies

Model A: E(Y (a) | S) = η0 + φ(S) + aHβ Model B: E(Y (a) | V ) = φ0 + φ(V ) + aHβ H is always a function of S Ex: H = (1, S) Model A Model B No S is Sole Confnders Modrtrs S, Confnders Confnder V ♯ Confndrs V φ Is OLS∗ OLS∗ IPTW OLS Known Regression∗ φ Is Not OLS E-estimtr∗† IPTW E-estimtr♯ Known (if S ⊥ A) E-estimtr∗†

∗just discussed †need Pr(A = 1 | S) ♯need Pr(A = 1 | V )

slide-43
SLIDE 43

10 Extra Slides 43

Application in One Time Point

n = 1984 adolescents that are substance abusers. Motivation: American Society of Addiction Medicine (ASAM) Patient Placement Criteria (PPC) Two levels of care (LOC): A = 0 outpatient, A = 1 residential Illustrate methodology using: S = Needle Frequency Index (hi is bad) Y = Substance Frequency Index (hi is bad) V = 86 covariates to adjust for (possible confounders)

slide-44
SLIDE 44

10 Extra Slides 44

Covariate Balance Before-After Weighting

0.0 0.2 0.4 0.6 0.8

Standardized Differences Before−After

B = Average Absolute Standardized Difference Absolute Standardized Difference Unweighted Weighted

B = 0.3476 B = 0.0688

small medium large 0.0 0.2 0.4 0.6 0.8 1.0

P−values for No Difference Before−After

N = Number of P−values < 0.10 P−values Unweighted Weighted

N = 86 N = 11

0.05 0.1

slide-45
SLIDE 45

10 Extra Slides 45

Effect Moderation by S = Needle Frequency Index

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Unweighted Regression

Needle Frequency Index (nfip) Substance Frequencey Index (sfi7pfu) Outpatient LOC Inpatient/Residential LOC 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

IPTWeighted Regression

Needle Frequency Index (nfip) Outpatient LOC Inpatient/Residential LOC

slide-46
SLIDE 46

10 Extra Slides 46

As a Decomposition of the Marginal Causal Effect

Recall the data structure {S1, a1, S2(a1), a2, Y (a1, a2)}. Consider the following arithmetic decomposition of the causal effect of (a1, a2) on Y , using the covariates ¯ S2(a1): E

  • Y (a1, a2) − Y (0, 0)
  • = E
  • E
  • Y (a1, a2) − Y (a1, 0) | ¯

S2(a1)

  • + E
  • E
  • Y (a1, 0) − Y (0, 0) | S1
  • .

The inner expectations represent the conditional intermediate causal effects µ1 and µ2, respectively.

slide-47
SLIDE 47

10 Extra Slides 47

Robins’ Structural Nested Mean Model

The SNMM for the conditional mean of Y (a1, a2) given ¯ S2(a1) is: E

  • Y (a1, a2) | S1, S2(a1)
  • = E[Y (0, 0)] +
  • E[Y (0, 0) | S1] − E[Y (0, 0)]
  • +
  • E
  • Y(a1, 0) − Y(0, 0) | S1
  • +
  • E[Y (a1, 0) | ¯

S2(a1)] − E[Y (a1, 0) | S1]

  • +
  • E
  • Y(a1, a2) − Y(a1, 0) | ¯

S2(a1)

  • = µ0 + ǫ1(s1) + µ1(s1, a1) + ǫ2(¯

s2, a1) + µ2(¯ s2, ¯ a2)

slide-48
SLIDE 48

10 Extra Slides 48

Parameterizing the Nuisance Functions

So we must parameterize the nuisance functions correctly. Recall the constraints on the nuisance functions:

  • ǫ2(¯

s2, a1) = E[Y (a1, 0) | ¯ S2(a1) = ¯ s2]−E[Y (a1, 0) | S1 = s1],

  • ǫ1(s1) = E[Y (0, 0) | S1 = s1] − E[Y (0, 0)],
  • ES2|S1[ǫ2(¯

s2, a1) | S1 = s1] = 0, and ES1[ǫ1(s1)] = 0. Example parameterizations for the nuisance functions: ǫ1(s1)

say

= η1,1

  • s1 − E(S1)
  • ǫ2(¯

s2, a1)

say

= η2,1

  • s2 − E(S2(a1) | S1 = s1)
slide-49
SLIDE 49

10 Extra Slides 49

Proposed 2-Stage Regression Estimator

Recall that E

  • Y (a1, a2) | ¯

S2(a1) = ¯ s2

  • = µ2(¯

s2, ¯ a2; β2) + ǫ2(¯ s2, a1; η2, γ2) + µ1(s1, a1; β1) + ǫ1(s1; η1, γ1) + µ0.

  • 1. We have models for the µ’s: A1H1β1 and A2H2β2; Set aside
  • 2. Model m1(γ1) = E(S1), estimate γ1 with GLM; model

m2(s1, a1; γ2) = E(S2(a1) | S1 = s1), estimate γ2 with GLM

  • 3. Construct residuals ˆ

δ1 = s1 − ˆ m1(ˆ γ1) and ˆ δ2 = s2 − ˆ m2(s1, a1; ˆ γ2)

  • 4. Construct models for ǫ’s: G1ˆ

δ1η1 = G∗

1η1 and G2ˆ

δ2η2 = G∗

2η2

  • 5. Obtain ˆ

β and ˆ η using OLS of Y ∼ [1, G∗

1, A1H1, G∗ 2, A2H2]

slide-50
SLIDE 50

10 Extra Slides 50

Robins’ Semi-parametric G-Estimator

Robins’ G-Estimator is the solution to these estimating equations: 0 = Pn

  • Y −A2H2β2−b2( ¯

S2, A1)

  • ×
  • A2−p2( ¯

S2, A1)

  • ×

  0 H′

2

 

+

  • Y −A2H2β2−A1H1β1−b1(S1)
  • ×
  • A1−p1(S1)
  • ×

  H′

1

∆′(H1)  

∆(H1) = E

  • H2A2
  • S1, A1 = 1
  • − E
  • H2A2
  • S1, A1 = 0
  • b2( ¯

S2, A1) = E

  • Y − A2H2β2 | ¯

S2, A1

  • p2( ¯

S2, A1) = Pr

  • A2 = 1 | ¯

S2, A1

  • b1(S1) = E [Y − A2H2β2 − A1H1β1 | S1]

p1(S1) = Pr [A1 = 1 | S2]

slide-51
SLIDE 51

10 Extra Slides 51

Bias-Variance Trade-off

This discussion assumes true models for the causal effects, the µts: Robins’ G-Estimator is unbiased if either pt or bt are correctly

  • specified. So-called double-robustness property.

Robins’ G-Estimator is semi-parametric efficient if pt, bt, and ∆ are all correctly specified. 2-Stage Regression Estimator is unbiased only if the nuisance functions are correctly specified. 2-Stage Regression Estimator with correctly specified nuisance is more efficient than G-Estimator But what happens as we mis-specify the nuisance functions?

slide-52
SLIDE 52

10 Extra Slides 52

Mis-specifying ǫt’s using S∗ = S × N(1, sd = ν)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1 1.5 2 Scaled Root Mean Squared Difference (SRMSD) υ true

Larger values of υ correspond to worse fitting 2−Stage Regression estimators. MSD is the mean squared difference between the true nuisance function and the mis−specified nuisance function. SRMSD is equal to root−MSD divided by the standard deviation of the response Y.

slide-53
SLIDE 53

10 Extra Slides 53

Results

Relative MSE versus level of Mis−specification

Relative Mean Squared Error for β: MSE(Robins’ G−Estimator) / MSE(2−Stage Estimator)

1.0 1.5 2.0 2.5

a1

0.0 0.5 1.0 1.5 2.0

a1:s1 a2

1.0 1.5 2.0 2.5

a2:I((s1 + s2)/2)

1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0

a3 a3:I((s1 + s2 + s3)/3) υ SRMSD

0.02 0.5 0.58 0.6 0.6

SRMSD

0.02 0.5 0.58 0.6 0.6

υ

slide-54
SLIDE 54

10 Extra Slides 54

The Generative Model in Simulations

nits = 1000 simulated data sets each of size n = 500

  • 1. δ1 ∼
  • res1. Then S1 ←

− 0.40 + δ1.

  • 2. Z ←

− Bin(n, p = 0.50). Then A1 ← − 0 if Z = 0; otherwise A1 ← − Bin(n, p1 = Λ(1.0 − 0.24s1))

  • 3. δ2 ∼ Nn(0, sd = 0.75). Generate S2 by setting

S2 ← − 0.27 + 0.41s1 + 0.01a1 − 0.01s2

1 − 0.27s1a1 + δ2.

  • 4. Set A2 ←

− 0 if A1 = 0; otherwise A2 ← − Bin(n, p2 = Λ(1.0 + 0.40s1 − 0.31s2)).

slide-55
SLIDE 55

10 Extra Slides 55

  • 5. δ3 ∼ Nn(0, sd = 0.51). Generate S3 by setting

S3 ← − 0.17 + 0.10s1 − 0.25a1 + 0.30s2 − 0.75a2 + 0.05s2

1

− 0.04s2

2 − 0.1a1s1 + δ3.

  • 6. Set A3 ←

− 0 if A2 = 0; otherwise A3 ← − Bin(n, p3 = Λ(1.0 − 0.2s1 − 0.3s2 + 0.4s3)). SNMM Generated as follows: Y ← − intercept + ǫTRUE

1

(s1; η1) + a1(β1,1 + β1,2s1) + ǫTRUE

2

(¯ s2, a1; η2) + a2(β2,1 + β2,2(s1 + s2)/2) + ǫTRUE

3

(¯ s3, ¯ a2; η3) + a3(β1,1 + β3,2(s1 + s2 + s3)/3) + δy, where

slide-56
SLIDE 56

10 Extra Slides 56

  • 1. intercept = 3.55
  • 2. β1,1 = β2,1 = β3,2 = 0.30,
  • 3. β1,2 = β2,2 = β3,1 = −0.30,
  • 4. δy is a random sample of size n from N(0, sd = 0.7),

and where the true nuisance functions are defined as

  • 1. ǫTRUE

1

(s1; η1) = 0.45 × δ1,

  • 2. ǫTRUE

2

(¯ s2, a1; η2) = (0.30 + 0.20s1 + 0.15a1 + 0.15a1s1 + 1.0 sin(4.5s1)) × δ2,

  • 3. ǫTRUE

3

(¯ s3, ¯ a2; η3) = (0.40 − 0.30s2 + 0.30a2 + 0.60a2s2 + 1.6 sin(2.5s2)) × δ3.

slide-57
SLIDE 57

10 Extra Slides 57

Scaled Root Mean Squared Difference

This is how we measured amount of mis-specification: SRMSD(ν) =

  • E

K

t=3 ǫTRUE t

− K

t=1 ǫν t (

η, γ) 2 V ar(Y ) , where ν corresponds to a mis-specified 2-Stage Regression Estimator. The expectation E and variance V ar in SRMSD are over the data D = ( ¯ S3, ¯ A3, Y ) for fixed ( η, γ). Calculated via Monte Carlo integration. I claim SRMSD has an “effect-size-like” interpretation.

slide-58
SLIDE 58

10 Extra Slides 58

Confounding in

PROSPECT

Before Weighting After Weighting Variable Absolute Effect Name Effect Size† Sign Size‡ A1 = HSANY 4 HAMDA 0 0.77 + 0.18 RE 0 0.64

  • 0.05

RE16N 0 0.66

  • 0.05

MCS 0 0.53

  • 0.08

MMSE2 0 0.45 + 0.22 SSI 0 0.40

  • 0.29

A2 = HSANY 8 CAD 4 0.80 + 0.27 DYSTH 0 0.69

  • 0.66

OPS 0 0.49

  • 0.07

HAMDA 4 0.55

  • 0.07

CAD 0 0.51 + 0.47 POSAF 0 0.53 +

  • 0.28

A3 = HSANY 12 CAD 8 0.90 + 0.37 WHITE1 0.71 + 0.08 RP16N 0 0.60 + 0.28