[PPT] - Assessing Proximal and Lagged Moderated Effects in Mobile Health PowerPoint Presentation

SLIDE 1

Assessing Proximal and Lagged Moderated Effects in Mobile Health

University of Washington Biostatistics Department Seattle, WA

November 2, 2016

Audrey Boruvka,1 Daniel Almirall,1 Katie Witkiewitz,2 and Susan A. Murphy1

1University of Michigan and 2University of New Mexico

SLIDE 2

Outline

1. Three examples: BASICS Mobile, HeartSteps and

Sense2Stop

2. What does data from a micro-randomized trial look like?
3. Proximal and lagged moderated effects
4. Estimating the proximal and lagged moderated effects
5. Simulation experiments
6. A data example using BASICS Mobile

1 / 34

SLIDE 3

BASICS-Mobile Example (College Drinking)

PI: Katie Witkiewitz

Smartphone-based intervention to curb heavy drinking and smoking in college students Data Collected EMA up to 3x/day (morning, aftern., eve) Intervention Frequency Up to 2x/day (afternoon, evening) Intervention Content Mindfulness-based message vs general health information (binary treatment) Intervention Availability Based on answering an EMA Typical Question Is the effect of providing a mindfulness-based intervention (vs GHI) on subsequent smoking rate moderated by increase in need to self-regulate?

2 / 34

SLIDE 4

HeartSteps Example (Physical Activity)

PI: Pedja Klasjna

Wearable activity-tracker + smartphone-based intervention to encourage physical activity Data Collected "Continuously" + EMA each evening Intervention Frequency Up to 5x/day (before work, lunch, 2pm, after work, eve) Intervention Content Delivers vs does not deliver (binary treatment) contextually relevant activity suggestion via the smartphone Intervention Availability Not in vehicle, not exercising, not "snooze" the app, phone on Typical Question Does time-of-day or the busyness influence the effect of suggesting an activity on step count?

3 / 34

SLIDE 5

Sense2Stop Example (Smoking Cessation)

PI: Bonnie Spring

Wearable chest-strap + wrist-band + smartphone-based intervention to sense stress and reduce smoking Data Collected "Continuously" + EMA Intervention Frequency 3x/day on average; with 50% chance

f happening when stressed and 50% chance of happening

when not stressed Intervention Content Deliver or not deliver prompt (binary treatment) via smartphone to use one of 3 stress-management apps Intervention Availability Not in vehicle, ≥ 60min since intervention, ≥ 10min since EMA, cannot have uncertain stress classification, phone on Typical Question Will delivering the message be more effective than not delivering the message in times of stress? In times of no stress? Or equally effective in either?

4 / 34

SLIDE 6

Data from a Micro-randomized Trial

t treatment occasion Xt individual and contextual characteristics at t At binary treatment at t Yt+1 continuous response following t and before t + 1 Ht history through t: ( ¯ Xt, ¯ Yt, ¯ At−1) Data in temporal order looks like this X1, A1, Y2, . . . , Xt, At,Yt+1, . . . , XT, AT, YT+1 ←←←←←← Ht, At,Yt+1, . . . ρt(1 | Ht) is known randomization probability P(At = 1 | Ht) that generates At

5 / 34

SLIDE 7

Example Data Structure

BASICS Mobile

. . . Morning At−1 t − 1 Afternoon At t Evening Morning . . . Xt−1 Xt Yt+1

6 / 34

SLIDE 8

Proximal moderated effect

At on Yt+1

Yt+1(¯ at) response, had the treatments ¯ at been provided S1t(¯ at−1) vector of candidate moderators from the history through t, Ht, had the treatments ¯ at−1 been provided The proximal treatment effect is E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0) | S1t( ¯ At−1)

.

A contrast of the average Yt+1 under treatment 1 vs the average Y under treatment 0, conditional on S1t. Think of S1t(¯ at−1) as a "State" of particular interest

7 / 34

SLIDE 9

Proximal moderated effect

At on Yt+1

A proximal treatment effect is E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0) | S1t( ¯ At−1)

.

S1t(¯ at−1) is low-dimensional, pre-selected by scientist. It can be the "empty set". It can include time trends. Proximal effect is averaged over any variables in Ht not represented in S1t. The definition depends on distribution of (past) treatments in the data.

8 / 34

SLIDE 10

What does a Proximal Effect look like?

9 / 34

SLIDE 11

Lagged moderated effect

At on Yt+2

A lagged treatment effect is E

Yt+2( ¯

At−1, 1, Aat=1

t+1 ) − Yt+2( ¯

At−1, 0, Aat=0

t+1 ) | S2t( ¯

At−1)

.

Aat=a

t+1 = At+1( ¯

At−1, a) S2t(¯ at−1) is again a low-dimensional, pre-selected by scientist Delayed effect is averaged over any variables in Ht not represented in Skt Delayed effect is averaged over future treatment Aat

t+1.

Here, lag = 2.

10 / 34

SLIDE 12

General case: Lag k treatment effects

E

Yt+k( ¯

At−1, 1, Aat=1

t+1 , . . . , Aat=1 t+k−1)

− E
Yt+k( ¯

At−1, 0, Aat=0

t+1 , . . . , Aat=0 t+k−1) | Skt( ¯

At−1)

.

where Aat=a

t+1 denotes At+1( ¯

At−1, a), Aat=a

t+2 denotes At+2( ¯

At−1, a, At+1( ¯ At−1, a)), and so on. k pre-selected by scientist; interest in multiple k’s is OK. Skt(¯ at−1) is again a low-dimensional, pre-selected by scientist for examining the lag k effect (could differ by k)

11 / 34

SLIDE 13

Identification (Effects in terms of observed data)

Under sequential randomization, consistency and positivity assumptions

The proximal treatment effect is E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0) | S1t( ¯ At−1)

= E [E[Yt+1 | At = 1, Ht] − E[Yt+1 | At = 0, Ht] | S1t]

= E I(At = 1)Yt+1 ρt(1 | Ht) − I(At = 0)Yt+1 1 − ρt(1 | Ht)

S1t
,

where ρt(1 | Ht) = Pr(At = 1 | Ht) is the probabilities used to randomize sequentially. Lagged treatment effects can be identified similarly.

12 / 34

SLIDE 14

Extension of the Structural Nested Mean Model

Treatment “blip" of at versus stochastic At(¯ at−1) on Yt+k is µt,t+k(Ht( ¯ At−1), ¯ At−1, at) = E

Yt+k( ¯

At−1, at, Aat=a

t+1 , . . . , Aat=a t+k−1) | Ht( ¯

At−1)

− E
Yt+k( ¯

At−1, At, At+1, At+k−1) | Ht( ¯ At−1)

= E
Yt+k( ¯

At) | At = at, Ht( ¯ At−1)

by Seq Ign. & Cons.

− E

Yt+k( ¯

At) | Ht( ¯ At−1)

13 / 34

SLIDE 15

Extension of the Structural Nested Mean Model

Treatment “blip" of at versus stochastic At(¯ at−1) on Yt+k is µt,t+k(Ht( ¯ At−1), ¯ At−1, at) = E

Yt+k( ¯

At−1, at, Aat=a

t+1 , . . . , Aat=a t+k−1) | Ht( ¯

At−1)

− E
Yt+k( ¯

At−1, At, At+1, At+k−1) | Ht( ¯ At−1)

= E
Yt+k( ¯

At) | At = at, Ht( ¯ At−1)

by Seq Ign. & Cons.

− E

Yt+k( ¯

At) | Ht( ¯ At−1)

Our lagged effects are expected contrasts of the “blips":

E

µt,t+k(Ht( ¯

At−1), ¯ At−1, 1) | Skt( ¯ At−1)

− E
µt,t+k(Ht( ¯

At−1), ¯ At−1, 0) | Skt( ¯ At−1)

= E
Yt+k( ¯

At−1, 1, Aat=1

t+1 , . . . , Aat=1 t+k−1)

Skt( ¯

At−1)

− E
Yt+k( ¯

At−1, 0, Aat=0

t+1 , . . . , Aat=0 t+k−1)

Skt( ¯

At−1)

,

13 / 34

SLIDE 16

The Notion of Availability

Not all individuals are available for treatment at all time points (e.g., Wang et al. 2012; Robins 2004). For simplicity, we define this in terms of the observed data. E [E[Yt+k | At = 1, It = 1, Ht] | It = 1, Skt] − E [E[Yt+k | At = 0, It = 1, Ht] | It = 1, Skt] = E 1(At = 1)Yt+1 ρt(1 | Ht) − 1(At = 0)Yt+1 1 − ρt(1 | Ht)

It = 1, Skt
,

Note that It = 1 is not a static subpopulation; and we expect prior treatment to effect it.

14 / 34

SLIDE 17

Modeling assumptions

We consider linear models for each lag k effect E [E[Yt+k | At = 1, It = 1, Ht] | It = 1, Skt] − E [E[Yt+k | At = 0, It = 1, Ht] | It = 1, Skt] = fk(t, Skt)⊺βk. Recall k = 1 is the proximal effect. These models do not constrain each other across k (Robins, Rotnitzky and Scharfstein 2000, Theorem 8.6). We assume these treatment effect models are correct.

15 / 34

SLIDE 18

Modeling assumptions: Example 1

Suppose Skt is the null set. Here, the analyst is interested in a marginal effect that could vary over time; for example, E [E[Yt+1 | At = 1, It = 1, Ht] | It = 1] − E [E[Yt+1 | At = 0, It = 1, Ht] | It = 1] = βk1 + βk2t + βk3t2.

16 / 34

SLIDE 19

What does a Proximal Effect look like?

17 / 34

SLIDE 20

Modeling assumptions: Example 2

Suppose Skt = Stresst is binary. Here, an example model is E [E[Yt+1 | At = 1, It = 1, Ht] | It = 1, Stresst] − E [E[Yt+1 | At = 0, It = 1, Ht] | It = 1, Stresst] = βk1 + βk2Stresst + βk3t + βk4tStresst. Here, the moderators of interest to the scientist are time and Stress.

18 / 34

SLIDE 21

Estimation

What would we like in an estimator?

Easy Familiar and easy-to-use estimation method that allows the scientist to Versatile Examine proximal or lagged effects of At conditional on any Stk, a low-dim subset of Ht Efficiency While incorporating working knowledge about the association of Ht and Yt+k for statistical power Robustness Yet not requiring this working knowledge to be correct–which can be difficult or impossible! Recall Ht is high-dimensional (especially in mobile health!)

19 / 34

SLIDE 22

Weighted & Centered least squares

Scientist already selected k, Skt, and trt effect models ˜ At = At − ˜ pt(1 | S1t), centered treatment Wt =

˜

pt(1|Skt) ρt(1|Ht)

At

1−˜ pt(1|Skt) 1−ρt(1|Ht)

(1−At) gkt(Ht)⊺αk is a working model for E[WtYt+k | Ht]

20 / 34

SLIDE 23

Weighted & Centered least squares

Scientist already selected k, Skt, and trt effect models ˜ At = At − ˜ pt(1 | S1t), centered treatment Wt =

˜

pt(1|Skt) ρt(1|Ht)

At

1−˜ pt(1|Skt) 1−ρt(1|Ht)

(1−At) gkt(Ht)⊺αk is a working model for E[WtYt+k | Ht] Solve for (αk, βk) in 0 = Pn UW (αk, βk), where UW =

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

20 / 34

SLIDE 24

Weighted & Centered least squares

Scientist already selected k, Skt, and trt effect models ˜ At = At − ˜ pt(1 | S1t), centered treatment Wt =

˜

pt(1|Skt) ρt(1|Ht)

At

1−˜ pt(1|Skt) 1−ρt(1|Ht)

(1−At) gkt(Ht)⊺αk is a working model for E[WtYt+k | Ht] Solve for (αk, βk) in 0 = Pn UW (αk, βk), where UW =

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

More Intuitively: Just do Yt+k

Wt

∼ gkt(Ht)

Efficiency

+ ˜ Atfk(t, Skt)

Causal

20 / 34

SLIDE 25

The Weights

Reminiscent of IPTW in causal inference but no confounding in MRTs!

Weighting is used to facilitate estimating marginal quantities Weighting + centering are used to make ˆ βk robust against misspecified gkt(Ht)⊺αk About the Numerator ˜ pt(1 | Skt):

1. Determines the limit of ˆ

βk when the modeling assumption for the lag k treatment effect is false

2. Bias can result if we “stabilize” numerator with variables

not in Skt

21 / 34

SLIDE 26

Weighted & Centered least squares: Special Case 1

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose ρt = 1/4

Choose k = 1 and S1t = ∅; Model the marginal proximal effect E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0)

using β11

22 / 34

SLIDE 27

Weighted & Centered least squares: Special Case 1

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose ρt = 1/4

Choose k = 1 and S1t = ∅; Model the marginal proximal effect E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0)

using β11

Suppose g1t(Ht)⊺α1 = α10 + α11Ht (working model)

22 / 34

SLIDE 28

Weighted & Centered least squares: Special Case 1

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose ρt = 1/4

Choose k = 1 and S1t = ∅; Model the marginal proximal effect E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0)

using β11

Suppose g1t(Ht)⊺α1 = α10 + α11Ht (working model) Just “standard least squares” Yt+1 ∼ int + Ht

Efficiency

+ (At − 1/4)

Effect

22 / 34

SLIDE 29

Weighted & Centered least squares: Special Case 2

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose all vars. in micro-randomization of interest as

moderators S1t = Ht. So pt(1 | S1t) = ρt(1 | Ht), or Wt = 1 and g1t(Ht)⊺αk is now a working model for E[Yt+1 | Ht].

23 / 34

SLIDE 30

Weighted & Centered least squares: Special Case 2

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose all vars. in micro-randomization of interest as

moderators S1t = Ht. So pt(1 | S1t) = ρt(1 | Ht), or Wt = 1 and g1t(Ht)⊺αk is now a working model for E[Yt+1 | Ht]. Choose k = 1; Use β11 + β12S1t to model the proximal effects E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0) | S1t( ¯ At−1)

Suppose g1t(Ht)⊺α1 = α10 + α11Ht (working model)

23 / 34

SLIDE 31

Weighted & Centered least squares: Special Case 2

Recall the estimating function

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Suppose all vars. in micro-randomization of interest as

moderators S1t = Ht. So pt(1 | S1t) = ρt(1 | Ht), or Wt = 1 and g1t(Ht)⊺αk is now a working model for E[Yt+1 | Ht]. Choose k = 1; Use β11 + β12S1t to model the proximal effects E

Yt+1( ¯

At−1, 1) − Yt+1( ¯ At−1, 0) | S1t( ¯ At−1)

Suppose g1t(Ht)⊺α1 = α10 + α11Ht (working model)

Just “centered least squares” Yt+1 ∼ int + Ht

Efficiency

+ ˜ At + ˜ AtS1t

Effect

23 / 34

SLIDE 32

How does this compare to “a standard” GEE?

Proposed

T−k+1

t=1
Yt+k − gkt(Ht)⊺αk − ˜

Atfk(t, Skt)⊺βk

Wt
gkt(Ht)

˜ Atfk(t, Skt)

Standard GEE with ind. working correlation

T−k+1

t=1

(Yt+k − gkt(Ht)⊺αk − Atfk(t, Skt)⊺βk)

gkt(Ht)

Atfk(t, Skt)

,

but this requires E[Yt+k | Ht, At] =

Got it right

gkt(Ht)⊺αk +

Got it right

Atfk(t, Skt)⊺βk

(difficult or impossible, esp. if scientist interested in multiple k)

24 / 34

SLIDE 33

Implementation is easy

Estimation can be implemented with standard GEE software. Availability? Just replace Wt with ItWt. Only the independence working correlation structure may be

employed. Alternative structures induce bias.

Extra code (available in R) is needed for SEs with estimated (i) numerator or (ii) denominator of the weights (i.e., not using MRT data).

25 / 34

SLIDE 34

Simulation Experiment 1

Omitting an underlying moderator variable induces bias in standard GEE but not in our proposed Weighting and Centering Estimator.

26 / 34

SLIDE 35

Simulation Experiment 1 with n = T = 30

Yt+1 = 0.8(St − 0.5) + (At − ρt(1 | Ht))(−0.2 + β∗

11St) + ϵt+1

ϵt ∼ N(0, 1) with Corr(ϵu, ϵt) = 0.5|u−t| ρt(1 | Ht) = expit(−0.8At−1 + 0.8St) for St ∈ (−1, 1) Pr(St = 1 | At−1, Ht−1) = 0.5 Proximal Effect conditional on St = E[E[Yt+1 | At = 1, Ht] − E[Yt+1 | At = 0, Ht] | St] = −0.2 + β∗

11St

Marginal Effect E[E[Yt+1 | At = 1, Ht] − E[Yt+1 | At = 0, Ht]] = −0.2 + β∗

11E[St] = −0.2

We will vary β∗

11; and compare with GEE

27 / 34

SLIDE 36

Experiment 1: Omitting an Underlying Moderator

Results

Simila results for varying levels of n, T, marginal proximal effect sizes and type of residual correlation structures in the generative model.

28 / 34

SLIDE 37

Two Other Simulation Experiments in Paper

Illustrate how bias can result if we “stabilize” numerator with variables not in Skt. Illustrate how employing a non-independence working correlation structure with the weighting and centering can result in bias

29 / 34

SLIDE 38

4 Easy, Take Home Messages

1. We proposed a new estimand, particularly useful to

behavioral intervention scientists working in mHealth.

2. Weighting and centering allow us to estimate causal effects

that are marginal over Ht and robust to mis-specification

f E[WtYt+k | Ht]
3. Easy implementation using "over the counter" GEE

software

4. For now... we advise using only an independent

within-person correlation matrix (i.e., do not use random effects modeling).

30 / 34

SLIDE 39

Recall BASICS Mobile Example

Smartphone-based intervention to curb heavy drinking and smoking in college students Data Collected EMA up to 3x/day (morning, aftern., eve) Intervention Frequency Up to 2x/day (afternoon, evening) Intervention Content Mindfulness-based message vs general health information (binary) Intervention Availability Based on answering an EMA Typical Question Is the effect of providing a mindfulness-based intervention (vs general health information) on subsequent smoking rate moderated by increase in need to self-regulate?

31 / 34

SLIDE 40

BASICS Mobile Data Example

At indicator that the user rec’vd mindfulness-based message Yt+1 smoking rate reported at the EMA following At k examined a proximal (k = 1) and a delayed (k = 2) effect S1t indicator of increased self-regulation from t − 1 to t S2t is the empty set (marginal delayed effect) pt(1 | It = 1, Ht) Mindfulness treatment more likely if urge is high or past smoking is high ˜ pt(1 | It = 1, St) estimated using Pn T

t=1 At/T = 0.67

Treatment effect Estimate SE 95% CI p-value Proximal, ↑ self-reg −0.05 0.94 (−2.03, 1.93) 0.96 Proximal, ✁

✁

↑ self-reg −2.78 1.27 (−5.47, −0.10) 0.04 Delayed −0.47 0.60 (−1.74, 0.80) 0.45

32 / 34

SLIDE 41

Future Work

Apply these methods with HeartSteps and Smoking

Cessation Data

How best to include random effects?
Variable selection (e.g., penalization) for the covariates

used in the working model

Is there a scientific rationale for sharing parameters across

k (proximal and lagged treatment effects)?

Does estimating and choosing an appropriate model for

working diagonal variance structure increase efficiency?

33 / 34

SLIDE 42

Thank you!

My email: dalmiral@umich.edu My colleagues: Susan, Audrey and Katie

34 / 34