Generalizing experimental study results to target populations - - PowerPoint PPT Presentation

▶

Nov 02, 2022 502 likes •774 views

Generalizing experimental study results to target populations Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Departments of Mental Health, Biostatistics, and Health Policy and Management estuart@jhu.edu www.biostat.jhsph.edu/

SLIDE 1

Generalizing experimental study results to target populations

Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Departments of Mental Health, Biostatistics, and Health Policy and Management

estuart@jhu.edu www.biostat.jhsph.edu/∼estuart Funding thanks to NSF DRL-1335843, IES R305D150003

February 26, 2016

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 1 / 25

SLIDE 2

Outline

1

Introduction, context, and framework

2

The setting and overview of approaches

3

Reweighting approaches

4

Conclusions

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 2 / 25

SLIDE 3

Outline

1

Introduction, context, and framework

2

The setting and overview of approaches

3

Reweighting approaches

4

Conclusions

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 3 / 25

SLIDE 4

Making research results relevant: A range of policy or practice questions

A given district or school may go on to the What Works Clearinghouse to see whether a new reading intervention is “evidence-based” and helpful for them The state of Maryland may be deciding whether to recommend the new program for all schools or districts in the state

Or for all “struggling” schools?

Medicare may be deciding whether or not to approve payment for a new treatment for back pain Should a broad public health media campaign be started around not switching car seats to forward facing until a child is 12 months old?

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 4 / 25

SLIDE 5

From individual to population effects

All of these reflect a “population” average treatment effect

e.g., across individuals in a population, does this intervention work “on average”? This population could be fairly narrow, or quite broad

There may actually be underlying treatment effect heterogeneity

e.g., stronger effects for some individuals Lots of interest in tailoring treatments for individuals; not my focus today

But for policy questions that motivate today’s talk, desire an overall average effect

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 5 / 25

SLIDE 6

At this point, relatively little attention to how well results from a given study might carry over to a relevant target population This talk will discuss recent work trying to get people to start thinking about these issues, while taking advantage of recent advances in study quality and data

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 6 / 25

SLIDE 7

How much do we need to worry about external validity?

Lots of evidence that the people or groups that participate in trials differ from general populations

Will cause bias if the factors that differ also moderate treatment effects

Districts that participate in rigorous educational evaluations much larger than typical districts in the US (Stuart et al., under review) People that participate in trials of drug abuse treatment have higher education levels than those in drug abuse treatment nationwide (Susukida et al., in press) Increasing worries about lack of minority representation in clinical trials And these differences can lead to external validity bias (Bell et al., in press)

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 7 / 25

SLIDE 8

Outline

1

Introduction, context, and framework

2

The setting and overview of approaches

3

Reweighting approaches

4

Conclusions

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 8 / 25

SLIDE 9

The setting

Assume we have one randomized trial, already conducted And also covariate data on some target population of interest (do not have treatment values or outcomes in the population) The question: How can we use these data to estimate the effects of the intervention in the target population? Note: Focused on assessing and enhancing external validity with respect to the characteristics of trial and population subjects Lots of other threats to external validity as well: scale-up problems, implementation, different settings, . . . (see Cook, 2014)

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 9 / 25

SLIDE 10

Analysis approaches for estimating population effects

Meta-analysis: When multiple studies available, but does not necessarily give population estimates Cross-design synthesis: Explicitly combines experimental and non-experimental effect estimates (Pressler & Kaizar, 2013) Model-based approaches: Model outcome in the trial, use to predict outcomes in the population (e.g., BART; Kern et al., 2016) Post-stratification: Estimate separate effects, then combine using population proportions Reweighting: Like a smoothed version of post-stratification (Cole & Stuart, 2009; O’Muircheartaigh & Hedges, 2014) (Of course design options exist too, e.g., aiming to enroll representative (or “balanced”) samples (Royall!))

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 10 / 25

SLIDE 11

Outline

1

Introduction, context, and framework

2

The setting and overview of approaches

3

Reweighting approaches

4

Conclusions

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 11 / 25

SLIDE 12

Case study: The ACTG Trial

Examined highly active antiretroviral (HAART) therapy for HIV compared to standard combination therapy 577 US HIV+ adults randomized to treatment, 579 to control 33/577 and 63/579 endpoints (AIDS/death) during 52-week follow-up Intent-to-treat analysis: Hazard ratio of 0.51 (95% CI: 0.33, 0.77) Cole & Stuart (2010)

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 12 / 25

SLIDE 13

The target population

Don’t necessarily just care about people in trial What would the effects of the treatment be if implemented nationwide? US estimates of the number of people infected with HIV in 2006 (CDC, 2008) HIV incidence was estimated using a statistical approach with adjustment for testing frequency and extrapolated to the US Have joint distribution of sex, race, and age groups of the newly infected individuals

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 13 / 25

SLIDE 14

Inverse probability of selection weighting

Weight the trial subjects up to the population Each subject in trial receives weight wi =

1 P(Si=1|X)

(Inverse of their probability of being in the trial)

Use those weights when calculating means or running regressions Related to inverse probability of treatment weighting, Horvitz-Thompson estimation in surveys

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 14 / 25

SLIDE 15

Standard assumptions

Experiment was randomized “Sample ignorability for treatment effects”: selection into the trial independent of impacts given the observed covariates

For the same value of observed covariates, impacts the same across trial and population No unmeasured variables related to selection into the trial and treatment effects (Sensitivity analysis for this: Nguyen et al., under review)

“Overlap”: all individuals in the population had a non-zero probability

f participating in the trial

Analogous to strong ignorability/unconfoundedness of treatment assignment in non-experimental studies (If outcome under control observed in the population, can use a slightly different assumption)

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 15 / 25

SLIDE 16

Effect heterogeneity and predictors of participation

People in trial more likely to be:

Older (not 13-29) Male White or Hispanic

Those characteristics also moderate effects in the trial

Detrimental effects for young people Largest effects for those 30-39 Larger effects for males, as compared to females Larger effects for blacks, as compared to White or Hispanic

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 16 / 25

SLIDE 17

Estimated population effects

Hazard ratio 95% CI Crude trial results 0.51 0.33, 0.77 Age weighted 0.68 0.39, 1.17 Sex weighted 0.53 0.34, 0.82 Race weighted 0.46 0.29, 0.72 Age-sex-race weighted 0.57 0.33, 1.00 CI’s longer for weighted results Effects generally somewhat attenuated, except for weighting only by race

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 17 / 25

SLIDE 18

Placebo checks

Can also use the weighting as a diagnostic Weighted control group mean should match the population outcome mean if the control conditions are the same (“placebo check”) In HAART case, if we had mortality information in the population, could see if weighted mortality rate among control group matched the population mortality rate (assuming no treatment in the population) If placebo check fails, may indicate unobserved differences between the groups Hartman et al., 2013; Stuart et al., 2011

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 18 / 25

SLIDE 19

Outline

1

Introduction, context, and framework

2

The setting and overview of approaches

3

Reweighting approaches

4

Conclusions

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 19 / 25

SLIDE 20

Everyone wants to assume that study results generalize

But very few statistical methods exist At this point, lots of “hand waving,” qualitative statements Need more statistical methods to quantify and improve external validity

For both study design and study analysis

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 20 / 25

SLIDE 21

What do we need to assess and enhance external validity?

Information on the factors that influence treatment effect heterogeneity Information on the factors that influence participation in rigorous evaluations Data on all of these factors in the trial and the population

Not very helpful if these factors not observed in the population

Methods that allow for the differences between trial and population

n these factors

These are coming along

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 21 / 25

SLIDE 22

Data a primary limiting factor

Right now we have very little information on factors that influence effects or participation in trials Sometimes hard to find population data Trial data also often not publicly available Even harder to find population data that has the same measures as trial of interest

Stuart & Rhodes (under review): Hard to find appropriate population data, and even then out of over 400 measures in each, only about 7 were comparable

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 22 / 25

SLIDE 23

Conclusions

Can’t necessarily assume that average effects seen in a trial would carry over directly to a target population Methods allow us to adjust for differences in observed characteristics between the trial sample and population to estimate population treatment effects But only as good as the data available!

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 23 / 25

SLIDE 24

And remember . . .

“With better data, fewer assumptions are needed.”

Rubin (2005, p. 324)

“You can’t fix by analysis what you bungled by design.”

Light, Singer and Willett (1990, p. v)

“Real world relationships are invariably more complicated than those we can represent in mathematically tractable models.”

Royall and Pfeffermann (1981, p. 16)

Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 24 / 25

SLIDE 25

References, with thanks to all my co-authors

Bell, S.H., Olsen, R.B., Orr, L.L., and Stuart, E.A. (in press). Estimates of external validity bias when impact evaluations select sites non-randomly. Forthcoming in Education Evaluation and Policy Analysis. Cole, S.R. and Stuart, E.A. (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG-320 trial. American Journal of Epidemiology 172: 107-115. Imai, K., King, G., and Stuart, E.A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A 171: 481-502. Kern, H.L., Stuart, E.A., Hill, J., and Green, D.P. (2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness. Olsen, R., Bell, S., Orr, L., and Stuart, E.A. (2013). External Validity in Policy Evaluations that Choose Sites

Purposively. Journal of Policy Analysis and Management 32(1): 107-121

Stuart, E.A., Cole, S.R., Bradshaw, C.P., and Leaf, P.J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. The Journal of the Royal Statistical Society, Series A 174(2): 369-386. Stuart, E.A., Bradshaw, C.P., and Leaf, P.J. (2015). Assessing the generalizability of randomized trial results to target

populations. Prevention Science 16(3): 475-485.

Susukida, R., Crum, R., Stuart, E.A., and Mojtabai, R. (in press). Assessing Sample Representativeness in Randomized Control Trials: Application to the National Institute of Drug Abuse Clinical Trials Network. Forthcoming in Addiction. Elizabeth Stuart (JHSPH) Generalizability February 26, 2016 25 / 25