Bridging observational studies and randomized experiments by - PowerPoint PPT Presentation

Bridging observational studies and randomized experiments by embedding the former in the latter D.B. Rubin (with M.-A. Bind) LISER July 13 th 2017 July 13 th 2017 D.B. Rubin LISER 1 / 29

Once upon a time... many people smoked July 13 th 2017 D.B. Rubin LISER 2 / 29

Today : how would you investigate whether parental smoking has an impact on children ? July 13 th 2017 D.B. Rubin LISER 3 / 29

Does parental smoking have an impact on children’s lung function ? Goal : Quantify the impact of smoking / benefit of smoking reduction July 13 th 2017 D.B. Rubin LISER 4 / 29

Four stages to address causality from an observational dataset 1) A conceptual stage 2) A design stage 3) A statistical analysis stage 4) A summary stage July 13 th 2017 D.B. Rubin LISER 5 / 29

First two stages to address causality 1) A conceptual stage that involves the precise formulation of the causal question (and related assumptions) using potential outcomes and described in terms of a hypothetical randomized experiment where the exposure is randomly assigned to units ; this description includes the timing of random assignment and defines the target population ; no computation is needed at this stage. 2) A design stage that attempts to reconstruct (or approximate) the design of a randomized experiment before any outcome data are observed (that is, with unconfounded assignment of exposure using the observed background and treatment assignment data) ; typically, heavy use of computing is needed at this stage, e.g., for multivariate matched sampling and extensive balance diagnostics. July 13 th 2017 D.B. Rubin LISER 6 / 29

Last two stages to address causality 3) A statistical analysis stage defined in a protocol explicated before seeing any outcome data, comparing the outcomes of interest in similar (e.g., hypothetically randomly divided) exposed and non-exposed units of the hypothetical randomized experiment ; this stage is the one that most closely parallels the standard model-based analyses but uses more flexible methods. 4) A summary stage providing conclusions about statistical evidence for the sizes of possible causal effects of the exposure ; no computing is required at this stage, just thoughtful summarization, e.g., focusing on actual world interventions. July 13 th 2017 D.B. Rubin LISER 7 / 29

First stage : formulation of the causal question in terms of a hypothetical randomized experiment using potential outcomes for the observational dataset i Age Height Sex Parental smoking FEV-1(0) FEV-1(1) 1 ... N=654 July 13 th 2017 D.B. Rubin LISER 8 / 29

Non-exposed vs. exposed children What can you say about the distributions of age in the 654 children ? July 13 th 2017 D.B. Rubin LISER 9 / 29

Non-exposed vs. exposed children What can you say about the distributions of height in the 654 children ? July 13 th 2017 D.B. Rubin LISER 9 / 29

Hypothetical experiments All hypothetical experiments assumed the 654 families have smoking parents and a child yet to be born. We also assume full compliance with the assigned treatment. Hypothetical experiment A Suppose we intervene on smoking households before they have 9 children and randomize them to stop smoking with probability 10 , 1 and thus with probability 10 to continue to smoke. Result is a completely randomized experiment with N Smoking =65 children with smoking parents and N Non − Smoking =589 children with non-smoking parents. July 13 th 2017 D.B. Rubin LISER 10 / 29

Hypothetical experiments Hypothetical experiment B Suppose we selected boundaries for the covariates age and height (viewed as surrogates for properties of parents) and restricted the experiment to the 361 families who fell within those boundaries. For the restricted families, we assigned (completely at random), 61 of the families to continue to smoking, and 300 to stop smoking. This strategy led to N Smoking =61 children with smoking parents and N Non − smoking =300 children with non-smoking parents. July 13 th 2017 D.B. Rubin LISER 11 / 29

Hypothetical experiments Hypothetical experiment C A randomized block experiment with blocks that are relatively homogenous with respect to age and height , viewed as properties of parents. Specification of rules for blocking unclear ( cem ). This formulation led to N Smoking =57 children with smoking parents and N Non − smoking =216 children with non-smoking parents. July 13 th 2017 D.B. Rubin LISER 12 / 29

Hypothetical experiments Other hypothetical randomized experiments would also intervene on smoking parents before their child’s conception ; we describe two such experiments, both of which selected a pool of 126 families in a non-random way that depended on the covariates age and height , again viewed as characteristics of the parents. First, Hypothetical experiment D.1 , a completely randomized experiment that create two equal-sized groups of parents similar on background characteristics, that is, N Smoking =N Non − Smoking =63 children. Or second, Hypothetical experiment D.2 : a rerandomized experiment with two equal-sized groups of similar parents (with N Smoking =N Non − Smoking =63) for which the randomized allocations are allowed only when parents’ covariates (e.g., height ) mean differences between smokers and non-smokers are within some a priori defined calipers. July 13 th 2017 D.B. Rubin LISER 13 / 29

Hypothetical experiments Another hypothetical randomized experiment, Hypothetical experiment E , would intervene after the child’s conception, from the point in time for which we know the child’s gender. We select a pool of 126 children according to the following rule : we select the children according to the same rule as in Hypothetical experiment D , EXCEPT , we reject all samples of 126 with an odd number of females or an odd number of males. We create pairs of "similar" parents expecting a child with same gender, where "similar" means time of conception and height of the parents. A coin flip determines which parents of a pair of two similar parents is randomized to be exposed to still-smoking parents, with N Smoking = N Non − Smoking = 63 children). July 13 th 2017 D.B. Rubin LISER 14 / 29

Second stage : Design stage that attempts to reconstruct the ideal conditions for a randomized experiment What type(s) of design do you know ? July 13 th 2017 D.B. Rubin LISER 15 / 29

Second stage : Design stage that attempts to reconstruct the ideal conditions for a randomized experiment July 13 th 2017 D.B. Rubin LISER 15 / 29

A few quotes on matching (Imbens and Rubin, 2016) Matching can be interpreted as reorganizing the data from an observational study in such a way that the assumptions from a randomized experiment hold, at least approximately. Unconfoundedness is not guaranteed (as it is in expectation for randomized experiment). Matching may be inexact, systematic differences in pre-exposure variables across the matched pairs may remain but can be subsequently adjusted in the analysis stage. July 13 th 2017 D.B. Rubin LISER 16 / 29

One approach using propensity score matching strategy Overall picture, compare "like with like". logit P(Smoking=1|Age, Height, Sex) = β 0 + β 1 Age + β 2 Height + β 3 Sex Fitted values = � P ( Smoking = 1 | Age , Height , Sex ) = Propensity score i Age Height Sex Parental smoking Propensity score 1 9 58 1 1 0.01 ... ... ... ... ... ... 654 9 58 1 0 0.01 1-1 matching with caliper on the estimated propensity score. July 13 th 2017 D.B. Rubin LISER 17 / 29

Overlap 154 unexposed children were "unmatchable" to exposed children (i.e., outside of the range of the other exposed children in terms of covariates) and 2 exposed children were "unmatchable" to unexposed children. July 13 th 2017 D.B. Rubin LISER 18 / 29

Overlap After trimming and refitting, July 13 th 2017 D.B. Rubin LISER 18 / 29

Matched pairs based on propensity score � Pair Age Height Sex Parental smoking PS 1 (9, 9) (58, 58) (1,1) (1,0) (0.01, 0.01) ... ... ... ... ... ... 63 (18, 16) (70.5, 66.5) (1,0) (1,0) (0.6, 0.6) We ended up with N=126 children (i.e., N T =N C =63 "similar" matched pairs). July 13 th 2017 D.B. Rubin LISER 19 / 29

Diagnostics for second stage : Love plots Propensity score, Black : before matching, grey : after matching July 13 th 2017 D.B. Rubin LISER 20 / 29

Diagnostics for second stage : Love plots After optimal matching, Black : before matching, grey : after matching July 13 th 2017 D.B. Rubin LISER 20 / 29

Diagnostics for second stage : Age histograms in original, ps-matched, optimal paired datasets KS p-values for : 1) before matching = 10 − 16 , 2) after matching, not significant July 13 th 2017 D.B. Rubin LISER 21 / 29

Diagnostics for second stage : Height histograms in original, ps-matched, optimal paired datasets KS p-values for : 1) before matching = 10 − 12 , 2) after matching, not significant July 13 th 2017 D.B. Rubin LISER 22 / 29

Third stage : Analysis stage that compares the outcomes of interest in the exposed versus non-exposed units of the hypothetical randomized experiment July 13 th 2017 D.B. Rubin LISER 23 / 29

Bridging observational studies and randomized experiments by - PowerPoint PPT Presentation

Bridging observational studies and randomized experiments by embedding the former in the latter D.B. Rubin (with M.-A. Bind) LISER July 13 th 2017 July 13 th 2017 D.B. Rubin LISER 1 / 29 Once upon a time... many people smoked July 13 th 2017

Lecture 6/Chapters 5&6 Observational Studies & Review Advantages of Observational

Observational studies and experiments Introduction to Data Types of studies Observational

Experimental Designs leading to multiple regression analysis 1. (Randomized) designed

11/11/2014 Chapter 12 EXPERIMENTS AND OBSERVATIONAL STUDIES 1 TYPES OF STUDIES An

Chapter 2: Observational Studies In an observational study the subjects determine whether

Randomized Experiments The goal of randomized experiments is to identify The causal

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Observational Methods and NATM NATM System for Observational approach to tunnel design Eurocode

Abstract Claims coming from human medical observational studies, when tested rigorously, most

Experimental Design and the Search for Quasi-Experiments Department of Government London School

Lecture 6/Chapters 5&6 backward in time, about the past. Observational Studies & Review

Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Statin Use and Risk of Breast Cancer: A Meta-analysis of Observational Studies Krishna Undela

Introduction to Observational Studies Deborah Friedman, MD, MPH University of Texas Southwestern

Observational and Numerical Observational and Numerical Study of Ocean Dynamics over Study of

5C Three Assessment Tools to Strengthen PBIS Implementation Ken Fitzgerald Stanislaus County

String Oriented Programming: When ASLR is not enough Mathias Payer* and Thomas R. Gross

How to Randomize? Bruno Crepon J-PAL Lecture Overview Unit and method of randomization

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start & Matched-Control Design

Simulation experiment based on William F Rosenberger, Feifang Hu (2004), " Maximizing power

Social Networks and the Decision to Insure: Evidence from Randomized Experiments in China J ING C

From Research to Policy in School Management and Accountability: Evidence from Randomized

KDIGO of CKD G4+: Evidence from Randomized Trials Navdeep Tangri MD PhD FRCP(C) Associate

Sambuz

Useful Links

Newsletter

Mail Us

Bridging observational studies and randomized experiments by - PowerPoint PPT Presentation

Bridging observational studies and randomized experiments by embedding the former in the latter D.B. Rubin (with M.-A. Bind) LISER July 13 th 2017 July 13 th 2017 D.B. Rubin LISER 1 / 29 Once upon a time... many people smoked July 13 th 2017

Lecture 6/Chapters 5&amp;6 Observational Studies &amp; Review Advantages of Observational

Observational studies and experiments Introduction to Data Types of studies Observational

Experimental Designs leading to multiple regression analysis 1. (Randomized) designed

11/11/2014 Chapter 12 EXPERIMENTS AND OBSERVATIONAL STUDIES 1 TYPES OF STUDIES An

Chapter 2: Observational Studies In an observational study the subjects determine whether

Randomized Experiments The goal of randomized experiments is to identify The causal

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Observational Methods and NATM NATM System for Observational approach to tunnel design Eurocode

Abstract Claims coming from human medical observational studies, when tested rigorously, most

Experimental Design and the Search for Quasi-Experiments Department of Government London School

Lecture 6/Chapters 5&amp;6 backward in time, about the past. Observational Studies &amp; Review

Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah &amp; Karan Singh 1 Randomized

Statin Use and Risk of Breast Cancer: A Meta-analysis of Observational Studies Krishna Undela

Introduction to Observational Studies Deborah Friedman, MD, MPH University of Texas Southwestern

Observational and Numerical Observational and Numerical Study of Ocean Dynamics over Study of

5C Three Assessment Tools to Strengthen PBIS Implementation Ken Fitzgerald Stanislaus County

String Oriented Programming: When ASLR is not enough Mathias Payer* and Thomas R. Gross

How to Randomize? Bruno Crepon J-PAL Lecture Overview Unit and method of randomization

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start &amp; Matched-Control Design

Simulation experiment based on William F Rosenberger, Feifang Hu (2004), &quot; Maximizing power

Social Networks and the Decision to Insure: Evidence from Randomized Experiments in China J ING C

From Research to Policy in School Management and Accountability: Evidence from Randomized

KDIGO of CKD G4+: Evidence from Randomized Trials Navdeep Tangri MD PhD FRCP(C) Associate

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 6/Chapters 5&6 Observational Studies & Review Advantages of Observational

Lecture 6/Chapters 5&6 backward in time, about the past. Observational Studies & Review

CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start & Matched-Control Design

Simulation experiment based on William F Rosenberger, Feifang Hu (2004), " Maximizing power