Why Propensity Scores Should Be Used for Matching Ben Jann - PowerPoint PPT Presentation

Why Propensity Scores Should Be Used for Matching Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 German Stata Users Group Meeting Berlin, June 23, 2017 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 1

Contents Potential Outcomes and Causal Inference 1 Matching 2 Propensity Score Matching 3 King and Nielsen’s “Why Propensity Scores Should Not Be Used for 4 Matching” Are King and Nielsen right? 5 Illustration using kmatch 6 Conclusions 7 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 2

Counterfactual Causality (see Neyman 1923, Rubin 1974, 1990) a.k.a. Rubin Causal Model a.k.a. Potential Outcomes Framework John Stuart Mill (1806–1873) Thus, if a person eats of a particular dish, and dies in consequence, that is, would not have died if he had not eaten of it, people would be apt to say that eating of that dish was the cause of his death. (Mill 2002[1843]:214) Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 3

Counterfactual Causality (see Neyman 1923, Rubin 1974, 1990) a.k.a. Rubin Causal Model a.k.a. Potential Outcomes Framework Treatment variable D � 1 treatment (eats of a particular dish) D = 0 control (does not eat of a particular dish) Potential outcomes Y 1 and Y 0 ◮ Y 1 : potential outcome with treatment ( D = 1) ⋆ If person i would eat of a particular dish, would she die or would she survive? ◮ Y 0 : potential outcome without treatment ( D = 0) ⋆ If person i would not eat of a particular dish, would she die or would she survive? Causal effect of the treatment for individual i : causal effect = difference between potential outcomes δ i = Y 1 i − Y 0 i Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 4

Fundamental Problem of Causal Inference The causal effect of D on Y for individual i is defined as the difference in potential outcomes: δ i = Y 1 i − Y 0 i However, the observed outcome variable is � Y 1 if D i = 1 i Y i = Y 0 if D i = 0 i That is, only one of the two potential outcomes will be realized and, hence, only Y 1 i or Y 0 i can be observed, but never both. Consequence: The individual treatment effect δ i cannot be observed! Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 5

Average Treatment Effect Although individual causal effects cannot be observed, the average causal effect in a population (the so-called “Average Treatment Effect”) can be identified comparing the expected values of Y 1 and Y 0 : ATE = E [ δ ] = E [ Y 1 − Y 0 ] = E [ Y 1 ] − E [ Y 0 ] Some other quantities of interest: ◮ Average Treatment Effect on the Treated (ATT) ATT = E [ Y 1 − Y 0 | D = 1 ] = E [ Y 1 | D = 1 ] − E [ Y 0 | D = 1 ] ◮ Average Treatment Effect on the Untreated (ATC) ATC = E [ Y 1 − Y 0 | D = 0 ] = E [ Y 1 | D = 0 ] − E [ Y 0 | D = 0 ] Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 6

Average Treatment Effect To determine the average effect, unbiased estimates of E [ Y 0 ] and E [ Y 1 ] are required. If the independence assumption ( Y 0 , Y 1 ) ⊥ ⊥ D applies, that is, if D is independent from Y 0 and Y 1 , then E [ Y 0 ] = E [ Y 0 | D = 0 ] E [ Y 1 ] = E [ Y 1 | D = 1 ] In this case the average causal effect can be be measured by a simple group comparison (mean difference) of observations without treatment ( D = 0) and observations with treatment ( D = 1). Randomized experiments solve the problem: If the assignment of D is randomized, D is independent from Y 0 and Y 1 by design. Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 7

Potential Outcomes and Causal Inference 1 Matching 2 Propensity Score Matching 3 King and Nielsen’s “Why Propensity Scores Should Not Be Used for 4 Matching” Are King and Nielsen right? 5 Illustration using kmatch 6 Conclusions 7 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 8

Conditional Independence / Strong Ignorability Can causal effects also be identified from “observational” (i.e. non-experimental) data? Sometimes it can be argued that the independence assumption is valid conditionally (conditional independence, “unconfoundedness”): ( Y 0 , Y 1 ) ⊥ ⊥ D | X If, in addition, the overlap assumption 0 < Pr( D = 1 | X = x ) < 1 , for all x is given, then the ATE (or ATT or ATC) can be identified by conditioning on X . For example: � ATE = Pr[ X = x ] { E [ Y | D = 1 , X = x ] − E [ Y | D = 0 , X = x ] } x Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 9

Matching Matching is one approach to “condition on X ” if strong ignorability holds. Basic idea: 1. For each observation in the treatment group, find “statistical twins” in the control group with the same (or at least very similar) X values (and vice versa). 2. The Y values of these matching observations are then used to compute the counterfactual outcome for the observation at hand. 3. An estimate for the average causal effect can be obtained as the mean of the differences between the observed values and the “imputed” counterfactual values over all observations. Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 10

Matching Formally:   � � � � � 1 1 � Y i − ˆ Y 0  Y i −  ATT = = w ij Y j i N D = 1 N D = 1 i | D = 1 i | D = 1 j | D = 0   � � � �  � 1 1 � Y 1 ˆ  ATC = i − Y i = w ij Y j − Y i N D = 0 N D = 0 i | D = 0 i | D = 0 j | D = 1 ATE = N D = 1 ATT + N D = 0 · � · � � ATC N N Different matching algorithms use different definitions of w ij . Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 11

Exact Matching � Exact matching: 1 / k i if X i = X j w ij = 0 else with k i as the number of observations for which X i = X j applies. The result equivalent to “perfect stratification” or “subclassification” (see, e.g., Cochran 1968). Problem: If X contains several variables there is a large probability that no exact matches can be found for many observations (the “curse of dimensionality”). Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 12

Multivariate Distance Matching (MDM) An alternative is to match based on a distance metric that measures the proximity between observations in the multivariate space of X . The idea then is to use observations that are “close”, but not necessarily equal, as matches. A common approach is to use � ( X i − X j ) ′ Σ − 1 ( X i − X j ) MD ( X i , X j ) = as distance metric, where Σ is an appropriate scaling matrix. ◮ Mahalanobis matching: Σ is the covariance matrix of X . ◮ Euclidean matching: Σ is the identity matrix. ◮ Mahalanobis matching is equivalent to Euclidean matching based on standardized and orthogonalized X . Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 13

Matching Algorithms Various matching algorithms can be employed to find potential matches based on MD , and determine the matching weights w ij . Pair matching (one-to-one matching without replacement) ◮ For each observation i in the treatment group find observation j in the control group for which MD ij is smallest. Once observation j is used as a match, do not use it again. Nearest-neighbor matching ◮ For each observation i in the treatment group find the k closest observations in the control group. A single control can be used multiple times as a match. In case of ties (multiple controls with identical MD ), use all ties as matches. k is set by the researcher. Caliper matching ◮ Like nearest-neighbor matching, but only use controls for which MD is smaller than some threshold c . Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 14

Mahalanobis Matching Radius matching ◮ Use all controls as matches for which MD is smaller than some threshold c . Kernel matching ◮ Like radius matching, but give larger weight to controls for which MD is small (using some kernel function such as, e.g., the Epanechnikov kernel). In addition, since matching is no longer exact, it may make sense to refine the estimates by applying regression-adjustment to the matched data (also known as “bias-adjustment” in the context of nearest-neighbor matching). Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 15

Potential Outcomes and Causal Inference 1 Matching 2 Propensity Score Matching 3 King and Nielsen’s “Why Propensity Scores Should Not Be Used for 4 Matching” Are King and Nielsen right? 5 Illustration using kmatch 6 Conclusions 7 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 16

The Propensity Score Theorem (Rosenbaum and Rubin 1983) If the conditional independence assumption is true, then Pr( D i = 1 | Y 0 i , Y 1 i , X i ) = Pr( D i = 1 | X i ) = π ( X i ) where π ( X ) is called the propensity score. That is, ( Y 0 , Y 1 ) ⊥ ⊥ D | X implies ( Y 0 , Y 1 ) ⊥ ⊥ D | π ( X ) so that under strong ignorability the average causal effect can be estimated by conditioning on the propensity score π ( X ) instead of X . This is remarkable, because the information in X , which may include many variables, can be reduced to just one dimension. This greatly simplifies the matching task. Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 17

Why Propensity Scores Should Be Used for Matching Ben Jann - PowerPoint PPT Presentation

Why Propensity Scores Should Be Used for Matching Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 German Stata Users Group Meeting Berlin, June 23, 2017 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 1

Propensity Score Matching James H. Steiger Department of Psychology and Human Development

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Matching and Propensity Scores Erik Gahner Larsen Advanced applied statistics, 2015 1 / 56

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Holger Langkabel Introduction: Confounding in Non-Randomized Settings Assessing Balance The

STATISTICS 536B, Lecture #9 March 26, 2015 Propensity scores - What is the high level idea? Have

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

L4: Blast: Alignment Scores etc. L4: Blast: Alignment Scores etc. Why is Blast Fast? Why is

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

SRP EV Adoption Propensity and Transformer Load Management Jeff Loehr Senior Engineer

Covariate Balancing Propensity Score Kosuke Imai Princeton University June 1, 2012 Joint work

st Century e 21 st Russian an G Gran and Strategy egy in the 21 ROBERT T T. PER PERSON, P

Pathways to Resilience The vital work of adapting our organizations during, and after, the

5 Simple Rules for Making Slides that Make More Sense Certified Designer Refresher Training, LLC

Making sense of event attribution: the decision-making and communication context Robert S. Webb

Balance Sheet Recessions Jos e-V ctor R os-Rull Minnesota, Mpls Fed, CAERP 2014

Josh Bloch Charlie Garrod 17-214 1 Administrivia Homework 5b due Tuesday night Design a

Lecture 22 How to write, visualize, present Prof. Manolis Kellis Slides/content credit:

Challenges & Opportunities Medicine Faculty BMC- Tufts MC merger discussions Lost

Sambuz

Useful Links

Newsletter

Mail Us

Why Propensity Scores Should Be Used for Matching Ben Jann - PowerPoint PPT Presentation

Why Propensity Scores Should Be Used for Matching Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 German Stata Users Group Meeting Berlin, June 23, 2017 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 1

Propensity Score Matching James H. Steiger Department of Psychology and Human Development

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Matching and Propensity Scores Erik Gahner Larsen Advanced applied statistics, 2015 1 / 56

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Holger Langkabel Introduction: Confounding in Non-Randomized Settings Assessing Balance The

STATISTICS 536B, Lecture #9 March 26, 2015 Propensity scores - What is the high level idea? Have

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

L4: Blast: Alignment Scores etc. L4: Blast: Alignment Scores etc. Why is Blast Fast? Why is

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

SRP EV Adoption Propensity and Transformer Load Management Jeff Loehr Senior Engineer

Covariate Balancing Propensity Score Kosuke Imai Princeton University June 1, 2012 Joint work

st Century e 21 st Russian an G Gran and Strategy egy in the 21 ROBERT T T. PER PERSON, P

Pathways to Resilience The vital work of adapting our organizations during, and after, the

5 Simple Rules for Making Slides that Make More Sense Certified Designer Refresher Training, LLC

Making sense of event attribution: the decision-making and communication context Robert S. Webb

Balance Sheet Recessions Jos e-V ctor R os-Rull Minnesota, Mpls Fed, CAERP 2014

Josh Bloch Charlie Garrod 17-214 1 Administrivia Homework 5b due Tuesday night Design a

Lecture 22 How to write, visualize, present Prof. Manolis Kellis Slides/content credit:

Challenges &amp; Opportunities Medicine Faculty BMC- Tufts MC merger discussions Lost

Sambuz

Useful Links

Newsletter

Mail Us

Challenges & Opportunities Medicine Faculty BMC- Tufts MC merger discussions Lost