CEM: A Matching Method for Observational Data in the Social Sciences - PowerPoint PPT Presentation

CEM: A Matching Method for Observational Data in the Social Sciences S.M. Iacus (Univ. of Milan) & G. King (Harvard Univ.) & G. Porro (Univ. of Trieste) Rennes, useR! 2009, July 8th - 10th 1 / 11

The problem of matching Estimation of TE We consider an observational study with n observations. For each unit i Matching solutions in R (incomplete list) Y i = outcome T i = treatment indicator X i = covariates CEM Overview Infos ESTIMATION GOAL: the treatment effect TE i = Y i ( T i = 1) − Y i ( T i = 0) = Y i (1) − Y i (0) but Y i (0) is not observed. For the treated unit i with covariates X i , it is natural to look for another unit j in the sample for which Y j (0) is observed and such that X j ≃ X i MATCHING GOAL: for each treated unit i find the “twin” control unit j (i.e. with X j ≃ X i ) in order to reduce bias in the estimation of TE i 2 / 11

Matching solutions in R (incomplete list) Estimation of TE � MatchIt : (pscore, mahalanobis, etc) Matching solutions in R (incomplete list) CEM Overview � Matching : (genetic matching, pscore, etc) Infos � optmatch : (full optimal matching) � rrp : (random recursive partitioning) � arm : (single nearest neighbour) � SpectralGEM : (spectral graph theory) � analogue : (analogue matching, nearest neighbour) � PSAgraphics (diagnotic) � RItools (diagnostic) 3 / 11

CEM Overview Estimation of TE Coarsened Exact Matching (CEM), is a simple (and ancient) method of causal Matching solutions in R inference, with unexplored powerful properties. CEM is as simple as (incomplete list) CEM Overview Infos 4 / 11

CEM Overview Estimation of TE Coarsened Exact Matching (CEM), is a simple (and ancient) method of causal Matching solutions in R inference, with unexplored powerful properties. CEM is as simple as (incomplete list) CEM Overview 1. Temporarily coarsen X as much as you’re willing (e.g., for education: Infos grade school, high school, college, graduate); 4 / 11

CEM Overview Estimation of TE Coarsened Exact Matching (CEM), is a simple (and ancient) method of causal Matching solutions in R inference, with unexplored powerful properties. CEM is as simple as (incomplete list) CEM Overview 1. Temporarily coarsen X as much as you’re willing (e.g., for education: Infos grade school, high school, college, graduate); 2. Perform exact matching on the coarsened data C ( X ) , sort observations into strata and prune any stratum with 0 treated or 0 control units, i.e. set weight=0 for pruned observations and CEM weights to matched units; 4 / 11

CEM Overview Estimation of TE Coarsened Exact Matching (CEM), is a simple (and ancient) method of causal Matching solutions in R inference, with unexplored powerful properties. CEM is as simple as (incomplete list) CEM Overview 1. Temporarily coarsen X as much as you’re willing (e.g., for education: Infos grade school, high school, college, graduate); 2. Perform exact matching on the coarsened data C ( X ) , sort observations into strata and prune any stratum with 0 treated or 0 control units, i.e. set weight=0 for pruned observations and CEM weights to matched units; 3. use the original uncoarsened data X (with appropriate weights) in your analysis, except those units pruned. Maximum imbalance is controlled ex-ante by the choice of coarsening 4 / 11

CEM Overview Estimation of TE Matching solutions in R (incomplete list) THE ANALYSIS STAGE CEM Overview lm Infos COARSEN THE DATA X INTO C(X) glm DO EXACT MATCHING ON randomForest COARSENED DATA C(X) ORIGINAL CEM weights coxph DATA X etc pass original uncoarsened data X to the analysis stage 5 / 11

CEM package cem offers standard 1-dim as well as a new multidimensional measure of imbalance L 1 ∈ [0 , 1] : the distance between multidimensional histograms of the distributions of treated and control units R> library(cem) R> data(LL) # The Lalonde(1986) benchmark data R> # initial imbalance R> imb <- imbalance(LL$treated,LL,drop=c("re78","treated")) R> imb Multivariate Imbalance Measure: L1=0.735 Percentage of local common support: LCS=17.8% Univariate Imbalance Measures: statistic type L1 min 25% 50% 75% max age 1.792038e-01 (diff) 4.705882e-03 0 1 0.00000 -1.0000 -6.0000 education 1.922361e-01 (diff) 9.811844e-02 1 0 1.00000 1.0000 2.0000 black 1.346801e-03 (diff) 1.346801e-03 0 0 0.00000 0.0000 0.0000 married 1.070311e-02 (diff) 1.070311e-02 0 0 0.00000 0.0000 0.0000 nodegree -8.347792e-02 (diff) 8.347792e-02 0 -1 0.00000 0.0000 0.0000 re74 -1.014862e+02 (diff) 5.551115e-17 0 0 69.73096 584.9160 -2139.0195 re75 3.941545e+01 (diff) 5.551115e-17 0 0 294.18457 660.6865 490.3945 hispanic -1.866508e-02 (diff) 1.866508e-02 0 0 0.00000 0.0000 0.0000 u74 -2.009903e-02 (diff) 2.009903e-02 0 0 0.00000 0.0000 0.0000 u75 -4.508616e-02 (diff) 4.508616e-02 0 0 0.00000 0.0000 0.0000 6 / 11

CEM package After matching with CEM R> mat <- cem("treated", LL, drop="re78",L1.breaks=imb$L1$breaks) R> mat G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Multivariate Imbalance Measure: L1=0.432 Percentage of local common support: LCS=44.7% Univariate Imbalance Measures: statistic type L1 min 25% 50% 75% max age 1.862046e-01 (diff) 5.551115e-17 0 0 0.0000 1.00000 1.000 education 1.022495e-02 (diff) 1.022495e-02 0 0 0.0000 0.00000 0.000 black -1.110223e-16 (diff) 6.245005e-17 0 0 0.0000 0.00000 0.000 married 0.000000e+00 (diff) 5.898060e-17 0 0 0.0000 0.00000 0.000 nodegree -1.110223e-16 (diff) 5.551115e-17 0 0 0.0000 0.00000 0.000 re74 7.197514e+00 (diff) 5.551115e-17 0 0 0.0000 -70.85522 416.416 re75 1.220698e+01 (diff) 5.551115e-17 0 0 234.4843 140.79126 -852.252 hispanic 0.000000e+00 (diff) 5.551115e-17 0 0 0.0000 0.00000 0.000 u74 0.000000e+00 (diff) 2.775558e-17 0 0 0.0000 0.00000 0.000 u75 0.000000e+00 (diff) 5.551115e-17 0 0 0.0000 0.00000 0.000 7 / 11

Diagnostic tool The choice of coarsening affects the matching solution. Due to high computationally efficiency of cem , the function relax . cem allows for automatic coarsening relaxations R> relax.cem(mat,LL) Executing 42 different relaxations .......[20%]....[40%].....[60%]....[80%]....[100%] Pre−relax: 163 matched (54.9 %) 74.1 220 ● 0 . 7 1 71.4 212 ● 0 . 6 70.4 209 ● 9 0 . 6 9 number of matched 68.7 ● 204 0 . 6 7 % matched 66.7 198 ● 0 . 6 7 64.6 192 ● 0 . 6 5 63.3 188 ● ● ● 63.0 187 ● ● 0 0 0 62.6 186 ● 0 0 . . . 6 6 6 . . 0 3 3 4 6 6 62.0 184 ● . 6 3 3 0 4 61.3 . 182 ● ● 6 0 0 4 60.6 180 ● . . 6 6 60.3 ● 179 0 0 4 0 . 6 59.6 177 ● . 6 0 59.3 ● ● ● ● ● 176 0 0 58.9 175 ● ● ● 0 0 0 0 0 . 6 58.6 174 ● ● ● ● 0 0 0 . . . . . 6 6 6 6 6 3 58.2 173 ● . . . 0 0 0 0 2 2 2 2 3 6 6 6 57.9 . . . . 172 ● ● 0 1 1 2 6 6 6 6 57.6 171 ● 0 0 . 6 1 1 1 2 57.2 170 ● ● . . 0 2 6 6 56.9 . 169 ● ● ● 0 0 2 2 6 56.6 168 ● 0 0 0 . . 6 6 1 . . . 0 1 1 6 6 6 . 1 1 1 6 0 55.2 164 ● 54.9 163 ● ● 0 0 0 . 5 . . 5 5 9 9 9 <start> education(9) education(8) hispanic(1) re74(7) re74(8) re74(9) re74(5) re74(6) education(7) u75(1) black(1) age(9) re75(7) re75(8) re75(9) age(8) re75(5) re75(6) nodegree(1) education(5) re74(4) u74(1) education(6) married(1) age(7) re74(3) re74(2) re74(1) age(6) education(4) age(5) re75(3) re75(4) re75(1) re75(2) education(3) education(2) age(4) education(1) age(2) age(3) age(1) 8 / 11

ATT estimation and extrapolation ATT estimation on the matched data only R> att(mat, re78 ~ treated, LL) -> TE R> TE G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Linear regression model on CEM matched data: SATT point estimate: 550.962564 (p.value=0.368242) 95% conf. interval: [-647.777701, 1749.702830] ATT estimation on all treated observations via extrapolation R> att(mat, re78 ~ treated, LL, extrapolate=TRUE) G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Linear regression model with extrapolation: SATT point estimate: 1290.247549 (p.value=0.062168) 95% conf. interval: [391.886467, 2188.608631] The distribution of the treatment effect accross CEM strata can be further visualized R> plot(TE,mat,LL,vars=c("re75","re74","education","age","hispanic")) 9 / 11

ATT estimation and visualization Linear regression model on CEM matched data ● ● ● CEM Strata ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● −5000 0 5000 10000 15000 20000 Treatment Effect negative zero positive hispanic hispanic hispanic age age age education education education re74 re74 re74 re75 re75 re75 Min Max Min Max Min Max 10 / 11

For more information Estimation of TE For the latest version of the manuscript, R and Stata software, visit Matching solutions in R (incomplete list) CEM Overview Infos http://GKing.Harvard.edu/cem 11 / 11

CEM: A Matching Method for Observational Data in the Social Sciences - PowerPoint PPT Presentation

CEM: A Matching Method for Observational Data in the Social Sciences S.M. Iacus (Univ. of Milan) & G. King (Harvard Univ.) & G. Porro (Univ. of Trieste) Rennes, useR! 2009, July 8th - 10th 1 / 11 The problem of matching Estimation of

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

ASH-CEM : A CONCRETE ROAD TO CIRCULAR ECONOMY Nele De Belie, Aneeta Mary Joseph, Natalia

Observational Methods and NATM NATM System for Observational approach to tunnel design Eurocode

Outline Flexible, optimal matching for observational Optimal matching of two groups studies

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Lecture 6/Chapters 5&6 Observational Studies & Review Advantages of Observational

Observational studies and experiments Introduction to Data Types of studies Observational

Mark J Howe, PE, CEM Mark J Howe, PE, CEM Campus Energy Manager Campus Energy Manager James

2014 Retro-Commissioning (RC x ) Program RC x Staff Calvin Burnham, P.E.,CEM CenterPoint

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Observational Constraints of Observational Constraints of the Epoch of the Epoch of

Observational and Numerical Observational and Numerical Study of Ocean Dynamics over Study of

Chapter 2: Observational Studies In an observational study the subjects determine whether

Lecture 6/Chapters 5&6 backward in time, about the past. Observational Studies & Review

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

1 The National Rural Recruitment and Retention Network 3R Net Members are state based

The Electoral Impact of a Conditional Cash Transfer: The Case of Mexicos

MODIS Atmosphere Products MODIS Atmosphere Products Michael D. King Michael D. King NASA

French Baroque: Paintings, Landscapes and Architecture Georges de la Tour: Adoration of the

Dynamic Credentials and Ciphertext Delegation for ABE Amit Sahai, Hakan Seyalioglu, Brent Waters

Kipper a Grid bridge to Identity Federation Andrey Kiryanov Brief The Kipper client

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Filesize: 5.48 MB Reviews Reviews

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Filesize: 7.62 MB Reviews Reviews

CEM: A Matching Method for Observational Data in the Social Sciences - PowerPoint PPT Presentation

CEM: A Matching Method for Observational Data in the Social Sciences S.M. Iacus (Univ. of Milan) & G. King (Harvard Univ.) & G. Porro (Univ. of Trieste) Rennes, useR! 2009, July 8th - 10th 1 / 11 The problem of matching Estimation of

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

ASH-CEM : A CONCRETE ROAD TO CIRCULAR ECONOMY Nele De Belie, Aneeta Mary Joseph, Natalia

Observational Methods and NATM NATM System for Observational approach to tunnel design Eurocode

Outline Flexible, optimal matching for observational Optimal matching of two groups studies

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Lecture 6/Chapters 5&amp;6 Observational Studies &amp; Review Advantages of Observational

Observational studies and experiments Introduction to Data Types of studies Observational

Mark J Howe, PE, CEM Mark J Howe, PE, CEM Campus Energy Manager Campus Energy Manager James

2014 Retro-Commissioning (RC x ) Program RC x Staff Calvin Burnham, P.E.,CEM CenterPoint

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

Observational Constraints of Observational Constraints of the Epoch of the Epoch of

Observational and Numerical Observational and Numerical Study of Ocean Dynamics over Study of

Chapter 2: Observational Studies In an observational study the subjects determine whether

Lecture 6/Chapters 5&amp;6 backward in time, about the past. Observational Studies &amp; Review

Outline Morning program Preliminaries Text matching I Text matching II Afternoon program

1 The National Rural Recruitment and Retention Network 3R Net Members are state based

The Electoral Impact of a Conditional Cash Transfer: The Case of Mexicos

MODIS Atmosphere Products MODIS Atmosphere Products Michael D. King Michael D. King NASA

French Baroque: Paintings, Landscapes and Architecture Georges de la Tour: Adoration of the

Dynamic Credentials and Ciphertext Delegation for ABE Amit Sahai, Hakan Seyalioglu, Brent Waters

Kipper a Grid bridge to Identity Federation Andrey Kiryanov Brief The Kipper client

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Filesize: 5.48 MB Reviews Reviews

Baby Penguin Slips and Slides Baby Penguin Slips and Slides Filesize: 7.62 MB Reviews Reviews

Lecture 6/Chapters 5&6 Observational Studies & Review Advantages of Observational

Lecture 6/Chapters 5&6 backward in time, about the past. Observational Studies & Review