Kernel matching with automatic bandwidth selection Ben Jann - PowerPoint PPT Presentation

Kernel matching with automatic bandwidth selection Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata Users Group meeting London, September 7–8, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1

Contents Background 1 What is Matching? Multivariate Distance Matching (MDM) Propensity Score Matching (PSM) Matching Algorithms “Why PSM Should Not Be Used for Matching” The kmatch command 2 Features Examples Some Simulation Results Conclusions 3 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 2

What is Matching? Matching is an approach to “condition on X ” between a treatment group and a control group. Basic idea: 1. For each observation in the treatment group, find “statistical twins” in the control group with the same (or at least very similar) X values. 2. The Y values of these matching observations are then used to compute the counterfactual outcome without treatment for the observation at hand. 3. An estimate for the average treatment effect can be obtained as the mean of the differences between the observed values and the “imputed” counterfactual values over all observations. Ben Jann (University of Bern) Kernel matching London, 07.09.2017 3

What is Matching? Formally: � � � � 1 � Y i − ˆ Y 0 Y 0 ˆ ATT = w ij Y j with i = i N T = 1 i | T = 1 j | T = 0 � � � � 1 � Y 1 ˆ Y 1 ˆ ATC = i − Y i w ij Y j with i = N T = 0 i | T = 0 j | T = 1 ATE = N T = 1 ATT + N T = 0 � · � · � ATC N N Different matching algorithms use different definitions of w ij . ATE : average treatment effect; ATT : a.t.e. on the treated; ATC : a.t.e. on the untreated T : treatment indicator (0/1) Y : observed outcome; Y 1 ; potential outcome with treatment; Y 0 : p.o. without treatment Ben Jann (University of Bern) Kernel matching London, 07.09.2017 4

Exact Matching � Exact matching: 1 / k i if X i = X j w ij = 0 else with k i as the number of observations for which X i = X j applies. The result equivalent to “perfect stratification” or “subclassification” (see, e.g., Cochran 1968). Problem: If X contains several variables there is a large probability that no exact matches can be found for many observations (the “curse of dimensionality”). Ben Jann (University of Bern) Kernel matching London, 07.09.2017 5

Multivariate Distance Matching (MDM) An alternative is to match based on a distance metric that measures the proximity between observations in the multivariate space of X . The idea then is to use observations that are “close”, but not necessarily equal, as matches. A common approach is to use � MD ( X i , X j ) = ( X i − X j ) ′ Σ − 1 ( X i − X j ) as distance metric, where Σ is an appropriate scaling matrix. ◮ Mahalanobis matching: Σ is the covariance matrix of X . ◮ Euclidean matching: Σ is the identity matrix. ◮ Mahalanobis matching is equivalent to Euclidean matching based on standardized and orthogonalized X . Ben Jann (University of Bern) Kernel matching London, 07.09.2017 6

Propensity Score Matching (PSM) ( Y 0 , Y 1 ) ⊥ ⊥ T | X implies ( Y 0 , Y 1 ) ⊥ ⊥ T | π ( X ) , where π ( X ) is the treatment probability conditional on X (the “propensity score”) (Rosenbaum and Rubin 1983). This simplifies the matching task as we can match on one-dimensional π ( X ) instead of multi-dimensional X . Procedure ◮ Step 1: Estimate the propensity score, e.g. using a Logit model. ◮ Step 2: Apply a matching algorithm using differences in the π ( X i ) − ˆ π ( X j ) | , instead of multivariate distances. propensity score, | ˆ PSM is very popular ◮ https://scholar.google.ch/scholar?q="propensity+score"+AND+ (matching+OR+matched+OR+match) Ben Jann (University of Bern) Kernel matching London, 07.09.2017 7

Matching Algorithms Various matching algorithms can be used to find potential matches based on MD or ˆ π ( X ) and determine the matching weights w ij . Pair matching (one-to-one matching without replacement) ◮ For each observation in the treatment group find the closest observation in the control group. Each control is only used once. Nearest-neighbor matching (with replacement) ◮ For each observation in the treatment group find the k closest observations in the control group. A single control can be used multiple times. In case of ties, use all ties as matches. k is set by the researcher. Caliper matching ◮ Like nearest-neighbor matching, but only use controls with a distance smaller than some threshold c . Ben Jann (University of Bern) Kernel matching London, 07.09.2017 8

Matching Algorithms Radius matching ◮ Use all controls with a distance smaller than some threshold c . Kernel matching ◮ Like radius matching, but give larger weight to controls with smaller distances (using some kernel function such as, e.g., the Epanechnikov kernel). Optional: remove remaining imbalance after matching using regression adjustment (a.k.a. “bias correction” in the context of nearest-neighbor matching). Ben Jann (University of Bern) Kernel matching London, 07.09.2017 9

“Why PSM Should Not Be Used for Matching” The message of a recent paper by Gary King and Richard Nielsen is: Do not use PSM, it is really, really bad. ◮ The paper: http://j.mp/1sexgVw ◮ Slides: https://gking.harvard.edu/presentations/ why-propensity-scores-should-not-be-used-matching-6 ◮ Watch it: https://www.youtube.com/watch?v=rBv39pK1iEs Their argument goes about as follows: ◮ In experimental language, PSM approximates complete randomization . ◮ Other methods such as MDM approximate fully blocked randomization . ◮ A fully blocked design is more efficient. It leads to less data imbalance and less “model dependence” (dependence of results on modeling decisions by the researcher). ◮ Hence, procedures such as MDM dominate PSM. ◮ King and Nielsen provide evidence suggesting that PSM performs shockingly bad. Ben Jann (University of Bern) Kernel matching London, 07.09.2017 10

Types of Experiments Balance Complete Fully Covariates: Randomization Blocked On average Exact Observed Unobserved On average On average � Fully blocked dominates complete randomization for: imbalance, model dependence, power, efficiency, bias, research costs, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller! Goal of Each Matching Method (in Observational Data) (slides by King and Nielsen) • PSM: complete randomization • Other methods: fully blocked • Other matching methods dominate PSM (wait, it gets worse) Ben Jann (University of Bern) Kernel matching London, 07.09.2017 11

Best Case: Mahalanobis Distance Matching 80 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C T T C C C C C 70 C C C C C T C C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C C T C C C C C C C C C C C C C C C C C C C C T C 60 C C C C C C C C C C C C C C T C C C C C T C C C T C C C C C T C C C C C T C C C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T C C C C C C C C C C C T T C C C C C T C C C C C C C C C C C T C C C C C C C C Age C C C C C C T C 50 C C C C C C C C C C C C C C C T C C C T C C C C C C C T C T C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C C T C C C C TT C C C C C C C C C C C C C CC TT C C C C C C C C C T C C CC C C T C C C T C C C C C C C C C C T C C C C C C C 40 C C C T C C C C C T C C C C C C C C C C C C T C C C C C C C C C C C T C C C T C C C T C T C C C C T C C C C C C C C C C C C C T C C C C C C C C C C T C C T C C C C C C C C C C C C T C C T C T C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C C C (slides by King and Nielsen) C C C C C C C C 30 C 20 12 14 16 18 20 22 24 26 28 Education (years) 9/23 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 12

Best Case: Mahalanobis Distance Matching 80 C T C T C T 70 C T C T C T C T C T C T 60 C T C T C T C T T C C T C T T C T C C T T C Age T C 50 C T C T C T T C C T C T TT C C CC TT C T T C T C C T C 40 T C T C C T T C T C T C T C T C T C C T T C T C T C T C T (slides by King and Nielsen) 30 20 12 14 16 18 20 22 24 26 28 Education (years) 9/23 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 12

Kernel matching with automatic bandwidth selection Ben Jann - PowerPoint PPT Presentation

Kernel matching with automatic bandwidth selection Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata Users Group meeting London, September 78, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Automatic Color Matching with a Computer 66116T Kei Takahashi Title: Automatic Color Matching

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

The Unit Circle Many important elementary functions involve computations on the unit circle.

CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 1 Instructor: Yizhou Sun

Func Functio tions ns What is a function? Its a method that isnt part of a class.

Fermilab Accelerator Physics Center Update Pion Production Sergei Striganov nuSTORM

Web Site Design and Development Lecture 7 CS 0134 Fall 2018 T ues and Thurs 1:00 2:15PM

Geographic Features in MySQL Tibor Korocz Percona What do I try to answer today? - Can MySQL

Chapter 3 Selections false A logical value, or a truth value, is a value indicating the

Coverings and packings for radius 1 adaptive block coding Robert B. Ellis Illinois Institute of

Kernel matching with automatic bandwidth selection Ben Jann - PowerPoint PPT Presentation

Kernel matching with automatic bandwidth selection Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata Users Group meeting London, September 78, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1

7.5 Bipartite Matching Matching Matching. Input: undirected graph G = (V, E). M E

Matching of Matrix Elements and Parton Showers CKKW matching in e + e collisions Lecture 2:

Global Shape Matching Section 3.3: Articulated Matching using Graph Cuts Global Shape Matching:

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Automatic Color Matching with a Computer 66116T Kei Takahashi Title: Automatic Color Matching

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Matching Bipartite Matching Input Given a (undirected) graph G = ( V , E ) Input Given a bipartite

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

The Unit Circle Many important elementary functions involve computations on the unit circle.

CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 1 Instructor: Yizhou Sun

Func Functio tions ns What is a function? Its a method that isnt part of a class.

Fermilab Accelerator Physics Center Update Pion Production Sergei Striganov nuSTORM

Web Site Design and Development Lecture 7 CS 0134 Fall 2018 T ues and Thurs 1:00 2:15PM

Geographic Features in MySQL Tibor Korocz Percona What do I try to answer today? - Can MySQL

Chapter 3 Selections false A logical value, or a truth value, is a value indicating the

Coverings and packings for radius 1 adaptive block coding Robert B. Ellis Illinois Institute of

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?