PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. - - PowerPoint PPT Presentation

ps 406 week 4 section matching and glms for binary
SMART_READER_LITE
LIVE PREVIEW

PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. - - PowerPoint PPT Presentation

PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. Flynn April 23, 2014 D.J. Flynn PS406 Week 4 Section Spring 2014 1 / 21 Matching Intuitive solution to the problem of confounding? Kind of. Relative to


slide-1
SLIDE 1

PS 406 – Week 4 Section: Matching and GLMs for Binary Outcomes

D.J. Flynn April 23, 2014

D.J. Flynn PS406 – Week 4 Section Spring 2014 1 / 21

slide-2
SLIDE 2

Matching

Intuitive solution to the problem of confounding? Kind of. Relative to experiments...

We lose efficiency because we need to estimate more parameters (e.g., when calculating p-scores) SEs aren’t straightforward (bootstrapping) Regression ofen does a better job of replicating experimental findings than matching estimators (Peikes et al. 2008) (My two cents:) A lot of decisions to make. Which covariates to match

  • n? Which matching estimator? etc..

Really good overview: sekhon.polisci.berkeley.edu/papers/annualreview.pdf

D.J. Flynn PS406 – Week 4 Section Spring 2014 2 / 21

slide-3
SLIDE 3

Setting up the data

framing<-read.csv("~/Downloads/framing-exp-data.csv") #Running example: effect of PID on support for renewables: framing$support.renew<-recode(framing$renewables, "1:4=0;5:7=1",as.factor.result=FALSE) framing$dem<-recode(framing$party,"1=1;else=0", as.factor.result=FALSE) framing.new<-na.omit(data.frame(TA=framing$TA,condition= framing$condition,renewables=framing$renewables,gmf= framing$gmf,sex=framing$sex,year=framing$year,party= framing$party,understand=framing$understand,interest= framing$interest,dem=framing$dem,support.renew= framing$support.renew))

D.J. Flynn PS406 – Week 4 Section Spring 2014 3 / 21

slide-4
SLIDE 4

Estimating GLMs in R

#Let’s use logit: logit<-glm(support.renew~as.factor(TA)+understand+interest, family=binomial(link=logit),data=framing.new) summary(logit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4905 2.1122 0.706 0.4804 as.factor(TA)2

  • 0.5258

0.9769

  • 0.538

0.5905 as.factor(TA)3

  • 0.6315

0.9904

  • 0.638

0.5237 understand 0.9575 0.4500 2.128 0.0334 * interest

  • 0.7423

0.7192

  • 1.032

0.3020

  • Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

D.J. Flynn PS406 – Week 4 Section Spring 2014 4 / 21

slide-5
SLIDE 5

Propensity scores

Recall that a propensity score is the probability that a given unit is assigned to treatment, conditional on covariates: Pr(Di = T|Xi) framing.new$pscore<-logit$fitted.values summary(framing.new$pscore)

  • Min. 1st Qu.

Median Mean 3rd Qu. Max. 0.6335 0.9214 0.9288 0.9200 0.9619 0.9829 hist(framing.new$pscore) plot(framing.new$pscore)

D.J. Flynn PS406 – Week 4 Section Spring 2014 5 / 21

slide-6
SLIDE 6

Now we can proceed with matching...

install.packages("Matching") library(Matching)

D.J. Flynn PS406 – Week 4 Section Spring 2014 6 / 21

slide-7
SLIDE 7

Pairwise matching

Pairwise matching matches controls to treatment cases with closest p-score. match<-with(framing.new,Match(Y=support.renew,Tr=dem, X=pscore,est="ATT"))

D.J. Flynn PS406 – Week 4 Section Spring 2014 7 / 21

slide-8
SLIDE 8

summary(match) Estimate... 0.20402 AI SE...... 0.056228 T-stat..... 3.6285 p.val...... 0.00028506 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 58 Matched number of observations (unweighted). 221

D.J. Flynn PS406 – Week 4 Section Spring 2014 8 / 21

slide-9
SLIDE 9

Let’s check the quality of our matches...

with(framing.new, MatchBalance(support.renew~as.factor(TA)+ understand+interest,match.out=logit)) [Long List of Info] Successful matching = similar means for treated and control cases Use p-values and KS Bootstrap to gauge balance The Kolmogorov-Smirnov stat is a non-parametric test for the equality

  • f sample distributions (cf., t-test)

D.J. Flynn PS406 – Week 4 Section Spring 2014 9 / 21

slide-10
SLIDE 10

Caliper matching

Caliper matching specifies a maximum acceptable distance between propensity scores (e.g., don’t want to match two cases that are very different – even if the match is the “best” possible in the data) caliper<-with(framing.new,Match(Y=support.renew,Tr=dem, X=pscore,est="ATT",caliper=0.10))

D.J. Flynn PS406 – Week 4 Section Spring 2014 10 / 21

slide-11
SLIDE 11

summary(caliper) Estimate... 0.1891 AI SE...... 0.049145 T-stat..... 3.8479 p.val...... 0.00011916 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 52 Matched number of observations (unweighted). 215 Caliper (SDs)........................................ 0.1 Number of obs dropped by ’exact’ or ’caliper’ 6

D.J. Flynn PS406 – Week 4 Section Spring 2014 11 / 21

slide-12
SLIDE 12

Common support matching

Common support matching creates a range where propensity scores for treated and control cases overlap. Cases with p-scores outside this range will be dropped. Generally, caliper matching is better at dealing with

  • utliers (bc it throws away inliers too). CRAN: “Seriously, don’t use it

[common support matching].” common <- with(framing.new, Match(Y=support.renew,Tr=dem, X=pscore, est="ATT", CommonSupport=TRUE))

D.J. Flynn PS406 – Week 4 Section Spring 2014 12 / 21

slide-13
SLIDE 13

summary(common) Estimate... 0.17879 AI SE...... 0.052167 T-stat..... 3.4272 p.val...... 0.00060973 Original number of observations.............. 96 Original number of treated obs............... 55 Matched number of observations............... 55 Matched number of observations (unweighted). 217

D.J. Flynn PS406 – Week 4 Section Spring 2014 13 / 21

slide-14
SLIDE 14

Bias Adjustment

Bias adjusted matching uses regression adjustment to improve the consistency of the matching estimator. (Not all estimators are consistent in expectation like OLS.) bias.adj <- with(framing.new, Match(Y=support.renew, Tr=dem, X=pscore, est="ATT", BiasAdjust=TRUE))

D.J. Flynn PS406 – Week 4 Section Spring 2014 14 / 21

slide-15
SLIDE 15

summary(bias.adj) Estimate... 0.21874 AI SE...... 0.05897 T-stat..... 3.7094 p.val...... 0.00020776 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 58 Matched number of observations (unweighted). 221

D.J. Flynn PS406 – Week 4 Section Spring 2014 15 / 21

slide-16
SLIDE 16

Exact matching

Under exact matching, only cases with the same p-scores will be matched;

  • thers will be discarded. You can specify which covariates to use for exact

matches (e.g., no continuous covariates). exact <- with(framing.new, Match(Y=support.renew, Tr=dem, X=pscore, est="ATT", exact=TRUE))

D.J. Flynn PS406 – Week 4 Section Spring 2014 16 / 21

slide-17
SLIDE 17

summary(exact) Estimate... 0.16667 AI SE...... 0.043309 T-stat..... 3.8483 p.val...... 0.00011893 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 47 Matched number of observations (unweighted). 204 Number of obs dropped by ’exact’ or ’caliper’ 11

D.J. Flynn PS406 – Week 4 Section Spring 2014 17 / 21

slide-18
SLIDE 18

Rosenbaum sensitivity analysis

Matching relies on the propensity score, which was estimated using a vector of covariates X that we specified a priori. Thus, we want to check how sensitive our effect estimates are to potential confounders – that is, any variable(s) that affects assignment to treatment. Rosenbaum sensitivity analysis helps us do this. It shows us how our results might change given different values of a sensitivity parameter. A short, readable paper on RSA is here: www.personal.psu.edu/ljk20/rbounds%20vignette.pdf

D.J. Flynn PS406 – Week 4 Section Spring 2014 18 / 21

slide-19
SLIDE 19

Sensitivity analysis

For binary outcomes: library(rbounds) binarysens() For continuous outcomes: library(rbounds) psens()

D.J. Flynn PS406 – Week 4 Section Spring 2014 19 / 21

slide-20
SLIDE 20

Back to our data: What do we see?

binarysens(bias.adj) Rosenbaum Sensitivity Test Unconfounded estimate .... Gamma Lower bound Upper bound 1 0.00000 2 0.00000 3 0.00001 4 0.00011 5 0.00057 6 0.00180 Note: Gamma is Odds of Differential Assignment To Treatment Due to Unobserved Factors

D.J. Flynn PS406 – Week 4 Section Spring 2014 20 / 21

slide-21
SLIDE 21

Closing thoughts: GLMs for binary outcomes

We used logit to estimate propensity scores. We also could’ve used a linear probability model (OLS), probit, clog-log, others... Key takeaway is to always use one of these models (not OLS) when your DV is binary to prevent out-of-sample predictions (e.g., Pr(turnout)=1.23???) . When you estimate one of these, you’re no longer modeling E[Y]. Instead you’re modeling: [Pr(Yi = 1)|Xi] I stick with one (usually logit), that way I can use to “divide by 4 rule” to get rough ideas of effects. But always present substantive effects for readers. Otherwise, who cares about the effect of a given X on the log odds of Y??

D.J. Flynn PS406 – Week 4 Section Spring 2014 21 / 21