Regression discontinuity I & II
April 1, 2020
PMAP 8521: Program Evaluation for Public Service Andrew Young School of Policy Studies Spring 2020
Regression discontinuity I & II April 1, 2020 PMAP 8521: - - PowerPoint PPT Presentation
Regression discontinuity I & II April 1, 2020 PMAP 8521: Program Evaluation for Public Service Andrew Young School of Policy Studies Spring 2020 Plan for today Arbitrary cutoffs & causal inference Drawing lines & measuring gaps
April 1, 2020
PMAP 8521: Program Evaluation for Public Service Andrew Young School of Policy Studies Spring 2020
Index or measure that determines eligibility
Number that formally assigns access to program
Above cutoff Running variable Program Outcome
Size Annual Monthly 138% 150% 200% 1 $12,760 $1,063 $17,609 $19,140 $25,520 2 $17,240 $1,437 $23,791 $25,860 $34,480 3 $21,720 $1,810 $29,974 $32,580 $43,440 4 $26,200 $2,183 $36,156 $39,300 $52,400 5 $30,680 $2,557 $42,338 $46,020 $61,360 6 $35,160 $2,930 $48,521 $52,740 $70,320 7 $39,640 $3,303 $54,703 $59,460 $79,280 8 $44,120 $3,677 $60,886 $66,180 $88,240
Medicaid 138% SNAP/Free lunch 130% Reduced lunch 130–185% ACA subsidies 100*–400% CHIP 200%
FALSE TRUE 40 60 80 100
AIG test score Participated in AIG program
FALSE TRUE 40 60 80 100
AIG test score Participated in AIG program
FALSE TRUE 69 72 75 78 81
AIG test score Participated in AIG program
40 60 80 40 60 80 100
AIG test score Final test score
40 60 80 40 60 80 100
AIG test score Final test score
40 60 80 69 72 75 78 81
AIG test score Final test score
Turnout
0.2 0.4 0.6Treatment Status (Eastern Side of Time Zone Border)
No YesFigure 1 shows counties (with their geographic centroids marked) on either side of the time zones in the continental United States as of Election Day on 2010. The map shows counties within 1 degree (latitude and longitude) of the time zone boundaries.
When Time Is of the Essence: A Natural Experiment
Jerome Schafer, Ludwig Maximilian University of Munich John B. Holbein, University of Virginia
Foundational theories of voter turnout suggest that time is a key input in the voting decision, but we possess little causal evidence about how this resource affects electoral behavior. In this article, we use over two decades of elections data and a novel geographic regression discontinuity design that leverages US time zone boundaries. Our results show that exog- enous shifts in time allocations have significant political consequences. Namely, we find that citizens are less likely to vote if they live on the eastern side of a time zone border. Time zones also exacerbate participatory inequality and push election results toward Republicans. Exploring potential mechanisms, we find suggestive evidence that these effects are the conse- quence of insufficient sleep and moderated by the convenience of voting. Regardless of the exact mechanisms, our results indicate that local differences in daily schedules affect how difficult it is to vote and shape the composition of the electorate.A
lthough in recent years the administrative barriers to voting have declined in many democracies (Blais 2010), many eligible citizens still fail to vote. In the United States, about 40% of registered voters do not partic- ipate in presidential elections, with abstention rates soaring as vote, many nonvoters report “not having enough time”—or a close derivative (e.g., “I’m too busy” or “[Voting] takes too long”; Pew Research Center 2006). Moreover, recent studies suggest that levels of turnout may be shaped by time costs such as how long it takes to register to vote (Leighley and NaglerLower turnout in counties on the eastern side of the boundary Election schedules cause fluctuations in turnout
California requires that insurance cover two days of post-partum hospitalization Does extra time in the hospital improve health outcomes?
After Midnight: A Regression Discontinuity Design in Length of Postpartum Hospital Stays†
By Douglas Almond and Joseph J. Doyle Jr.* Estimates of moral hazard in health insurance markets can be con- founded by adverse selection. This paper considers a plausibly exog- enous source of variation in insurance coverage for childbirth in
substantial extensions in length of hospital stay for mother and new-
readmissions or mortality, and the estimates are precise. Our results suggest that for uncomplicated births, minimum insurance mandates incur substantial costs without detectable health benefits. (JEL D82, G22, I12, I18, J13)
1.1 1.4 1.7 2 0.5 0.8 1.1 1.4 1.7 2 12:00 14:00 16:00 18:00 20:00 22:00 24:00 2:00 4:00 6:00 8:00 10:00
Minute of birth Panel B. Additional midnights: after law change
Being born at 12:01 AM makes you stay longer in the hospital…
nge
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59
Time of birth Panel B. Twenty-eight day readmission rate: after law change
0.004 0.006 0.008 0.01 0.012
Panel D. Twenty-eight day mortality rate: after law change
0.002
12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59 12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59 12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59
Time of birth
…but being born at 12:01 AM has no effect
Does going to the main state university (i.e. UGA) make you earn more money? SAT scores are an arbitrary cutoff for accessing the university
THE EFFECT OF ATTENDING THE FLAGSHIP STATE UNIVERSITY ON EARNINGS: A DISCONTINUITY-BASED APPROACH
Mark Hoekstra*
Abstract—This paper examines the effect of attending the flagship state university on the earnings of 28 to 33 year olds by combining confidential admissions records from a large state university with earnings data collected through the state’s unemployment insurance program. To distin- guish the effect of attending the flagship state university from the effects
probability of enrollment at the admission cutoff. The results indicate that attending the most selective state university causes earnings to be approx- imately 20% higher for white men.
I. Introduction
W
HILE there has been considerable study of the effect
regarding the economic returns to college quality. This paper examines the economic returns to college quality in the context of attending the most selective public state
nuity design that compares the earnings of 28 to 33 year
individuals who were barely rejected. Convincingly estimating the economic returns to college quality requires overcoming the selection bias arising from the fact that attendance at more selective universities is likely correlated with unobserved characteristics that them- selves will affect future earnings. Such biases could arise for leges but chose to attend less selective institutions. They find that attending more selective colleges has a positive effect on earnings only for students from low-income fam-
payoff by explicitly modeling high school students’ choice
elite private institution for all students. Behrman, Rozenz- weig, and Taubman (1996) identify the effect by comparing female twin pairs and find evidence of a positive payoff from attending Ph.D.-granting private universities with well- paid senior faculty. Using a similar approach, Lindahl and Regner (2005) use Swedish sibling data and show that cross-sectional estimates of the selective college wage pre- mium are twice the within-family estimates. This paper uses a different strategy in that it identifies the effect of school selectivity on earnings by comparing the earnings of those just below the cutoff for admission to the flagship state university to those of applicants who were barely above the cutoff for admission. To do so, I combined confidential administrative records from a large flagship state university with earnings records collected by the state through the unemployment insurance program. To put the selectivity of the flagship in context, the average SAT scores
Cutoff seems rule-based Earnings slightly higher
.1 .2 .3 .4 .5 .6 .7 .8 .9 1
50 100 150 200 250 300 350 Local Average
.1 .2 (Residual) Natural Log of Earnings
50 100 150 200 250 300 350 SAT Points Above the Admission Cutoff Predicted Earnings Local Average
Estimated Discontinuity = 0.095 (z = 3.01)
They’re intuitive, compelling, and highly graphical
ABSTRACT Methods Matter: P-Hacking and Causal Inference in Economics*
The economics ‘credibility revolution’ has promoted the identification of causal relationships using difference-in-differences (DID), instrumental variables (IV), randomized control trials (RCT) and regression discontinuity design (RDD) methods. The extent to which a reader should trust claims about the statistical significance of results proves very sensitive toLess susceptible to p-hacking and selective publication than DID or IV
40 60 80 40 60 80 100
AIG test score Final test score
100 200 300 400 500 25 50 75 100 y = 1 x
y = 10 + 4x
<latexit sha1_base64="zpVzK4a1J8bRw6VgubFrx+7lX4=">AB8nicbVBNSwMxEM36WetX1aOXYBEoezWgl6EohePFewHbJeSTbNtaDZklxKf0ZXjwo4tVf481/Y9ruQVsfDzem2FmXpgIbsB1v52V1bX1jc3CVnF7Z3dv3Rw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD0e3Ubz8ybiSD5AlLIjJQPKIUwJW8jN8jT0Xn+PaU69UdivuDHiZeDkpoxyNXumr21c0jZkEKogxvucmEIyJBk4FmxS7qWEJoSMyYL6lksTMBOPZyRN8apU+jpS2JQHP1N8TYxIbk8Wh7YwJDM2iNxX/8/wUoqtgzGWSApN0vihKBQaFp/jPteMgsgsIVRzeyumQ6IJBZtS0YbgLb68TFrVindRqd7XyvWbPI4COkYn6Ax56BLV0R1qoCaiSKFn9IreHBenHfnY964uQzR+gPnM8fXhmPWg=</latexit>200 400 600 25 50 75 100 y = 1 x + 2 x2
y = 120 − 3x + 0.07x2
<latexit sha1_base64="BHsJAGwCWTgrypV0BDQ+fbkQ3PQ=">AB/3icbZDLSsNAFIYn9VbrLSq4cTNYBEsSrUjVB047KCvUAby2Q6aYdOJmFmIg2xC1/FjQtF3Poa7nwbp20W2vrDwMd/zuGc+b2IUaks69vILS2vrK7l1wsbm1vbO+buXkOGscCkjkMWipaHJGUk7qipFWJAgKPEa3vB6Um8+ECFpyO9UEhE3QH1OfYqR0lbXPEjgJbQdC57B8gieQqtkVUb3TtcsapoKLoKdQRFkqnXNr04vxHFAuMIMSdm2rUi5KRKYkbGhU4sSYTwEPVJWyNHAZFuOr1/DI+104N+KPTjCk7d3xMpCqRMAk93BkgN5HxtYv5Xa8fKv3BTyqNYEY5ni/yYQRXCSRiwRwXBiUaEBZU3wrxAmElY6soEOw57+8CA2nZJdLzu15sXqVxZEHh+AInAbVEAV3IAaqAMHsEzeAVvxpPxYrwbH7PWnJHN7IM/Mj5/ADz2kmQ=</latexit>200 400 25 50 75 100 y = 1 x + 2 x2 + 3 x3
y = 300 − 25x + 0.65x2 − 0.004x3
<latexit sha1_base64="qyzOFC8ZMPC3g0aE7Ur4ZEVPmTk=">AC3icbZDLSgMxFIYz9VbrerSTWgRBHI9KJuhKIblxXsBXojk2ba0MyFJCMdhu7d+CpuXCji1hdw59uYabvQ1h8CX/5zDsn57YAzqRD6NlIrq2vrG+nNzNb2zu5edv+gLv1QEFojPvdF08aScubRmK02YgKHZtThv26CapNx6okMz37lU0I6LBx5zGMFKW71sLoJXsIgQPIOF8hieQmSel8fdgr4jE6HSuFvsZfMJoLYM0hD+aq9rJf7b5PQpd6inAsZctCgerEWChGOJ1k2qGkASYjPKAtjR52qezE010m8Fg7fej4Qh9Pwan7eyLGrpSRa+tOF6uhXKwl5n+1Vqicy07MvCBU1COzh5yQ+XDJBjYZ4ISxSMNmAim/wrJEAtMlI4vo0OwFldehnrBtIpm4a6Ur1zP40iDI5ADJ8ACF6ACbkEV1ABj+AZvI348l4Md6Nj1lrypjPHI/Mj5/ADx2lXw=</latexit>100 200 300 400 500 25 50 75 100 y = 1 x + 2 sin(x)
y = 10 + 4x + 50 × sin(x 4 ))
<latexit sha1_base64="p5Ov2BSQuKxSlIijVuEOFCT56Q=">ACE3icbVDLSsNAFJ3UV62vqEs3g0VoFUpSK7oRim5cVrAPaEKZTCft0MkzEykIeQf3Pgrblwo4taNO/G6WOh1QMzHM65l3v8SJGpbKsLyO3tLyupZfL2xsbm3vmLt7LRnGApMmDlkoOh6ShFOmoqRjqRICjwGl7o+uJ374nQtKQ36kIm6ABpz6FCOlpZ5nMBLaFvwBNbG+juzoKNoQCR0JOUlxcIp+MsrWXlcs8sWhVrCviX2HNSBHM0euan0w9xHBCuMENSdm0rUm6KhKYkazgxJECI/QgHQ15UjPdPpTRk80kof+qHQjys4VX92pCiQMgk8XRkgNZSL3kT8z+vGyr9wU8qjWBGOZ4P8mEVwklAsE8FwYolmiAsqN4V4iHSOSgdY0GHYC+e/Je0qhX7tFK9rRXrV/M48uAHISsME5qIMb0ABNgMEDeAIv4NV4NJ6N+N9Vpoz5j374BeMj2+8J5sD</latexit>200 400 600 25 50 75 100 y = 1 x y = 1 x + 2 x2
200 400 25 50 75 100 y = 1 x y = 1 x + 2 x2 y = 1 x + 2 x2 + 3 x3
Use the data to find the best line,
Locally estimated/weighted scatterplot smoothing (LOESS/LOWESS) is a common method
100 200 300 400 500 25 50 75 100 Loess
y =
<latexit sha1_base64="lFRCknCdrTm0fYjYMIATybL+Up0=">AB/nicbVBNSwMxEM3Wr1q/VsWTl2ARPJXdKuhFLHrxWMF+QFtKNk3b0GyJLPWshT8K148KOLV3+HNf2Pa7kFbHw83pthZl4QCW7A876dzNLyupadj23sbm1vePu7lWNijVlFaqE0vWAGCa4ZBXgIFg90oyEgWC1YHAz8WsPTBu5D2MItYKSU/yLqcErNR2D0b4EjeBPUIy7Cs8kGporsZtN+8VvCnwIvFTkcpym3q9lRNA6ZBCqIMQ3fi6CVEA2cCjbONWPDIkIHpMcalkoSMtNKpueP8bFVOrirtC0JeKr+nkhIaMwoDGxnSKBv5r2J+J/XiKF70Uq4jGJgks4WdWOBQeFJFrjDNaMgRpYQqrm9FdM+0YSCTSxnQ/DnX14k1WLBPy0U787ypes0jiw6REfoBPnoHJXQLSqjCqIoQc/oFb05T86L8+58zFozTjqzj/7A+fwB3bCVbw=</latexit>100 200 300 400 500 25 50 75 100 y = 1 x y = 1 x + 2 x2 Loess
40 60 80 40 60 80 100
AIG test score Final test score
Easiest way: center the running variable
ID
running_var running_var_centered treatment 1 90.0 69
FALSE 2 85.7 75 TRUE 3 85.8 78 3 TRUE 4 85.7 65
FALSE 5 84.4 76 1 TRUE
y = β0 + β1Running variable (centered) + β2Indicator for treatment
<latexit sha1_base64="9Lt8zBRTz0oJIOCIkOazb1/Iz94=">AAACSnicbVDPaxNBGJ1No43RalqPXgZDIVIIu1Gwl0KoF71FMWkgCeHb2W+TobOzy8y3pWHJ39dLT978I7x4UMSLs8ki2vSDgcd77/sxL8yUtOT7X73aXv3Bw/3Go+bjJwdPn7UOj0Y2zY3AoUhVasYhWFRS45AkKRxnBiEJFV6El+9K/eIKjZWp/kyrDGcJLLSMpQBy1LwFK37GpyESzH1+UqGATwmvqfiUay31gl+BkeAm8o5ATWgwerX+a+5V5g86KqemhsfukTuCEudez1ttv+tviu+CoAJtVtVg3voyjVKRl81CgbWTwM9oVoAhKRSum9PcYgbiEhY4cVBDgnZWbKJY82PHRJsL4lQT37D/dhSQWLtKQudMgJb2rlaS92mTnOLTWSF1lhNqsV0U54pTystceSQNClIrB0AY6W7lYgkGhMvLNl0Iwd0v74JRrxu87vY+vmn3z6s4GuwFe8k6LGBvWZ+9ZwM2ZILdsG/sB/vp3XrfvV/e76215lU9z9l/Vav/AbXEsko=</latexit>40 60 80 40 60 80 100
AIG test score Final test score
Can’t use regression; use rdrobust R package
40 60 80 40 60 80 100
AIG test score Final test score
40 60 80 40 60 80 100
AIG test score Final test score
Observations far away don’t matter because they’re not comparable
500 1000 5 10 15 20
Bandwidth = 5
500 1000 5 10 15 20
Bandwidth = 2.5
Maybe ±5 for the AIG test?
0.00 0.25 0.50 0.75 1.00
0.0 0.5 1.0
Distance from cutoff Weight
Uniform Triangular Epanechnikov
Rectangular Triangular Epanechnikov 0.00 0.25 0.50 0.75 1.00
Weight
Line type (parametric vs. nonparametric) Bandwidth (wide vs. narrow) Kernel weighting
500 1000 5 10 15 20 Linear
500 1000 5 10 15 20 Linear Linear (bw = 5)
500 1000 5 10 15 20 Linear Linear (bw = 5) Linear (bw = 2.5)
500 1000 5 10 15 20
Bandwidth = 5
500 1000 5 10 15 20
Bandwidth = 2.5
(But can you really do that with RCTs and diff-in-diff anyway?)
Angrist and Pischke, Mostly Harmless Econometrics, p. 23–24
A B C
50 100
A B C
50 100
= 63.29; t = 14.197; p = <0.001 = 25.02; t = 5.694; p = <0.001 = 8.8; t = 1.997; p = 0.046
50 100
(if people know about the cutoff)
25,000 50,000 75,000 100,000 02:00 02:30 03:00 03:30 04:00 04:30 05:00 05:30 06:00 06:30 07:00
Finish time (each bar is one minute) Number of finishers N = 9,589,053
Distribution of marathon finishing times
Eric J. Allen, Patricia M. Dechow, Devin G. Pope, George Wu (2017) Reference-Dependent Preferences: Evidence from Marathon Runners. Management Science 63(6):1657-1672. https://doi.org/10.1287/mnsc.2015.2417
rddensity::rdplotdensity() in R
0.1 0.2 0.3 0.4
1 2
Running variable Density
Manipulation
0.0 0.1 0.2 0.3 0.4 0.5
1 2
Running variable Density
No manipulation
The ACA, subsidies, Medicaid, and 138% of the poverty line
FALSE TRUE 40 60 80 100
AIG test score Participated in AIG program Test score 75
FALSE TRUE
Perfect compliance
FALSE TRUE 40 60 80 100
AIG test score Participated in AIG program Test score 75
FALSE TRUE
Imperfect compliance
Use an instrument for which side of the cutoff people should be on
1: Is assignment to treatment rule-based?
If not, stop!
2: Is design fuzzy or sharp?
Either is fine; sharp is easier.
3: Is there a discontinuity in running variable at cutpoint?
Hopefully not.
4: Is there a discontinuity in outcome variable at cutpoint in running variable?
Hopefully.
5: How big is the gap?
Measure parametrically and nonparametrically.