Regression discontinuity I & II April 1, 2020 PMAP 8521: - - PowerPoint PPT Presentation

regression discontinuity i ii
SMART_READER_LITE
LIVE PREVIEW

Regression discontinuity I & II April 1, 2020 PMAP 8521: - - PowerPoint PPT Presentation

Regression discontinuity I & II April 1, 2020 PMAP 8521: Program Evaluation for Public Service Andrew Young School of Policy Studies Spring 2020 Plan for today Arbitrary cutoffs & causal inference Drawing lines & measuring gaps


slide-1
SLIDE 1

Regression discontinuity I & II

April 1, 2020

PMAP 8521: Program Evaluation for Public Service Andrew Young School of Policy Studies Spring 2020

slide-2
SLIDE 2

Plan for today

Arbitrary cutoffs & causal inference Drawing lines & measuring gaps Main RDD concerns RDD with R

slide-3
SLIDE 3

Arbitrary cutoffs & causal inference

slide-4
SLIDE 4

Lots of policies and programs are based

  • n arbitrary rules and thresholds

If you’re above the threshold, you’re in the program; if you’re below, you’re not

Rules to access programs

slide-5
SLIDE 5

Running/forcing variable

Index or measure that determines eligibility

Cutoff/cutpoint/threshold

Number that formally assigns access to program

Key terms

slide-6
SLIDE 6

Above cutoff Running variable Program Outcome

slide-7
SLIDE 7

Discontinuities everywhere!

Size Annual Monthly 138% 150% 200% 1 $12,760 $1,063 $17,609 $19,140 $25,520 2 $17,240 $1,437 $23,791 $25,860 $34,480 3 $21,720 $1,810 $29,974 $32,580 $43,440 4 $26,200 $2,183 $36,156 $39,300 $52,400 5 $30,680 $2,557 $42,338 $46,020 $61,360 6 $35,160 $2,930 $48,521 $52,740 $70,320 7 $39,640 $3,303 $54,703 $59,460 $79,280 8 $44,120 $3,677 $60,886 $66,180 $88,240

Medicaid 138% SNAP/Free lunch 130% Reduced lunch 130–185% ACA subsidies 100*–400% CHIP 200%

slide-8
SLIDE 8

Hypothetical AIG program

If you score 75+ on a test, you get into an academically and intellectually gifted (AIG) during-school program

slide-9
SLIDE 9

FALSE TRUE 40 60 80 100

AIG test score Participated in AIG program

slide-10
SLIDE 10

Causal inference intuition

People right before and right after the threshold are essentially the same

slide-11
SLIDE 11

FALSE TRUE 40 60 80 100

AIG test score Participated in AIG program

slide-12
SLIDE 12

FALSE TRUE 69 72 75 78 81

AIG test score Participated in AIG program

slide-13
SLIDE 13

Causal inference intuition

People right before and right after the threshold are essentially the same Pseudo treatment and control groups! Compare outcomes for those right before/after, calculate difference

slide-14
SLIDE 14

40 60 80 40 60 80 100

AIG test score Final test score

slide-15
SLIDE 15

40 60 80 40 60 80 100

AIG test score Final test score

δ

slide-16
SLIDE 16

40 60 80 69 72 75 78 81

AIG test score Final test score

slide-17
SLIDE 17

Geographic discontinuities

Turnout

0.2 0.4 0.6

Treatment Status (Eastern Side of Time Zone Border)

No Yes

Figure 1 shows counties (with their geographic centroids marked) on either side of the time zones in the continental United States as of Election Day on 2010. The map shows counties within 1 degree (latitude and longitude) of the time zone boundaries.

When Time Is of the Essence: A Natural Experiment

  • n How Time Constraints Influence Elections

Jerome Schafer, Ludwig Maximilian University of Munich John B. Holbein, University of Virginia

Foundational theories of voter turnout suggest that time is a key input in the voting decision, but we possess little causal evidence about how this resource affects electoral behavior. In this article, we use over two decades of elections data and a novel geographic regression discontinuity design that leverages US time zone boundaries. Our results show that exog- enous shifts in time allocations have significant political consequences. Namely, we find that citizens are less likely to vote if they live on the eastern side of a time zone border. Time zones also exacerbate participatory inequality and push election results toward Republicans. Exploring potential mechanisms, we find suggestive evidence that these effects are the conse- quence of insufficient sleep and moderated by the convenience of voting. Regardless of the exact mechanisms, our results indicate that local differences in daily schedules affect how difficult it is to vote and shape the composition of the electorate.

A

lthough in recent years the administrative barriers to voting have declined in many democracies (Blais 2010), many eligible citizens still fail to vote. In the United States, about 40% of registered voters do not partic- ipate in presidential elections, with abstention rates soaring as vote, many nonvoters report “not having enough time”—or a close derivative (e.g., “I’m too busy” or “[Voting] takes too long”; Pew Research Center 2006). Moreover, recent studies suggest that levels of turnout may be shaped by time costs such as how long it takes to register to vote (Leighley and Nagler
slide-18
SLIDE 18

Geographic discontinuities

Lower turnout in counties on the eastern side of the boundary Election schedules cause fluctuations in turnout

slide-19
SLIDE 19

Time discontinuities

California requires that insurance cover two days of post-partum hospitalization Does extra time in the hospital improve health outcomes?

After Midnight: A Regression Discontinuity Design in Length of Postpartum Hospital Stays†

By Douglas Almond and Joseph J. Doyle Jr.* Estimates of moral hazard in health insurance markets can be con- founded by adverse selection. This paper considers a plausibly exog- enous source of variation in insurance coverage for childbirth in

  • California. We find that additional health insurance coverage induces

substantial extensions in length of hospital stay for mother and new-

  • born. However, remaining in the hospital longer has no effect on

readmissions or mortality, and the estimates are precise. Our results suggest that for uncomplicated births, minimum insurance mandates incur substantial costs without detectable health benefits. (JEL D82, G22, I12, I18, J13)

slide-20
SLIDE 20

Time discontinuities

1.1 1.4 1.7 2 0.5 0.8 1.1 1.4 1.7 2 12:00 14:00 16:00 18:00 20:00 22:00 24:00 2:00 4:00 6:00 8:00 10:00

Minute of birth Panel B. Additional midnights: after law change

Being born at 12:01 AM makes you stay longer in the hospital…

slide-21
SLIDE 21

Time discontinuities

nge

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59

Time of birth Panel B. Twenty-eight day readmission rate: after law change

0.004 0.006 0.008 0.01 0.012

Panel D. Twenty-eight day mortality rate: after law change

0.002

12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59 12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59 12:00 14:00 16:00 18:00 20:00 22:00 24:00 1:59 3:59 5:59 7:59 9:59 11:59

Time of birth

…but being born at 12:01 AM has no effect

  • n readmission rates
  • r mortality rates
slide-22
SLIDE 22

Test score discontinuities

Does going to the main state university (i.e. UGA) make you earn more money? SAT scores are an arbitrary cutoff for accessing the university

THE EFFECT OF ATTENDING THE FLAGSHIP STATE UNIVERSITY ON EARNINGS: A DISCONTINUITY-BASED APPROACH

Mark Hoekstra*

Abstract—This paper examines the effect of attending the flagship state university on the earnings of 28 to 33 year olds by combining confidential admissions records from a large state university with earnings data collected through the state’s unemployment insurance program. To distin- guish the effect of attending the flagship state university from the effects

  • f confounding factors correlated with the university’s admission decision
  • r the applicant’s enrollment decision, I exploit a large discontinuity in the

probability of enrollment at the admission cutoff. The results indicate that attending the most selective state university causes earnings to be approx- imately 20% higher for white men.

I. Introduction

W

HILE there has been considerable study of the effect

  • f educational attainment on earnings, less is known

regarding the economic returns to college quality. This paper examines the economic returns to college quality in the context of attending the most selective public state

  • university. It does so using an intuitive regression disconti-

nuity design that compares the earnings of 28 to 33 year

  • lds who were barely admitted to the flagship to those of

individuals who were barely rejected. Convincingly estimating the economic returns to college quality requires overcoming the selection bias arising from the fact that attendance at more selective universities is likely correlated with unobserved characteristics that them- selves will affect future earnings. Such biases could arise for leges but chose to attend less selective institutions. They find that attending more selective colleges has a positive effect on earnings only for students from low-income fam-

  • ilies. Brewer, Eide, and Ehrenberg (1999) estimate the

payoff by explicitly modeling high school students’ choice

  • f college type and find significant returns to attending an

elite private institution for all students. Behrman, Rozenz- weig, and Taubman (1996) identify the effect by comparing female twin pairs and find evidence of a positive payoff from attending Ph.D.-granting private universities with well- paid senior faculty. Using a similar approach, Lindahl and Regner (2005) use Swedish sibling data and show that cross-sectional estimates of the selective college wage pre- mium are twice the within-family estimates. This paper uses a different strategy in that it identifies the effect of school selectivity on earnings by comparing the earnings of those just below the cutoff for admission to the flagship state university to those of applicants who were barely above the cutoff for admission. To do so, I combined confidential administrative records from a large flagship state university with earnings records collected by the state through the unemployment insurance program. To put the selectivity of the flagship in context, the average SAT scores

slide-23
SLIDE 23

Test score discontinuities

Cutoff seems rule-based Earnings slightly higher

.1 .2 .3 .4 .5 .6 .7 .8 .9 1

  • 300 -250 -200 -150 -100
  • 50

50 100 150 200 250 300 350 Local Average

  • .4
  • .3
  • .2
  • .1

.1 .2 (Residual) Natural Log of Earnings

  • 300 -250 -200 -150 -100
  • 50

50 100 150 200 250 300 350 SAT Points Above the Admission Cutoff Predicted Earnings Local Average

Estimated Discontinuity = 0.095 (z = 3.01)

slide-24
SLIDE 24

RDDs are all the rage

People love these things!

They’re intuitive, compelling, and highly graphical

ABSTRACT Methods Matter: P-Hacking and Causal Inference in Economics*

The economics ‘credibility revolution’ has promoted the identification of causal relationships using difference-in-differences (DID), instrumental variables (IV), randomized control trials (RCT) and regression discontinuity design (RDD) methods. The extent to which a reader should trust claims about the statistical significance of results proves very sensitive to
  • method. Applying multiple methods to 13,440 hypothesis tests reported in 25 top
economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading. JEL Classification: A11, B41, C13, C44 Keywords: research methods, causal inference, p-curves, p-hacking, publication bias

Less susceptible to p-hacking and selective publication than DID or IV

slide-25
SLIDE 25

Drawing lines & measuring gaps

slide-26
SLIDE 26

Measure the gap in the outcome for people on both sides of the cutpoint

Main goal of RD

Gap = δ = local average treatment effect (LATE)

slide-27
SLIDE 27

40 60 80 40 60 80 100

AIG test score Final test score

δ

slide-28
SLIDE 28

The size of the gap depends on how you draw the lines on each side of the cutoff

Drawing lines

The type of lines you choose can change the estimate of δ—sometimes by a lot! There’s no one right way to draw lines!

slide-29
SLIDE 29

Parametric vs. nonparametric lines

Line-drawing considerations

Bandwidths Kernels Measuring the gap

slide-30
SLIDE 30

Formulas with parameters

Parametric lines

y = mx + b

<latexit sha1_base64="5SUf5zqAKliBbeqXuPfFCcMOY=">AB8XicbVBNSwMxEJ31s9avqkcvwSIQtmtgl6EohePFewHtkvJptk2NMkuSVZclv4Lx4U8eq/8ea/MW3oK0PBh7vzTAzL4g508Z1v52l5ZXVtfXCRnFza3tnt7S39RoghtkIhHqh1gTmTtGY4bQdK4pFwGkrGN1M/NYjVZpF8t6kMfUFHkgWMoKNlR5SdIXEzpFQa9UdivuFGiReDkpQ456r/TV7UckEVQawrHWHc+NjZ9hZRjhdFzsJprGmIzwgHYslVhQ7WfTi8fo2Cp9FEbKljRoqv6eyLDQOhWB7RTYDPW8NxH/8zqJCS/9jMk4MVS2aIw4chEaPI+6jNFieGpJZgoZm9FZIgVJsaGVLQhePMvL5JmteKdVap35+XadR5HAQ7hCE7AgwuowS3UoQEJDzDK7w52nlx3p2PWeuSk8cwB84nz+RY+K</latexit>

y = β0 + β1x1 + β2x2

<latexit sha1_base64="8eZ/az91uVfG6S71lzMSeHDHUwM=">ACEHicbVDLSsNAFL2pr1pfUZduBosoCWJgm6EohuXFewD2hAm02k7dPJgZiKG0E9w46+4caGIW5fu/BunbQRtPXDh3HPuZeYeP+ZMKsv6MgoLi0vLK8XV0tr6xuaWub3TkFEiCK2TiEei5WNJOQtpXTHFaSsWFAc+p01/eDX2m3dUSBaFtyqNqRvgfsh6jGClJc8TNEF6vhUYc9Cxzmz0b2un87RneOZatiTYDmiZ2TMuSoeZnpxuRJKChIhxL2batWLkZFoRTkelTiJpjMkQ92lb0xAHVLrZ5KAROtBKF/UioStUaKL+3shwIGUa+HoywGogZ72x+J/XTlTv3M1YGCeKhmT6UC/hSEVonA7qMkGJ4qkmAim/4rIAtMlM6wpEOwZ0+eJw2nYp9UnJvTcvUyj6MIe7APR2DGVThGmpQBwIP8AQv8Go8Gs/Gm/E+HS0Y+c4u/IHx8Q3JMZnq</latexit>
slide-31
SLIDE 31
  • 100

100 200 300 400 500 25 50 75 100 y = 1 x

y = 10 + 4x

<latexit sha1_base64="zpVzK4a1J8bRw6VgubFrx+7lX4=">AB8nicbVBNSwMxEM36WetX1aOXYBEoezWgl6EohePFewHbJeSTbNtaDZklxKf0ZXjwo4tVf481/Y9ruQVsfDzem2FmXpgIbsB1v52V1bX1jc3CVnF7Z3dv3Rw2DIq1ZQ1qRJKd0JimOCSNYGDYJ1EMxKHgrXD0e3Ubz8ybiSD5AlLIjJQPKIUwJW8jN8jT0Xn+PaU69UdivuDHiZeDkpoxyNXumr21c0jZkEKogxvucmEIyJBk4FmxS7qWEJoSMyYL6lksTMBOPZyRN8apU+jpS2JQHP1N8TYxIbk8Wh7YwJDM2iNxX/8/wUoqtgzGWSApN0vihKBQaFp/jPteMgsgsIVRzeyumQ6IJBZtS0YbgLb68TFrVindRqd7XyvWbPI4COkYn6Ax56BLV0R1qoCaiSKFn9IreHBenHfnY964uQzR+gPnM8fXhmPWg=</latexit>
slide-32
SLIDE 32

Not just for straight lines! Make curvy with exponents or trigonometry

Parametric lines

y = β0 + β1x + β2x2 + β3x3

<latexit sha1_base64="yj0ZVtjDOZYZltdwbm6TG0oqH4=">ACHicbZBNS8NAEIY3ftb6FfXoZbEIglCSRtCLUPTisYL9gDaGzXbTLt1swu5GkJ/iBf/ihcPinjxIPhv3LahaOvAwjPvzDA7rx8zKpVlfRtLyura+uFjeLm1vbOrm35BRIjCp4hFouUjSRjlpK6oYqQVC4JCn5GmP7ge15sPREga8TuVxsQNUY/TgGKktOSZTgovYcnCnkWPM3JhsMZV+DwvjLHJ05nlmytYk4CLYOZRAHjXP/Ox0I5yEhCvMkJRt24qVmyGhKGZkVOwksQID1CPtDVyFBLpZpPjRvBYK10YREI/ruBE/T2RoVDKNPR1Z4hUX87XxuJ/tXaigs3ozxOFOF4uihIGFQRHDsFu1QrFiqAWFB9V8h7iOBsNJ+FrUJ9vzJi9ColG2nXLk9K1WvcjsK4BAcgRNg3NQBTegBuoAg0fwDF7Bm/FkvBjvxse0dcnIZw7AnzC+fgCKOp3v</latexit>

y = β0 + β1x + β2 sin(x)

<latexit sha1_base64="yvk0tIqNeOUIfd/HbfGjkO+LQrA=">ACEnicbZDLSsNAFIYnXmu9RV26GSxCi1CSKuhGKLpxWcFeoAlhMp20QyeTMDORhtBncOruHGhiFtX7nwbp20Qbf1h4OM/53Dm/H7MqFSW9WUsLa+srq0XNoqbW9s7u+befktGicCkiSMWiY6PJGUk6aipFOLAgKfUba/vB6Um/fEyFpxO9UGhM3RH1OA4qR0pZnVlJ4CR2fKORZ8CQnG45+uAYdSXl5VPHMklW1poKLYOdQArkanvnp9CKchIQrzJCUXduKlZshoShmZFx0EklihIeoT7oaOQqJdLPpSWN4rJ0eDCKhH1dw6v6eyFAoZRr6ujNEaiDnaxPzv1o3UcGFm1EeJ4pwPFsUJAyqCE7ygT0qCFYs1YCwoPqvEA+QFjpFIs6BHv+5EVo1ar2abV2e1aqX+VxFMAhOAJlYINzUAc3oAGaAIMH8ARewKvxaDwb8b7rHXJyGcOwB8ZH9xNprU</latexit>
slide-33
SLIDE 33

200 400 600 25 50 75 100 y = 1 x + 2 x2

y = 120 − 3x + 0.07x2

<latexit sha1_base64="BHsJAGwCWTgrypV0BDQ+fbkQ3PQ=">AB/3icbZDLSsNAFIYn9VbrLSq4cTNYBEsSrUjVB047KCvUAby2Q6aYdOJmFmIg2xC1/FjQtF3Poa7nwbp20W2vrDwMd/zuGc+b2IUaks69vILS2vrK7l1wsbm1vbO+buXkOGscCkjkMWipaHJGUk7qipFWJAgKPEa3vB6Um8+ECFpyO9UEhE3QH1OfYqR0lbXPEjgJbQdC57B8gieQqtkVUb3TtcsapoKLoKdQRFkqnXNr04vxHFAuMIMSdm2rUi5KRKYkbGhU4sSYTwEPVJWyNHAZFuOr1/DI+104N+KPTjCk7d3xMpCqRMAk93BkgN5HxtYv5Xa8fKv3BTyqNYEY5ni/yYQRXCSRiwRwXBiUaEBZU3wrxAmElY6soEOw57+8CA2nZJdLzu15sXqVxZEHh+AInAbVEAV3IAaqAMHsEzeAVvxpPxYrwbH7PWnJHN7IM/Mj5/ADz2kmQ=</latexit>
slide-34
SLIDE 34

200 400 25 50 75 100 y = 1 x + 2 x2 + 3 x3

y = 300 − 25x + 0.65x2 − 0.004x3

<latexit sha1_base64="qyzOFC8ZMPC3g0aE7Ur4ZEVPmTk=">AC3icbZDLSgMxFIYz9VbrerSTWgRBHI9KJuhKIblxXsBXojk2ba0MyFJCMdhu7d+CpuXCji1hdw59uYabvQ1h8CX/5zDsn57YAzqRD6NlIrq2vrG+nNzNb2zu5edv+gLv1QEFojPvdF08aScubRmK02YgKHZtThv26CapNx6okMz37lU0I6LBx5zGMFKW71sLoJXsIgQPIOF8hieQmSel8fdgr4jE6HSuFvsZfMJoLYM0hD+aq9rJf7b5PQpd6inAsZctCgerEWChGOJ1k2qGkASYjPKAtjR52qezE010m8Fg7fej4Qh9Pwan7eyLGrpSRa+tOF6uhXKwl5n+1Vqicy07MvCBU1COzh5yQ+XDJBjYZ4ISxSMNmAim/wrJEAtMlI4vo0OwFldehnrBtIpm4a6Ur1zP40iDI5ADJ8ACF6ACbkEV1ABj+AZvI348l4Md6Nj1lrypjPHI/Mj5/ADx2lXw=</latexit>
slide-35
SLIDE 35
  • 100

100 200 300 400 500 25 50 75 100 y = 1 x + 2 sin(x)

y = 10 + 4x + 50 × sin(x 4 ))

<latexit sha1_base64="p5Ov2BSQuKxSlIijVuEOFCT56Q=">ACE3icbVDLSsNAFJ3UV62vqEs3g0VoFUpSK7oRim5cVrAPaEKZTCft0MkzEykIeQf3Pgrblwo4taNO/G6WOh1QMzHM65l3v8SJGpbKsLyO3tLyupZfL2xsbm3vmLt7LRnGApMmDlkoOh6ShFOmoqRjqRICjwGl7o+uJ374nQtKQ36kIm6ABpz6FCOlpZ5nMBLaFvwBNbG+juzoKNoQCR0JOUlxcIp+MsrWXlcs8sWhVrCviX2HNSBHM0euan0w9xHBCuMENSdm0rUm6KhKYkazgxJECI/QgHQ15UjPdPpTRk80kof+qHQjys4VX92pCiQMgk8XRkgNZSL3kT8z+vGyr9wU8qjWBGOZ4P8mEVwklAsE8FwYolmiAsqN4V4iHSOSgdY0GHYC+e/Je0qhX7tFK9rRXrV/M48uAHISsME5qIMb0ABNgMEDeAIv4NV4NJ6N+N9Vpoz5j374BeMj2+8J5sD</latexit>
slide-36
SLIDE 36

It’s important to get the parameters right!

Parametric lines

Line should fit the data pretty well

slide-37
SLIDE 37

200 400 600 25 50 75 100 y = 1 x y = 1 x + 2 x2

slide-38
SLIDE 38

200 400 25 50 75 100 y = 1 x y = 1 x + 2 x2 y = 1 x + 2 x2 + 3 x3

slide-39
SLIDE 39

Lines without parameters

Nonparametric lines

Use the data to find the best line,

  • ften with windows and moving averages

Locally estimated/weighted scatterplot smoothing (LOESS/LOWESS) is a common method

slide-40
SLIDE 40

100 200 300 400 500 25 50 75 100 Loess

y =

<latexit sha1_base64="lFRCknCdrTm0fYjYMIATybL+Up0=">AB/nicbVBNSwMxEM3Wr1q/VsWTl2ARPJXdKuhFLHrxWMF+QFtKNk3b0GyJLPWshT8K148KOLV3+HNf2Pa7kFbHw83pthZl4QCW7A876dzNLyupadj23sbm1vePu7lWNijVlFaqE0vWAGCa4ZBXgIFg90oyEgWC1YHAz8WsPTBu5D2MItYKSU/yLqcErNR2D0b4EjeBPUIy7Cs8kGporsZtN+8VvCnwIvFTkcpym3q9lRNA6ZBCqIMQ3fi6CVEA2cCjbONWPDIkIHpMcalkoSMtNKpueP8bFVOrirtC0JeKr+nkhIaMwoDGxnSKBv5r2J+J/XiKF70Uq4jGJgks4WdWOBQeFJFrjDNaMgRpYQqrm9FdM+0YSCTSxnQ/DnX14k1WLBPy0U787ypes0jiw6REfoBPnoHJXQLSqjCqIoQc/oFb05T86L8+58zFozTjqzj/7A+fwB3bCVbw=</latexit>
slide-41
SLIDE 41
slide-42
SLIDE 42

100 200 300 400 500 25 50 75 100 y = 1 x y = 1 x + 2 x2 Loess

slide-43
SLIDE 43

Measuring gap with parametric lines

40 60 80 40 60 80 100

AIG test score Final test score

slide-44
SLIDE 44

Easiest way: center the running variable

Measuring gap with parametric lines

ID

  • utcome

running_var running_var_centered treatment 1 90.0 69

  • 6

FALSE 2 85.7 75 TRUE 3 85.8 78 3 TRUE 4 85.7 65

  • 10

FALSE 5 84.4 76 1 TRUE

y = β0 + β1Running variable (centered) + β2Indicator for treatment

<latexit sha1_base64="9Lt8zBRTz0oJIOCIkOazb1/Iz94=">AAACSnicbVDPaxNBGJ1No43RalqPXgZDIVIIu1Gwl0KoF71FMWkgCeHb2W+TobOzy8y3pWHJ39dLT978I7x4UMSLs8ki2vSDgcd77/sxL8yUtOT7X73aXv3Bw/3Go+bjJwdPn7UOj0Y2zY3AoUhVasYhWFRS45AkKRxnBiEJFV6El+9K/eIKjZWp/kyrDGcJLLSMpQBy1LwFK37GpyESzH1+UqGATwmvqfiUay31gl+BkeAm8o5ATWgwerX+a+5V5g86KqemhsfukTuCEudez1ttv+tviu+CoAJtVtVg3voyjVKRl81CgbWTwM9oVoAhKRSum9PcYgbiEhY4cVBDgnZWbKJY82PHRJsL4lQT37D/dhSQWLtKQudMgJb2rlaS92mTnOLTWSF1lhNqsV0U54pTystceSQNClIrB0AY6W7lYgkGhMvLNl0Iwd0v74JRrxu87vY+vmn3z6s4GuwFe8k6LGBvWZ+9ZwM2ZILdsG/sB/vp3XrfvV/e76215lU9z9l/Vav/AbXEsko=</latexit>
slide-45
SLIDE 45

Measuring gap with parametric lines

40 60 80 40 60 80 100

AIG test score Final test score

slide-46
SLIDE 46

Measuring gap with nonparametric lines

Can’t use regression; use rdrobust R package

40 60 80 40 60 80 100

AIG test score Final test score

slide-47
SLIDE 47

Measuring gap with nonparametric lines

40 60 80 40 60 80 100

AIG test score Final test score

slide-48
SLIDE 48

Bandwidths

All you really care about is the area right around the cutoff

Observations far away don’t matter because they’re not comparable

Bandwidth = window around cutoff

slide-49
SLIDE 49

500 1000 5 10 15 20

Bandwidth = 5

500 1000 5 10 15 20

Bandwidth = 2.5

slide-50
SLIDE 50

Bandwidths

Algorithms exist to choose optimal width Also use common sense For robustness, check what happens if you double and halve the bandwidth

Maybe ±5 for the AIG test?

slide-51
SLIDE 51

Kernels

Because we care the most about

  • bservations right by the cutoff,

give more distant ones less weight Kernel = method for assigning importance to values based on their distance to the cutoff

slide-52
SLIDE 52

0.00 0.25 0.50 0.75 1.00

  • 1.0
  • 0.5

0.0 0.5 1.0

Distance from cutoff Weight

Uniform Triangular Epanechnikov

slide-53
SLIDE 53

Rectangular Triangular Epanechnikov 0.00 0.25 0.50 0.75 1.00

Weight

slide-54
SLIDE 54

Try everything!

Your estimate of δ depends on all these:

Line type (parametric vs. nonparametric) Bandwidth (wide vs. narrow) Kernel weighting

Try lots of different combinations!

slide-55
SLIDE 55

500 1000 5 10 15 20 Linear

slide-56
SLIDE 56

500 1000 5 10 15 20 Linear Linear (bw = 5)

slide-57
SLIDE 57

500 1000 5 10 15 20 Linear Linear (bw = 5) Linear (bw = 2.5)

slide-58
SLIDE 58

Main RDD concerns

slide-59
SLIDE 59

You need lots of data, since you’re throwing most of it away

It’s greedy!

500 1000 5 10 15 20

Bandwidth = 5

500 1000 5 10 15 20

Bandwidth = 2.5

slide-60
SLIDE 60

You’re only measuring the ATE for people in the bandwidth Local Average Treatment Effect (LATE)

It’s limited in scope!

slide-61
SLIDE 61

You can’t make population-level claims with a LATE

(But can you really do that with RCTs and diff-in-diff anyway?)

It’s limited in scope!

“The realistic conclusion to draw is that all quantitative empirical results that we encounter are ‘local’”

Angrist and Pischke, Mostly Harmless Econometrics, p. 23–24

slide-62
SLIDE 62

Graphics are neat!

A B C

  • 50

50 100

slide-63
SLIDE 63

Which ones are significant?

A B C

  • 50

50 100

slide-64
SLIDE 64

All of them!

= 63.29; t = 14.197; p = <0.001 = 25.02; t = 5.694; p = <0.001 = 8.8; t = 1.997; p = 0.046

  • 50

50 100

slide-65
SLIDE 65

Don’t rely only on graphics!

Super clear breaks are uncommon

  • = 8.8; t = 1.997; p = 0.046

Make graphs, but also find the actual δ value

slide-66
SLIDE 66

People might fudge numbers or work to hit the threshold to get in/out of program

Manipulation!

(if people know about the cutoff)

If so, those right next to the cutoff are no longer comparable treatment/control groups

slide-67
SLIDE 67

25,000 50,000 75,000 100,000 02:00 02:30 03:00 03:30 04:00 04:30 05:00 05:30 06:00 06:30 07:00

Finish time (each bar is one minute) Number of finishers N = 9,589,053

Distribution of marathon finishing times

Eric J. Allen, Patricia M. Dechow, Devin G. Pope, George Wu (2017) Reference-Dependent Preferences: Evidence from Marathon Runners. Management Science 63(6):1657-1672. https://doi.org/10.1287/mnsc.2015.2417

slide-68
SLIDE 68

Check with a McCrary density test

rddensity::rdplotdensity() in R

Manipulation

0.1 0.2 0.3 0.4

  • 2
  • 1

1 2

Running variable Density

Manipulation

0.0 0.1 0.2 0.3 0.4 0.5

  • 2
  • 1

1 2

Running variable Density

No manipulation

slide-69
SLIDE 69

People on the margin of the cutoff might end up in/out of the program

The ACA, subsidies, Medicaid, and 138% of the poverty line

Noncompliance!

Sharp vs. fuzzy discontinuities

slide-70
SLIDE 70

FALSE TRUE 40 60 80 100

AIG test score Participated in AIG program Test score 75

FALSE TRUE

Sharp discontinuity

Perfect compliance

slide-71
SLIDE 71

FALSE TRUE 40 60 80 100

AIG test score Participated in AIG program Test score 75

FALSE TRUE

Fuzzy discontinuity

Imperfect compliance

slide-72
SLIDE 72

Address noncompliance with instrumental variables (more on those next time!)

Use an instrument for which side of the cutoff people should be on

Fuzzy discontinuities

Effect is only for compliers near the cutoff (complier LATE; doubly local)

slide-73
SLIDE 73

RDD with R

slide-74
SLIDE 74

1: Is assignment to treatment rule-based?

If not, stop!

2: Is design fuzzy or sharp?

Either is fine; sharp is easier.

3: Is there a discontinuity in running variable at cutpoint?

Hopefully not.

4: Is there a discontinuity in outcome variable at cutpoint in running variable?

Hopefully.

5: How big is the gap?

Measure parametrically and nonparametrically.

slide-75
SLIDE 75

R time!