Gov 2002 - Causal Inference III: Regression Discontinuity Designs - - PowerPoint PPT Presentation

gov 2002 causal inference iii regression discontinuity
SMART_READER_LITE
LIVE PREVIEW

Gov 2002 - Causal Inference III: Regression Discontinuity Designs - - PowerPoint PPT Presentation

Gov 2002 - Causal Inference III: Regression Discontinuity Designs Matthew Blackwell Arthur Spirling October 16th, 2014 Introduction Causal for us so far: selection of observables, instrumental variables for when this doesnt hold


slide-1
SLIDE 1

Gov 2002 - Causal Inference III: Regression Discontinuity Designs

Matthew Blackwell Arthur Spirling October 16th, 2014

slide-2
SLIDE 2

Introduction

◮ Causal for us so far: selection of observables, instrumental

variables for when this doesn’t hold

slide-3
SLIDE 3

Introduction

◮ Causal for us so far: selection of observables, instrumental

variables for when this doesn’t hold

◮ Basic idea behind both: find some plausibly exogeneous

variation in the treatment assignment

slide-4
SLIDE 4

Introduction

◮ Causal for us so far: selection of observables, instrumental

variables for when this doesn’t hold

◮ Basic idea behind both: find some plausibly exogeneous

variation in the treatment assignment

◮ Selection on observables: treatment as-if random conditional

  • n Xi
slide-5
SLIDE 5

Introduction

◮ Causal for us so far: selection of observables, instrumental

variables for when this doesn’t hold

◮ Basic idea behind both: find some plausibly exogeneous

variation in the treatment assignment

◮ Selection on observables: treatment as-if random conditional

  • n Xi

◮ IV: instrument provides exogeneous variation

slide-6
SLIDE 6

Introduction

◮ Causal for us so far: selection of observables, instrumental

variables for when this doesn’t hold

◮ Basic idea behind both: find some plausibly exogeneous

variation in the treatment assignment

◮ Selection on observables: treatment as-if random conditional

  • n Xi

◮ IV: instrument provides exogeneous variation ◮ Regression Discontinuity: exogeneous variation from a

discontinuity in treatment assignment

slide-7
SLIDE 7

Plan of attack

Sharp Regression Discontinuity Designs Estimation in the SRD Readings Fuzzy Regression Discontinuity Designs

slide-8
SLIDE 8

Sharp Regression Discontinuity Designs

slide-9
SLIDE 9

Setup

◮ The basic idea behind regression discontinuity designs is that

we have a variable, Xi, that we call the forcing variable, which determines (partly or wholly) the treatment assignment

  • n either side of a fixed threshold.
slide-10
SLIDE 10

Setup

◮ The basic idea behind regression discontinuity designs is that

we have a variable, Xi, that we call the forcing variable, which determines (partly or wholly) the treatment assignment

  • n either side of a fixed threshold.

◮ This variable may or may not be related to the potential

  • utcomes, but we assume that relationship is smooth, so that

changes in the outcome around the threshold can be interpretted as a causal effect.

slide-11
SLIDE 11

Setup

◮ The basic idea behind regression discontinuity designs is that

we have a variable, Xi, that we call the forcing variable, which determines (partly or wholly) the treatment assignment

  • n either side of a fixed threshold.

◮ This variable may or may not be related to the potential

  • utcomes, but we assume that relationship is smooth, so that

changes in the outcome around the threshold can be interpretted as a causal effect.

◮ The classic example of this is in the educational context:

slide-12
SLIDE 12

Setup

◮ The basic idea behind regression discontinuity designs is that

we have a variable, Xi, that we call the forcing variable, which determines (partly or wholly) the treatment assignment

  • n either side of a fixed threshold.

◮ This variable may or may not be related to the potential

  • utcomes, but we assume that relationship is smooth, so that

changes in the outcome around the threshold can be interpretted as a causal effect.

◮ The classic example of this is in the educational context:

◮ Scholarships allocated based on a test score threshold

(Thistlethwaite and Campbell, 1960)

slide-13
SLIDE 13

Setup

◮ The basic idea behind regression discontinuity designs is that

we have a variable, Xi, that we call the forcing variable, which determines (partly or wholly) the treatment assignment

  • n either side of a fixed threshold.

◮ This variable may or may not be related to the potential

  • utcomes, but we assume that relationship is smooth, so that

changes in the outcome around the threshold can be interpretted as a causal effect.

◮ The classic example of this is in the educational context:

◮ Scholarships allocated based on a test score threshold

(Thistlethwaite and Campbell, 1960)

◮ Class size on test scores using total student thresholds to create

new classes (Angrist and Lavy, 1999)

slide-14
SLIDE 14

Notation

◮ Treatment: Ai = 1 or Ai = 0

slide-15
SLIDE 15

Notation

◮ Treatment: Ai = 1 or Ai = 0 ◮ Potential outcomes, Yi(1) and Yi(0)

slide-16
SLIDE 16

Notation

◮ Treatment: Ai = 1 or Ai = 0 ◮ Potential outcomes, Yi(1) and Yi(0) ◮ Observed outcomes:

Yi = Yi(1)Ai + Yi(0)(1 − Ai)

slide-17
SLIDE 17

Notation

◮ Treatment: Ai = 1 or Ai = 0 ◮ Potential outcomes, Yi(1) and Yi(0) ◮ Observed outcomes:

Yi = Yi(1)Ai + Yi(0)(1 − Ai)

◮ Forcing variable: Xi ∈ R

slide-18
SLIDE 18

Notation

◮ Treatment: Ai = 1 or Ai = 0 ◮ Potential outcomes, Yi(1) and Yi(0) ◮ Observed outcomes:

Yi = Yi(1)Ai + Yi(0)(1 − Ai)

◮ Forcing variable: Xi ∈ R ◮ Covariates: an M-length vector Zi = (Z1i, . . . , ZMi)

slide-19
SLIDE 19

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

slide-20
SLIDE 20

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

slide-21
SLIDE 21

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

slide-22
SLIDE 22

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship

slide-23
SLIDE 23

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship ◮ When test scores are below 1500 → not offered scholarship

slide-24
SLIDE 24

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship ◮ When test scores are below 1500 → not offered scholarship ◮ Key assumption: no compliance problems (deterministic)

slide-25
SLIDE 25

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship ◮ When test scores are below 1500 → not offered scholarship ◮ Key assumption: no compliance problems (deterministic) ◮ At the threshold, c, we only see treated units and below the

threshold c − ε, we only see control values:

slide-26
SLIDE 26

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship ◮ When test scores are below 1500 → not offered scholarship ◮ Key assumption: no compliance problems (deterministic) ◮ At the threshold, c, we only see treated units and below the

threshold c − ε, we only see control values:

slide-27
SLIDE 27

Design

◮ In a sharp RD design, the treatment assignment is a

deterministic function of the forcing variable and the threshold, c so that:

Assumption SRD

Ai = 1{Xi ≥ c} ∀i

◮ When test scores are above 1500 → offered scholarship ◮ When test scores are below 1500 → not offered scholarship ◮ Key assumption: no compliance problems (deterministic) ◮ At the threshold, c, we only see treated units and below the

threshold c − ε, we only see control values: P(Ai = 1|Xi = c) = 1 P(Ai = 1|Xi = c − ε) = 0

slide-28
SLIDE 28

Threshold

◮ Intuitively, we are interested in the discontinuity in the outcome

at the discontinuity in the treatment assignment.

slide-29
SLIDE 29

Threshold

◮ Intuitively, we are interested in the discontinuity in the outcome

at the discontinuity in the treatment assignment.

◮ We want to investigate the behavior of the outcome around the

threshold: lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x]

slide-30
SLIDE 30

Threshold

◮ Intuitively, we are interested in the discontinuity in the outcome

at the discontinuity in the treatment assignment.

◮ We want to investigate the behavior of the outcome around the

threshold: lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x] ◮ Under certain assumptions, this quantity identifies the ATE at

the threshold: τSRD = E[Yi(1) − Yi(0)|Xi = c]

slide-31
SLIDE 31

Plotting the RDD (Imbens and Lemieux, 2008)

slide-32
SLIDE 32

Comparison to traditional setup

◮ Note that ignorability here hold by design, because condition

  • n the forcing variable, the treatment is deterministic.

Yi(1), Yi(0) ⊥ ⊥ Ai|Xi

slide-33
SLIDE 33

Comparison to traditional setup

◮ Note that ignorability here hold by design, because condition

  • n the forcing variable, the treatment is deterministic.

Yi(1), Yi(0) ⊥ ⊥ Ai|Xi

◮ Again, we can’t directly use this because we know that the

usual posivity assumption is violated. Remember that positivity is an overlap condition: 0 < Pr[Ai = 1|Xi = x] < 1

slide-34
SLIDE 34

Comparison to traditional setup

◮ Note that ignorability here hold by design, because condition

  • n the forcing variable, the treatment is deterministic.

Yi(1), Yi(0) ⊥ ⊥ Ai|Xi

◮ Again, we can’t directly use this because we know that the

usual posivity assumption is violated. Remember that positivity is an overlap condition: 0 < Pr[Ai = 1|Xi = x] < 1

◮ Here, obviously, the propensity score is only 0 or 1, depending

  • n the value of the forcing variable.
slide-35
SLIDE 35

Comparison to traditional setup

◮ Note that ignorability here hold by design, because condition

  • n the forcing variable, the treatment is deterministic.

Yi(1), Yi(0) ⊥ ⊥ Ai|Xi

◮ Again, we can’t directly use this because we know that the

usual posivity assumption is violated. Remember that positivity is an overlap condition: 0 < Pr[Ai = 1|Xi = x] < 1

◮ Here, obviously, the propensity score is only 0 or 1, depending

  • n the value of the forcing variable.

◮ Thus, we need to extrapolate from the treated to the control

group and vice versa.

slide-36
SLIDE 36

Extrapolation and smoothness

◮ Remember the quantity of interest here is the effect at the

threshold: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c]

slide-37
SLIDE 37

Extrapolation and smoothness

◮ Remember the quantity of interest here is the effect at the

threshold: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c]

◮ But we don’t observe E[Yi(0)|Xi = c] ever due to the design,

so we’re going to extrapolate from E[Yi(0)|Xi = c − ε].

slide-38
SLIDE 38

Extrapolation and smoothness

◮ Remember the quantity of interest here is the effect at the

threshold: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c]

◮ But we don’t observe E[Yi(0)|Xi = c] ever due to the design,

so we’re going to extrapolate from E[Yi(0)|Xi = c − ε].

◮ Extrapolation, even at short distances, requires a certain

smoothness in the functions we are extrapolating.

slide-39
SLIDE 39

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

slide-40
SLIDE 40

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

◮ This continuity implies the following:

E[Yi(0)|Xi = c] = lim

x↑c E[Yi(0)|Xi = x]

(continuity)

slide-41
SLIDE 41

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

◮ This continuity implies the following:

E[Yi(0)|Xi = c] = lim

x↑c E[Yi(0)|Xi = x]

(continuity)

slide-42
SLIDE 42

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

◮ This continuity implies the following:

E[Yi(0)|Xi = c] = lim

x↑c E[Yi(0)|Xi = x]

(continuity) = lim

x↑c E[Yi(0)|Ai = 0, Xi = x]

(SRD)

slide-43
SLIDE 43

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

◮ This continuity implies the following:

E[Yi(0)|Xi = c] = lim

x↑c E[Yi(0)|Xi = x]

(continuity) = lim

x↑c E[Yi(0)|Ai = 0, Xi = x]

(SRD) = lim

x↑c E[Yi|Xi = x]

(consistency/SRD)

slide-44
SLIDE 44

Continuity of the CEFs

Assumption 1: Continuity

The functions E[Yi(0)|Xi = x] and E[Yi(1)|Xi = x] are continuous in x.

◮ This continuity implies the following:

E[Yi(0)|Xi = c] = lim

x↑c E[Yi(0)|Xi = x]

(continuity) = lim

x↑c E[Yi(0)|Ai = 0, Xi = x]

(SRD) = lim

x↑c E[Yi|Xi = x]

(consistency/SRD)

◮ Note that this is the same for the treated group:

E[Yi(1)|Xi = c] = lim

x↓c E[Yi|Xi = x]

slide-45
SLIDE 45

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c]

slide-46
SLIDE 46

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c]

slide-47
SLIDE 47

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c]

slide-48
SLIDE 48

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c] = lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x]

slide-49
SLIDE 49

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c] = lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x] ◮ Note that each of these is identified at least with infinite data,

as long as Xi has positive density around the cutpoint

slide-50
SLIDE 50

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c] = lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x] ◮ Note that each of these is identified at least with infinite data,

as long as Xi has positive density around the cutpoint

◮ Why? With arbitrarily high N, we’ll get an arbitrarily good

approximations to the expectation of the line

slide-51
SLIDE 51

Identification results

◮ Thus, under the ignorability assumption, the sharp RD

assumption, and the continuity assumption, we have: τSRD = E[Yi(1) − Yi(0)|Xi = c] = E[Yi(1)|Xi = c] − E[Yi(0)|Xi = c] = lim

x↓c E[Yi|Xi = x] − lim x↑c E[Yi|Xi = x] ◮ Note that each of these is identified at least with infinite data,

as long as Xi has positive density around the cutpoint

◮ Why? With arbitrarily high N, we’ll get an arbitrarily good

approximations to the expectation of the line

◮ How to estimate these nonparametrically is difficult as we’ll see

(endpoints are a big problem)

slide-52
SLIDE 52

What can go wrong?

◮ If the potential outcomes change at the discontinuity for

reasons other than the treatment, then smoothness will be violated.

slide-53
SLIDE 53

What can go wrong?

◮ If the potential outcomes change at the discontinuity for

reasons other than the treatment, then smoothness will be violated.

◮ For instance, if people sort around threshold, then you might

get jumps other than the one you care about.

slide-54
SLIDE 54

What can go wrong?

◮ If the potential outcomes change at the discontinuity for

reasons other than the treatment, then smoothness will be violated.

◮ For instance, if people sort around threshold, then you might

get jumps other than the one you care about.

◮ If things other than the treatment change at the threshold,

then that might cause discontinuities in the potential outcomes.

slide-55
SLIDE 55

Estimation in the SRD

slide-56
SLIDE 56

Graphical approaches

◮ Simple plot of mean outcomes within bins of the forcing

variable: Y k = 1 Nk

N

  • i=1

Yi · I(bk < Xi ≤ bk+1) where Nk is the number of units within bin k and bk are the bin cutpoints.

slide-57
SLIDE 57

Graphical approaches

◮ Simple plot of mean outcomes within bins of the forcing

variable: Y k = 1 Nk

N

  • i=1

Yi · I(bk < Xi ≤ bk+1) where Nk is the number of units within bin k and bk are the bin cutpoints.

◮ Obvious discontinuity at the threshold?

slide-58
SLIDE 58

Graphical approaches

◮ Simple plot of mean outcomes within bins of the forcing

variable: Y k = 1 Nk

N

  • i=1

Yi · I(bk < Xi ≤ bk+1) where Nk is the number of units within bin k and bk are the bin cutpoints.

◮ Obvious discontinuity at the threshold? ◮ Are there other, unexplained discontinuities?

slide-59
SLIDE 59

Graphical approaches

◮ Simple plot of mean outcomes within bins of the forcing

variable: Y k = 1 Nk

N

  • i=1

Yi · I(bk < Xi ≤ bk+1) where Nk is the number of units within bin k and bk are the bin cutpoints.

◮ Obvious discontinuity at the threshold? ◮ Are there other, unexplained discontinuities? ◮ As Imbens and Lemieux say:

slide-60
SLIDE 60

Graphical approaches

◮ Simple plot of mean outcomes within bins of the forcing

variable: Y k = 1 Nk

N

  • i=1

Yi · I(bk < Xi ≤ bk+1) where Nk is the number of units within bin k and bk are the bin cutpoints.

◮ Obvious discontinuity at the threshold? ◮ Are there other, unexplained discontinuities? ◮ As Imbens and Lemieux say:

The formal statistical analyses discussed below are essentially just sophisticated versions of this, and if the basic plot does not show any evidence of a discontinuity, there is relatively little chance that the more sophisticated analyses will lead to robust and credible estimates with statistically and substantially significant magnitudes.

slide-61
SLIDE 61

Example from RD on extending unemployment

slide-62
SLIDE 62

Other graphs to include

◮ Next, it’s a good idea to plot covariates by the forcing variable

to see if these covariates also jump at the discontinuity.

slide-63
SLIDE 63

Other graphs to include

◮ Next, it’s a good idea to plot covariates by the forcing variable

to see if these covariates also jump at the discontinuity.

◮ Same binning strategy:

Z km = 1 Nk

N

  • i=1

Zim · I(bk < Xi ≤ bk+1)

slide-64
SLIDE 64

Other graphs to include

◮ Next, it’s a good idea to plot covariates by the forcing variable

to see if these covariates also jump at the discontinuity.

◮ Same binning strategy:

Z km = 1 Nk

N

  • i=1

Zim · I(bk < Xi ≤ bk+1)

◮ Intuition: our key assumption is that the potential outcomes

are smooth in the forcing variable.

slide-65
SLIDE 65

Other graphs to include

◮ Next, it’s a good idea to plot covariates by the forcing variable

to see if these covariates also jump at the discontinuity.

◮ Same binning strategy:

Z km = 1 Nk

N

  • i=1

Zim · I(bk < Xi ≤ bk+1)

◮ Intuition: our key assumption is that the potential outcomes

are smooth in the forcing variable.

◮ Discontinuities in covariates unaffected by the threshold could

be indications of discontinuities in the potential outcomes.

slide-66
SLIDE 66

Other graphs to include

◮ Next, it’s a good idea to plot covariates by the forcing variable

to see if these covariates also jump at the discontinuity.

◮ Same binning strategy:

Z km = 1 Nk

N

  • i=1

Zim · I(bk < Xi ≤ bk+1)

◮ Intuition: our key assumption is that the potential outcomes

are smooth in the forcing variable.

◮ Discontinuities in covariates unaffected by the threshold could

be indications of discontinuities in the potential outcomes.

◮ Similar to balance tests in matching

slide-67
SLIDE 67

Checking covariates at the discontinuity

slide-68
SLIDE 68

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x]

slide-69
SLIDE 69

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x] ◮ It turns out that this is a hard problem because we want to

estimate the regression at a single point and that point is a boundary point.

slide-70
SLIDE 70

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x] ◮ It turns out that this is a hard problem because we want to

estimate the regression at a single point and that point is a boundary point.

◮ As a result, the usual kinds of nonparametric estimators

perform poorly.

slide-71
SLIDE 71

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x] ◮ It turns out that this is a hard problem because we want to

estimate the regression at a single point and that point is a boundary point.

◮ As a result, the usual kinds of nonparametric estimators

perform poorly.

◮ In general, we are going to have to choose some way of

estimating the regression functions around the cutpoint.

slide-72
SLIDE 72

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x] ◮ It turns out that this is a hard problem because we want to

estimate the regression at a single point and that point is a boundary point.

◮ As a result, the usual kinds of nonparametric estimators

perform poorly.

◮ In general, we are going to have to choose some way of

estimating the regression functions around the cutpoint.

◮ Using the entire sample on either side will obviously lead to

bias because those values that are far from the cutpoint are clearly different than those nearer to the cutpoint.

slide-73
SLIDE 73

General estimation strategy

◮ The main goal in RD is to estimate the limits of various CEFs

such as: lim

x↑c E[Yi|Xi = x] ◮ It turns out that this is a hard problem because we want to

estimate the regression at a single point and that point is a boundary point.

◮ As a result, the usual kinds of nonparametric estimators

perform poorly.

◮ In general, we are going to have to choose some way of

estimating the regression functions around the cutpoint.

◮ Using the entire sample on either side will obviously lead to

bias because those values that are far from the cutpoint are clearly different than those nearer to the cutpoint.

◮ → restrict our estimation to units close to the threshold.

slide-74
SLIDE 74

Example of misleading trends

  • 10
  • 5

5 10 100 200 300 x y

slide-75
SLIDE 75

Nonparametric and semiparametric approaches

◮ Let’s define

µR(x) = lim

z↓x E[Yi(1)|Xi = z]

µL(x) = lim

z↑x E[Yi(0)|Xi = z]

slide-76
SLIDE 76

Nonparametric and semiparametric approaches

◮ Let’s define

µR(x) = lim

z↓x E[Yi(1)|Xi = z]

µL(x) = lim

z↑x E[Yi(0)|Xi = z] ◮ For the SRD, we have τSRD = µ1(x) − µ0(x).

slide-77
SLIDE 77

Nonparametric and semiparametric approaches

◮ Let’s define

µR(x) = lim

z↓x E[Yi(1)|Xi = z]

µL(x) = lim

z↑x E[Yi(0)|Xi = z] ◮ For the SRD, we have τSRD = µ1(x) − µ0(x). ◮ One nonparametric approach is to estimate nonparametrically

µL(x) with a uniform kernel:

  • µL(c) =

N

i=1 Yi · I{c − h ≤ Xi < c}

N

i=1 I{c − h ≤ Xi < c}

slide-78
SLIDE 78

Nonparametric and semiparametric approaches

◮ Let’s define

µR(x) = lim

z↓x E[Yi(1)|Xi = z]

µL(x) = lim

z↑x E[Yi(0)|Xi = z] ◮ For the SRD, we have τSRD = µ1(x) − µ0(x). ◮ One nonparametric approach is to estimate nonparametrically

µL(x) with a uniform kernel:

  • µL(c) =

N

i=1 Yi · I{c − h ≤ Xi < c}

N

i=1 I{c − h ≤ Xi < c} ◮ Here, h is a bandwidth parameter, selected by you.

slide-79
SLIDE 79

Nonparametric and semiparametric approaches

◮ Let’s define

µR(x) = lim

z↓x E[Yi(1)|Xi = z]

µL(x) = lim

z↑x E[Yi(0)|Xi = z] ◮ For the SRD, we have τSRD = µ1(x) − µ0(x). ◮ One nonparametric approach is to estimate nonparametrically

µL(x) with a uniform kernel:

  • µL(c) =

N

i=1 Yi · I{c − h ≤ Xi < c}

N

i=1 I{c − h ≤ Xi < c} ◮ Here, h is a bandwidth parameter, selected by you. ◮ Basically, calculate means among units no more than h away

from the threshold.

slide-80
SLIDE 80

Bandwidth equal to 7

  • 10
  • 5

5 10 100 200 300 x y

slide-81
SLIDE 81

Bandwidth equal to 5

  • 10
  • 5

5 10 100 200 300 x y

slide-82
SLIDE 82

Bandwidth equal to 1

  • 10
  • 5

5 10 100 200 300 x y

slide-83
SLIDE 83

Local averages

◮ Estimate mean of Yi when Xi ∈ [c, c + h] and when

Xi ∈ [c − h, c).

slide-84
SLIDE 84

Local averages

◮ Estimate mean of Yi when Xi ∈ [c, c + h] and when

Xi ∈ [c − h, c).

◮ Can do this with the following approach regression on those

units less than h away from c: ( α, τ) = arg min

α,τ

  • i:Xi∈[c−h,c+h]

(Yi − α − τAi)2

slide-85
SLIDE 85

Local averages

◮ Estimate mean of Yi when Xi ∈ [c, c + h] and when

Xi ∈ [c − h, c).

◮ Can do this with the following approach regression on those

units less than h away from c: ( α, τ) = arg min

α,τ

  • i:Xi∈[c−h,c+h]

(Yi − α − τAi)2

◮ Here,

τSRD = τ.

slide-86
SLIDE 86

Local averages

◮ Estimate mean of Yi when Xi ∈ [c, c + h] and when

Xi ∈ [c − h, c).

◮ Can do this with the following approach regression on those

units less than h away from c: ( α, τ) = arg min

α,τ

  • i:Xi∈[c−h,c+h]

(Yi − α − τAi)2

◮ Here,

τSRD = τ.

◮ This turns out to have very large bias as the we increase the

bandwidth.

slide-87
SLIDE 87

Local linear regression

◮ Instead of a local constant, we can use a local linear regression.

slide-88
SLIDE 88

Local linear regression

◮ Instead of a local constant, we can use a local linear regression. ◮ Run a linear regression of Yi on Xi − c in the group

Xi ∈ [c, c + h] to estimate µ1(x) and the same regression for group with Xi ∈ [c − h, c): ( αL, βL) = arg min

α,β

  • i:Xi∈[c−h,c)

(Yi − α − β(Xi − c))2 ( αR, βR) = arg min

α,β

  • i:Xi∈[c,c+h]

(Yi − α − β(Xi − c))2

slide-89
SLIDE 89

Local linear regression

◮ Instead of a local constant, we can use a local linear regression. ◮ Run a linear regression of Yi on Xi − c in the group

Xi ∈ [c, c + h] to estimate µ1(x) and the same regression for group with Xi ∈ [c − h, c): ( αL, βL) = arg min

α,β

  • i:Xi∈[c−h,c)

(Yi − α − β(Xi − c))2 ( αR, βR) = arg min

α,β

  • i:Xi∈[c,c+h]

(Yi − α − β(Xi − c))2

◮ Our estimate is

  • τSRD =

µR(c) − µL(c) = αR + βR(c − c) − αL − βL(c − c) = αR − αL

slide-90
SLIDE 90

More practical estimation

◮ We can estimate this local linear regression by dropping

  • bservations more than h away from c and then running the

following regression: Yi = α + β(Xi − c) + τAi + γ(Xi − c)Ai + ηi

slide-91
SLIDE 91

More practical estimation

◮ We can estimate this local linear regression by dropping

  • bservations more than h away from c and then running the

following regression: Yi = α + β(Xi − c) + τAi + γ(Xi − c)Ai + ηi

◮ Here we just have an interaction term between the treatment

status and the forcing variable.

slide-92
SLIDE 92

More practical estimation

◮ We can estimate this local linear regression by dropping

  • bservations more than h away from c and then running the

following regression: Yi = α + β(Xi − c) + τAi + γ(Xi − c)Ai + ηi

◮ Here we just have an interaction term between the treatment

status and the forcing variable.

◮ Here,

τSRD = τ which is the coefficient on the treatment.

slide-93
SLIDE 93

More practical estimation

◮ We can estimate this local linear regression by dropping

  • bservations more than h away from c and then running the

following regression: Yi = α + β(Xi − c) + τAi + γ(Xi − c)Ai + ηi

◮ Here we just have an interaction term between the treatment

status and the forcing variable.

◮ Here,

τSRD = τ which is the coefficient on the treatment.

◮ Yields numerically the same as the separate regressions.

slide-94
SLIDE 94

Bandwidth equal to 10 (Global)

  • 10
  • 5

5 10 100 200 300 x y

slide-95
SLIDE 95

Bandwidth equal to 7

  • 10
  • 5

5 10 100 200 300 x y

slide-96
SLIDE 96

Bandwidth equal to 5

  • 10
  • 5

5 10 100 200 300 x y

slide-97
SLIDE 97

Bandwidth equal to 1

  • 10
  • 5

5 10 100 200 300 x y

slide-98
SLIDE 98

Odds and ends for the SRD

◮ Standard errors: robust standard errors from local OLS are

valid.

slide-99
SLIDE 99

Odds and ends for the SRD

◮ Standard errors: robust standard errors from local OLS are

valid.

◮ Covariates: shouldn’t matter, but can include them for

increased precision.

slide-100
SLIDE 100

Odds and ends for the SRD

◮ Standard errors: robust standard errors from local OLS are

valid.

◮ Covariates: shouldn’t matter, but can include them for

increased precision.

◮ ALWAYS REPORT MODELS WITHOUT COVARIATES

FIRST

slide-101
SLIDE 101

Odds and ends for the SRD

◮ Standard errors: robust standard errors from local OLS are

valid.

◮ Covariates: shouldn’t matter, but can include them for

increased precision.

◮ ALWAYS REPORT MODELS WITHOUT COVARIATES

FIRST

◮ You can include polynomials of the forcing variable in the local

  • regression. Let ˜

Xi = Xi − c Yi = α + β1 ˜ Xi + β2 ˜ X 2

i + τAi + γ1 ˜

XiAi + γ2 ˜ X 2

i Ai + ηi

slide-102
SLIDE 102

Odds and ends for the SRD

◮ Standard errors: robust standard errors from local OLS are

valid.

◮ Covariates: shouldn’t matter, but can include them for

increased precision.

◮ ALWAYS REPORT MODELS WITHOUT COVARIATES

FIRST

◮ You can include polynomials of the forcing variable in the local

  • regression. Let ˜

Xi = Xi − c Yi = α + β1 ˜ Xi + β2 ˜ X 2

i + τAi + γ1 ˜

XiAi + γ2 ˜ X 2

i Ai + ηi ◮ Make sure that your effects aren’t dependent on the polynomial

choice.

slide-103
SLIDE 103

Bandwidth selection

◮ The choice of bandwidth is fairly important here and we want

it to be smaller as N grows.

slide-104
SLIDE 104

Bandwidth selection

◮ The choice of bandwidth is fairly important here and we want

it to be smaller as N grows.

◮ In general, we can use cross-validation techniques to choose

the optimal bandwidth.

slide-105
SLIDE 105

Bandwidth selection

◮ The choice of bandwidth is fairly important here and we want

it to be smaller as N grows.

◮ In general, we can use cross-validation techniques to choose

the optimal bandwidth.

◮ See Imbens and Kalyanaraman (2012) for optimal bandwidth

selection.

slide-106
SLIDE 106

Readings

slide-107
SLIDE 107

Reading 1

slide-108
SLIDE 108

Reading 1

slide-109
SLIDE 109

Reading 2

slide-110
SLIDE 110

Fuzzy Regression Discontinuity Designs

slide-111
SLIDE 111

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

slide-112
SLIDE 112

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

slide-113
SLIDE 113

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

Assumption FRD

lim

x↓c Pr[Ai = 1|Xi = x] = lim x↑c Pr[Ai = 1|Xi = x]

slide-114
SLIDE 114

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

Assumption FRD

lim

x↓c Pr[Ai = 1|Xi = x] = lim x↑c Pr[Ai = 1|Xi = x]

slide-115
SLIDE 115

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

Assumption FRD

lim

x↓c Pr[Ai = 1|Xi = x] = lim x↑c Pr[Ai = 1|Xi = x] ◮ In the sharp RD, this is also true, but it further requried the

jump in probability to be from 0 to 1.

slide-116
SLIDE 116

Setup

◮ With fuzzy RD, the treatment assignment is no longer a

deterministic function of the forcing variable, but there is still a discontinuity in the probability of treatment at the threshold:

Assumption FRD

lim

x↓c Pr[Ai = 1|Xi = x] = lim x↑c Pr[Ai = 1|Xi = x] ◮ In the sharp RD, this is also true, but it further requried the

jump in probability to be from 0 to 1.

◮ Fuzzy RD is often useful when the a threshold encourages

participation in program, but does not actually force units to participate.

slide-117
SLIDE 117

Fuzzy RD in a graph

slide-118
SLIDE 118

Fuzzy RD is IV

◮ Forcing variable is an instrument:

slide-119
SLIDE 119

Fuzzy RD is IV

◮ Forcing variable is an instrument:

◮ affects Yi, but only through Ai (at the threshold)

slide-120
SLIDE 120

Fuzzy RD is IV

◮ Forcing variable is an instrument:

◮ affects Yi, but only through Ai (at the threshold)

◮ Let Ai(x) be the potential value of treatment when we set the

forcing variable to x, for some small neighborhood around c.

slide-121
SLIDE 121

Fuzzy RD is IV

◮ Forcing variable is an instrument:

◮ affects Yi, but only through Ai (at the threshold)

◮ Let Ai(x) be the potential value of treatment when we set the

forcing variable to x, for some small neighborhood around c.

◮ Ai(x) = 1 if unit i would take treatment when Xi was x

slide-122
SLIDE 122

Fuzzy RD is IV

◮ Forcing variable is an instrument:

◮ affects Yi, but only through Ai (at the threshold)

◮ Let Ai(x) be the potential value of treatment when we set the

forcing variable to x, for some small neighborhood around c.

◮ Ai(x) = 1 if unit i would take treatment when Xi was x ◮ Ai(x) = 0 if unit i would take control when Xi was x

slide-123
SLIDE 123

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

slide-124
SLIDE 124

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

◮ Increasing the forcing variable doesn’t encourage people to take

the treatment

slide-125
SLIDE 125

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

◮ Increasing the forcing variable doesn’t encourage people to take

the treatment

slide-126
SLIDE 126

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

◮ Increasing the forcing variable doesn’t encourage people to take

the treatment

Assumption 3: Local Exogeneity of Forcing Variable

In a neighborhood of c, {τi, Ai(x)} ⊥ ⊥ Xi

slide-127
SLIDE 127

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

◮ Increasing the forcing variable doesn’t encourage people to take

the treatment

Assumption 3: Local Exogeneity of Forcing Variable

In a neighborhood of c, {τi, Ai(x)} ⊥ ⊥ Xi

slide-128
SLIDE 128

Fuzzy RD assumptions

Assumption 2: Monotoncity

There exists ε such that Ai(c + e) ≥ Ai(c − e) for all 0 < e < ε

◮ Increasing the forcing variable doesn’t encourage people to take

the treatment

Assumption 3: Local Exogeneity of Forcing Variable

In a neighborhood of c, {τi, Ai(x)} ⊥ ⊥ Xi

◮ Basically, in an ε-ball around c, the forcing variable is randomly

assigned.

slide-129
SLIDE 129

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

slide-130
SLIDE 130

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

slide-131
SLIDE 131

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

◮ Compliers would:

slide-132
SLIDE 132

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

◮ Compliers would:

◮ apply to grad school if their GPA was just above the threshold

slide-133
SLIDE 133

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

◮ Compliers would:

◮ apply to grad school if their GPA was just above the threshold ◮ not apply to grad school if their GPA was just below the

threshold

slide-134
SLIDE 134

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

◮ Compliers would:

◮ apply to grad school if their GPA was just above the threshold ◮ not apply to grad school if their GPA was just below the

threshold

◮ We don’t get to see their compliance status because due to the

fundamental problem of causal inference

slide-135
SLIDE 135

Compliance in Fuzzy RDs

◮ Compliers are those i such that for all 0 < e < ε:

Ai(c + e) = 1 and Ai(c − e) = 0

◮ Think about college students that get above a certain GPA are

encouraged to apply to grad school.

◮ Compliers would:

◮ apply to grad school if their GPA was just above the threshold ◮ not apply to grad school if their GPA was just below the

threshold

◮ We don’t get to see their compliance status because due to the

fundamental problem of causal inference

◮ Could also think about this as changing the threshold instead

  • f changing Xi
slide-136
SLIDE 136

Compliance graph

Cutoff Ai(x) c − ε c c + ε 1 Compliers

◮ Compliers would not take the treatment if they had Xi = c and

we increased the cutoff by some small amount

slide-137
SLIDE 137

Compliance graph

Cutoff Ai(x) c − ε c c + ε 1 Compliers

◮ Compliers would not take the treatment if they had Xi = c and

we increased the cutoff by some small amount

◮ These are compliers at the threshold

slide-138
SLIDE 138

Compliance groups

◮ Compliers: Ai(c + e) = 1

and Ai(c − e) = 0

Cutoff Ai(x) c − ε c c + ε 1 Never Takers Always Takers

slide-139
SLIDE 139

Compliance groups

◮ Compliers: Ai(c + e) = 1

and Ai(c − e) = 0

◮ Always-takers: Ai(c + e) = Ai(c − e) = 1

Cutoff Ai(x) c − ε c c + ε 1 Never Takers Always Takers

slide-140
SLIDE 140

Compliance groups

◮ Compliers: Ai(c + e) = 1

and Ai(c − e) = 0

◮ Always-takers: Ai(c + e) = Ai(c − e) = 1 ◮ Never-takers: $A_i(c + e) = A_i(c-e) = 0 $

Cutoff Ai(x) c − ε c c + ε 1 Never Takers Always Takers

slide-141
SLIDE 141

Compliance groups

◮ Compliers: Ai(c + e) = 1

and Ai(c − e) = 0

◮ Always-takers: Ai(c + e) = Ai(c − e) = 1 ◮ Never-takers: $A_i(c + e) = A_i(c-e) = 0 $

Cutoff Ai(x) c − ε c c + ε 1 Never Takers Always Takers Compliers

slide-142
SLIDE 142

LATE in the Fuzzy RD

◮ We can define an estimator that is in the spirit of IV:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x] = effect of threshold on Yi effect of threshold on Ai

slide-143
SLIDE 143

LATE in the Fuzzy RD

◮ We can define an estimator that is in the spirit of IV:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x] = effect of threshold on Yi effect of threshold on Ai

◮ Under the FRD assumption, continuity, consistency,

monotonicity, and local exogeneity, we can write that the estimator is equal to the effect at the threshold for compliers. τFRD = lim

e↓0 E[τi|Ai(c + e) > Ai(c − e)]

slide-144
SLIDE 144

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

slide-145
SLIDE 145

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai)

slide-146
SLIDE 146

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai)

slide-147
SLIDE 147

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai

slide-148
SLIDE 148

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai = Yi(0) + τiAi

slide-149
SLIDE 149

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai = Yi(0) + τiAi

◮ Plug this into the CEF of the outcome:

E[Yi|Xi = c + e] = E[Yi(0) + τiAi|Xi = c + e]

slide-150
SLIDE 150

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai = Yi(0) + τiAi

◮ Plug this into the CEF of the outcome:

E[Yi|Xi = c + e] = E[Yi(0) + τiAi|Xi = c + e]

slide-151
SLIDE 151

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai = Yi(0) + τiAi

◮ Plug this into the CEF of the outcome:

E[Yi|Xi = c + e] = E[Yi(0) + τiAi|Xi = c + e] = E[Yi(0) + τiAi(c + e)]

slide-152
SLIDE 152

Proof

◮ To prove this, we’ll look at the discontinuity in Yi in a window

around the threshold and then shrink that window: E[Yi|Xi = c + e] − E[Yi|Xi = c − e]

◮ First, remember that by consistency,

Yi = Yi(1)Ai + Yi(0)(1 − Ai) = Yi(0) + (Yi(1) − Yi(0))Ai = Yi(0) + τiAi

◮ Plug this into the CEF of the outcome:

E[Yi|Xi = c + e] = E[Yi(0) + τiAi|Xi = c + e] = E[Yi(0) + τiAi(c + e)]

◮ Thus, we can write the difference around the threshold as:

E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi(Ai(c+e)−Ai(c−e))]

slide-153
SLIDE 153

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] =

slide-154
SLIDE 154

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] =

slide-155
SLIDE 155

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] = E[τi × 1 | complier ] × Pr[complier] +E[τi × − 1 | defier] × Pr[defier] +E[τi × (Ai(c + e) − Ai(c − e)) | always] × Pr[always] +E[τi × (Ai(c + e) − Ai(c − e)) | never] × Pr[never]

slide-156
SLIDE 156

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] = E[τi × 1 | complier ] × Pr[complier] +E[τi × − 1 | defier] × 0 +E[τi × (Ai(c + e) − Ai(c − e)) | always] × Pr[always] +E[τi × (Ai(c + e) − Ai(c − e)) | never] × Pr[never]

slide-157
SLIDE 157

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] = E[τi × 1 | complier ] × Pr[complier] +E[τi × − 1 | defier] × 0 +E[τi × 0 | always] × Pr[always] +E[τi × 0 | never] × Pr[never]

slide-158
SLIDE 158

Proof (cont)

◮ Let’s break this expectation apart using the law of iterated

expectations: E[τi(Ai(c + e) − Ai(c − e))] = E[τi × 1 | complier ] × Pr[complier] +E[τi × − 1 | defier] × 0 +E[τi × 0 | always] × Pr[always] +E[τi × 0 | never] × Pr[never] = E[τi | complier] × Pr[complier]

slide-159
SLIDE 159

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

slide-160
SLIDE 160

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1]

slide-161
SLIDE 161

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1]

slide-162
SLIDE 162

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1] = E[Ai(c + e) − Ai(c − e)]

slide-163
SLIDE 163

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1] = E[Ai(c + e) − Ai(c − e)] = E[Ai(c + e)] − E[Ai(c − e)]

slide-164
SLIDE 164

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1] = E[Ai(c + e) − Ai(c − e)] = E[Ai(c + e)] − E[Ai(c − e)] = E[Ai(c + e)|Xi = c + e] − E[Ai(c − e)|Xi = c − e]

slide-165
SLIDE 165

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1] = E[Ai(c + e) − Ai(c − e)] = E[Ai(c + e)] − E[Ai(c − e)] = E[Ai(c + e)|Xi = c + e] − E[Ai(c − e)|Xi = c − e] = E[Ai|Xi = c + e] − E[Ai|Xi = c − e]

slide-166
SLIDE 166

Proof (cont)

◮ So far, we’ve shown that the outcome jump at the

discontinuity is the LATE times the probability of compliance: E[Yi|Xi = c+e]−E[Yi|Xi = c−e] = E[τi | complier]×Pr[complier]

◮ What is the probability of compliance though?

Pr[complier] = Pr[Ai(c + e) − Ai(c − e) = 1] = E[Ai(c + e) − Ai(c − e)] = E[Ai(c + e)] − E[Ai(c − e)] = E[Ai(c + e)|Xi = c + e] − E[Ai(c − e)|Xi = c − e] = E[Ai|Xi = c + e] − E[Ai|Xi = c − e]

◮ Thus,

E[Yi|Xi = c + e] − E[Yi|Xi = c − e] E[Ai|Xi = c + e] − E[Ai|Xi = c − e] = E[τi | Ai(c+e) > Ai(c−e)]

slide-167
SLIDE 167

Misc notes

◮ Taking the limit as e → 0, we’ve shown that:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x] = lim

e↓0 E[τi|Ai(c + e) > Ai(c − e)]

slide-168
SLIDE 168

Misc notes

◮ Taking the limit as e → 0, we’ve shown that:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x] = lim

e↓0 E[τi|Ai(c + e) > Ai(c − e)] ◮ Note that the FRD estimator emcompasses the SRD estimator

because with a sharp design: lim

x↓c E[Ai|Xi = x] − lim x↑c E[Ai|Xi = x] = 1

slide-169
SLIDE 169

Misc notes

◮ Taking the limit as e → 0, we’ve shown that:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x] = lim

e↓0 E[τi|Ai(c + e) > Ai(c − e)] ◮ Note that the FRD estimator emcompasses the SRD estimator

because with a sharp design: lim

x↓c E[Ai|Xi = x] − lim x↑c E[Ai|Xi = x] = 1 ◮ A note on external validity: obsviously, FRD puts even more

restrictions on the external validity of our estimates because not only are we discussing a LATE, but also the effect is at the

  • threshold. That might give us pause about generalizing other

populations for the both the SRD and FRD.

slide-170
SLIDE 170

Estimation in FRD

◮ Remember that we had:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x]

slide-171
SLIDE 171

Estimation in FRD

◮ Remember that we had:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x]

◮ We can estimate the numerator using the SRD approaches we

just outlined, τSRD.

slide-172
SLIDE 172

Estimation in FRD

◮ Remember that we had:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x]

◮ We can estimate the numerator using the SRD approaches we

just outlined, τSRD.

◮ For the denominator, we simply apply the local linear regression

to the Ai: ( αaL, βaL) = arg min

α,β

  • i:Xi∈[c−h,c)

(Ai − α − β(Xi − c))2 ( αaR, βaR) = arg min

α,β

  • i:Xi∈[c,c+h]

(Ai − α − β(Xi − c))2

slide-173
SLIDE 173

Estimation in FRD

◮ Remember that we had:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x]

◮ We can estimate the numerator using the SRD approaches we

just outlined, τSRD.

◮ For the denominator, we simply apply the local linear regression

to the Ai: ( αaL, βaL) = arg min

α,β

  • i:Xi∈[c−h,c)

(Ai − α − β(Xi − c))2 ( αaR, βaR) = arg min

α,β

  • i:Xi∈[c,c+h]

(Ai − α − β(Xi − c))2

◮ Use this to calculate the effect of threshold on Ai:

  • τa =

αaR − αaL

slide-174
SLIDE 174

Estimation in FRD

◮ Remember that we had:

τFRD = limx↓c E[Yi|Xi = x] − limx↑c E[Yi|Xi = x] limx↓c E[Ai|Xi = x] − limx↑c E[Ai|Xi = x]

◮ We can estimate the numerator using the SRD approaches we

just outlined, τSRD.

◮ For the denominator, we simply apply the local linear regression

to the Ai: ( αaL, βaL) = arg min

α,β

  • i:Xi∈[c−h,c)

(Ai − α − β(Xi − c))2 ( αaR, βaR) = arg min

α,β

  • i:Xi∈[c,c+h]

(Ai − α − β(Xi − c))2

◮ Use this to calculate the effect of threshold on Ai:

  • τa =

αaR − αaL

◮ Calculate ratio estimator:

  • τFRD =

τSRD

  • τa
slide-175
SLIDE 175

More practical FRD estimation

◮ The ratio estimator above is equivalent to a TSLS approach.

slide-176
SLIDE 176

More practical FRD estimation

◮ The ratio estimator above is equivalent to a TSLS approach. ◮ Use the same specification as above with the following

covariates: Vi =

   

1 I{Xi < c}(Xi − c) I{Xi ≥ c}(Xi − c)

   

slide-177
SLIDE 177

More practical FRD estimation

◮ The ratio estimator above is equivalent to a TSLS approach. ◮ Use the same specification as above with the following

covariates: Vi =

   

1 I{Xi < c}(Xi − c) I{Xi ≥ c}(Xi − c)

   

◮ First stage:

Ai = δ′

1Vi + ρI{Xi ≥ c} + νi

slide-178
SLIDE 178

More practical FRD estimation

◮ The ratio estimator above is equivalent to a TSLS approach. ◮ Use the same specification as above with the following

covariates: Vi =

   

1 I{Xi < c}(Xi − c) I{Xi ≥ c}(Xi − c)

   

◮ First stage:

Ai = δ′

1Vi + ρI{Xi ≥ c} + νi ◮ Second stage:

Yi = δ′

2Vi + τAi + ηi

slide-179
SLIDE 179

More practical FRD estimation

◮ The ratio estimator above is equivalent to a TSLS approach. ◮ Use the same specification as above with the following

covariates: Vi =

   

1 I{Xi < c}(Xi − c) I{Xi ≥ c}(Xi − c)

   

◮ First stage:

Ai = δ′

1Vi + ρI{Xi ≥ c} + νi ◮ Second stage:

Yi = δ′

2Vi + τAi + ηi ◮ Thus, being above the threshold is treated like an instrument,

controlling for trends in Xi.