Synthetic Difference in Differences
Dmitry Arkhangelsky Susan Athey David Hirshberg Guido Imbens Stefan Wager
- JSM. August 3rd, 2020.
1
Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey - - PowerPoint PPT Presentation
Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey David Hirshberg Guido Imbens Stefan Wager JSM. August 3rd, 2020. 1 When Berkeley implemented the first soda tax, we compared to San Francisco. While Berkeley, the first
Synthetic Difference in Differences
Dmitry Arkhangelsky Susan Athey David Hirshberg Guido Imbens Stefan Wager
1
When Berkeley implemented the first soda tax, we compared to San Francisco.
While Berkeley, the first U.S. city to pass a “soda tax,” saw a substantial decline of 0.13 times/day in the consumption of soda in the months following implementation of the tax in March 2015, neighboring San Francisco, where a soda-tax measure was defeated, and Oakland, saw a 0.03 times/day increase, according to a study published today in the American Journal of Public Health.
2
This is how we did it. Berkeley San Francisco Hallucinated Parallel Berkeley
3
This is a “Difference-in-Differences” estimate
τ = Y (1)
BK,post − Y (0) BK,post.
ˆ τ = [Y (1)
BK,post − Y (0) BK,pre] − [Y (0) SF,post − Y (0) SF,pre]
Y (0)
city,time ≈ αcity + β · 1{time = post}.
4
Difference in Differences
Things get interesting when we observe many units over many time periods. We focus on simultaneous adoption. 1, . . . , T0 T0 + 1, . . . , T = T0 + T1 1 . . . no treatment no treatment N0 N0 + 1 . . . no treatment treatment N = N0 + N1
it
∼ αi + βt + wτ
applied to the averages our 4 ‘blocks’
5
California’s anti-smoking legislation (Proposition 99)
60 90 120 1970 1980 1990 2000 Average Control California
A 25 cents/pack excise tax increase took effect in 1989. California ≈
1 49Alaska + 1 49Alabama + . . . 6
California’s anti-smoking legislation: Difference-in-Differences
60 90 120 1970 1980 1990 2000 Average Control California
If we average and hallucinate a line, it obviously doesn’t fit.
7
Synthetic Controls
compare it to something else.
[Abadie, Diamond, and Hainmueller, 2010]
ˆ ωnYnt
≈ ¯ Ytreated,t
for all t ≤ T0.
the mean post-treatment difference between treated and synthetic control. ˆ τ = 1 T1
Ytreated,t −
ˆ ωnYnt
8
California’s anti-smoking legislation: Synthetic Control
40 60 80 100 120 1970 1980 1990 2000 Synthetic Control California
When comparing to a synthetic control, trends line up better. California ≈ .3 Utah + .2 Nevada + .15 Montana + . . .
9
Improving on Synthetic Control
Instead of constructing a unit for a cross-sectional comparison, construct a unit and time period for a diff-in-diff comparison.
This is a double robust version of synthetic control. If the before/after comparison is good, the unit comparison doesn’t have to be. And it’s easier to make them good. Constants shifts get differenced out, so constructed parallel trends are as good as overlaid.
10
California’s anti-smoking legislation: Constructed Parallel Trends
80 120 160 1970 1980 1990 2000
sdid sc
11
California’s anti-smoking legislation: Constructed Parallel Trends
80 120 160 1970 1980 1990 2000
sdid sc
11
California’s anti-smoking legislation: Constructed Parallel Trends
80 120 160 1970 1980 1990 2000
sdid sc
11
California’s anti-smoking legislation: Constructed Parallel Trends
80 120 160 1970 1980 1990 2000
sdid sc
11
California’s anti-smoking legislation: Constructed Parallel Trends
80 120 160 1970 1980 1990 2000
sdid sc
11
California’s anti-smoking legislation: Double Robustness
−40 Alabama Arkansas Colorado Connecticut Delaware Georgia Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Mexico North Carolina North Dakota Ohio Oklahoma Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia West Virginia Wisconsin Wyoming
unit.weight
0.10 0.15 0.20 0.25
estimator
sdid
12
Implementation
ω by simplex-constrained least squares
ˆ ω = arg min
ω0,ω
Ytreated,t 2 + ζ2T0ω2 subject to ω1 . . . ωN0 ≥ 0,
N0
ωn = 1 Use an intercept. We want parallel lines, not overlaid ones. Use a ridge penalty; multicollinearity is typical. Shrinkage helps control variance and own-observation bias.
λ, on the control units.
13
Synthetic Difference-in-Differences
synthetic pre-treatment average post-treatment synthetic control
ˆ ωnˆ λtYnt
ˆ ωnT −1
1
Ynt average treated
N −1
1
ˆ λtYnt
N −1
1
T −1
1
Ynt DID uses equal weights ωn = 1/N0, λt = 1/T0. SC only take one difference (uses zero time weights λt = 0).
14
Theory
A General Setting
Ynt = Lnt + Wntτnt + εnt, E[ε | W] = 0
We estimate the ATT ¯ τ = 1 N1T1
Wntτnt
15
What can go wrong?
Underfitting We don’t create parallel trends in pre-treatment outcomes. Overfitting We do, but by predicting signal from noise. Failed identification We adjust as intended, but we’re still confounded.
16
Underfitting
It happens, but it tends to be something we can see. e.g., California cigarette consumption with southeastern states as controls.
50 100 150 1970 1980 1990 2000
california
California ≈ .82 Louisiana + .10 Mississippi + . . .
17
Overfitting
We prove concentration around an oracle estimator to rule out overfitting.
expected (as opposed to empirical) mean squared error. ˜ ω = arg min
ω0,ω∈R×S
Ltreated,t 2 + [trace(Σ) + ζ2T0]ω2 ˜ λ = arg min
λ0,λ∈R×S
Ln,post 2 + N0Σ1/2(λ − ψ)2. We’re in an error-in-variables model, so implicit ridge penalty terms arise as the expectation of quadratics in the noise matrix ε. Σ = E εT
n,preεn,pre
pre-treatment autocovariance matrix ψ = arg min
v∈RT0
E (εn,prev − ¯ εn,post)2 post-on-pre autoregression vector
τ uses these in place of the empirical minimizers.
18
Concentration around the oracle
Deviation from the oracle is essentially bilinear in the weight differences. ˆ τ − ˜ τ ≈ (ˆ ω − ˜ ω)T Lcontrol,pre (ˆ λ − ˜ λ) ≤ ˆ ω − ˜ ω
λ − ˜ λ)
Cauchy-Schwarz bounds depend on prediction error and coefficient error. We characterize these using a version of the ‘slow rate’ analysis for the lasso.
that depends logarithmically on its dimension.
ridge regularization and improves with its strength ζ.
depending on the fit and dispersion (2-norm) of the limiting weights. Ridge regularization helps, as long as the limiting weights still fit the data.
λ − ˜ λ)
˜ λ
Nef = min
ω−1 , ˆ ω − ˜ ω √log T0 ζT 1/2
˜ ω
Tef = min
λ−1 .
19
Oracle bias
The oracle estimator’s bias is caused by changes in the predictive bias
N1Ltre,post − ˜
ωT Lcon,post − ˜ ω0
counterfactual post-treatment bias of ˜ ω
−
N1Ltre,pre − ˜
ωT Lcon,pre − ˜ ω0
λ
bias of ˜ ω over the synthetic pre-treatment period
This change is small if either:
I’ve written this in terms of the bias of the unit weights ˜ ω above, but there is an analogous decomposition swapping the roles of ˜ ω and ˜ λ. Here an ∈ Rn = n−11.
20
Oracle normality
Our oracle estimator’s error is approximately normal around the oracle bias. ˜ τ −τ −bias ≈ aT
N1
λ
ωT εcon,postaT1 − εcon,pre˜ λ
This goes for the real estimator if its deviation from the oracle is negligible, in which case we can estimate variance by resampling units. With autocorrelated noise, variance is reduced by the inclusion of time weights, as they are predictive of the post-treatment noise.
21
−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 5 10 15
Minimum wage assignment
estimate Density SDID SC DID
Error in simulation based on Bertrand, Duflo, and Mullainathan [2004].
22
Is our identification strategy a problem?
we need more ‘minimal’ assumptions.
23
arxiv.org/abs/1812.09970 github.com/davidahirshberg/synthdid
24
References
Alberto Abadie, Alexis Diamond, and Jens Hainmueller. Synthetic control methods for comparative case studies: Estimating the effect of californias tobacco control program. Journal of the American Statistical Association, 105(490), 2010. Marianne Bertrand, Esther Duflo, and Sendhil Mullainathan. How much should we trust differences-in-differences estimates? The Quarterly journal of economics, 119(1):249–275, 2004.
25