Heterogeneity, Endogeneity and Causal Effect Estimation
Kevin Sheppard
❤tt♣s✿✴✴✇✇✇✳❦❡✈✐♥s❤❡♣♣❛r❞✳❝♦♠
Oxford MFE
This version: March 11, 2020
March 2020
Heterogeneity, Endogeneity and Causal Effect Estimation Kevin - - PDF document
Heterogeneity, Endogeneity and Causal Effect Estimation Kevin Sheppard ttssr Oxford MFE This version: March 11, 2020 March 2020 Causal Effect Estimation Potential
Oxford MFE
This version: March 11, 2020
March 2020
Potential Outcomes Challenges in Effect Estimation Experimental and Quasi-Experimenal Data ◮ Randomized Controlled Experiments and ATE ◮ Imperfect Compliance and LATE Observational Data ◮ Regression Discontinuity ◮ Difference-in-Difference ◮ Panel Models 2 / 50
Observed outcome for individual or firm i
Di is the treatment status variable for individual i
Outcome variable is determined by
β1i is a heterogeneous treatment effect for individual i Also known as the potential outcomes model Two outcomes
3 / 50
ATE is a weighted average
Average Treatment Effect on the Untreated (TUT)
Should we measure ATE or TOT? ◮ TOT makes sense when treatment is non-compulsory Individuals who do not undertake treatment are not relevant for cost-benefit
◮ ATE is more sensible for mandatory programs Measures the effect on both those who would like to participate and those who
5 / 50
Estimate the regression on observed data
◮ ˆ
p
p
Leads to selection bias
In terms of the regression
SB is the difference in the no-treatment outcomes for the treated and
6 / 50
Fundamental problem: Cannot see counterfactual
No data on Yi (1) when Di = 0 and Yi (0) when Di = 1 TOT measures the effect conditional on receiving treatment ◮ Missing counterfactual: E
7 / 50
Example: Financial Stress and Payday loans Outcome is a measure of financial distress: 90-days delinquent on a debt Treatment is taking out a payday loan TOT: Difference in delinquency if loan taken or not given loan wanted
SB: Difference in outcome if loan not taken for those who want a loan and
◮ Plausible TOT is negative but SB is positive ◮ Positive SB if
◮ Observed effect could have either sign 8 / 50
Randomization removes selection bias Well executed Randomized Controlled Trials are the gold standard for
A RCT ensures that
Randomly give loans only to those seeking them ◮ Creates group with Yi (0) as if D = 1
Knowledge of W provides no information about Z. 9 / 50
Track outcomes of both groups
Internal Validity: are the results valid for the sample used? ◮ Is the assignment actual random?
◮ Are participants complying? ◮ Are there spill-overs of non-rival treatments to non-treated? ◮ Hawthorne Effect: studying a subject changes their behavior External Validity: do the results generalize to a broader sample? ◮ Is the RCT sample representative of the target population? ◮ Are there other key personnel that are essential for success? 11 / 50
Previous result requires perfect compliance ◮ Treated if offered, not-treated if not offered When treatment is not random, or compliance is not perfect, simple
Possible to use an instrument to recover a meaningful measure of
Measure is local in the sense that it measures the effect of a particular
Notation ◮ Di is treatment status ◮ Zi is treatment assignment (offer to treat) Compliance ◮ Perfect if Di = Zi ◮ Imperfect if Di = Zi for some i Zi may be random even if Di is not ◮ Treatment assignment is made by lottery due to limited capacity (Zi) ◮ Treatment status conditional on offer depends on expected benefits (Di) 12 / 50
Leads to two-equation system
Causal chain Zi → Di → Yi Treatment equation measures potential treatment status
◮ Di (0) = π0i is status when not assigned ◮ Di (1) = π0i + π1i is status when not assigned ◮ Both Di (0) and Di (1) may be 0 or 1 Treatment responsiveness π1i is heterogeneous like treatment effect β1i 13 / 50
Often described as as if randomly assigned Note that the instrument is independent of the potential treatment status Zi does not affect the probability that either occur (π•i) Zi does not affect the outcomes if treatment is taken or not (β•i) Is this a reasonable assumption? ◮ Often plausible when Zi is assigned using randomization (lottery) ◮ Sometimes plausible for Zi taken from observational data 14 / 50
Violations of the exclusion restriction mean that Zi affects Yi through more
Classic example is when Zi directly affects both Yi and Di In many cases, Zi affects Di and another variable Xi which in turn affects
Suppose selection for a randomly assigned government funding program
If selection also increases the probability that a firm receives series B
Exclusion restriction ensures that Z does not affect the potential outcome
15 / 50
The 2SLS estimator obtained by
In large samples
1 p
LATE is a weighted average of treatment effects Weights are determined by responsiveness to treatment assignment ◮ Holds if either of Di or Zi are not binary If effects are not heterogeneous (β1i = β1 or π1i = π1) then LATE = ATE 16 / 50
Useful to describe structure implied by Di and Zi Four types of program participants ◮ Compliers: Di = Zi (π0i = 0, π1i = 1) ◮ Always-takers: Di = 1 for any Zi (π0i = 1, π1i = 0) ◮ Never-takers: Di = 0 for any Zi (π0i = π1i = 0) ◮ Defiers: Di = 1 − Zi (π0i = 1, π1i = −1) Compliers are the ideal candidates and ultimately what we can measure Defiers invalidate measurement using the instrument LATE is determined only by compliers and defiers 17 / 50
18 / 50
Common to report the effect on the intention-to-treat
Difference in outcomes conditional on only the instrument which
With perfect compliance ITT = LATE 19 / 50
2008 Medicaid expansion in US state or Oregon Used a lottery to choose participants from a waiting list Constructed a control group from non-winners Not everyone selected participated in the program (imperfect compliance) Non-selected prohibited from participation Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J.P., Allen, H., Baicker, K. and Oregon Health Study Group, 2012. The Oregon Health Insurance Experiment: evidence from the first year. The Quarterly journal of economics, 127(3), pp.1057-1106. 20 / 50
Estimating Intent-to-Treat (ITT)
◮ i: individual, h: household, j: domain of variable ◮ LOTTERYh indicates household was selected by the lottery (Zi = 1) ◮ Xih are required controls and Vih are optional controls Estimating LATE
◮ INSURANCEih is a measure of insurance coverage (Di) ◮ Use 2SLS
◮ Insurance is “Ever on Medicaid” 21 / 50
22 / 50
Repeated cross-sections across multiple periods Examine evidence between treated group and control group Control chosen to be similar except for treatment Basic model
Assume two periods, treatment only in second Treated individuals or firms have
23 / 50
Two groups, A (treated) and B (untreated) Construct averages
Difference across groups removes own effects
Difference across time removes time effects
Solution is to difference twice
24 / 50
Key assumption
Uses lagged average of A to remove group-specific effects Generates counter-factual using group B 25 / 50
Counter-factual can be interpreted as parallel line
t = 1 t = 2
E[YA1] E[YA2|D = 0] E[YA2|D = 1] E[YB1] E[YB2|D = 0]
Effect
26 / 50
Estimated as a regression
◮ Extends trivially to include other controls
Uses dummy variables
◮ Extends to multiple groups and time periods
G
T
Main issue: Violation of parallel trends assumption
27 / 50
t = 1 t = 2 t = 3 t = 4 t = 5 False Effect
Treated Untreated False Counterfactual Intervention
28 / 50
Examine how uninsured deposits are affected by implicit Too Big Too Fail
Variation introduced by changes in the limits of the amounts insured Main model
◮ Bank b, Year t ◮ k indicates the range of deposits in DKK 50,000 bins (e.g., 0–50K,
◮ Interest in the difference between accounts that remain insured with those
Key variable is Abovek × Aftert which would have a 0 coefficient if no effect Iyer, R., Lærkholm Jensen, T., Johannesen, N., & Sheridan, A. (2019). The Distortive Effects of Too Big To Fail: Evidence from the Danish Market for Retail
29 / 50
30 / 50
31 / 50
32 / 50
Regression discontinuity exploits discrete jumps in treatment as a
Individuals with Xi ≤ X are untreated Individuals with Xi ≤ X are treated Theoretical framework examines the difference locally with X
Intuition is that individuals are homogeneous local to X 33 / 50
In practice often included as part of a model
Can use more sophisticated models Causality from model requires treatment indicator I[Xi≥X] to be
Also requires functional form to be correct that that E
◮ Rules out neglected nonlinearities 34 / 50
1 2 3
3 6 9 Treatment Threshold Fitted Regression Estimated Effect (Discontinuity)
1 2 3
3 6 9
E[Y |X ]
Fitted Regression Estimated Effect
35 / 50
0.0 0.2 0.4 5
0.0 0.1 0.2 5
0.00 0.05 0.10 5
E[Y |X ]
Fitted Regression Estimated Effect
36 / 50
Investigate the effect of CSR proposals that just pass or just fail board
Key assumption: the difference between just passing and failing is as if
◮ Close failures and close passes are identical aside from the vote Regress abnormal returns on a dummy for passing ◮ Returns computed using Carhart 4 factor model (FF3 + Momentum) Use RDD estimate is based on a small window near a tied vote Also considers a full model which is piece-wise polynomial
Flammer, C. (2015). Does corporate social responsibility lead to superior financial performance? A regression discontinuity approach. Management Science, 61(11), 2549-2568. 37 / 50
38 / 50
–0.010 –0.005 0.000 0.005 0.010 0.015 0.020 –50 –45 –40 –35 –30 –25 –20 –15 –10 –5 5 10 15 20 25 30 35 40 45 50
Abnormal return on the day of the vote
Victory margin (2% bins)
vertical axis indicates abnormal returns on the day of the vote. Abnormal returns are computed using the four-factor model of Carhart (1997).
39 / 50
Panel data is double indexed
◮ i is the entity (or unit): Traders, Firms, Borrowers, ... ◮ t is the time period Panels track entities over time N entities, T time periods ◮ N is assumed to be large, T is usually small ◮ Asymptotics assume N → ∞ 40 / 50
Panel data can be used to estimate pooled OLS models
◮ Ignores the panel structure Panel structure allows us to model unobserved heterogeneity
Wi is a vector of entity-specific characteristics Key: Assumed to be time invariant Estimating pooled OLS results in biased coefficients if Wi is correlated
In large samples,
p
◮ This is omitted variable bias 41 / 50
1 2 3 4 5 6
2 4 6 Entity 1 Entity 2 Entity 3 Pooled OLS
42 / 50
Could collect data on Wi if available, and include in the model In many plausible scenarios it is not observable ◮ Individual ability or intrinsic motivation ◮ Firm management culture Fixed Effect estimator allow β to be estimated when Wi is not known Note that Wiγ is a constant for entity i
Demean entity-by-entity
ωi is time-invariant so ωi − ¯
Note that FE models cannot estimate time-invariant effects 43 / 50
Model is equivalent to including entity dummies
When T = 2 identical to first-difference estimator
Inefficient to use first difference estimator for T ≥ 3 44 / 50
1 2
1 2 Entity 1 (Demeaned) Entity 2 (Demeaned) Entity 3 (Demeaned) Fixed-Effect Regression
45 / 50
Estimates of ˆ
Estimated using OLS on entity-wise demeaned data
◮ Known as the Least Squares Dummy Variable (LSDV) estimator ◮ Intercept is not meaningful in FE models when reported 46 / 50
Robust inference requires a clustered variance covariance estimator ◮ White Covariance
XX SΣ−1 XX
it ˜
it ˜
N
T
it ˜
it ˜
◮ Clustered (Rogers) covariance
N
′ ˆ
T
Replace S when an estimator that allows dependence within entity ξ′ξ contains all squares and cross-products Imposes no restrictions on the dependence within an entity 47 / 50
Panels models often include time effects
γt is a shock that affects all observations in period t ◮ Commonly used for model common aggregate movement Estimated regression use data demeaned using entity and time period When N is large and T is small, time effects are consistently estimated ◮ Does not need special treatment for inference ◮ Identical to including dummies for each time period In general, fixed effects can be used to remove constant effects in any
◮ Industry ◮ City, County, State, or Regional 48 / 50
Examine how test pass rate is affected by teacher financial distress
◮ Subscripts: c: campus g: grade of student t: year ◮ Bankruptcycgt is teacher bankruptcy ◮ Xcgt are average teacher characteristics ◮ Zcgtare student demographic characteristics Maturana, G., & Nickerson, J. (2019). Real effects of workers’ financial distress: Evidence from teacher spillovers. Journal of Financial Economics. 49 / 50
Fixed Effects ◮ δdt - District-Year: Control for local economic conditions ◮ ηgt- Grade-Year: Control for changes to the test across grade and time ◮ φcg- Campus-Grade: Control for heterogeneity across different campuses
Estimates are computed as deviations from all three FE Standard errors are clustered by campus-grade and campus-year ◮ CG allows arbitrary correlation within all students in a single grade and
◮ CY allows arbitrary correlation within all students in a single campus and
50 / 50