SLIDE 1 Randomized Experiments
- The goal of randomized experiments is to identify…
- The causal effect!
- Advantage of causal effect described by statisticians:
“The advantage of causal predictors compared with non-causal predictors is that their influence on the target variable remains invariant under different changes of the environment.” (Peters, Buhlmann and Meinshausen 2016, Journal of the Royal Statistical Society)
- Correlation is not causation!
SLIDE 2
SLIDE 3
SLIDE 4 Randomized Experiments
- The gold standard to estimate a causal effect is a
randomized experiment.
- The validity of a randomized experiment depends on:
- 1. Randomization.
- 2. Well constructed control group.
SLIDE 5
SLIDE 6 1) What to randomize on:
1. Randomize eligibility 2. Randomize after acceptance into the program 3. Randomize incentives for take-up
Randomize after acceptance into the program: – R=1 if randomized in (treatment group) – R=0 if randomized out (control group) – D denotes if someone applies to the program and is subject to randomization [here D=1 for all people who are in the randomization] – Random assignment implies:
- For treatment group: E(Y1|X, D=1,R=1) = E(Y1|X, D=1)
- For control group: E(Y0|X, D=1, R=0) = E(Y0|X, D=1)
Experiment gives TTE = E(Y1-Y0|X, D=1)
What to Take into Account when Conducting a Randomized Experiment?
SLIDE 7 2) Power calculations
- Def: power of the design is the probability that, for a given effect
size and statistical significance level, we will be able to reject the hypothesis of zero effect.
- Design choices that affect the “power” of an experiment:
– Sample size – Minimum size of the effect that the researcher wants to be able to detect – Multiple treatment groups – Partial compliance and drop out – Control variables (important to know how much they absorb of the residual variance)
- Standard softwares for the single-site case (“power” command in
Stata)
- Multi-site power analyses get complicated
– Need to know the impact variation and correlations across sites
What to Take into Account when Conducting a Randomized Experiment?
SLIDE 8 3) Choosing the sites in multi-site experiments
- External validity: choose sites at random
- Realistic impacts: choose sites that are representative
- Efficacy: choose sites that will best implement the
treatment
- Avoid contamination: choose sites with little or no
contact of any sort
What to Take into Account when Conducting a Randomized Experiment?
SLIDE 9 Examples of Randomized Experiments
- Large-scale experiments, e.g. in the US/Canada:
– US National JTPA (Job Training Partnership Act) Study, Tennessee class size experiment (STAR)
- More recently, randomized experiments in developing
countries:
– Small experiments addressing very specific questions, for example microfinance experiments by Dean Karlan, education experiments (e.g. schooling inputs) by Michael Kremer and Esther Duflo, etc – Example of a large-scale and very successful conditional cash transfer program: Progresa/Oportunidades in Mexico (1997- 2003)
SLIDE 10 Example: the STAR Experiment (Stock and Watson Ch. 13)
Tennessee Project STAR (Student-Teacher Achievement Ratio): 4-year US study for an overall budget of $12 million. 79 Tennessee public schools for a single cohort of students in kindergarten through third grade in the years 1985-89. Upon entering the school system a student was randomly assigned to one of three groups: Regular class (22 – 25 students). Regular class + full-time teacher’s aide. Small class (13 – 17 students). Regular classes’ students re-randomized after first year to regular
Y = Stanford achievement test scores.
SLIDE 11 “Natural” (or Quasi-) Experiments
A quasi-experiment or natural experiment: “nature” provides random events that can be used as a source of exogenous variation. Treatment (D) is “as if” randomly assigned. Example: Effect of changes in minimum wage on employment. D = change in minimum wage law in some States (it changes
- nly in some States, thus State is “as if” randomly assigned).
The natural random event operates as an instrumental variable: Relevance: it is strongly correlated with the treatment D (so much that it defines the treatment!). Exogeneity: it does not affect the outcome Y rather than via the treatment D.
SLIDE 12
“Natural” (or Quasi-) Experiments
Idea of quasi-experiments follows that of “real” randomized experiments: find exogenous source of variation (i.e. variable that affects participation but not the outcome directly) Important to understand the source of variation that helps to identify the treatment effect
SLIDE 13
“Natural” (or Quasi-) Experiments
Disadvantage: small amount of random events provided by nature… Advantage: when the nature provides random events, they can usefully be exploited. Example: Card D. and Krueger A. (1994) “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania”, American Economic Review, Vol. 84, No. 4, pp. 772-793.
SLIDE 14 Regression Analysis of Experiments for Differences Estimator
- In an ideal randomized controlled experiment the
treatment D is randomly assigned: Y=a+b*D+u (1)
- If D is randomly assigned, then u and D are independently
distributed, E(u|D)=0 (conditional mean independence) dE(Y|D)/dD=b average causal effect of D on Y OLS of (1) gives an unbiased estimate of the causal effect of D on Y.
- When the treatment is binary, the causal effect b is the
difference in mean outcomes in treatment vs control. This difference in means is the differences estimator
SLIDE 15 Regression Analysis of Experiments for Differences Estimator
We can add covariates X to the model: Y=a+b*D+c*X+u (2) Advantages of adding the covariates X:
1. Check if randomization worked: if D is randomly assigned, the OLS estimates of b in model (1) and (2) (that is with and without the covariates X) should be similar – if they aren’t, this suggests that D was not randomly assigned
- NOTE: to check directly for randomization, we can regress the
treatment indicator, D, on the covariates X, and do a F-test. 2. Increases efficiency: smaller standard errors 3. Adjust for conditional randomization (apply conditional randomization if interested in treatment effects for different groups; for example schools’ effects if randomization was within but not across schools).
SLIDE 16 Problems with Randomized Experiments
- Randomization per se does not assure that the treatment
and the control group are perfectly comparable.
– In any given RCT, nothing ensures that other causal factors are balanced across the groups at the point of randomization (Deaton and Cartwright 2017).
- Randomization per se only means that, on average, if
several experiments are repeated, the estimated effect of the treatment is the true effect.
– Unbiasedness says that, if we were to repeat the trial many times, we would be right on average. Yet we are almost never in such a situation, and with only one trial (as is virtually always the case) unbiasedness does nothing to prevent our single estimate from being very far away from the truth (Deaton and Cartwright 2017).
SLIDE 17 Solvable Problems with Randomized Experiments
- 1. Drop-out of treatment: some subjects in the treatment
group may drop out before completing the program.
- 2. Contamination bias: some subjects in the control group
get treatment.
SLIDE 18 Two Solutions to Drop-out of Treatment and Contamination Bias
- 1. Define treatment as “intent-to-treat” or “offer of
treatment”: focus on those who were invited to be treated, whether or not they actually agreed to be treated.
- 2. Treatment assignment can be used as an instrument:
- Wald estimator: IV when the instrument is a binary variable.
SLIDE 19
Wald Estimator to Solve Drop-out of Treatment and Contamination Bias
Start with some notation: – Initial random assignment: R=0/1 – Decision to participate: D=0/1 Drop out of treatment: R=1 and D=0 Contamination bias: R=0 and D=1 p0=P(D=1|R=0), p1=P(D=1|R=1) Observe R, D, p0, p1, Y0 if D=0 and Y1 if D=1 E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0) E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1)
SLIDE 20
Given: E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0) E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1) Because of Randomization: E(Y1|R=1)=E(Y1|R=0)=E(Y1) (same for Y0) Therefore: E(Y|R=1) – E(Y|R=0)= E(Y1)*(p1-p0) – E(Y0)*(p1-p0) ATE= E(Y1) – E(Y0)= [E(Y|R=1)-E(Y|R=0)]/(p1-p0) [Wald estimator]
Wald Estimator to Solve Drop-out of Treatment and Contamination Bias
SLIDE 21 Unsolvable Problems with Randomized Experiments
- Not implementable: e.g. effect of a merger on a firm’s
- utputs – we can not force a firm to merge.
- Costs are too high.
- Ethical considerations: e.g. all poor households should
receive a given income subsidy.
- Estimates would only be available after many years: e.g.
effect of healthy diet on longevity.
SLIDE 22 Threats to Internal Validity of Experiments
A) Threats to internal validity (ability to estimate causal effects within the study population) 1. Failure to randomize (or imperfect randomization). Randomization
changes the nature of the program (e.g. greater recruitment needs may lead to change in acceptance standards); ethical problems and political
- pposition to randomize (e.g. poorest should get the program first).
2. Failure to follow treatment protocol (“partial compliance”):
- Some controls get treatment, some treated dropout of program.
- Differential attrition (e.g. in job training program, controls who find
jobs move out of town).
3. Experimental effects:
- Experimenter bias: treatment is associated with “extra effort”.
- The experiment is perceived differently by the researcher and by the
- subjects. Subject behavior is affected by taking part in an experiment
(Hawthorne effect).
4. Validity of the instruments (in quasi-experiments). Threats to internal validity imply that Cov(D,u)≠0, so the difference estimator is biased.
SLIDE 23 B) Threats to external validity (ability to estimate causal effects that are valid for other populations and settings)
- Nonrepresentative sample.
- Nonrepresentative “treatment” (that is program or policy).
- General equilibrium effects (effect of a program can depend on its
scale), and peer effects.
- Experiments involving human subjects typically need informed
consent: no guarantee that inferences for populations that give consent generalize to populations that do not.
- Which aspects of the treatment are responsible for the effect?
External validity can never be guaranteed, neither in randomized experiments nor in studies using observational data.
Threats to External Validity of Experiments
SLIDE 24 Internal and External Validity
- f Randomized Experiments
- The threats to the internal and external validity of an experiment are
different from the threats to the internal and external validity of an OLS regression using observational data (OVB, sample selection bias, reversed causality, wrong functional form, and measurement error).
- The threats for experiments refer to estimating the causal effects
with an experiment: – Threats to internal validity (ability to estimate causal effects within the study population) – Threats to external validity (ability to estimate causal effects that are valid for other populations and settings)
SLIDE 25 Still Lots of Benefits
- f Experiments
- Combine several experiments to estimate heterogeneous
treatment effects:
– Run several experiments in settings that differ for population, nature of treatments, treatment rate, etc. to assess the credibility
- f generalizing the results to other settings. Ex.: Meager (2016)
considers data from 7 randomized experiments on microfinance programs and find consistency across studies.
- They reveal facts that are sometimes in contrast with
simple mean comparisons – two famous examples:
1. Effectiveness of job training programs on earnings (positive only when assessed with randomized experiments -> why???) 2. Effectiveness of reducing class size on students’ test scores (positive only when assessed with randomized experiments, famous Tennessee Star Program -> why???)
SLIDE 26 Experiments versus Observational Studies
- Internal validity is more problematic in observational studies, external
validity is more problematic in experiments: – Observational studies with large samples are more representative
- f the overall population but run the risk of omitted variables’ bias.
– Experiments with small samples have little external validity due to differences between the sample and the target population.
- Combine experiments and observational studies: ex. search for
treatments with large effects that should be detected even in
- bservational studies, and use experiments to study the effects of
specific treatments.
- To estimate a causal effect it is necessary to have a theory to establish:
– Which variables in addition to the treatment affect Y. – How to control for these variables.
SLIDE 27 Machine Learning and Randomized Experiments
- Ludwig, Mullainathan and Spiess (2019):
- RCTs are costly in terms of both time and dollars: the Negative
Income Tax experiments cost $60 million, Congress set aside $100 million for the Moving to Opportunity experiment, which has taken 25 years.
- Pre-analysis plans have been used to test the size of the control
groups, the variables to control for, subgroups, functional forms. However, arbitrariness of the choices that can undermine the credibility of the results.
- Idea: use ML index that predicts the outcome of interest from the
full vector of all controls to: assess balance treatment and no difference between treatment and control distributions at baseline; whether treatment effects are heterogeneous; and whether all heterogeneity is captured by the included controls.
SLIDE 28 Machine Learning and Control Groups: (Varian 2016, PNAS)
- * Varian, H. (2016) “Causal Inference in Economics and
Marketing”, PNAS, Vol. 113, No. 27, pages 7310-7315.
- Introduction to causal inference in Economics written for readers
familiar with machine learning methods.
- Discussion of how machine learning techniques can be
useful for developing better estimates of the counterfactuals.
SLIDE 29 Machine Learning and Control Groups: (Varian 2016, PNAS)
Two main types of questions:
- 1. Quantify how a given treatment affects the subjects:
– Examples: effect of a drug on health outcomes; effect of class size
- n students’ learning; effect of an ad campaign on consumers’
spending. Classic treatment-control group comparison and machine learning can help by building counterfactuals through predictions.
- 2. Quantify how a given treatment affects the “experimenter”:
– Example: if I increase ad expenditure by x%, how many extra sales will I get? – The answer depends on how consumers respond to the ad, but we do not need to model how they respond.
- Example: we care about an increase in the number of visits to our website rather
than in how this happened (more clicks on a given ad, more search queries, etc.)
Machine learning is essential to build a predictive model for the counterfactual.
SLIDE 30 Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)
- Example of an ad campaign.
– Research question: if I increase ad expenditure by x%, how many extra sales will I get?
- The advertiser increases ad spend for a given period of
time and would like to compare the sales amount after the increase with what would have happened to sales without the increase in ad expenditure.
– NOTE: this differs from “pure prediction problems” where causal inference is not necessary.
- How to compute the counterfactual?
– With a predictive model using data from before treatment.
SLIDE 31 Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)
- For type 1 questions (effect of treatment on subjects):
– Treated and untreated (control) groups. – Comparison of outcomes between treated and control groups.
- For type 2 questions (effect of treatment on experimenter):
– All subjects are treated for a given period. One unit of analysis over time. – 4 step process (TTTC) to build and use a predictive model:
- 1. TRAIN: machine learning tools to tune model’s parameters.
- 2. TEST: apply the model to a test set to check how well it performs.
- 3. TREAT: apply the model during treatment period to predict the
counterfactual.
- 4. COMPARE: compare what actually happened to the treated to the
prediction (given by the model) of what would have happened without the treatment.
SLIDE 32 Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)
- TTTC process is a generalization of the classic treatment-
control approach to experiments.
– Classic approach requires a control group, which provides an estimate of the counterfactual. – TTTC allows constructing a predictive model of the counterfactual even if we do NOT have a true control group. One unit of analysis
- ver time.
- NOTE: TTTC estimates only the TTE (average effect of
treatment on the treated).
SLIDE 33 Different Approaches
1. Run an experiment and use simple differences estimator. 2. Use observational data to construct the counterfactual
- a. Selection on observables:
- Unconfoundedness assumption: we assume to observe all X variables
that affect both participation decision and outcome.
- Differences-in-Differences (DID)
- Matching
- Regression discontinuity
- b. Selection on unobservables
- We assume participation depends on unobserved variables.
- Instrumental variable estimation
- Control function approach
SLIDE 34 Differences Estimator
- Differences estimator is the simple difference in mean
- utcomes (Y) between treatment and control.
- Problem 1: time-constant unobserved differences between
treated and untreated that are correlated with outcomes.
- Ex: effect of job training program on earnings. Those
who participate in the program are more motivated to work, so would earn more even without training program, thus the effect of program is overestimated.
- Solution to problem 1: compare outcome of participants
before and after “treatment” using panel data.
- Problem 2: time-trends (e.g. business cycles).
- Ex: if recession after treatment, underestimation of treatment effect.
SLIDE 35 Differences-in-Differences Estimator
- Differences-in-Differences estimator (DID): differences
- ut time-constant differences between treatment and
control and time-trends by comparing treated and untreated before and after the program.
- Data requirement: to implement DID it is necessary to
have panel data where each unit of analysis (individual, firm, state) is observed for at least two consecutive periods.
SLIDE 36 Differences-in-Differences Estimator
The DID estimator adjusts for pre-experimental differences by subtracting
- ff each subject’s pre-experimental value of Y
before i
Y
= value of Y for subject i before the expt
after i
Y
= value of Y for subject i after the expt Yi =
after i
Y
–
before i
Y
= change over course of expt
The DID estimator differences out: time-constant (level) differences before and after the program by computing Yi and time-trends by comparing treated and untreated before and after the program
1
ˆ diffs in diffs
= ( , treat after
Y
–
, treat before
Y
) – (
, control after
Y
–
, control before
Y
)
SLIDE 37 1
ˆ diffs in diffs
= (
, treat after
Y –
, treat before
Y ) – (
, control after
Y –
, control before
Y )
Differences-in-Differences Estimator
(from Stock and Watson)
SLIDE 38
Differences-in-Differences Estimator
Main assumption of DID:
Counterfactual LEVELS for treated and non-treated can be different, but have the same TIME VARIATION – COMMON TREND Assumption: E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)
SLIDE 39
Differences-in-Differences Estimator
Main assumption of DID:
Counterfactual LEVELS for treated and non-treated can be different, but have the same TIME VARIATION – COMMON TREND Assumption: E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0) In the absence of the treatment, the change in treated outcome would have been the same as change in non-treated outcome i.e. changes in the economy or life-cycle that are unrelated to the treatment affect the two groups in a similar way What is NOT allowed are unobserved time-varying effects that affect the treatment and the control group differently.
SLIDE 40 Differences-in-Differences Estimator and Machine Learning
- We can use machine learning to construct the
counterfactuals.
- DID can be combined with TTTC:
– TTTC builds a predictive model for the outcome in the absence of the treatment: predicts the missing potential outcome under the absence of treatment status. – When estimating DID, we can build a predictive model for the group that did not receive the treatment using the same 4 step process that we discussed for TTTC:
- 1. TRAIN: machine learning tools to tune parameters.
- 2. TEST: apply the model to a test set to check how well it performs.
- 3. TREAT: apply the model to the treated units to predict the counterfactual.
- 4. COMPARE: compare what actually happened to the treated to the prediction given
by the model of what would have happened without the treatment.
SLIDE 41
Differences-in-Differences Estimator: Summing Up
DID differences out time-constant differences between treatment and control and time-trends by comparing treated and untreated before and after the treatment. Validity assumption: COMMON TREND – absent treatment, the change in treated outcome would have been the same as the change in non- treated outcome. What is NOT allowed are unobserved time-varying effects that affect treatment and control differently. DID identifies the TTE; however, if the assignment to the treatment is random, we can also estimate ATE. Of course, as for the differences estimator, we can control for additional covariates with the same advantages that we discussed.
SLIDE 42 Regression Analysis of Experiments for DID Estimator
- Data requirement: to implement DID it is necessary to
have panel data where each unit of analysis (individual, firm, state) is observed for at least two consecutive periods.
- You have already studied panel data in your
Microecometrics course.
SLIDE 43 Brief Review of Panel Data
- Panel data with k regressors
{X1(it),…, Xk(it), Y(it)}, i=1,…, n (number of entities), t=1,.., T (number of time periods)
- Another term for panel data is longitudinal data.
- Balanced panel: no missing observations.
- Unbalanced panel: some entities (unit of analysis) are not
- bserved for some time periods.
SLIDE 44 Why are Panel Data Useful?
Two main cases:
- 1. Control for entity fixed effects: effects that vary across
entities (unit of analysis), but do not vary over time.
- 2. Control for time fixed effects: effects that vary over
time, but do not vary across units of analysis.
SLIDE 45 Why are Panel Data Useful? Entity Fixed Effects
With panel data we can control for factors that:
- Vary across entities (unit of analysis), but do not vary
- ver time.
- Are unobserved or unmeasured – and therefore cannot
be included in the regression.
- Could cause omitted variable bias if they were omitted.
- Example: Can alcohol taxes reduce traffic deaths?
(Chapter 10 in Stock and Watson)
SLIDE 46
SLIDE 47
SLIDE 48 Panel Data and Omitted Variable Bias
- Why more traffic deaths in States that have higher alcohol taxes?
- Other factors that determine traffic fatality rate such as:
– Density of cars on the road – “Culture” around drinking and driving
- These omitted factors could cause omitted variable bias.
- Example: traffic density
1. High traffic density means more traffic deaths 2. States with higher traffic density have higher alcohol taxes Two conditions for omitted variable bias are satisfied. Specifically, “high taxes” could reflect “high traffic density”, so the OLS coefficient would be biased upwards.
- Panel data allow eliminating omitted variable bias when the omitted
variables are constant over time within a given unit of analysis (State in the example here).
SLIDE 49 Consider the panel data model FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it), Where Z(i) is a factor that does not change over time (eg traffic density), at least during the years on which we have data. Suppose Z(i) is not
- bserved, so its omission could result in omitted variable bias.
The effect of Z(i) can be eliminated using T=2 years.
Panel Data and Omitted Variable Bias
SLIDE 50 Consider the panel data model FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it), Where Z(i) is a factor that does not change over time (eg traffic density), at least during the years on which we have data. Suppose Z(i) is not
- bserved, so its omission could result in omitted variable bias.
The effect of Z(i) can be eliminated using T=2 years.
- Key idea: Any change in the fatality rate from 1982 to 1988 cannot
be caused by Z(i), because Z(i) (by assumption) does not change between 1982 and 1988.
- Estimate the difference in fatality rate as a function of the difference
in beer tax using OLS.
Panel Data and Omitted Variable Bias
SLIDE 51
SLIDE 52
- What if you have more than 2 time periods (T>2)?
- For i=1,…,n and t=1,…, T
Y(it)=a+b*X(it)+c*Z(i)+u(it) we can rewrite this in two useful ways: 1. “Fixed Effects” regression model Y(it)=a(i)+b*X(it)+u(it) intercept a(i) is unique for each State, slope b is the same in all States
Panel Data with T>2
SLIDE 53
Panel Data with T>2
SLIDE 54
Y(it)=a+b*X(it)+c*Z(i)+u(it) we can rewrite this in two useful ways: 1. “Fixed Effects” regression model Y(it)=a(i)+b*X(it)+u(it) intercept a(i) is unique for each State, slope b is the same in all States 2. “n-1 binary regressors” regression model Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it) where D2(i)=1{i=2}, i.e. D2(i) equals 1 if the ith observation is from State ith
Panel Data with T>2
SLIDE 55
Three estimation methods: 1. “n-1” binary regressors” OLS regression 2. “Entity-demeaned” OLS regression 3. “Changes” specification (if and only if T=2) These three methods produce identical estimates of the regression coefficients and identical standard errors.
Panel Data with T≥2
SLIDE 56 Three estimation methods: 1. “n-1” binary regressors” OLS regression 2. “Entity-demeaned” OLS regression 3. “Changes” specification (if and only if T=2) These three methods produce identical estimates of the regression coefficients and identical standard errors. Method 1. Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)
- First create the binary variables, D2(i),…, Dn(i)
- Then estimate above equation by OLS
- Inference (hypothesis tests, confidence intervals) is as usual
(using heteroscedasticity-robust standard errors)
- Impractical when n is very large
Panel Data with T≥2
SLIDE 57 Method 2. Y(it)= a(i) +b*X(it)+u(it)
- First construct the demeaned variables
- Then estimate above equation by OLS
- Inference (hypothesis tests, confidence intervals) is as usual
(using heteroscedasticity-robust standard errors)
- This is like the “changes” approach (method 3), but Y(it) is
deviated from the state average instead of Y(i1) Estimation can be done easily in STATA:
- “areg” automatically demeans the data (useful when n large)
- The reported intercept is the average of the n-1 dummy variables
(no clear interpretation)
Panel Data with T≥2
SLIDE 58 Why are Panel Data Useful? Time Fixed Effects
An omitted variable might vary over time but not across units (ex. States):
- Safer cars (air bags, etc); changes in national laws
- These produce intercepts that change over time
- Let these changes (“safer cars”) be denoted by the variable, S(t),
which changes over time but not across States
- The resulting population regression model is:
Y(it)=a+b*X(it)+c*S(t)+u(it) The intercept varies from one year to the next, m(1982)=a+c*S(1982)
SLIDE 59 Why are Panel Data Useful? Time Fixed Effects
An omitted variable might vary over time but not across units (ex. States):
- Safer cars (air bags, etc); changes in national laws
- These produce intercepts that change over time
- Let these changes (“safer cars”) be denoted by the variable, S(t),
which changes over time but not across States
- The resulting population regression model is:
Y(it)=a+b*X(it)+c*S(t)+u(it) The intercept varies from one year to the next, m(1982)=a+c*S(1982) Again, two formulations for time fixed effects: 1. “Binary regressor” formulation: “T-1 binary regressors” OLS regression 2. “Time effects” formulation: “Year demeaned” OLS regression (deviate Y(it) and X(it) from year averages), then estimate by OLS
SLIDE 60 Time and Entity Fixed Effects
Y(it)=a(t)+b*T(it)+m(i)+u(it), where T(it)=1 if in treatment group and after treatment, 0 otherwise
Y(it)=a+ b*D(it)*Z(it) +c*Z(it)+d*D(it)+u(it), where D(it)=1 if in treatment group, 0 otherwise Z(it)=1 if in “after” period, 0 in “before” period D(it)*Z(it)=1 if in treatment group in “after” pd (interaction effect)
b is the DID estimator
SLIDE 61 Time and Entity Fixed Effects: Estimation
Different equivalent ways to allow for both entity and time fixed effects:
- Differences and intercept (T=2 only)
- Entity (or time) demeaning and T-1 time (or N-1 entity)
indicators
- T-1 time indicators and n-1 entity indicators
- Entity and time demeaning
SLIDE 62
- Under the fixed effects regression assumptions, which are
basically extensions of the least squares assumptions, the OLS fixed effects estimator of b is normally distributed.
- BUT there are difficulties associated with computing
standard errors that do not come up with cross-sectional data.
1. Fixed effects regression assumptions. 2. Standard errors for fixed effects regression. 3. Proof of consistency and normality of fixed effects estimator.
Time and Entity Fixed Effects: Estimation
SLIDE 63 Additions to DID
- Possible to use repeated cross-sections instead of panel
data under certain conditions, e.g. stable group composition
- ver time (see Meyer 1995, and Abadie 2005).
- Caveats and extensions:
– Endogenous treatment (Besley/Case 2000) example: DID assumptions exclude the possibility that a State increases the alcohol tax because of high rate of traffic fatalities in the past. – Parallel trends conditional on X: trends can be different in treated and control groups if the distribution of X is different (Abadie 2005: “Semi- parametric DID”) mix of “diffs-in-diffs” and “matching” methods. – Bertrand et al (2004): proposes solution to case when residual autocorrelation over time is not accounted for, thus the variance may be underestimated (heteroscedasticity and autocorrelation-consistent asymptotic variance).
SLIDE 64
Appendix 1: Fixed-Effects Regression Assumptions and Variance-Covariance Matrix
SLIDE 65
Fixed-Effects Regression Assumptions
Under these assumptions, FE is consistent and normally distributed.
SLIDE 66
Fixed-Effects Regression Assumptions
SLIDE 67
Fixed-Effects Regression Assumptions
SLIDE 68
Assumption #2
SLIDE 69
Fixed-Effects Regression Assumptions
SLIDE 70
Variance-Covariance Matrix
SLIDE 71 Variance-Covariance Matrix
- In general, we would like to allow the error terms to be correlated
- ver time for a given entity, and this makes the formula for the
asymptotic variance complicated.
- You can also allow for heteroskedasticity. Then you can compute
the “heteroskedasticity- and autocorrelation-consistent asymptotic variance”.
- You can also compute “clustered standard errors” because there is
a grouping, or “cluster”, within which the error term is possibly correlated, but outside of which (across groups) it is not. For example, you can allow for correlation of the errors for individuals within the same family but not between families.
SLIDE 72
Variance-Covariance Matrix - special case: no correlation across time within entities
SLIDE 73
Variance-Covariance Matrix
SLIDE 74
Variance-Covariance Matrix
SLIDE 75
Case 1: Serial Correlation
Heteroskedasticity and autocorrelation-consistent asymptotic variance
SLIDE 76
Case 2: No Serial Correlation
SLIDE 77
Case 3: No Serial Correlation and No Heteroskedasticity
SLIDE 78
Appendix 2: Proofs of Consistency and Normality of Fixed Effects Estimator
SLIDE 79
Consistency of Fixed Effects Estimator
SLIDE 80
Consistency of Fixed Effects Estimator
SLIDE 81
Consistency of Fixed Effects Estimator
SLIDE 82
Consistency of Fixed Effects Estimator
SLIDE 83
Consistency of Fixed Effects Estimator
SLIDE 84
Normality of Fixed Effects Estimator
SLIDE 85
Normality of Fixed Effects Estimator
SLIDE 86
Normality of Fixed Effects Estimator