Randomized Experiments The goal of randomized experiments is to - - PowerPoint PPT Presentation

randomized experiments
SMART_READER_LITE
LIVE PREVIEW

Randomized Experiments The goal of randomized experiments is to - - PowerPoint PPT Presentation

Randomized Experiments The goal of randomized experiments is to identify The causal effect! Advantage of causal effect described by statisticians: The advantage of causal predictors compared with non-causal predictors is that


slide-1
SLIDE 1

Randomized Experiments

  • The goal of randomized experiments is to identify…
  • The causal effect!
  • Advantage of causal effect described by statisticians:

“The advantage of causal predictors compared with non-causal predictors is that their influence on the target variable remains invariant under different changes of the environment.” (Peters, Buhlmann and Meinshausen 2016, Journal of the Royal Statistical Society)

  • Correlation is not causation!
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Randomized Experiments

  • The gold standard to estimate a causal effect is a

randomized experiment.

  • The validity of a randomized experiment depends on:
  • 1. Randomization.
  • 2. Well constructed control group.
slide-5
SLIDE 5
slide-6
SLIDE 6

1) What to randomize on:

1. Randomize eligibility 2. Randomize after acceptance into the program 3. Randomize incentives for take-up

Randomize after acceptance into the program: – R=1 if randomized in (treatment group) – R=0 if randomized out (control group) – D denotes if someone applies to the program and is subject to randomization [here D=1 for all people who are in the randomization] – Random assignment implies:

  • For treatment group: E(Y1|X, D=1,R=1) = E(Y1|X, D=1)
  • For control group: E(Y0|X, D=1, R=0) = E(Y0|X, D=1)

 Experiment gives TTE = E(Y1-Y0|X, D=1)

What to Take into Account when Conducting a Randomized Experiment?

slide-7
SLIDE 7

2) Power calculations

  • Def: power of the design is the probability that, for a given effect

size and statistical significance level, we will be able to reject the hypothesis of zero effect.

  • Design choices that affect the “power” of an experiment:

– Sample size – Minimum size of the effect that the researcher wants to be able to detect – Multiple treatment groups – Partial compliance and drop out – Control variables (important to know how much they absorb of the residual variance)

  • Standard softwares for the single-site case (“power” command in

Stata)

  • Multi-site power analyses get complicated

– Need to know the impact variation and correlations across sites

What to Take into Account when Conducting a Randomized Experiment?

slide-8
SLIDE 8

3) Choosing the sites in multi-site experiments

  • External validity: choose sites at random
  • Realistic impacts: choose sites that are representative
  • Efficacy: choose sites that will best implement the

treatment

  • Avoid contamination: choose sites with little or no

contact of any sort

What to Take into Account when Conducting a Randomized Experiment?

slide-9
SLIDE 9

Examples of Randomized Experiments

  • Large-scale experiments, e.g. in the US/Canada:

– US National JTPA (Job Training Partnership Act) Study, Tennessee class size experiment (STAR)

  • More recently, randomized experiments in developing

countries:

– Small experiments addressing very specific questions, for example microfinance experiments by Dean Karlan, education experiments (e.g. schooling inputs) by Michael Kremer and Esther Duflo, etc – Example of a large-scale and very successful conditional cash transfer program: Progresa/Oportunidades in Mexico (1997- 2003)

slide-10
SLIDE 10

Example: the STAR Experiment (Stock and Watson Ch. 13)

Tennessee Project STAR (Student-Teacher Achievement Ratio):  4-year US study for an overall budget of $12 million. 79 Tennessee public schools for a single cohort of students in kindergarten through third grade in the years 1985-89.  Upon entering the school system a student was randomly assigned to one of three groups:  Regular class (22 – 25 students).  Regular class + full-time teacher’s aide.  Small class (13 – 17 students).  Regular classes’ students re-randomized after first year to regular

  • r regular + aide class.

 Y = Stanford achievement test scores.

slide-11
SLIDE 11

“Natural” (or Quasi-) Experiments

A quasi-experiment or natural experiment: “nature” provides random events that can be used as a source of exogenous variation. Treatment (D) is “as if” randomly assigned. Example:  Effect of changes in minimum wage on employment.  D = change in minimum wage law in some States (it changes

  • nly in some States, thus State is “as if” randomly assigned).

The natural random event operates as an instrumental variable:  Relevance: it is strongly correlated with the treatment D (so much that it defines the treatment!).  Exogeneity: it does not affect the outcome Y rather than via the treatment D.

slide-12
SLIDE 12

“Natural” (or Quasi-) Experiments

Idea of quasi-experiments follows that of “real” randomized experiments:  find exogenous source of variation (i.e. variable that affects participation but not the outcome directly) Important to understand the source of variation that helps to identify the treatment effect

slide-13
SLIDE 13

“Natural” (or Quasi-) Experiments

 Disadvantage: small amount of random events provided by nature…  Advantage: when the nature provides random events, they can usefully be exploited. Example: Card D. and Krueger A. (1994) “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania”, American Economic Review, Vol. 84, No. 4, pp. 772-793.

slide-14
SLIDE 14

Regression Analysis of Experiments for Differences Estimator

  • In an ideal randomized controlled experiment the

treatment D is randomly assigned: Y=a+b*D+u (1)

  • If D is randomly assigned, then u and D are independently

distributed, E(u|D)=0 (conditional mean independence) dE(Y|D)/dD=b  average causal effect of D on Y  OLS of (1) gives an unbiased estimate of the causal effect of D on Y.

  • When the treatment is binary, the causal effect b is the

difference in mean outcomes in treatment vs control.  This difference in means is the differences estimator

slide-15
SLIDE 15

Regression Analysis of Experiments for Differences Estimator

We can add covariates X to the model: Y=a+b*D+c*X+u (2) Advantages of adding the covariates X:

1. Check if randomization worked: if D is randomly assigned, the OLS estimates of b in model (1) and (2) (that is with and without the covariates X) should be similar – if they aren’t, this suggests that D was not randomly assigned

  • NOTE: to check directly for randomization, we can regress the

treatment indicator, D, on the covariates X, and do a F-test. 2. Increases efficiency: smaller standard errors 3. Adjust for conditional randomization (apply conditional randomization if interested in treatment effects for different groups; for example schools’ effects if randomization was within but not across schools).

slide-16
SLIDE 16

Problems with Randomized Experiments

  • Randomization per se does not assure that the treatment

and the control group are perfectly comparable.

– In any given RCT, nothing ensures that other causal factors are balanced across the groups at the point of randomization (Deaton and Cartwright 2017).

  • Randomization per se only means that, on average, if

several experiments are repeated, the estimated effect of the treatment is the true effect.

– Unbiasedness says that, if we were to repeat the trial many times, we would be right on average. Yet we are almost never in such a situation, and with only one trial (as is virtually always the case) unbiasedness does nothing to prevent our single estimate from being very far away from the truth (Deaton and Cartwright 2017).

slide-17
SLIDE 17

Solvable Problems with Randomized Experiments

  • 1. Drop-out of treatment: some subjects in the treatment

group may drop out before completing the program.

  • 2. Contamination bias: some subjects in the control group

get treatment.

slide-18
SLIDE 18

Two Solutions to Drop-out of Treatment and Contamination Bias

  • 1. Define treatment as “intent-to-treat” or “offer of

treatment”: focus on those who were invited to be treated, whether or not they actually agreed to be treated.

  • 2. Treatment assignment can be used as an instrument:
  • Wald estimator: IV when the instrument is a binary variable.
slide-19
SLIDE 19

Wald Estimator to Solve Drop-out of Treatment and Contamination Bias

Start with some notation: – Initial random assignment: R=0/1 – Decision to participate: D=0/1  Drop out of treatment: R=1 and D=0  Contamination bias: R=0 and D=1 p0=P(D=1|R=0), p1=P(D=1|R=1)  Observe R, D, p0, p1, Y0 if D=0 and Y1 if D=1  E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0) E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1)

slide-20
SLIDE 20

Given: E(Y|R=0)=E(Y1|R=0)*p0 + E(Y0|R=0)*(1-p0) E(Y|R=1)=E(Y1|R=1)*p1 + E(Y0|R=1)*(1-p1) Because of Randomization: E(Y1|R=1)=E(Y1|R=0)=E(Y1) (same for Y0) Therefore: E(Y|R=1) – E(Y|R=0)= E(Y1)*(p1-p0) – E(Y0)*(p1-p0)  ATE= E(Y1) – E(Y0)= [E(Y|R=1)-E(Y|R=0)]/(p1-p0) [Wald estimator]

Wald Estimator to Solve Drop-out of Treatment and Contamination Bias

slide-21
SLIDE 21

Unsolvable Problems with Randomized Experiments

  • Not implementable: e.g. effect of a merger on a firm’s
  • utputs – we can not force a firm to merge.
  • Costs are too high.
  • Ethical considerations: e.g. all poor households should

receive a given income subsidy.

  • Estimates would only be available after many years: e.g.

effect of healthy diet on longevity.

slide-22
SLIDE 22

Threats to Internal Validity of Experiments

A) Threats to internal validity (ability to estimate causal effects within the study population) 1. Failure to randomize (or imperfect randomization). Randomization

changes the nature of the program (e.g. greater recruitment needs may lead to change in acceptance standards); ethical problems and political

  • pposition to randomize (e.g. poorest should get the program first).

2. Failure to follow treatment protocol (“partial compliance”):

  • Some controls get treatment, some treated dropout of program.
  • Differential attrition (e.g. in job training program, controls who find

jobs move out of town).

3. Experimental effects:

  • Experimenter bias: treatment is associated with “extra effort”.
  • The experiment is perceived differently by the researcher and by the
  • subjects. Subject behavior is affected by taking part in an experiment

(Hawthorne effect).

4. Validity of the instruments (in quasi-experiments).  Threats to internal validity imply that Cov(D,u)≠0, so the difference estimator is biased.

slide-23
SLIDE 23

B) Threats to external validity (ability to estimate causal effects that are valid for other populations and settings)

  • Nonrepresentative sample.
  • Nonrepresentative “treatment” (that is program or policy).
  • General equilibrium effects (effect of a program can depend on its

scale), and peer effects.

  • Experiments involving human subjects typically need informed

consent: no guarantee that inferences for populations that give consent generalize to populations that do not.

  • Which aspects of the treatment are responsible for the effect?

 External validity can never be guaranteed, neither in randomized experiments nor in studies using observational data.

Threats to External Validity of Experiments

slide-24
SLIDE 24

Internal and External Validity

  • f Randomized Experiments
  • The threats to the internal and external validity of an experiment are

different from the threats to the internal and external validity of an OLS regression using observational data (OVB, sample selection bias, reversed causality, wrong functional form, and measurement error).

  • The threats for experiments refer to estimating the causal effects

with an experiment: – Threats to internal validity (ability to estimate causal effects within the study population) – Threats to external validity (ability to estimate causal effects that are valid for other populations and settings)

slide-25
SLIDE 25

Still Lots of Benefits

  • f Experiments
  • Combine several experiments to estimate heterogeneous

treatment effects:

– Run several experiments in settings that differ for population, nature of treatments, treatment rate, etc. to assess the credibility

  • f generalizing the results to other settings. Ex.: Meager (2016)

considers data from 7 randomized experiments on microfinance programs and find consistency across studies.

  • They reveal facts that are sometimes in contrast with

simple mean comparisons – two famous examples:

1. Effectiveness of job training programs on earnings (positive only when assessed with randomized experiments -> why???) 2. Effectiveness of reducing class size on students’ test scores (positive only when assessed with randomized experiments, famous Tennessee Star Program -> why???)

slide-26
SLIDE 26

Experiments versus Observational Studies

  • Internal validity is more problematic in observational studies, external

validity is more problematic in experiments: – Observational studies with large samples are more representative

  • f the overall population but run the risk of omitted variables’ bias.

– Experiments with small samples have little external validity due to differences between the sample and the target population.

  • Combine experiments and observational studies: ex. search for

treatments with large effects that should be detected even in

  • bservational studies, and use experiments to study the effects of

specific treatments.

  • To estimate a causal effect it is necessary to have a theory to establish:

– Which variables in addition to the treatment affect Y. – How to control for these variables.

slide-27
SLIDE 27

Machine Learning and Randomized Experiments

  • Ludwig, Mullainathan and Spiess (2019):
  • RCTs are costly in terms of both time and dollars: the Negative

Income Tax experiments cost $60 million, Congress set aside $100 million for the Moving to Opportunity experiment, which has taken 25 years.

  • Pre-analysis plans have been used to test the size of the control

groups, the variables to control for, subgroups, functional forms. However, arbitrariness of the choices that can undermine the credibility of the results.

  • Idea: use ML index that predicts the outcome of interest from the

full vector of all controls to: assess balance treatment and no difference between treatment and control distributions at baseline; whether treatment effects are heterogeneous; and whether all heterogeneity is captured by the included controls.

slide-28
SLIDE 28

Machine Learning and Control Groups: (Varian 2016, PNAS)

  • * Varian, H. (2016) “Causal Inference in Economics and

Marketing”, PNAS, Vol. 113, No. 27, pages 7310-7315.

  • Introduction to causal inference in Economics written for readers

familiar with machine learning methods.

  • Discussion of how machine learning techniques can be

useful for developing better estimates of the counterfactuals.

slide-29
SLIDE 29

Machine Learning and Control Groups: (Varian 2016, PNAS)

Two main types of questions:

  • 1. Quantify how a given treatment affects the subjects:

– Examples: effect of a drug on health outcomes; effect of class size

  • n students’ learning; effect of an ad campaign on consumers’

spending.  Classic treatment-control group comparison and machine learning can help by building counterfactuals through predictions.

  • 2. Quantify how a given treatment affects the “experimenter”:

– Example: if I increase ad expenditure by x%, how many extra sales will I get? – The answer depends on how consumers respond to the ad, but we do not need to model how they respond.

  • Example: we care about an increase in the number of visits to our website rather

than in how this happened (more clicks on a given ad, more search queries, etc.)

 Machine learning is essential to build a predictive model for the counterfactual.

slide-30
SLIDE 30

Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)

  • Example of an ad campaign.

– Research question: if I increase ad expenditure by x%, how many extra sales will I get?

  • The advertiser increases ad spend for a given period of

time and would like to compare the sales amount after the increase with what would have happened to sales without the increase in ad expenditure.

– NOTE: this differs from “pure prediction problems” where causal inference is not necessary.

  • How to compute the counterfactual?

– With a predictive model using data from before treatment.

slide-31
SLIDE 31

Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)

  • For type 1 questions (effect of treatment on subjects):

– Treated and untreated (control) groups. – Comparison of outcomes between treated and control groups.

  • For type 2 questions (effect of treatment on experimenter):

– All subjects are treated for a given period. One unit of analysis over time. – 4 step process (TTTC) to build and use a predictive model:

  • 1. TRAIN: machine learning tools to tune model’s parameters.
  • 2. TEST: apply the model to a test set to check how well it performs.
  • 3. TREAT: apply the model during treatment period to predict the

counterfactual.

  • 4. COMPARE: compare what actually happened to the treated to the

prediction (given by the model) of what would have happened without the treatment.

slide-32
SLIDE 32

Machine Learning and Control Groups: Example of Type 2 Question (Varian 2016, PNAS)

  • TTTC process is a generalization of the classic treatment-

control approach to experiments.

  • Key difference:

– Classic approach requires a control group, which provides an estimate of the counterfactual. – TTTC allows constructing a predictive model of the counterfactual even if we do NOT have a true control group. One unit of analysis

  • ver time.
  • NOTE: TTTC estimates only the TTE (average effect of

treatment on the treated).

slide-33
SLIDE 33

Different Approaches

  • f Program Evaluation

1. Run an experiment and use simple differences estimator. 2. Use observational data to construct the counterfactual

  • a. Selection on observables:
  • Unconfoundedness assumption: we assume to observe all X variables

that affect both participation decision and outcome.

  • Differences-in-Differences (DID)
  • Matching
  • Regression discontinuity
  • b. Selection on unobservables
  • We assume participation depends on unobserved variables.
  • Instrumental variable estimation
  • Control function approach
slide-34
SLIDE 34

Differences Estimator

  • Differences estimator is the simple difference in mean
  • utcomes (Y) between treatment and control.
  • Problem 1: time-constant unobserved differences between

treated and untreated that are correlated with outcomes.

  • Ex: effect of job training program on earnings. Those

who participate in the program are more motivated to work, so would earn more even without training program, thus the effect of program is overestimated.

  • Solution to problem 1: compare outcome of participants

before and after “treatment” using panel data.

  • Problem 2: time-trends (e.g. business cycles).
  • Ex: if recession after treatment, underestimation of treatment effect.
slide-35
SLIDE 35

Differences-in-Differences Estimator

  • Differences-in-Differences estimator (DID): differences
  • ut time-constant differences between treatment and

control and time-trends by comparing treated and untreated before and after the program.

  • Data requirement: to implement DID it is necessary to

have panel data where each unit of analysis (individual, firm, state) is observed for at least two consecutive periods.

slide-36
SLIDE 36

Differences-in-Differences Estimator

 The DID estimator adjusts for pre-experimental differences by subtracting

  • ff each subject’s pre-experimental value of Y

before i

Y

= value of Y for subject i before the expt 

after i

Y

= value of Y for subject i after the expt  Yi =

after i

Y

before i

Y

= change over course of expt

 The DID estimator differences out:  time-constant (level) differences before and after the program by computing Yi  and time-trends by comparing treated and untreated before and after the program

1

ˆ diffs in diffs 

  = ( , treat after

Y

, treat before

Y

) – (

, control after

Y

, control before

Y

)

slide-37
SLIDE 37

1

ˆ diffs in diffs 

 

= (

, treat after

Y –

, treat before

Y ) – (

, control after

Y –

, control before

Y )

Differences-in-Differences Estimator

(from Stock and Watson)

slide-38
SLIDE 38

Differences-in-Differences Estimator

Main assumption of DID:

Counterfactual LEVELS for treated and non-treated can be different, but have the same TIME VARIATION – COMMON TREND Assumption: E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)

slide-39
SLIDE 39

Differences-in-Differences Estimator

Main assumption of DID:

Counterfactual LEVELS for treated and non-treated can be different, but have the same TIME VARIATION – COMMON TREND Assumption: E(Y0(t1)-Y0(t0)|D=1)= E(Y0(t1)-Y0(t0)|D=0)  In the absence of the treatment, the change in treated outcome would have been the same as change in non-treated outcome i.e. changes in the economy or life-cycle that are unrelated to the treatment affect the two groups in a similar way What is NOT allowed are unobserved time-varying effects that affect the treatment and the control group differently.

slide-40
SLIDE 40

Differences-in-Differences Estimator and Machine Learning

  • We can use machine learning to construct the

counterfactuals.

  • DID can be combined with TTTC:

– TTTC builds a predictive model for the outcome in the absence of the treatment: predicts the missing potential outcome under the absence of treatment status. – When estimating DID, we can build a predictive model for the group that did not receive the treatment using the same 4 step process that we discussed for TTTC:

  • 1. TRAIN: machine learning tools to tune parameters.
  • 2. TEST: apply the model to a test set to check how well it performs.
  • 3. TREAT: apply the model to the treated units to predict the counterfactual.
  • 4. COMPARE: compare what actually happened to the treated to the prediction given

by the model of what would have happened without the treatment.

slide-41
SLIDE 41

Differences-in-Differences Estimator: Summing Up

 DID differences out time-constant differences between treatment and control and time-trends by comparing treated and untreated before and after the treatment.  Validity assumption: COMMON TREND – absent treatment, the change in treated outcome would have been the same as the change in non- treated outcome.  What is NOT allowed are unobserved time-varying effects that affect treatment and control differently.  DID identifies the TTE; however, if the assignment to the treatment is random, we can also estimate ATE.  Of course, as for the differences estimator, we can control for additional covariates with the same advantages that we discussed.

slide-42
SLIDE 42

Regression Analysis of Experiments for DID Estimator

  • Data requirement: to implement DID it is necessary to

have panel data where each unit of analysis (individual, firm, state) is observed for at least two consecutive periods.

  • You have already studied panel data in your

Microecometrics course.

slide-43
SLIDE 43

Brief Review of Panel Data

  • Panel data with k regressors

{X1(it),…, Xk(it), Y(it)}, i=1,…, n (number of entities), t=1,.., T (number of time periods)

  • Another term for panel data is longitudinal data.
  • Balanced panel: no missing observations.
  • Unbalanced panel: some entities (unit of analysis) are not
  • bserved for some time periods.
slide-44
SLIDE 44

Why are Panel Data Useful?

Two main cases:

  • 1. Control for entity fixed effects: effects that vary across

entities (unit of analysis), but do not vary over time.

  • 2. Control for time fixed effects: effects that vary over

time, but do not vary across units of analysis.

slide-45
SLIDE 45

Why are Panel Data Useful? Entity Fixed Effects

With panel data we can control for factors that:

  • Vary across entities (unit of analysis), but do not vary
  • ver time.
  • Are unobserved or unmeasured – and therefore cannot

be included in the regression.

  • Could cause omitted variable bias if they were omitted.
  • Example: Can alcohol taxes reduce traffic deaths?

(Chapter 10 in Stock and Watson)

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48

Panel Data and Omitted Variable Bias

  • Why more traffic deaths in States that have higher alcohol taxes?
  • Other factors that determine traffic fatality rate such as:

– Density of cars on the road – “Culture” around drinking and driving

  • These omitted factors could cause omitted variable bias.
  • Example: traffic density

1. High traffic density means more traffic deaths 2. States with higher traffic density have higher alcohol taxes  Two conditions for omitted variable bias are satisfied. Specifically, “high taxes” could reflect “high traffic density”, so the OLS coefficient would be biased upwards.

  • Panel data allow eliminating omitted variable bias when the omitted

variables are constant over time within a given unit of analysis (State in the example here).

slide-49
SLIDE 49

Consider the panel data model FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it), Where Z(i) is a factor that does not change over time (eg traffic density), at least during the years on which we have data. Suppose Z(i) is not

  • bserved, so its omission could result in omitted variable bias.

 The effect of Z(i) can be eliminated using T=2 years.

Panel Data and Omitted Variable Bias

slide-50
SLIDE 50

Consider the panel data model FatalityRate(it)=a+b*BeerTax(it)+c*Z(i)+u(it), Where Z(i) is a factor that does not change over time (eg traffic density), at least during the years on which we have data. Suppose Z(i) is not

  • bserved, so its omission could result in omitted variable bias.

 The effect of Z(i) can be eliminated using T=2 years.

  • Key idea: Any change in the fatality rate from 1982 to 1988 cannot

be caused by Z(i), because Z(i) (by assumption) does not change between 1982 and 1988.

  • Estimate the difference in fatality rate as a function of the difference

in beer tax using OLS.

Panel Data and Omitted Variable Bias

slide-51
SLIDE 51
slide-52
SLIDE 52
  • What if you have more than 2 time periods (T>2)?
  • For i=1,…,n and t=1,…, T

Y(it)=a+b*X(it)+c*Z(i)+u(it) we can rewrite this in two useful ways: 1. “Fixed Effects” regression model Y(it)=a(i)+b*X(it)+u(it)  intercept a(i) is unique for each State, slope b is the same in all States

Panel Data with T>2

slide-53
SLIDE 53

Panel Data with T>2

slide-54
SLIDE 54
  • For i=1,…,n and t=1,…, T

Y(it)=a+b*X(it)+c*Z(i)+u(it) we can rewrite this in two useful ways: 1. “Fixed Effects” regression model Y(it)=a(i)+b*X(it)+u(it)  intercept a(i) is unique for each State, slope b is the same in all States 2. “n-1 binary regressors” regression model Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it) where D2(i)=1{i=2}, i.e. D2(i) equals 1 if the ith observation is from State ith

Panel Data with T>2

slide-55
SLIDE 55

Three estimation methods: 1. “n-1” binary regressors” OLS regression 2. “Entity-demeaned” OLS regression 3. “Changes” specification (if and only if T=2)  These three methods produce identical estimates of the regression coefficients and identical standard errors.

Panel Data with T≥2

slide-56
SLIDE 56

Three estimation methods: 1. “n-1” binary regressors” OLS regression 2. “Entity-demeaned” OLS regression 3. “Changes” specification (if and only if T=2)  These three methods produce identical estimates of the regression coefficients and identical standard errors. Method 1. Y(it)=a+b*X(it)+c2*D2(i)+c3*D3(i)+…+cn*Dn(i)+u(it)

  • First create the binary variables, D2(i),…, Dn(i)
  • Then estimate above equation by OLS
  • Inference (hypothesis tests, confidence intervals) is as usual

(using heteroscedasticity-robust standard errors)

  • Impractical when n is very large

Panel Data with T≥2

slide-57
SLIDE 57

Method 2. Y(it)= a(i) +b*X(it)+u(it)

  • First construct the demeaned variables
  • Then estimate above equation by OLS
  • Inference (hypothesis tests, confidence intervals) is as usual

(using heteroscedasticity-robust standard errors)

  • This is like the “changes” approach (method 3), but Y(it) is

deviated from the state average instead of Y(i1) Estimation can be done easily in STATA:

  • “areg” automatically demeans the data (useful when n large)
  • The reported intercept is the average of the n-1 dummy variables

(no clear interpretation)

Panel Data with T≥2

slide-58
SLIDE 58

Why are Panel Data Useful? Time Fixed Effects

An omitted variable might vary over time but not across units (ex. States):

  • Safer cars (air bags, etc); changes in national laws
  • These produce intercepts that change over time
  • Let these changes (“safer cars”) be denoted by the variable, S(t),

which changes over time but not across States

  • The resulting population regression model is:

Y(it)=a+b*X(it)+c*S(t)+u(it)  The intercept varies from one year to the next, m(1982)=a+c*S(1982)

slide-59
SLIDE 59

Why are Panel Data Useful? Time Fixed Effects

An omitted variable might vary over time but not across units (ex. States):

  • Safer cars (air bags, etc); changes in national laws
  • These produce intercepts that change over time
  • Let these changes (“safer cars”) be denoted by the variable, S(t),

which changes over time but not across States

  • The resulting population regression model is:

Y(it)=a+b*X(it)+c*S(t)+u(it)  The intercept varies from one year to the next, m(1982)=a+c*S(1982) Again, two formulations for time fixed effects: 1. “Binary regressor” formulation: “T-1 binary regressors” OLS regression 2. “Time effects” formulation: “Year demeaned” OLS regression (deviate Y(it) and X(it) from year averages), then estimate by OLS

slide-60
SLIDE 60

Time and Entity Fixed Effects

  • r “Back to DID”

Y(it)=a(t)+b*T(it)+m(i)+u(it), where T(it)=1 if in treatment group and after treatment, 0 otherwise

  • r

Y(it)=a+ b*D(it)*Z(it) +c*Z(it)+d*D(it)+u(it), where D(it)=1 if in treatment group, 0 otherwise Z(it)=1 if in “after” period, 0 in “before” period D(it)*Z(it)=1 if in treatment group in “after” pd (interaction effect)

b is the DID estimator

slide-61
SLIDE 61

Time and Entity Fixed Effects: Estimation

Different equivalent ways to allow for both entity and time fixed effects:

  • Differences and intercept (T=2 only)
  • Entity (or time) demeaning and T-1 time (or N-1 entity)

indicators

  • T-1 time indicators and n-1 entity indicators
  • Entity and time demeaning
slide-62
SLIDE 62
  • Under the fixed effects regression assumptions, which are

basically extensions of the least squares assumptions, the OLS fixed effects estimator of b is normally distributed.

  • BUT there are difficulties associated with computing

standard errors that do not come up with cross-sectional data.

  • In Appendix 1 and 2:

1. Fixed effects regression assumptions. 2. Standard errors for fixed effects regression. 3. Proof of consistency and normality of fixed effects estimator.

Time and Entity Fixed Effects: Estimation

slide-63
SLIDE 63

Additions to DID

  • Possible to use repeated cross-sections instead of panel

data under certain conditions, e.g. stable group composition

  • ver time (see Meyer 1995, and Abadie 2005).
  • Caveats and extensions:

– Endogenous treatment (Besley/Case 2000)  example: DID assumptions exclude the possibility that a State increases the alcohol tax because of high rate of traffic fatalities in the past. – Parallel trends conditional on X: trends can be different in treated and control groups if the distribution of X is different (Abadie 2005: “Semi- parametric DID”)  mix of “diffs-in-diffs” and “matching” methods. – Bertrand et al (2004): proposes solution to case when residual autocorrelation over time is not accounted for, thus the variance may be underestimated (heteroscedasticity and autocorrelation-consistent asymptotic variance).

slide-64
SLIDE 64

Appendix 1: Fixed-Effects Regression Assumptions and Variance-Covariance Matrix

slide-65
SLIDE 65

Fixed-Effects Regression Assumptions

Under these assumptions, FE is consistent and normally distributed.

slide-66
SLIDE 66

Fixed-Effects Regression Assumptions

slide-67
SLIDE 67

Fixed-Effects Regression Assumptions

slide-68
SLIDE 68

Assumption #2

slide-69
SLIDE 69

Fixed-Effects Regression Assumptions

slide-70
SLIDE 70

Variance-Covariance Matrix

slide-71
SLIDE 71

Variance-Covariance Matrix

  • In general, we would like to allow the error terms to be correlated
  • ver time for a given entity, and this makes the formula for the

asymptotic variance complicated.

  • You can also allow for heteroskedasticity. Then you can compute

the “heteroskedasticity- and autocorrelation-consistent asymptotic variance”.

  • You can also compute “clustered standard errors” because there is

a grouping, or “cluster”, within which the error term is possibly correlated, but outside of which (across groups) it is not. For example, you can allow for correlation of the errors for individuals within the same family but not between families.

slide-72
SLIDE 72

Variance-Covariance Matrix - special case: no correlation across time within entities

slide-73
SLIDE 73

Variance-Covariance Matrix

slide-74
SLIDE 74

Variance-Covariance Matrix

slide-75
SLIDE 75

Case 1: Serial Correlation

 Heteroskedasticity and autocorrelation-consistent asymptotic variance

slide-76
SLIDE 76

Case 2: No Serial Correlation

slide-77
SLIDE 77

Case 3: No Serial Correlation and No Heteroskedasticity

slide-78
SLIDE 78

Appendix 2: Proofs of Consistency and Normality of Fixed Effects Estimator

slide-79
SLIDE 79

Consistency of Fixed Effects Estimator

slide-80
SLIDE 80

Consistency of Fixed Effects Estimator

slide-81
SLIDE 81

Consistency of Fixed Effects Estimator

slide-82
SLIDE 82

Consistency of Fixed Effects Estimator

slide-83
SLIDE 83

Consistency of Fixed Effects Estimator

slide-84
SLIDE 84

Normality of Fixed Effects Estimator

slide-85
SLIDE 85

Normality of Fixed Effects Estimator

slide-86
SLIDE 86

Normality of Fixed Effects Estimator