SLIDE 1
TESTING AND CORRECTING FOR ENDOGENEITY IN NONLINEAR UNOBSERVED EFFECTS MODELS IAAE Lecture 21st International Panel Data Conference Central European University Budapest, June 30 2015 Jeff Wooldridge Michigan State University
1
SLIDE 2
- 1. Introduction
- 2. Linear Model
- 3. Exponential Model
- 4. Probit Response Function
- 5. Empirical Example
- 6. Extensions and Future Directions
2
SLIDE 3
∙ In unobserved effects models we can think of two kinds of
endogeneity for an explanatory variable:
- 1. Correlation with unobserved effect(s) (time constant)
- 2. Correlation with innovations (time varying)
3
SLIDE 4
∙ Application to linear model: Levitt (1996, QJE), effects of
prison size on violent crime.
∙ Nonlinear models: Can combine the correlated random
effects (CRE) and control function (CF) approaches for certain nonlinear models. 4
SLIDE 5 ∙ Papke and Wooldridge (2008, J of E) provides one
- approach. Simple but not ideal for testing purposes: It
conflates the two kinds of endogeneity.
∙ Application to effects of spending on school/district test
pass rates.
∙ Here focus on continuous endogenous explanatory
variables (EEVs), but some suggestions for discrete EEVs. 5
SLIDE 6 ∙ Other Approaches:
- 1. “Fixed Effects” (Heterogeneity as parameters to
estimate): Incidental parameters problem with small T. Bias adjustments available for parameters and average partial effects, but usually stationarity and weak dependence (even independence) are assumed. Difficult to incorporate time effects. Endogenous explanatory variables? 6
SLIDE 7
Only works in special cases. Relies on conditional independence across time. Partial effects in nonlinear models often unidentified. Exentensions to EEVs?
- 3. Finite Number of Types (Bonhomme and Manresa)
Conditional Independence Nonlinear Models? Extension to EEVs? 7
SLIDE 8
Ideal FE CMLE CRE Restricts Dci|xi? No No No Yes Incidental Parameters with Small T? No Yes No No Restricts Time Series Dependence/Heterogeneity? No Yes1 Yes2 No Restricts Amount of Heterogeneity? No No3 Yes No APEs Identified? Yes Yes4 No Yes Unbalanced Panels? Yes Yes Yes Yes5 Can Estimate Dci? Yes Yes4 No Yes6 Endogenous Explanatory Variables? Yes Yes4 No7 Yes
8
SLIDE 9
- 1. The large T approximations assume weak dependence and often stationarity.
- 2. Usually conditional independence, unless estimator is inherently fully robust (linear, Poisson).
- 3. Need at least one more time period than sources of heterogeneity.
- 4. Subject to the incidental parameters problem.
- 5. Subject to exchangeability restrictions.
- 6. Under conditional independence or some other restriction.
- 7. Unless one makes parametric assumptions on the reduced form and imposes conditional
independence.
9
SLIDE 10
∙ Consider a “structural” equation
yit1 xit11 ci1 uit1 where xit1 zit1,yit2
∙ The outside instruments are zit2. ∙ zit1 can include time effects, but supress.
10
SLIDE 11
∙ Both zit and yit2 may be correlated with ci1. ∙ Assume zit is strictly exogenous with respect to uit1:
Covzit,uir1 0, all t,r 1,...,T
∙ yit2 may be correlated with uit1 (across all time
periods).
∙ Given a rank condition, 1 can be estimated by fixed
effects IV (FEIV). 11
SLIDE 12 ∙ How can we test the null hypothesis that yit2 is
exogenous with respect to uit1?
∙ Hausman test comparing FE and FEIV.
Cumbersome due to deficient rank. Original Hausman test not robust to serial correlation
- r heteroskedasticity in uit1.
12
SLIDE 13 ∙ Variable Addition Test (Control Function):
- 1. Estimate the reduced form of yit2,
yit2 zit2 ci2 uit2, by fixed effects, and obtain the FE residuals, üit2 ÿit2 − z ̈ it ̂ 2 ÿit2 yit2 − T−1 ∑
r1 T
yir2 13
SLIDE 14
yit1 xit11 üit21 ci1 errorit1 by usual FE and compute a robust test of H0 : 1 0.
∙ In step (2),
̂
1 is the FEIV estimator.
∙ Note that the nature of yit2 is unrestricted (discrete,
continuous, both features). 14
SLIDE 15
∙ We can also use a correlated random effects approach, but
some but some care is needed to get a proper test.
∙ The Mundlak equation for yit2 is
yit2 2 zit2 z ̄ i2 ai2 uit2 z ̄ i T−1 ∑
t1 T
zit
∙ We are operating as if
Covzit,uis2 0, all t,s Covzit,ai2 0, all t 15
SLIDE 16
∙ Key: How should we apply the Mundlak device to ci1 in
yit1 xit11 ci1 uit1?
∙ Projecting ci1 only onto z
̄ i is fine for estimation.
∙ For testing, it does not distinguish between
Covyit2,ci1 ≠ 0 and Covyit2,uis1 ≠ 0. 16
SLIDE 17
∙ Better is to project ci1 onto z
̄ i,v ̄ i2 where yit2 2 zit2 z ̄ i2 vit2 ci1 1 z ̄ i1 v ̄ i21 ai1 Covzi,ai1 0 Covyi2,ai1 0 17
SLIDE 18 ∙ Plugging in gives the estimating equation
yit1 xit11 1 z ̄ i1 v ̄ i21 ai1 uit1 xit11 1 z ̄ i1 y ̄ i21 ai1 uit1
∙ By the Mundlak device, ai1 is uncorrelated with all RHS
∙ By assumption, zi is uncorrelated with uit1. ∙ Now test whether yit2, equivalently vit2, is uncorrelated
with uit1. 18
SLIDE 19
- 1. Run a pooled OLS regression (or use random effects),
yit2 2 zit2 z ̄ i2 vit2, and obtain the residuals, v ̂ it2.
yit1 xit11 1 z ̄ i1 y ̄ i21 v ̂ it21 errorit1 by POLS or RE.
- 3. Use a robust Wald test of H0 : 1 0.
19
SLIDE 20
∙ Algebra:
(i) ̂
1 in step (2) is the FEIV estimator.
(ii) ̂ 1 is identical to that from estimating yit1 xit11 üit21 ci1 errorit1 by FE.
∙ Result (i) still holds if y
̄ i2 is dropped from the estimating equation, but (ii) does not. 20
SLIDE 21
∙ Conclusion: In using the CRE/CF approach for testing
H0 : Covyit2,uis1 0, use the equations yit2 ̂ 2 zit ̂ 2 z ̄ i ̂ 2 v ̂ it2 yit1 xit11 1 z ̄ i1 y ̄ i21 v ̂ it21 errorit1
∙ Also works in the unbalanced case when the complete
cases are used (Joshi and Wooldridge, 2015). 21
SLIDE 22
∙ What about using Chamberlain in place of Mundlak? ∙ Reusing notation, with
zi zi1,...,ziT, yi2 yi12,...,yiT2, yit2 ̂ 2 zit ̂ 2 zi ̂ 2 v ̂ it2 yit1 xit11 1 zi1 yi21 v ̂ it21 errorit1
∙ Estimates of 1 and 1 are are identical to Mundlak, as is
robust Wald test. (POLS or RE). 22
SLIDE 23 ∙ Conclusion: Include time averages of zit and yit2 to
- btain a clean test of endogeneity of yit2 with respect to
uit1.
∙ If POLS or RE are used, Chamberlain Mundlak. ∙ All goes through with time effects. ∙ Time constant variables can be included in the CRE/CF
approach. 23
SLIDE 24 ∙ Ignoring the pre-testing problem, a strategy for testing is:
- 1. If the VAT rejects, use FEIV.
Or, then test REIV against FEIV, as instruments may be “super” exogenous.
- 2. If the VAT fails to reject, use FE or compare RE and FE.
24
SLIDE 25
∙ Fully robust test for exogeneity:
- 1. Estimate the reduced form for yit2 by fixed effects and
- btain the FE residuals,
üit2 ÿit2 − z ̈ it ̂ 2
- 2. Use FE Poisson on the mean function
“Eyit1|zit1,yit2, üit2,ci1 ci1 expxit11 üit21” and use a robust Wald test of H0 : 1 0. 25
SLIDE 26
∙ The null hypothesis is
H0 : Eyit1|zi1,yi2,üit2,ci1 Eyit1|zit1,yit2,ci1 ci1 expxit11
∙ Algebra: The Poisson FE estimates of 1,1 are
unchanged if we estimate the Mundlak reduced form: yit2 2 zit2 z ̄ i2 vit2 and use residuals, v ̂ it2 (v ̂ it2 ≠ üit2 but v ̈ it2 üit2). 26
SLIDE 27
∙ For testing, no restrictions are put on the RF of yit2. ∙ When would the Mundlak/FE Poisson approach be
consistent for 1?
∙ Assume that
Eyit1|zi,yi2,ci1,uit1 Eyit1|zit1,yit2,ci1,uit1 ci1 expxit11 uit1 where xit1 can be any function of zit1,yit2. 27
SLIDE 28
∙ The reduced form is
yit2 2 zit2 z ̄ i2 ai2 uit2 vit2 ai2 uit2
∙ Sufficient is
uit1 uit21 eit1 vit21 − ai21 eit1 eit1 zi,ci1,ci2,ui2 28
SLIDE 29
∙ This appears to impose a restriction of only
contemporaneous correlation between uit1 and uit2. How important is it?
∙ In the linear case it makes no difference.
29
SLIDE 30
∙ Is consistently estimating 1 enough? ∙ The average structural function (Blundell and Powell,
2003) is identified: ASFxt1 Eci1,uit1ci1 expxt11 uit1 Eci1,uit1ci1 expuit1expxt11
∙ But the average partial effects are not generally identified. ∙ For a continuous xt1j,
APEtj 1jExit1,ci1,uit1ci1 expxit11 uit1 30
SLIDE 31 ∙ A CRE/CF approach identifies both. What could it look
like?
∙ At least two choices. Let
Eyit1|zit1,yit2,ci1,uit1 ci1 expxit11 uit1 vit1 ci1 expuit1 yit2 2 zit2 z ̄ i2 vit2
- 1. Use Mundlak (or Chamberlain) on
Dvit1|zi,vit2 Dvit1|z ̄ i,vit2 (Papke and Wooldridge, 2008). 31
SLIDE 32
∙ Uses weaker exogeneity requirements (but not in linear
model).
∙ Cannot use GLS-type methods; pooled methods or GMM
from moment conditions.
∙ Does not lead to cleanest test of endogeneity.
32
SLIDE 33
- 2. Use Mundlak (or Chamberlain) on
Dvit1|zi,vi2 Dvit1|z ̄ i,v ̄ i2,vit2 Dvit1|z ̄ i,y ̄ i2,vit2
∙ Strict exogeneity is assumed, so GLS-type procedures can
be used.
∙ Separates endogeneity of yit2 with respect to ci1 and
uit1.
∙ Because of the linear index structure, we can use the
Mundlak residuals or FE residuals for the RF of yit2. 33
SLIDE 34
∙ In the simplest case, the estimating equation is
Eyit1|zi,yi2 exp1 xit11 z ̄ i1 y ̄ i21 vit21 and the Mundlak or FE residuals are plugged in for vit2.
∙ One can apply GLS (“generalized estimating equations)
methods, or GMM.
∙ Unlike in the linear case, no equivalance between CRE and
FE Poisson estimates.
∙ Wooldridge (1991)/Windmeijer (2000) moments approach
is an alternative. 34
SLIDE 35
- 4. Probit Response Function
∙ Now consider a probit conditional mean for yit1 ∈ 0,1:
Eyit1|zi,yi2,rit1 Eyit1|zit1,yit2,rit1 xit11 rit1
∙ Thinking of
rit1 ci1 uit1
∙ For continuous EEVs,
yit2 2 zit2 z ̄ i2 vit2 35
SLIDE 36
∙ Assume
vit2 is independent of zi.
∙ Key assumption:
Drit1|zi,vi2 Drit1|z ̄ i,v ̄ i2,vit2 Drit1|z ̄ i,y ̄ i2,vit2
∙ Leading case: homoskedastic normal with linear mean:
rit1|z ̄ i,y ̄ i2,vit2 Normal1 z ̄ i1 y ̄ i21 vit21,1 (Variance normalization has no effect on ASF or APEs.) 36
SLIDE 37 ∙ Then
Eyit1|zi,yi2 xit11 1 z ̄ i1 y ̄ i21 vit21 and two-step procedures are immediate:
̂ it2 by pooled OLS.
̂ it2 in place of vit2, use pooled (fractional) probit.
∙ Can use a quasi-GLS procedure (GEE) because of strict
exogeneity.
∙ Can test H0 : 1 0 using a robust Wald test.
37
SLIDE 38
∙ ASF is identified from
ASFxt1 Ez
̄ i,y ̄ i2,vit2xt11 1 z
̄ i1 y ̄ i21 vit21 ASFxt1 N−1 ∑
t1 T
xt1 ̂
1
̂ 1 z ̄ i ̂
1 y
̄ i2 ̂ 1 v ̂ it2 ̂ 1
∙ APEs:
1jExit1,z
̄ i,y ̄ i2,vit2xit11 1 z
̄ i1 y ̄ i21 vit21
∙ Stata margins command gets the correct estimates, but
does not adjust inference for two-step estimation. 38
SLIDE 39
∙ Lots of useful embellishments. For example,
Varrit1|z ̄ i,y ̄ i2,vit2 exp2z ̄ i1 y ̄ i21 vit21
∙ Run a heteroskedastic “probit” with mean function
depending on 1,xit1,z ̄ i,y ̄ i2,v ̂ it2 and variance function depending on z ̄ i,y ̄ i2,v ̂ it2. 39
SLIDE 40
∙ The ASF is consistently estimated as
ASFxt1 N−1 ∑
t1 T
xt1 ̂
1
̂ 1 z ̄ i ̂
1 y
̄ i2 ̂ 1 v ̂ it2 ̂ 1 exp z ̄ i ̂ 1 y ̄ i2 ̂ 1 v ̂ it2 ̂ 1
∙ Stata 14 does the estimation with fracreg (pooled
estimation).
∙ Stata margins (should) give the correct estimates for the
APEs because xit1 appears only in the “mean” function. 40
SLIDE 41 ∙ In the spirit of Blundell and Powell (2004, REStud), one
can directly use flexible functional forms (squares, interactions, higher order terms) in xit1,z ̄ i,y ̄ i2,v ̂ it2 and compute partial effects with respect to xit1. Average out the other variables.
∙ As in Altonji and Matzkin (2005, Econometrica), functions
- ther than time averages can be used. Can use
nonexchangeable functions, too. 41
SLIDE 42
∙ Papke and Wooldridge (2008), Michigan School Reform. ∙ math4, a pass rate, is a fractional response. ∙ “Foundation Allowance” as an IV for average district
spending.
∙ Other controls: enrollment, poverty rate, year effects. ∙ Uses a kinked relationship. IV is strong. ∙ N 501, T 7
42
SLIDE 43
Model: Linear Linear FProbit FProbit Estimation: FE FEIV PQMLE PQMLE Coef Coef Coef APE Coef APE lavgrexp .377 .420 .821 .277 .797 .269 (.071) (.115) (.334) (.112) (.338) (.114) v ̂ 2 — −.060 .076 −.666 — (.146) (.145) (.396) lavgrexp? — — Yes No 43
SLIDE 44
- 6. Extensions and Future Directions
∙ Extension to discrete EEVs. Lose identification without
strong assumptions. Can combine CRE and use one-step pooled quasi-MLE (“bivariate probit” is an example, as in Wooldridge (2014, J
Use generalized residuals as the CFs, such as grit2 yit2wit ̂ 2 − 1 − yit2−wit ̂ 2 when yit2 follows a reduced form probit. 44
SLIDE 45
∙ Unbalanced panels. Condition on time averages and
number of time periods and use complete cases (in reduced forms and CREs).
∙ Dynamic models with heterogeneity and EEVs (Giles and
Murtazashvili, 2013, JEM).
∙ Resiliency to model misspecification? (CRE functions,
control functions, and semiparametrics.) 45