Panel Data Analysis Part II — Feasible Estimators James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

� � �� � � � � � 1 Feasible GLS estimation How to do feasible GLS? To do feasible GLS estimation we follow the following two-step process: (1) Perform OLS regression, fit ˆ � and form residuals ˆ (2) Estimate � 2 � and � 2 � using the estimated residuals. Step 2 of the above procedure is done as follows: X X � �� � � �� ˆ � = ˆ � �� � ˜ Take ˆ � � · = � � � + � � =1 � =1 1

� � � � � � � � � � � � � � � � Then we have (using the assumptions that � [ � �� � � 0 � 0 ] = 0 and � [ � �� � � ] = 0) X ( ˆ � � ) 2 � =1 = �� 2 � + � 2 ( � ) � � 1 ˆ � � is unbiased for � � but not consistent - why? For � fixed, we cannot generate consistent estimates. 2

� � � �� � � � � � X ˆ Impose restriction � � = 0 . � =1 " � # X X � � · ) 2 (ˆ � �� � ˆ � =1 � =1 ( �� � � � 1 � ( � � 1)] � 2 = ( �� ) [ � ( � � 1) � ( � � 1)] � 2 = X � � · = 1 where ˆ � =1 3

� � � we use residuals from the regression to estimate � 2 I X X � � · ) 2 (ˆ � �� � ˆ � =1 � =1 � 2 ˆ � = I ( � � 1) � ( � � 1) 4

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � Then from equations ( � ) and ( �� ) we have: X ( ˆ � � ) 2 , � =1 � 2 � � 2 ˆ � = � � 1 � 2 (but this may go negative: if so, set ˆ � at zero). Then, simply use X ( ˆ � � ) 2 � =1 I � 1 � 2 to estimate ˆ � � which is always positive, consistent. 5

• Assuming normality for the error term, we get that the FGLS estimator is unbiased, i.e.: � (Feasible GLS) = � (ˆ � GLS ) � • Further, by a theorem of Taylor - only 17% reduction in e � ciency by estimating P rather than knowing it. 6

� � Mundlak Problem : Example of a use of panel data to ferret out and identify a cross sectional relationship. Mundlak posed the problem that � �� is correlated with � � ( e.g. . � � is managerial ability; � �� is inputs) � ( � � ( � �� )) 6 = 0 , and we have a specification error bias: h i £ ¤ ˆ ( � 0 � ) � 1 � 0 � = � + � 6 = 0 since � ( � 0 � ) 6 = 0 . 7

� � � � � � � � OLS is inconsistent and biased. One way to eliminate the problem: Use the within estimator: ¸ ¸ ¸ I - �� 0 I - �� 0 I - �� 0 � � = � � � + � �� � � � · = [ � � � �� � · ] 0 � + � � · On the transformed data, we get an estimator that is unbiased and consistent. Estimator of fixed e � ect not consistent (we acquire an inciden- tal parameters problem but we can eliminate it as � � � � � fixed, we get a consistent estimator.) 8

Notice, however, if there exists a variable that stays constant over the spell for all persons, we cannot estimate the associated � . ˆ � � = � � + � � � · � � �� and � � are variables and associated coe � cients that where � � stay fixed over spells (we can regress estimated fixed e � ects on the � provided that they stay constant over the spell are not corrected with the � � ). 9

• In a cross section context, we have that without some other information, the model is not identified unless we can invoke IV estimation. • F.E. estimator is a conditional version of R.E. estima- tor // R.E. estimator: � � + � �� both random values, we condition on values of � � . 10

� � � How To Test For the Presence of Bias? � 0 : No Bias in the OLS estimator � � : OLS and between estimator are biased, Within estimator is unbiased. ˆ � � vs. ˆ I X � [ I - 1 ˆ � � = � + ( � �� ) � 1 � 0 � �� 0 ] � � � =1 " � # X 1 ˆ � B = � + (B �� ) � 1 � 0 � �� 0 � � . � =1 11

� � Under � 0 ³ ´ ˆ � � � ˆ COV = 0 Independently distributed under a normality assumption. � we can test (just pool the standard errors). 12

Strict Exogeneity Test Basic idea � ( � � | � � ) 6 = 0 . where � � � ( � � 1 � � � � � � �� ) Failure of this is failure of strict exogeneity in the time series literature. Regression Function (Scalar Case) � � ( � � | � � ) = � 0 + � � 1 � 1 + � � 2 � 2 + � � 3 � 3 + � � � where � � denotes linear projection. Then, in Mundlak’s prob- lem, we get that � = 1 � � � � � � � �� = � 0 + � �� � 1 + [ � 0 + � � 1 � 1 + � � 2 � 2 + � � � ] + � �� � 13

Then, we can test to see whether or not future and past values of � �� enter the equation [if so, we get a violation of strict exogeneity in this set up]. Notice we can estimate � 2 (from first equation), � 1 (from sec- ond equation) and so forth � can estimate � 1 [but, we cannot separate out the intercepts in this equation. Nor can we identify variables that don’t vary over time.] This is just a control function in the sense of Heck- man and Robb (1985, 1986). 14

Chamberlain’s Strict Exogeneity Test � �� = � �� � + � �� , � = 1 � � � � � I, � = 1 � � � � � � � �� is strictly exogenous if � ( � �� | � � ) = 0 � model can be fitted by OLS. We can test, in time series � �� = � �� � + + � �� � � �� + � � an extraneous variable � = 1 � � � � � �� � = 1 � � � � � � We have strict exogeneity in the process if � = 0 (assumption: � �� is correlated over time: � ��� + � ) a future value of a variable is in the equation (that doesn’t belong) � we can do an exact test. 15

� � � � �� � � � � �� Consider special error structure: (one factor setup) � �� = � � + � �� � � �� i.i.d. X � � ( � � | � 1 � � � 2 � � � 3 � � � � � � � �� ) = � =1 Then if we relax the strict exogeneity assumption, we have that X � � ( � �� | � �� � [ � � 1 � � � � � � �� ]) = � �� � + � =1 � ( � �� | � � ) = � � � 16

Array the � �� into a supervector � = DIAG { �� �� � � � � � } + � � � � in all � regressions, we have that � � stays fixed � we can test this assumption. When applying this test in particular economic situations, we must interpret the results with caution. For e.g., in the ap- plication of this test to the situation in the permanent income hypothesis, the significance of the coe � cients of future values can not be ruled out under the model. 17

� � � � Example: Chamberlain test with T = 3 periods Simple regression setting with � �� = � � + � ��� � �� i.i.d., � �� � we have: � � = � 1 � 1 + � 2 � 2 + � 3 � 3 + � Then � 1 = � 1 � 1 + � 1 � 1 + � 2 � 2 + � 3 � 3 + � + � 1 � 2 = � 2 � 2 + � 1 � 1 + � 2 � 2 + � 3 � 3 + � + � 2 � 3 = � 3 � 3 + � 1 � 1 + � 2 � 2 + � 3 � 3 + � + � 3 18

For a factor structure, � �� = � � � � + � �� � �� i.i.d. � � � � . Then: � 1 = � 1 � 1 + � 1 ( � 1 � 1 + � 2 � 2 + � 3 � 3 ) + � 1 � + � 1 � 2 = � 2 � 2 + � 2 ( � 1 � 1 + � 2 � 2 + � 3 � 3 ) + � 2 � + � 2 � 3 = � 3 � 3 + � 3 ( � 1 � 1 + � 2 � 2 + � 3 � 3 ) + � 3 � + � 3 19

� � � � Can Identify ( � 1 + � 1 � 1 ) � 1 � 2 � 1 � 3 � 2 + � 2 � 2 � 2 � 1 � 2 � 3 � 3 � 2 · · · · · · · · · ( � 3 + � 3 � 3 ) � 3 � 1 Normalize: set � 1 � 1 then we can identify, � 2 , � 3 , � 1 � � 2 and � 3 . 20

� 2 Maximum likelihood panel data es- timators Consider these models from a more general viewpoint, we can form di � erent maximum likelihood estimators of the parame- ters of interest. Assume � �� = � � + � �� � Write � � = ( � � 1 � � � � � � �� � � � 1 � � � � � � �� ) � = 1 � � � � � I 21

� � � � � is an i.i.d. random vector with distribution depending on ˜ = ( �� � 1 � � � � � � � � � � � � � I ) = ( �� � ) (treat � � as a parameter) ¯ Y ¯ £ = � ( � ˜ ) ¯ � ˜ � � =1 � ( � � | �� � 1 � � � � � � � ) . � ˆ Max £ w.r.t. � = � �� . 22

� � �� � � �� � �� � �� ³ ´ ˆ 9 � � as � � � . � fixed In general, ˆ 9 � as I � � because of this. Not like in lin- ear models (in general, roots of these equations interconnected and we have problems). A joint system of equations ��� $ ��� $ = 0 = 0 � � = 1 � � � � � �� 23

This set of likelihood equations can be solved using three dis- tinct concepts: 1. Marginal Likelihood; 2. Conditional Likelihood; and 3. Integrated Likelihood. 24

Recommend

More recommend