 
              Notes Slide Set 5 CLRM: sample properties of OLS Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II” Version: Saturday 28 th December, 2019 (h16:05) P. Coretto • MEF CLRM: sample properties of OLS 1 / 16 Notes By finite-sample we mean n < + ∞ , that is finite sample size Note: the terminology “ finite sample ” means a sample size n that is not “ too large ”. When n is “ large enough ”, the asymptotic regime takes over, but this will be the object of the second part of the course. We will investigate properties of both b and s 2 A1–A4 are assumptions from previous slide sets P. Coretto • MEF CLRM: sample properties of OLS 2 / 16
Finite sample properties of b Notes Proposition (unbiasedness of b ) Assume A1, A2 and A3. Then E[ b | X ] = β , that is b is an unbiased estimator of β conditional on X . Proof. First note that by the linearity of expectations E[ b | X ] = β whenever E[ b − β | X ] = 0 Recall the decomposition of the estimation error � − 1 X ′ ε b − β = � X ′ X then �� X ′ X � − 1 X ′ ε | X � E[ b − β | X ] = E pull out what’s known and obtain � − 1 X ′ E [ ε | X ] = 0 E[ b − β | X ] = � X ′ X � P. Coretto • MEF CLRM: sample properties of OLS 3 / 16 Notes Note that this is a conditional unbiasedness statement, and it is stronger than the more traditional unconditional statement. This is stronger because E[ b | X ] = β = ⇒ E[E[ b | X ]] = E[ b ] = β P. Coretto • MEF CLRM: sample properties of OLS 4 / 16
Proposition (variance-covariance of b ) Notes Assume A1, A2 and A3 and A4. Then the followings hold: (a) Var[ b | X ] = σ 2 ( X ′ X ) − 1 (b) (Gauss-Markov theorem) Let b 0 be any estimator of β that is linear in the y and unbiased. Then Var[ b 0 | X ] � Var[ b | X ] (c) Cov[ b , e | X ] = 0 Recall that � means bigger or equal for certain matrices. Since of (b) the OLS estimator is also called BLUE = Best Linear Unbiased Estimator . Best here is in terms of mean square error (MSE). Since b is unbiased, its efficiency is equal to its variance. P. Coretto • MEF CLRM: sample properties of OLS 5 / 16 Notes Proof of part (a). Since β is not a random variable, then its variance is zero, then Var[ b | X ] = Var[ b − β | X ] � − 1 X ′ ε | X ] � X ′ X = Var[ � − 1 X ′ � X ′ X = Var[ Aε | X ] , for A = = A Var[ ε | X ] A ′ � − 1 X ′ Var[ ε | X ] X � − 1 � X ′ X � X ′ X = ( ∗ ) � − 1 X ′ ( σ 2 ) I n X � − 1 � X ′ X � X ′ X = � − 1 ( X ′ X ) � − 1 = σ 2 � X ′ X � X ′ X � − 1 = σ 2 � X ′ X (*) this is because for a matrix B , we have that ( B ′ ) − 1 = ( B − 1 ) ′ � P. Coretto • MEF CLRM: sample properties of OLS 6 / 16
Proof of part (b). First note that OLS is linear map of y . In fact, define Notes A = ( X ′ X ) − 1 X ′ , then b = Ay . Define an alternative linear estimator b 0 assuming that it exists b 0 = Cy = ( D + A ) y with D = C − A b 0 =( D + A ) y = Dy + Ay = D ( Xβ + ε ) + b = DXβ + Dε + b By assumption b 0 it is unbiased conditional on X , therefore E[ b 0 | X ] = β . P. Coretto • MEF CLRM: sample properties of OLS 7 / 16 Therefore Notes E[ b 0 | X ] = E[ DXβ + Dε + b | X ] = DXβ + E[ Dε | X ] + E[ b | X ] = DXβ + D E[ ε | X ] + β = DXβ + β The fact that b 0 is unbiased implies DX = 0 , hence b 0 = Dε + b (5.1) Now substract β from both side of (5.1) and obtain the estimation error b 0 b 0 − β = Dε + ( b − β ) = ( D + A ) ε P. Coretto • MEF CLRM: sample properties of OLS 8 / 16
Notes In order to show that the OLS is BLUE we need to show that for any b 0 (linear in y and unbiased) it holds true that Var[ b 0 | X ] is bigger then Var[ b | X ] We need to show that Var[ b 0 | X ] � Var[ b | X ] , that is Var[ b 0 | X ] − Var[ b | X ] is PSD . Same trick as before: the variance of constant is zero, so we start from Var[ b 0 | X ] = Var[( b 0 − β ) | X ] . P. Coretto • MEF CLRM: sample properties of OLS 9 / 16 Notes Var[ b 0 | X ] = Var[( b 0 − β ) | X ] = Var[( D + A ) ε | X ] =( D + A ) Var[ ε | X ] ( D + A ) ′ = σ 2 ( D + A )( D ′ + A ′ ) = σ 2 ( DD ′ + AD ′ + DA ′ + AA ′ ) Now � − 1 = 0 DA ′ = DX � X ′ X � − 1 = AA ′ = � − 1 X ′ X � − 1 � X ′ X � X ′ X � X ′ X Var[ b 0 | X ] = σ 2 [ DD ′ + � − 1 ] = σ 2 DD ′ + σ 2 � X ′ X � − 1 � X ′ X P. Coretto • MEF CLRM: sample properties of OLS 10 / 16
Notes Any variance covariance matrix is PSD, and the sum of PSD matrices is PSD, then DD ′ is PSD. Finally Var[ b 0 | X ] − Var[ b | X ] = σ 2 DD ′ which is PSD. This completes the proof � P. Coretto • MEF CLRM: sample properties of OLS 11 / 16 Proof of part (c). First note that e = Mε , in fact recall that MX = 0 Notes and e = My = M ( Xβ + ε ) = MXβ + Mε = Mε Now work out the formula of covariance between vectors: Cov[ b , ε | X ] = E[( b − E[ b | X ])( e − E[ e | X ]) ′ | X ] � − 1 X ′ ε ( Mε ) ′ | X ] � X ′ X = E[ � − 1 X ′ εε ′ M | X ] � X ′ X = E[ � − 1 X ′ E[ εε ′ | X ] M � X ′ X = (pull out what’s known) � − 1 X ′ M = σ 2 � X ′ X Since X ′ M = ( MX ) ′ = 0 , then Cov[ b , ε | X ] = 0 � P. Coretto • MEF CLRM: sample properties of OLS 12 / 16
Finite sample properties of s 2 Notes In order to do inference on β , we need the variance of b , but it depends on σ 2 Since ε i is recovered based on e i , the natural analog estimator of σ 2 would be n σ 2 = 1 i = e ′ e � e 2 ˆ n n i =1 However, this is biased because residuals are a two stages estimates of ε i . In fact, we need to pre-estimate β to get e . An unbiased estimator is obtained by correcting the biased one with the degrees of freedom as usual, that is n 1 e ′ e s 2 = � e 2 i = n − K n − K i =1 P. Coretto • MEF CLRM: sample properties of OLS 13 / 16 Notes Proposition (unbiased estimation of σ 2 ) Assume A1, A2, A3, A4, then s 2 is an unbiased estimator of σ 2 . Proof. E[ e ′ e | X ] = E[( Mε ) ′ Mε | X ] = E[ ε ′ M ′ Mε | X ] = E[ ε ′ Mε | X ] n n � � = m ij E[ ε i ε j | X ] (verify this!) i =1 j =1 n � = σ 2 m ii i =1 = σ 2 tr( M ) P. Coretto • MEF CLRM: sample properties of OLS 14 / 16
tr( M ) = tr( I n − P ) = n − tr( P ) Notes and � � − 1 X ′ � � � − 1 � � X ′ X X ′ X � X ′ X tr( P ) = tr X = tr = tr( I k ) = K Therefore E[ e ′ e | X ] = ( n − K ) σ 2 . And this proves that the analog σ 2 is biased, in fact estimator ˆ � e ′ e = n − K � � σ 2 | X � � σ 2 � E ˆ = E � X � n n This also shows that s 2 is unbiased, in fact � e ′ e � = n − K � E[ s 2 | X ] = E n − K σ 2 = σ 2 � � X � n − K � P. Coretto • MEF CLRM: sample properties of OLS 15 / 16 Estimated standard errors of b Notes We know that � − 1 Var[ b | X ] = σ 2 � X ′ X which depends on population quantities. We can estimate the variance matrix of b based on the plug-in principle: � � − 1 Var ( b ) = s 2 � X ′ X Let Se( b k ) be the square root of k th element on the diagonal of � Var ( b ) , then Se( b k ) is the standard error of the b k parameter estimate, that is � ( X ′ X ) − 1 Se( b k ) = s kk P. Coretto • MEF CLRM: sample properties of OLS 16 / 16
Recommend
More recommend