least squares estimation large sample properties
play

Least Squares Estimation- Large-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63 Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3


  1. Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Large-Sample 1 / 63

  2. Asymptotics for the LSE 1 Covariance Matrix Estimators 2 Functions of Parameters 3 The t Test 4 p -Value 5 Confidence Interval 6 The Wald Test 7 Confidence Region Problems with Tests of Nonlinear Hypotheses 8 Test Consistency 9 Asymptotic Local Power 10 Ping Yu (HKU) Large-Sample 2 / 63

  3. Introduction � β , σ 2 ( X 0 X ) � 1 � If u j x � N ( 0 , σ 2 ) , we have shown that b β j X � N . In general the distribution of u j x is unknown. Even if it is known, the unconditional distribution of b β is hard to derive since b β = ( X 0 X ) � 1 X 0 y is a complicated function of f x i g n i = 1 . The asymptotic (or large sample) method approximates (unconditional) sampling distributions based on the limiting experiment that the sample size n tends to infinity. It does not require any assumption on the distribution of u j x , and only some moments restrictions are imposed. Three steps: consistency, asymptotic normality and estimation of the covariance matrix. Ping Yu (HKU) Large-Sample 2 / 63

  4. Asymptotics for the LSE Asymptotics for the LSE Ping Yu (HKU) Large-Sample 3 / 63

  5. Asymptotics for the LSE Consistency Express b β as β = ( X 0 X ) � 1 X 0 y = ( X 0 X ) � 1 X 0 ( X β + u ) = β +( X 0 X ) � 1 X 0 u . b (1) To show b β is consistent, we impose the following additional assumptions. Assumption OLS.1 0 : rank ( E [ xx 0 ]) = k . Assumption OLS.2 0 : y = x 0 β + u with E [ x u ] = 0 . h k x k 2 i Assumption OLS.1 0 implicitly assumes that E < ∞ . Assumption OLS.1 0 is the large-sample counterpart of Assumption OLS.1. Assumption OLS.2 0 is weaker than Assumption OLS.2. Ping Yu (HKU) Large-Sample 4 / 63

  6. Asymptotics for the LSE Theorem p Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 and OLS.3, b β ! β . � Proof. p p From (1), to show b ! β , we need only to show that ( X 0 X ) � 1 X 0 u β � � ! 0. Note that ! � 1 ! n n 1 1 ∑ ∑ ( X 0 X ) � 1 X 0 u x i x 0 = x i u i i n n i = 1 i = 1 ! n n 1 i , 1 ∑ ∑ p x i x 0 ! E [ x i x 0 i ] � 1 E [ x i u i ] = 0 . = g x i u i � n n i = 1 i = 1 Here, the convergence in probability is from (I) the WLLN which implies n n 1 i ] and 1 ∑ ∑ p p x i x 0 ! E [ x i x 0 � x i u i � ! E [ x i u i ] ; (2) i n n i = 1 i = 1 � � (II) the fact that g ( A , b ) = A � 1 b is a continuous function at E [ x i x 0 i ] , E [ x i u i ] . The last equality is from Assumption OLS.2 0 . Ping Yu (HKU) Large-Sample 5 / 63

  7. Asymptotics for the LSE Proof. [Proof continue] (I) To apply the WLLN, we require (i) x i x 0 i and x i u i are i.i.d., which is implied by Assumption OLS.0 and that functions of i.i.d. data are also i.i.d.; (ii) h k x k 2 i < ∞ (OLS.1 0 ) and E [ k x u k ] < ∞ . E [ k x u k ] < ∞ is implied by the E Cauchy-Schwarz inequality, a h k x k 2 i 1 / 2 h j u j 2 i 1 / 2 E [ k x u k ] � E E , which is finite by Assumption OLS.1 0 and OLS.3. (II) To guarantee A � 1 b to be a � � i ] � 1 exists which E [ x i x 0 , we must assume that E [ x i x 0 continuous function at i ] , E [ x i u i ] is implied by Assumption OLS.1 0 . b h k X k 2 i 1 / 2 h k Y k 2 i 1 / 2 a Cauchy-Schwarz inequality: For any random m � n matrices X and Y , E [ k X 0 Y k ] � E E , where the inner product is defined as h X , Y i = E [ k X 0 Y k ] , and for a m � n matrix A , � � 1 / 2 = [ trace ( A 0 A )] 1 / 2 . ∑ m i = 1 ∑ n j = 1 a 2 k A k = ij i ] � 1 = E [ x 2 i ] � 1 is the reciprocal of E [ x 2 b If x i 2 R , E [ x i x 0 i ] which is a continuous function of E [ x 2 i ] only if E [ x 2 i ] 6 = 0. Ping Yu (HKU) Large-Sample 6 / 63

  8. Asymptotics for the LSE σ 2 and s 2 Consistency of b Theorem p p σ 2 ! σ 2 and s 2 ! σ 2 . Under the assumptions of Theorem 1, b � � Proof. Note that i b y i � x 0 b u i = β i b u i + x 0 i β � x 0 = β �b � u i � x 0 = β � β . i Thus �b � �b � 0 �b � u 2 i = u 2 b i � 2 u i x 0 x i x 0 β � β + β � β β � β (3) i i and n σ 2 = 1 ∑ u 2 b b i n i = 1 Ping Yu (HKU) Large-Sample 7 / 63

  9. Asymptotics for the LSE Proof. [Proof continue] !�b !�b � �b � 0 � n n n 1 1 1 ∑ ∑ ∑ u 2 u i x 0 x i x 0 = i � 2 β � β + β � β β � β i i n n n i = 1 i = 1 i = 1 p ! σ 2 , � where the last line uses the WLLN, (2), Theorem 1 and the CMT. Finally, since n / ( n � k ) ! 1, it follows that n p s 2 = σ 2 ! σ 2 n � k b � by the CMT. One implication of this theorem is that multiple estimators can be consistent for the population parameter. σ 2 and s 2 are unequal in any given application, they are close in value While b when n is very large. Ping Yu (HKU) Large-Sample 8 / 63

  10. Asymptotics for the LSE Asymptotic Normality To study the asymptotic normality of b β , we impose the following additional assumption. h k x k 4 i Assumption OLS.5 : E [ u 4 ] < ∞ and E < ∞ . Theorem Under Assumptions OLS.0, OLS.1 0 , OLS.2 0 , OLS.3 and OLS.5, �b � p d n β � β ! N ( 0 , V ) , � h i � � where V = Q � 1 Ω Q � 1 with Q = E x i x 0 x i x 0 i u 2 and Ω = E . i i Proof. From (1), ! � 1 ! �b � n n p 1 1 ∑ ∑ x i x 0 n β � β = p n x i u i . i n i = 1 i = 1 Ping Yu (HKU) Large-Sample 9 / 63

  11. Asymptotics for the LSE Proof. [Proof continue] Note first that h� � i h� � 2 i 1 / 2 h i 1 / 2 h k x i k 4 i 1 / 2 h i 1 / 2 � � � i u 2 � x i x 0 u 4 u 4 < ∞ , � x i x 0 E � E E � E E (4) � i i i i where the first inequality is from the Cauchy-Schwarz inequality, the second inequality is from the Schwarz matrix inequality, a and the last inequality is from Assumption OLS.5. So by the CLT, n 1 ∑ d p n x i u i � ! N ( 0 , Ω ) . i = 1 Given that n � 1 ∑ n p i = 1 x i x 0 � ! Q , i �b � p d ! Q � 1 N ( 0 , Ω ) = N ( 0 , V ) n β � β � by Slutsky’s theorem. a Schwarz matrix inequality: For any random m � n matrices X and Y , k X 0 Y k � k X kk Y k . This is a special form of the Cauchy-Schwarz inequality, where the inner product is defined as h X , Y i = k X 0 Y k . In the homoskedastic model, V reduces to V 0 = σ 2 Q � 1 . We call V 0 the homoskedastic covariance matrix . Ping Yu (HKU) Large-Sample 10 / 63

  12. Asymptotics for the LSE Partitioned Formula of V 0 Sometimes, to state the asymptotic distribution of part of b β as in the residual regression, we partition Q and Ω as � Q 11 � � Ω 11 � Q 12 Ω 12 Q = , Ω = . Q 21 Q 22 Ω 21 Ω 22 Recall from the proof of the FWL theorem, ! Q � 1 � Q � 1 11 . 2 Q 12 Q � 1 Q � 1 = 11 . 2 22 , � Q � 1 22 . 1 Q 21 Q � 1 Q � 1 11 22 . 1 where Q 11 . 2 = Q 11 � Q 12 Q � 1 22 Q 21 and Q 22 . 1 = Q 22 � Q 21 Q � 1 11 Q 12 . �b � = σ 2 Q � 1 Thus when the error is homoskedastic, n � AVar β 1 11 . 2 , and �b � β 1 , b = � σ 2 Q � 1 11 . 2 Q 12 Q � 1 n � ACov β 2 22 . We can also derive the general formulas in the heteroskedastic case, but these formulas are not easily interpretable and so less useful. Ping Yu (HKU) Large-Sample 11 / 63

  13. Asymptotics for the LSE LSE as a MoM Estimator The LSE is a MoM estimator, and the moment conditions are E [ x u ] = 0 with u = y � x 0 β . The sample analog is the normal equation n � � = 0 , 1 ∑ y i � x 0 x i i β n i = 1 the solution of which is exactly the LSE. h i � � = � Q , and Ω = E x i x 0 x i x 0 i u 2 M = � E , so i i �b � � 0 , Q � 1 Ω Q � 1 � p d n β � β ! N = N ( 0 , V ) . � Note that the asymptotic variance V takes the sandwich form. The larger the � � x i x 0 E , the smaller the V . i Although the LSE is a MoM estimator, it is a special MoM estimator because it can be treated as a "projection" estimator. Ping Yu (HKU) Large-Sample 12 / 63

  14. Asymptotics for the LSE Intuition Consider a simple linear regression model y i = β x i + u i , where E [ x i ] is normalized to be 0. From introductory econometrics courses, n ∑ x i y i d Cov ( x , y ) b i = 1 β = = , n d Var ( x ) ∑ x 2 i i = 1 and under homoskedasticity, �b � σ 2 AVar β = nVar ( x ) . �b � � � � � � ∂ E [ xu ] β So the larger the Var ( x ) , the smaller the AVar . Actually, Var ( x ) = � . ∂β Ping Yu (HKU) Large-Sample 13 / 63

  15. Asymptotics for the LSE Asymptotics for the Weighted Least Squares (WLS) Estimator The WLS estimator is a special GLS estimator with a diagonal weight matrix. Recall that b β GLS = ( X 0 WX ) � 1 X 0 Wy , which reduces to ! � 1 ! n n ∑ ∑ b w i x i x 0 β WLS = w i x i y i i i = 1 i = 1 when W = diag f w 1 , ��� , w n g . Note that this estimator is a MoM estimator under the moment condition (check!) E [ w i x i u i ] = 0 , so �b � p d n β WLS � β ! N ( 0 , V W ) , � h i � � � 1 E � � � 1 . w i x i x 0 w 2 i x i x 0 i u 2 w i x i x 0 where V W = E E i i i Ping Yu (HKU) Large-Sample 14 / 63

  16. Covariance Matrix Estimators Covariance Matrix Estimators Ping Yu (HKU) Large-Sample 15 / 63

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend