least squares estimation finite sample properties
play

Least Squares Estimation-Finite-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions Terminology and Assumptions 1 Goodness of Fit 2 Bias and


  1. Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29

  2. Terminology and Assumptions Terminology and Assumptions 1 Goodness of Fit 2 Bias and Variance 3 The Gauss-Markov Theorem 4 Multicollinearity 5 Hypothesis Testing: An Introduction 6 7 LSE as a MLE Ping Yu (HKU) Finite-Sample 2 / 29

  3. Terminology and Assumptions Terminology and Assumptions Ping Yu (HKU) Finite-Sample 2 / 29

  4. Terminology and Assumptions Terminology x 0 β + u , y = E [ u j x ] = 0 . u is called the error term , disturbance or unobservable . y x Dependent variable Independent Variable Explained variable Explanatory variable Response variable Control (Stimulus) variable Predicted variable Predictor variable Regressand Regressor LHS variable RHS variable Endogenous variable Exogenous variable � Covariate � Conditioning variable Table 1: Terminology for Linear Regression Ping Yu (HKU) Finite-Sample 3 / 29

  5. Terminology and Assumptions Assumptions We maintain the following assumptions in this chapter. Assumption OLS.0 (random sampling): ( y i , x i ) , i = 1 , ��� , n , are independent and identically distributed (i.i.d.). Assumption OLS.1 (full rank): rank ( X ) = k . Assumption OLS.2 (first moment): E [ y j x ] = x 0 β . Assumption OLS.3 (second moment): E [ u 2 ] < ∞ . Assumption OLS.3 0 (homoskedasticity): E [ u 2 j x ] = σ 2 . Ping Yu (HKU) Finite-Sample 4 / 29

  6. Terminology and Assumptions Discussion Assumption OLS.2 is equivalent to y = x 0 β + u (linear in parameters) plus E [ u j x ] = 0 (zero conditional mean). To study the finite-sample properties of the LSE, such as the unbiasedness, we always assume Assumption OLS.2, i.e., the model is linear regression. 1 Assumption OLS.3 0 is stronger than Assumption OLS.3. The linear regression model under Assumption OLS.3 0 is called the homoskedastic linear regression model , x 0 β + u , y = E [ u j x ] 0 , = E [ u 2 j x ] σ 2 . = If E [ u 2 j x ] = σ 2 ( x ) depends on x we say u is heteroskedastic . 1 For large-sample properties such as consistency, we require only weaker assumptions. Ping Yu (HKU) Finite-Sample 5 / 29

  7. Goodness of Fit Goodness of Fit Ping Yu (HKU) Finite-Sample 6 / 29

  8. Goodness of Fit Residual and SER Express y i = b y i + b u i , (1) i b where b y i = x 0 β is the predicted value , and b u i = y i � b y i is the residual . 2 Often, the error variance σ 2 = E [ u 2 ] is also a parameter of interest. It measures the variation in the "unexplained" part of the regression. Its method of moments (MoM) estimator is the sample average of the squared residuals, n σ 2 = 1 i = 1 ∑ u 2 b n b u 0 b b u . n i = 1 An alternative estimator uses the formula n 1 1 s 2 = ∑ u 2 u 0 b b n � k b i = u . n � k i = 1 This estimator adjusts the degree of freedom (df) of b u . 2 b u i is different from u i . The later is unobservable while the former is a by-product of OLS estimation. Ping Yu (HKU) Finite-Sample 7 / 29

  9. Goodness of Fit Coefficient of Determination u = ∑ n If X includes a column of ones, 1 0 b i = 1 b u i = 0, so y = b y . Subtracting y from both sides of (1), we have u i � e e y i � y i � y = b y i � y + b y i + b b u i � � 0 b Since e u = b b u = b y 0 b u � y � 1 0 b X 0 b � y � 1 0 b y β u u = 0, � � � � � � � � � � 2 2 � � 0 b � � � 2 = e � 2 = � 2 � SSE + SSR , �e + 2 e �e �e y 0 e �b �b b b b SST � y y = y y u + u y + u (2) � � where SST, SSE and SSR mean the total sum of squares , the explained sum of squares , and the residual sum of squares (or the sum of squared residuals), respectively. Dividing SST on both sides of (2), 3 we have 1 = SSE SST + SSR SST . The R -squared of the regression, sometimes called the coefficient of determination , is defined as σ 2 SST = 1 � b R 2 = SSE SST = 1 � SSR . σ 2 b y 3 When can we conduct this operation, i.e., SST 6 = 0? Ping Yu (HKU) Finite-Sample 8 / 29

  10. Goodness of Fit More on R 2 R 2 is defined only if x includes a constant. It is usually interpreted as the fraction of the sample variation in y that is explained by (nonconstant) x . When there is no constant term in x i , we need to define so-called uncentered R 2 , denoted as R 2 u , y 0 b u = b y R 2 y 0 y . R 2 can also be treated as an estimator of ρ 2 = 1 � σ 2 / σ 2 y . It is often useful in algebraic manipulation of some statistics. An alternative estimator of ρ 2 proposed by Henri Theil (1924-2000) called adjusted R -squared or "R-bar-squared" is R 2 = 1 � s 2 = 1 � ( 1 � R 2 ) n � 1 n � k � R 2 , σ 2 e y σ 2 where e y = e y 0 e y / ( n � 1 ) . R 2 adjusts the degrees of freedom in the numerator and denominator of R 2 . Ping Yu (HKU) Finite-Sample 9 / 29

  11. Goodness of Fit Degree of Freedom Why called "degree of freedom"? Roughly speaking, the degree of freedom is the dimension of the space where a vector can stay, or how "freely" a vector can move. For example, b u , as a n -dimensional vector, can only stay in a subspace with dimension n � k . Why? This is because X 0 b u = 0, so k constraints are imposed on b u , and b u cannot move completely freely and loses k degree of freedom. Similarly, the degree of freedom of e y is n � 1. Figure 1 illustrates why the degree of freedom of e y is n � 1 when n = 2. Table 2 summarizes the degrees of freedom for the three terms in (2). Variation Notation df 0 e e b b SSE y y k � 1 b u 0 b SSR u n � k y 0 e e SST y n � 1 Table 2: Degrees of Freedom for Three Variations Ping Yu (HKU) Finite-Sample 10 / 29

  12. Goodness of Fit 0 0 Figure: Although dim ( e y ) = 2, df ( e y ) = 1, where e y = ( e y 1 , e y 2 ) Ping Yu (HKU) Finite-Sample 11 / 29

  13. Bias and Variance Bias and Variance Ping Yu (HKU) Finite-Sample 12 / 29

  14. Bias and Variance Unbiasedness of the LSE Assumption OLS.2 implies that y = x 0 β + u , E [ u j x ] = 0 . Then 0 1 0 1 . . . . . . B C B C B C B C E [ u j X ] = E [ u i j X ] A = E [ u i j x i ] A = 0 , @ @ . . . . . . where the second equality is from the assumption of independent sampling (Assumption OLS.0). Now, � � � 1 X 0 y = � � � 1 X 0 ( X β + u ) = β + � � � 1 X 0 u , b X 0 X X 0 X X 0 X β = so hb i h� i � � 1 X 0 u j X � � � 1 X 0 E [ u j X ] = 0 , X 0 X X 0 X E β � β j X = E = i.e., b β is unbiased. Ping Yu (HKU) Finite-Sample 13 / 29

  15. Bias and Variance Variance of the LSE �b � �� � � � 1 X 0 u j X X 0 X Var β j X = Var � � � 1 X 0 Var ( u j X ) X � � � 1 X 0 X X 0 X = � � � 1 X 0 DX � � � 1 . X 0 X X 0 X � Note that h i h i � E [ u i j x i ] 2 = E u 2 u 2 � σ 2 Var ( u i j X ) = Var ( u i j x i ) = E i j x i i j x i i , and � � � � Cov ( u i , u j j X ) = E u i u j j X � E [ u i j X ] E u j j X � � � � = E u i u j j x i , x j � E [ u i j x i ] E u j j x j � � � � = 0 , = E [ u i j x i ] E u j j x j � E [ u i j x i ] E u j j x j so D is a diagonal matrix: � � σ 2 1 , ��� , σ 2 D = diag . n Ping Yu (HKU) Finite-Sample 14 / 29

  16. Bias and Variance continue... It is useful to note that n ∑ X 0 DX = x i x 0 i σ 2 i . i = 1 i = σ 2 and D = σ 2 I n , so X 0 DX = σ 2 X 0 X , and In the homoskedastic case, σ 2 �b � = σ 2 � � � 1 . X 0 X Var β j X You are asked to show that �b � n ∑ w ij σ 2 Var β j j X = i / SSR j , j = 1 , ��� , k , i = 1 where w ij > 0, ∑ n i = 1 w ij = 1, and SSR j is the SSR in the regression of x j on all other regressors. So under homoskedasticity, �b � h i = σ 2 / SSR j = σ 2 / SST j ( 1 � R 2 Var β j j X j ) , j = 1 , ��� , k , (why?), where SST j is the SST of x j , and R 2 j is the R -squared from the simple regression of x j on the remaining regressors (which includes an intercept). Ping Yu (HKU) Finite-Sample 15 / 29

  17. Bias and Variance σ 2 Bias of b Recall that b u = Mu , where we abbreviate M X as M , so by the properties of projection matrices and the trace operator, we have � � = 1 � Muu 0 � σ 2 = 1 u = 1 n u 0 MMu = 1 n u 0 Mu = 1 u 0 b u 0 Mu b n b n tr n tr . Then h σ 2 � i � � �� = 1 � � �� = 1 = 1 � Muu 0 j X uu 0 j X b E � X n tr E n tr M E n tr ( MD ) . In the homoskedastic case, D = σ 2 I n , so � n � k � σ 2 � h i � M σ 2 � = 1 � = σ 2 b E � X n tr . n σ 2 underestimates σ 2 . Thus b Alternatively, s 2 = n � k b 1 u 0 b u is unbiased for σ 2 . This is the justification for the σ 2 in empirical practice. common preference of s 2 over b However, this estimator is only unbiased in the special case of the homoskedastic linear regression model. It is not unbiased in the absence of homoskedasticity or in the projection model. Ping Yu (HKU) Finite-Sample 16 / 29

  18. The Gauss-Markov Theorem The Gauss-Markov Theorem Ping Yu (HKU) Finite-Sample 17 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend