Least Squares Estimation-Finite-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29

Terminology and Assumptions Terminology and Assumptions 1 Goodness of Fit 2 Bias and Variance 3 The Gauss-Markov Theorem 4 Multicollinearity 5 Hypothesis Testing: An Introduction 6 7 LSE as a MLE Ping Yu (HKU) Finite-Sample 2 / 29

Terminology and Assumptions Terminology and Assumptions Ping Yu (HKU) Finite-Sample 2 / 29

Terminology and Assumptions Terminology x 0 β + u , y = E [ u j x ] = 0 . u is called the error term , disturbance or unobservable . y x Dependent variable Independent Variable Explained variable Explanatory variable Response variable Control (Stimulus) variable Predicted variable Predictor variable Regressand Regressor LHS variable RHS variable Endogenous variable Exogenous variable � Covariate � Conditioning variable Table 1: Terminology for Linear Regression Ping Yu (HKU) Finite-Sample 3 / 29

Terminology and Assumptions Assumptions We maintain the following assumptions in this chapter. Assumption OLS.0 (random sampling): ( y i , x i ) , i = 1 , �� , n , are independent and identically distributed (i.i.d.). Assumption OLS.1 (full rank): rank ( X ) = k . Assumption OLS.2 (first moment): E [ y j x ] = x 0 β . Assumption OLS.3 (second moment): E [ u 2 ] < ∞ . Assumption OLS.3 0 (homoskedasticity): E [ u 2 j x ] = σ 2 . Ping Yu (HKU) Finite-Sample 4 / 29

Terminology and Assumptions Discussion Assumption OLS.2 is equivalent to y = x 0 β + u (linear in parameters) plus E [ u j x ] = 0 (zero conditional mean). To study the finite-sample properties of the LSE, such as the unbiasedness, we always assume Assumption OLS.2, i.e., the model is linear regression. 1 Assumption OLS.3 0 is stronger than Assumption OLS.3. The linear regression model under Assumption OLS.3 0 is called the homoskedastic linear regression model , x 0 β + u , y = E [ u j x ] 0 , = E [ u 2 j x ] σ 2 . = If E [ u 2 j x ] = σ 2 ( x ) depends on x we say u is heteroskedastic . 1 For large-sample properties such as consistency, we require only weaker assumptions. Ping Yu (HKU) Finite-Sample 5 / 29

Goodness of Fit Goodness of Fit Ping Yu (HKU) Finite-Sample 6 / 29

Goodness of Fit Residual and SER Express y i = b y i + b u i , (1) i b where b y i = x 0 β is the predicted value , and b u i = y i � b y i is the residual . 2 Often, the error variance σ 2 = E [ u 2 ] is also a parameter of interest. It measures the variation in the "unexplained" part of the regression. Its method of moments (MoM) estimator is the sample average of the squared residuals, n σ 2 = 1 i = 1 ∑ u 2 b n b u 0 b b u . n i = 1 An alternative estimator uses the formula n 1 1 s 2 = ∑ u 2 u 0 b b n � k b i = u . n � k i = 1 This estimator adjusts the degree of freedom (df) of b u . 2 b u i is different from u i . The later is unobservable while the former is a by-product of OLS estimation. Ping Yu (HKU) Finite-Sample 7 / 29

Goodness of Fit Coefficient of Determination u = ∑ n If X includes a column of ones, 1 0 b i = 1 b u i = 0, so y = b y . Subtracting y from both sides of (1), we have u i � e e y i � y i � y = b y i � y + b y i + b b u i � � 0 b Since e u = b b u = b y 0 b u � y � 1 0 b X 0 b � y � 1 0 b y β u u = 0, � � � � � � � � � � 2 2 � � 0 b � � � 2 = e � 2 = � 2 � SSE + SSR , �e + 2 e �e �e y 0 e �b �b b b b SST � y y = y y u + u y + u (2) � � where SST, SSE and SSR mean the total sum of squares , the explained sum of squares , and the residual sum of squares (or the sum of squared residuals), respectively. Dividing SST on both sides of (2), 3 we have 1 = SSE SST + SSR SST . The R -squared of the regression, sometimes called the coefficient of determination , is defined as σ 2 SST = 1 � b R 2 = SSE SST = 1 � SSR . σ 2 b y 3 When can we conduct this operation, i.e., SST 6 = 0? Ping Yu (HKU) Finite-Sample 8 / 29

Goodness of Fit More on R 2 R 2 is defined only if x includes a constant. It is usually interpreted as the fraction of the sample variation in y that is explained by (nonconstant) x . When there is no constant term in x i , we need to define so-called uncentered R 2 , denoted as R 2 u , y 0 b u = b y R 2 y 0 y . R 2 can also be treated as an estimator of ρ 2 = 1 � σ 2 / σ 2 y . It is often useful in algebraic manipulation of some statistics. An alternative estimator of ρ 2 proposed by Henri Theil (1924-2000) called adjusted R -squared or "R-bar-squared" is R 2 = 1 � s 2 = 1 � ( 1 � R 2 ) n � 1 n � k � R 2 , σ 2 e y σ 2 where e y = e y 0 e y / ( n � 1 ) . R 2 adjusts the degrees of freedom in the numerator and denominator of R 2 . Ping Yu (HKU) Finite-Sample 9 / 29

Goodness of Fit Degree of Freedom Why called "degree of freedom"? Roughly speaking, the degree of freedom is the dimension of the space where a vector can stay, or how "freely" a vector can move. For example, b u , as a n -dimensional vector, can only stay in a subspace with dimension n � k . Why? This is because X 0 b u = 0, so k constraints are imposed on b u , and b u cannot move completely freely and loses k degree of freedom. Similarly, the degree of freedom of e y is n � 1. Figure 1 illustrates why the degree of freedom of e y is n � 1 when n = 2. Table 2 summarizes the degrees of freedom for the three terms in (2). Variation Notation df 0 e e b b SSE y y k � 1 b u 0 b SSR u n � k y 0 e e SST y n � 1 Table 2: Degrees of Freedom for Three Variations Ping Yu (HKU) Finite-Sample 10 / 29

Goodness of Fit 0 0 Figure: Although dim ( e y ) = 2, df ( e y ) = 1, where e y = ( e y 1 , e y 2 ) Ping Yu (HKU) Finite-Sample 11 / 29

Bias and Variance Bias and Variance Ping Yu (HKU) Finite-Sample 12 / 29

Bias and Variance Unbiasedness of the LSE Assumption OLS.2 implies that y = x 0 β + u , E [ u j x ] = 0 . Then 0 1 0 1 . . . . . . B C B C B C B C E [ u j X ] = E [ u i j X ] A = E [ u i j x i ] A = 0 , @ @ . . . . . . where the second equality is from the assumption of independent sampling (Assumption OLS.0). Now, � � � 1 X 0 y = � � � 1 X 0 ( X β + u ) = β + � � � 1 X 0 u , b X 0 X X 0 X X 0 X β = so hb i h� i � � 1 X 0 u j X � � � 1 X 0 E [ u j X ] = 0 , X 0 X X 0 X E β � β j X = E = i.e., b β is unbiased. Ping Yu (HKU) Finite-Sample 13 / 29

Bias and Variance Variance of the LSE �b � �� 1 X 0 u j X X 0 X Var β j X = Var � � � 1 X 0 Var ( u j X ) X � � � 1 X 0 X X 0 X = � � � 1 X 0 DX � � � 1 . X 0 X X 0 X � Note that h i h i � E [ u i j x i ] 2 = E u 2 u 2 � σ 2 Var ( u i j X ) = Var ( u i j x i ) = E i j x i i j x i i , and � � � � Cov ( u i , u j j X ) = E u i u j j X � E [ u i j X ] E u j j X � � � � = E u i u j j x i , x j � E [ u i j x i ] E u j j x j � � � � = 0 , = E [ u i j x i ] E u j j x j � E [ u i j x i ] E u j j x j so D is a diagonal matrix: � � σ 2 1 , �� , σ 2 D = diag . n Ping Yu (HKU) Finite-Sample 14 / 29

Bias and Variance continue... It is useful to note that n ∑ X 0 DX = x i x 0 i σ 2 i . i = 1 i = σ 2 and D = σ 2 I n , so X 0 DX = σ 2 X 0 X , and In the homoskedastic case, σ 2 �b � = σ 2 � � � 1 . X 0 X Var β j X You are asked to show that �b � n ∑ w ij σ 2 Var β j j X = i / SSR j , j = 1 , �� , k , i = 1 where w ij > 0, ∑ n i = 1 w ij = 1, and SSR j is the SSR in the regression of x j on all other regressors. So under homoskedasticity, �b � h i = σ 2 / SSR j = σ 2 / SST j ( 1 � R 2 Var β j j X j ) , j = 1 , �� , k , (why?), where SST j is the SST of x j , and R 2 j is the R -squared from the simple regression of x j on the remaining regressors (which includes an intercept). Ping Yu (HKU) Finite-Sample 15 / 29

Bias and Variance σ 2 Bias of b Recall that b u = Mu , where we abbreviate M X as M , so by the properties of projection matrices and the trace operator, we have � � = 1 � Muu 0 � σ 2 = 1 u = 1 n u 0 MMu = 1 n u 0 Mu = 1 u 0 b u 0 Mu b n b n tr n tr . Then h σ 2 � i � � �� = 1 � � �� = 1 = 1 � Muu 0 j X uu 0 j X b E � X n tr E n tr M E n tr ( MD ) . In the homoskedastic case, D = σ 2 I n , so � n � k � σ 2 � h i � M σ 2 � = 1 � = σ 2 b E � X n tr . n σ 2 underestimates σ 2 . Thus b Alternatively, s 2 = n � k b 1 u 0 b u is unbiased for σ 2 . This is the justification for the σ 2 in empirical practice. common preference of s 2 over b However, this estimator is only unbiased in the special case of the homoskedastic linear regression model. It is not unbiased in the absence of homoskedasticity or in the projection model. Ping Yu (HKU) Finite-Sample 16 / 29

The Gauss-Markov Theorem The Gauss-Markov Theorem Ping Yu (HKU) Finite-Sample 17 / 29

Least Squares Estimation-Finite-Sample Properties Ping Yu School of - PowerPoint PPT Presentation

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions Terminology and Assumptions 1 Goodness of Fit 2 Bias and

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Least Squares Estimation- Large-Sample Properties Ping Yu School of Economics and Finance The

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Least Squares Estimation, Filtering, and Prediction Motivation If the second-order statistics

Dealing with Missing Data Challenges and Solutions Nicole Erler Department of Biostatistics,

Statistics and Data Analysis R Programming and Logistic Regression Ling-Chieh Kung Department of

Lecture 8: Model assessment, nested models, and hypothesis testing Ani Manichaikul

Two-way ANOVA. Interaction. Susanne Rosthj Section of Biostatistics Department of Public

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Evaluating an Alternative CS1 for Students with Prior Programming Experience Michael S.

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear