 
              Notes Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II” Version: Saturday 28 th December, 2019 (h16:05) P. Coretto • MEF CLRM estimation 1 / 22 Least Squares Method (LS) Notes Given an additive regression model: y = f ( X ; β ) + ε note that ε is not observed, but it is function of observables and the unknown parameter ε = y − f ( X ; β ) LS method: assume the signal f ( X ; β ) is much stronger than the error ε . look for a β such that the “size” of ε is as small as possible size of ε is measured by some norm � ε � P. Coretto • MEF CLRM estimation 2 / 22
Ordinary Least Squares estimator (OLS) Notes OLS = LS with �·� 2 . Therefore the OLS objective function is S ( β ) = � ε � 2 2 = ε ′ ε = ( y − f ( X ; β )) ′ ( y − f ( X ; β )) , and the OLS estimator b is defined as the optimal solution b = arg min β ∈ R K S ( β ) For the linear model n n � � S ( β ) = � ε � 2 2 = ε ′ ε = ( y − Xβ ) ′ ( y − Xβ ) = ε 2 i β ) 2 ( y i − x ′ i = i =1 i =1 S ( β ) is nicely convex! P. Coretto • MEF CLRM estimation 3 / 22 Notes Proposition: OLS estimator The “unique” OLS estimator is b = ( X ′ X ) − 1 X ′ y To see this, first we introduce two simple matrix derivative rules: 1 Let a , b ∈ R p then ∂ a ′ b = ∂ b ′ a = a ∂ b ∂ b 2 Let b ∈ R p , and let A ∈ R p × p be symmetric, then ∂ a ′ A b = 2 Ab = 2 b ′ A ∂ b P. Coretto • MEF CLRM estimation 4 / 22
Proof . Rewrite the LS objective function Notes S ( β ) =( y − Xβ ) ′ ( y − Xβ ) = y ′ y − β ′ X ′ y − y ′ Xβ + β ′ X ′ Xβ Note that the transpose of a scalar is the scalar itself, then y ′ Xβ = ( y ′ Xβ ) ′ = β ′ X ′ y so that we write S ( β ) = y ′ y − 2 β ′ ( X ′ y ) + β ′ ( X ′ X ) β (4.1) Since S ( · ) is convex, there exists a minimum b which will satisfy the first order conditions � ∂S ( β ) � = 0 � ∂ β � β = b P. Coretto • MEF CLRM estimation 5 / 22 By applying the previous derivative rules (1) and (2) to the 2 nd and Notes 3 rd term of (4.1) ∂S ( b ) = − 2( X ′ y ) + 2( X ′ X ) b = 0 ∂ b Which lead to the so called “normal equations” ( X ′ X ) b = X ′ y The matrix X ′ X is square symmetric (see homeworks). Based on A3 with probability 1 X ′ X is non singular, then ( X ′ X ) − 1 exists, then the normal equation can be written as ( X ′ X ) − 1 ( X ′ X ) b = ( X ′ X ) − 1 X ′ y b = ( X ′ X ) − 1 X ′ y = ⇒ which proves the desired result � P. Coretto • MEF CLRM estimation 6 / 22
Formulation in terms of sample averages Notes It can be shown (see homeworks) that n n � � X ′ X = x i x ′ and X ′ y = x i y i i i =1 i =1 Define n n S xx = 1 n X ′ X = 1 s xy = 1 n X ′ y = 1 � � x i x ′ and x i y i i n n i =1 i =1 Therefore b = ( X ′ X ) − 1 X ′ y can be written as � 1 � − 1 1 n X ′ X n X ′ y b = � � − 1 � � n n 1 1 � � x i x ′ = x i y i i n n i =1 i =1 = S − 1 xx s xy P. Coretto • MEF CLRM estimation 7 / 22 Once β is estimated via b , the estimated error, also called “ residual ” is Notes obtained as e = y − Xb Fitted values, also called the predicted values, are ˆ y = Xb so that e = y − ˆ y Note that y i = b 1 + b 2 x i 2 + b 2 x i 2 + . . . ˆ for all i = 1 , 2 , . . . , n What is ˆ y i ? ˆ y i I’s the estimated conditional expectation of Y for the when X 1 = 1 , X 2 = x i 2 , . . . , X K = x iK P. Coretto • MEF CLRM estimation 8 / 22
Algebraic/Geometric properties of the OLS Notes Proposition (orthogonality of residuals) The column space of X is orthogonal to the residual vector Proof. Write the normal equations X ′ Xb − X ′ y = 0 X ′ ( y − Xb ) = 0 X ′ e = 0 = ⇒ = ⇒ Therefore for every column X · k (observed regressor) it holds true that the inner product X · k ′ e = 0 . � P. Coretto • MEF CLRM estimation 9 / 22 Proposition (residuals sum to zero) Notes If the linear model includes the constant term, then n n � � ( y i − x ′ e i = i b ) = 0 i =1 i =1 Proof. By assumption we have a liner model with constant/intercept term.That is y i = β 1 + β 2 x i 2 + β 3 x i 3 + . . . + ε i Therefore X · 1 = 1 n = (1 , , 1 , . . . , 1) ′ . Apply the previous property the 1 st column of X n � X · 1 ′ e = 1 ′ e = e i = 0 i =1 and this proves the property � P. Coretto • MEF CLRM estimation 10 / 22
Proposition (Fitted vector is a projection) Notes ˆ y is the projection of y onto the space spanned by columns of X (regressors) Proof. y = Xb = X ( X ′ X ) − 1 X ′ y = P y ˆ It suffices to show that that P = X ( X ′ X ) − 1 X ′ is symmetric and idempotent. � X ( X ′ X ) − 1 X ′ � ′ P ′ = �� X ′ X � − 1 � ′ X ′ = X � ( X ′ X ) ′ � − 1 X ′ = X = X ( X ′ X ) − 1 X ′ = P Therefore P is symmetric. P. Coretto • MEF CLRM estimation 11 / 22 Notes � X ( X ′ X ) − 1 X ′ � � X ( X ′ X ) − 1 X ′ � P P = = X ( X ′ X ) − 1 ( X ′ X )( X ′ X ) − 1 X ′ = X ( X ′ X ) − 1 X ′ = P which shows that P is also idempotent, and this completes the proof � P it’s called the influence matrix, because measures the impact of the observed y s on each predicted ˆ y i . Elements of the diagonal of P are called leverages, because are the influence y i on the the corresponding ˆ y i P. Coretto • MEF CLRM estimation 12 / 22
Proposition (Orthogonal decomposition) Notes The OLS fitting decomposes the observed vector y in the sum of two orthogonal components y = ˆ y + e = P y + My Remark: orthogonality implies that the individual contributions of each term of the decomposition of y are somewhat well identified . Proof. First notice that e = y − ˆ y = y − P y = ( I − P ) y = My where M = ( I − P ) . Therefore y = ˆ y + e = P y + My It remains to show that ˆ y = P y and e = My are orthogonal vectors. P. Coretto • MEF CLRM estimation 13 / 22 First note that MP = P M = 0 , in fact Notes ( I − P ) P = IP − P P = 0 Moreover � P y , My � = ( P y ) ′ ( My ) = y ′ P ′ My = y ′ P My = y ′ 0 y = 0 and this completes the proof � M = I − P is called the residual maker matrix because it maps y into e . It allows to write e in terms of the observables y and X . Properties: M is idempotent and symmetric (show it) MX = 0 , in fact MX = ( I − P ) X = X − X = 0 Remark: it can be shown that this decomposition is also unique (a consequence of Hilbert projection theorem). P. Coretto • MEF CLRM estimation 14 / 22
Notes OLS Projection Source: Greene, W. H. (2011) “ Econometric Analysis ” 7th Edition P. Coretto • MEF CLRM estimation 15 / 22 Estimate of the variance of the error term Notes Min of the LS objective function S ( b ) = ( y − Xb ) ′ ( y − Xb ) = e ′ e This called “Residual sum of squares” n � e 2 RSS = i = e ′ e i =1 Note that e = My = M ( Xβ + ε ) = Mε and RSS = e ′ e = ( Mε ) ′ ( Mε ) = ε ′ M ′ Mε = ε ′ Mε P. Coretto • MEF CLRM estimation 16 / 22
Notes Unbiased estimation of the error variance n 1 e ′ e RSS s 2 = � e 2 i = n − K = n − K n − K i =1 SER = “ standard error of the regression ” = s P. Coretto • MEF CLRM estimation 17 / 22 Estimation error decomposition Notes The sampling estimation error is given by b − β , now � X ′ X � − 1 X ′ y − β b − β = � X ′ X � − 1 X ′ ( Xβ + ε ) − β = � X ′ X � − 1 ( X ′ X ) β + � X ′ X � − 1 X ′ ε − β = � X ′ X � − 1 X ′ ε − β = β + � X ′ X � − 1 X ′ ε = The bias is the expected estimation error: Bias ( b ) = E[ b − β ] P. Coretto • MEF CLRM estimation 18 / 22
TSS = total sum of squares Notes Let ¯ y be the sample average of the observed y 1 , y 2 , . . . , y n : n y = 1 � ¯ y i , n i =1 ) ′ . We can also write ¯ and let ¯ y = (¯ y, ¯ y, . . . , ¯ y y = ¯ y 1 n � �� � n times TSS = the deviance (variability) observed in the independent variable y n � ( y i − y ) 2 = ( y − ¯ y ) ′ ( y − ¯ TSS = y ) i =1 This is a variability measure, because it computes the squared deviations of y from its observed unconditional mean. P. Coretto • MEF CLRM estimation 19 / 22 ESS = explained sum of squares Notes ESS = the overall deviance of the predicted values of y wrt to the unconditional mean of y n � y i − y ) 2 = (ˆ y ) ′ (ˆ ESS = (ˆ y − ¯ y − ¯ y ) i =1 At first look this is not exactly a measure of variability (why?). But it turns out that another property of the OLS is that n n 1 y i = 1 � � ˆ y i n n i =1 i =1 P. Coretto • MEF CLRM estimation 20 / 22
Recommend
More recommend