Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it - - PowerPoint PPT Presentation

slide set 4 clrm estimation
SMART_READER_LITE
LIVE PREVIEW

Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it - - PowerPoint PPT Presentation

Notes Slide Set 4 CLRM estimation Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Universit degli Studi di Napoli Federico II Version: Saturday 28 th December, 2019 (h16:05) P. Coretto MEF CLRM


slide-1
SLIDE 1

Slide Set 4 CLRM estimation

Pietro Coretto pcoretto@unisa.it

Econometrics

Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II”

Version: Saturday 28th December, 2019 (h16:05)

  • P. Coretto • MEF

CLRM estimation 1 / 22

Least Squares Method (LS)

Given an additive regression model: y = f(X; β) + ε note that ε is not observed, but it is function of observables and the unknown parameter ε = y − f(X; β) LS method: assume the signal f(X; β) is much stronger than the error ε. look for a β such that the “size” of ε is as small as possible size of ε is measured by some norm ε

  • P. Coretto • MEF

CLRM estimation 2 / 22

Notes Notes

slide-2
SLIDE 2

Ordinary Least Squares estimator (OLS)

OLS = LS with ·2. Therefore the OLS objective function is S(β) = ε2

2 = ε′ε = (y − f(X; β))′(y − f(X; β)),

and the OLS estimator b is defined as the optimal solution b = arg min

β∈RK S(β)

For the linear model S(β) = ε2

2 = ε′ε = (y − Xβ)′(y − Xβ) = n

  • i=1

ε2

i = n

  • i=1

(yi − x′

iβ)2

S(β) is nicely convex!

  • P. Coretto • MEF

CLRM estimation 3 / 22

Proposition: OLS estimator The “unique” OLS estimator is b = (X′X)−1X′y To see this, first we introduce two simple matrix derivative rules:

1 Let a, b ∈ Rp then

∂a′b ∂b = ∂b′a ∂b = a

2 Let b ∈ Rp, and let A ∈ Rp×p be symmetric, then

∂a′Ab ∂b = 2Ab = 2b′A

  • P. Coretto • MEF

CLRM estimation 4 / 22

Notes Notes

slide-3
SLIDE 3
  • Proof. Rewrite the LS objective function

S(β) =(y − Xβ)′(y − Xβ) =y′y − β′X′y − y′Xβ + β′X′Xβ Note that the transpose of a scalar is the scalar itself, then y′Xβ = (y′Xβ)′ = β′X′y so that we write S(β) = y′y − 2β′(X′y) + β′(X′X)β (4.1) Since S(·) is convex, there exists a minimum b which will satisfy the first

  • rder conditions

∂S(β) ∂β

  • β=b

= 0

  • P. Coretto • MEF

CLRM estimation 5 / 22

By applying the previous derivative rules (1) and (2) to the 2nd and 3rd term of (4.1) ∂S(b) ∂b = −2(X′y) + 2(X′X)b = 0 Which lead to the so called “normal equations” (X′X)b = X′y The matrix X′X is square symmetric (see homeworks). Based on A3 with probability 1 X′X is non singular, then (X′X)−1 exists, then the normal equation can be written as (X′X)−1(X′X)b = (X′X)−1X′y = ⇒ b = (X′X)−1X′y which proves the desired result

  • P. Coretto • MEF

CLRM estimation 6 / 22

Notes Notes

slide-4
SLIDE 4

Formulation in terms of sample averages

It can be shown (see homeworks) that X′X =

n

  • i=1

xix′

i

and X′y =

n

  • i=1

xiyi Define Sxx = 1 nX′X = 1 n

n

  • i=1

xix′

i

and sxy = 1 nX′y = 1 n

n

  • i=1

xiyi Therefore b = (X′X)−1X′y can be written as b =

1

nX′X

−1 1

nX′y =

  • 1

n

n

  • i=1

xix′

i

−1

1 n

n

  • i=1

xiyi

  • =S−1

xx sxy

  • P. Coretto • MEF

CLRM estimation 7 / 22

Once β is estimated via b, the estimated error, also called “residual” is

  • btained as

e = y − Xb Fitted values, also called the predicted values, are ˆ y = Xb so that e = y − ˆ y Note that ˆ yi = b1 + b2xi2 + b2xi2 + . . . for all i = 1, 2, . . . , n What is ˆ yi? ˆ yi I’s the estimated conditional expectation of Y for the when X1 = 1, X2 = xi2, . . . , XK = xiK

  • P. Coretto • MEF

CLRM estimation 8 / 22

Notes Notes

slide-5
SLIDE 5

Algebraic/Geometric properties of the OLS

Proposition (orthogonality of residuals) The column space of X is orthogonal to the residual vector

  • Proof. Write the normal equations

X′Xb − X′y = 0 = ⇒ X′(y − Xb) = 0 = ⇒ X′e = 0 Therefore for every column X·k (observed regressor) it holds true that the inner product X·k′e = 0.

  • P. Coretto • MEF

CLRM estimation 9 / 22

Proposition (residuals sum to zero) If the linear model includes the constant term, then

n

  • i=1

ei =

n

  • i=1

(yi − x′

ib) = 0

  • Proof. By assumption we have a liner model with constant/intercept

term.That is yi = β1 + β2xi2 + β3xi3 + . . . + εi Therefore X·1 = 1n = (1, , 1, . . . , 1)′. Apply the previous property the 1st column of X X·1′e = 1′e =

n

  • i=1

ei = 0 and this proves the property

  • P. Coretto • MEF

CLRM estimation 10 / 22

Notes Notes

slide-6
SLIDE 6

Proposition (Fitted vector is a projection) ˆ y is the projection of y onto the space spanned by columns of X (regressors) Proof. ˆ y = Xb = X(X′X)−1X′y = P y It suffices to show that that P = X(X′X)−1X′ is symmetric and idempotent. P ′ =

  • X(X′X)−1X′′

= X

X′X −1′X′

= X

  • (X′X)′−1X′

= X(X′X)−1X′ = P Therefore P is symmetric.

  • P. Coretto • MEF

CLRM estimation 11 / 22

P P =

  • X(X′X)−1X′

X(X′X)−1X′ = X(X′X)−1(X′X)(X′X)−1X′ = X(X′X)−1X′ = P which shows that P is also idempotent, and this completes the proof

  • P it’s called the influence matrix, because measures the impact of the
  • bserved ys on each predicted ˆ

yi. Elements of the diagonal of P are called leverages, because are the influence yi on the the corresponding ˆ yi

  • P. Coretto • MEF

CLRM estimation 12 / 22

Notes Notes

slide-7
SLIDE 7

Proposition (Orthogonal decomposition) The OLS fitting decomposes the observed vector y in the sum of two

  • rthogonal components

y = ˆ y + e = P y + My Remark: orthogonality implies that the individual contributions of each term of the decomposition of y are somewhat well identified. Proof. First notice that e = y − ˆ y = y − P y = (I − P )y = My where M = (I − P ). Therefore y = ˆ y + e = P y + My It remains to show that ˆ y = P y and e = My are orthogonal vectors.

  • P. Coretto • MEF

CLRM estimation 13 / 22

First note that MP = P M = 0, in fact (I − P )P = IP − P P = 0 Moreover P y, My = (P y)′(My) = y′P ′My = y′P My = y′0y = 0 and this completes the proof

  • M = I − P is called the residual maker matrix because it maps y into e.

It allows to write e in terms of the observables y and X. Properties: M is idempotent and symmetric (show it) MX = 0, in fact MX = (I − P )X = X − X = 0 Remark: it can be shown that this decomposition is also unique (a consequence of Hilbert projection theorem).

  • P. Coretto • MEF

CLRM estimation 14 / 22

Notes Notes

slide-8
SLIDE 8

OLS Projection

Source: Greene, W. H. (2011) “Econometric Analysis” 7th Edition

  • P. Coretto • MEF

CLRM estimation 15 / 22

Estimate of the variance of the error term

Min of the LS objective function S(b) = (y − Xb)′(y − Xb) = e′e This called “Residual sum of squares” RSS =

n

  • i=1

e2

i = e′e

Note that e = My = M(Xβ + ε) = Mε and RSS = e′e = (Mε)′(Mε) = ε′M ′Mε = ε′Mε

  • P. Coretto • MEF

CLRM estimation 16 / 22

Notes Notes

slide-9
SLIDE 9

Unbiased estimation of the error variance s2 = 1 n − K

n

  • i=1

e2

i =

e′e n − K = RSS n − K SER = “standard error of the regression” = s

  • P. Coretto • MEF

CLRM estimation 17 / 22

Estimation error decomposition

The sampling estimation error is given by b − β, now b − β =

X′X −1X′y − β

=

X′X −1X′(Xβ + ε) − β

=

X′X −1(X′X)β + X′X −1X′ε − β

= β +

X′X −1X′ε − β

=

X′X −1X′ε

The bias is the expected estimation error: Bias(b) = E[b − β]

  • P. Coretto • MEF

CLRM estimation 18 / 22

Notes Notes

slide-10
SLIDE 10

TSS = total sum of squares

Let ¯ y be the sample average of the observed y1, y2, . . . , yn: ¯ y = 1 n

n

  • i=1

yi, and let ¯ y = (¯ y, ¯ y, . . . , ¯ y

  • n times

)′. We can also write ¯ y = ¯ y1n TSS = the deviance (variability) observed in the independent variable y TSS =

n

  • i=1

(yi − y)2 = (y − ¯ y)′(y − ¯ y) This is a variability measure, because it computes the squared deviations

  • f y from its observed unconditional mean.
  • P. Coretto • MEF

CLRM estimation 19 / 22

ESS = explained sum of squares

ESS = the overall deviance of the predicted values of y wrt to the unconditional mean of y ESS =

n

  • i=1

(ˆ yi − y)2 = (ˆ y − ¯ y)′(ˆ y − ¯ y) At first look this is not exactly a measure of variability (why?). But it turns out that another property of the OLS is that 1 n

n

  • i=1

ˆ yi = 1 n

n

  • i=1

yi

  • P. Coretto • MEF

CLRM estimation 20 / 22

Notes Notes

slide-11
SLIDE 11

TSS decomposition and goodness of fit

It can be shown (we don’t do this here) that TSS = ESS + RSS From the previous decomposition we get a famous (and misused) goodness

  • f fit statistic

R2 = ESS TSS = 1 − RSS TSS R2 is the portion of deviance observed in the y that is explained by the linear model. This is also called coefficient of determination.

  • P. Coretto • MEF

CLRM estimation 21 / 22

Problems with R2 Increases by adding more regressors. For this reason its better to look at the so-called adjusted R2 (for the degrees of freedom) which is computed as follows: R2 = 1 − RSS/(n − K) TSS/(n − 1) R2 ∈ [0, 1] only if the constant term is included in the model. So when you estimate without intercept don’t be scared if you get R2 < 0 An extremely large R2 is pathological, guess why!

  • P. Coretto • MEF

CLRM estimation 22 / 22

Notes Notes