Slide Set 3 Notes Regression Models and the Classical Linear - - PowerPoint PPT Presentation

slide set 3
SMART_READER_LITE
LIVE PREVIEW

Slide Set 3 Notes Regression Models and the Classical Linear - - PowerPoint PPT Presentation

Slide Set 3 Notes Regression Models and the Classical Linear Regression Model (CLRM) Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Universit degli Studi di Napoli Federico II Version: Tuesday 21


slide-1
SLIDE 1

Slide Set 3 Regression Models and the Classical Linear Regression Model (CLRM)

Pietro Coretto pcoretto@unisa.it

Econometrics

Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II”

Version: Tuesday 21st January, 2020 (h11:31)

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 1 / 26

Regression analysis

Regression analysis is a set of statistical techniques for modeling and analyzing the relationship between a dependent variable Y and one or more independent variables X. Typically X = (X1, X2, . . . , XK)′ is a vector of variables. Depending on the context, and the field of application, we have different names Y : dependent variable, response variable, outcome variable, output variable, target variable etc. X: independent variable, regressor, covariate, explanatory variable, predictor, feature, etc. In regression analysis we assume a certain mechanism linking the X to the Y . We want to use observed data to understand the link.

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 2 / 26

Notes Notes

slide-2
SLIDE 2

Regression function and types of regression models

The link is formalized in terms of a regression function. The latter models the relationship between Y and X Y ≈ f(X) Building a regression model requires to specify how the f(·) transforms X in which sense the f(·) approximates Y

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 3 / 26

Depending on how the f(·) transforms X we have nonparametric models parametric models Nonparametric models The f(·) is treated as the unknown. Therefore the object of interest belongs to an infinite dimensional space. Usually we restrict our quest to some well defined class, for instance we may restrict the analysis to

  • f : f is real valued, smooth and
  • |f(x)|dx < +∞
  • Nonparametric models allow for a lot flexibility, and this comes at a price.
  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 4 / 26

Notes Notes

slide-3
SLIDE 3

Parametric models f is assumed to have a specific form controlled by a scalar parameter β or a vector of parameters β = (β1, β2, . . . , βp)′. Therefore the object of interest is β. Examples linear parametric regession function: f(X; β) = β1X1 + β2X2 nonlinear parametric regession function: f(X; β) = sin(β1X1) + eβ2X2 Some nonlinear regression function can be linearized. Example: f(X; β) = eX1+βX2 − → log(f(X; β)) = X1 + βX2 Sometimes a regression function is not linear in the original X, but it is linear in a transformation of X f(X; β) = β1X2

1 + β2X2

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 5 / 26

Depending on what kind of approximation f(·) provides about Y (conditional) mean regression: f(X) = E[Y | X] (conditional) quantile regression: f(X) = median[Y |X] f(X) = quantileα[Y |X] etc... Conditional mean regression functions are central in regression analysis for several reasons: approximating Y by an average it’s intuitive most theoretical models are expressed in terms of expectations “Optimal Predictor Theorem”

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 6 / 26

Notes Notes

slide-4
SLIDE 4

The quality of the approximation Y ≈ f(X) can be measure by the quadratic risk or MSE E[(Y − f(X))2] Theorem (Optimal Predictor) Under suitable regularity conditions inf

f

E[(Y − f(X))2] = E[Y |X] In other words f(X) = E[Y |X] gives the best approximation to Y in terms of MSE If we want to guess Y based on information generated by X, f(X) = E[Y |X] is the best guess in terms of MSE

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 7 / 26

  • Proof. We need to show that for any function f(X)

E[(Y − E[Y |X])2] ≤ E[(Y − f(X))2] E[(Y − f(X))2] = E

     Y − E[Y |X]

  • A

+ E[Y |X] − f(X)

  • B

  

2

 

computing (A + B)2 and using using the lineary of expectations E[A2] + E[B2] + 2 E[AB] = E[(Y − f(X))2] (3.1) E[B2] = E[(E[Y |X] − f(X))2] ≥ 0, therefore (3.1) becomes E[A2] + 2 E[AB] ≤ E[(Y − f(X))2] (3.2)

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 8 / 26

Notes Notes

slide-5
SLIDE 5

E[AB] = E[ E[AB | X] ] (law of iterated expectations ) = E

(Y − E[Y |X])(E[Y |X] − f(X))

  • X
  • = (E[Y |X] − f(X)) E

(Y − E[Y |X])

  • X
  • (pull out)

= (E[Y |X] − f(X)) { E[Y | X] − E[E[Y | X] | X]} = (E[Y |X] − f(X)) × 0 Therefore, (3.2) becomes E[A2] ≤ E[(Y − f(X))2] with E[A2] = E[(Y − E[Y | X])2], which proves the result

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 9 / 26

In this course we focus on conditional mean regression models where the regression function has a linear parametric form: Y ≈ E[Y | X] = f(X; β) = β1X1 + β2X2 + . . . + βKXK The reason why this class of regression models is so popular is that because they can reproduce correlation between Y and Xs. Going back to the example of Slide Set #1

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 10 / 26

Notes Notes

slide-6
SLIDE 6

5 10 15 20 50 100 150 200 250 X Y

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 11 / 26

X Y x = 5 ¯ y|x = 135.1

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 12 / 26

Notes Notes

slide-7
SLIDE 7

X Y x = 10 ¯ y|x = 171.8

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 13 / 26

X Y x = 15 ¯ y|x = 208.6

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 14 / 26

Notes Notes

slide-8
SLIDE 8

The model postulates that Y ≈ E[Y | X], but we cannot observe E[Y | X]. For each sample unit i = 1, 2, . . . , n we observe the samples (Yi, Xi1, Xi2, . . . , Xik). Therefore, we need an additional term which summarizes the difference between Yi and its conditional mean E[Yi | X] A way to reproduce the previous sampling mechanism is to add an error term εi, which is a random variable that “summarizes” the deviations of Yi from its conditional mean E[Yi | X]. Therefore Yi = f(Xi; β) + εi = E[Yi | X] + εi = β1X1 + β2X2 + . . . + βKXK + εi The short name for this class of models is linear regression models = linear parametric regression function plus an additive error term.

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 15 / 26

Partial or marginal effects

The partial/marginal effect is a measure of the effect on the regression function determined by a change in a regressor Xj holding all the other regressors constant (waee = “with all else equal”). Let us focus on conditional mean models. Assuming differentiability, the partial/marginal effect of a change ∆Xj is given by ∆ E[Y | X] = ∂ E[Y | X] ∂Xj ∆Xj hoding fixed X1, . . . , Xj−1, Xj+1, . . . , XK Computing marginal/partial effects make sense only when the model has a causal interpretation (see later).

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 16 / 26

Notes Notes

slide-9
SLIDE 9

For the linear regression model ∂ E[Y | X] ∂Xj = βj Therefore, the unknown parameters βj coincides with partial effect of an unit change in Xj waee. For a discrete regressor Xj, partial effects are computed as the variations in E[Y | X] obtained by changing the level of Xj and waee. Suppose Xk ∈ {a, b, c}. The partial effect when Xk changes from level a to b (waee) is given by E[Y | Xk = b, X] − E[Y | Xk = a, X] Another measure of regressors’ effect on the Y is the partial/marginal elasticity (see homeworks).

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 17 / 26

Notations

Indexes and constants: n: number of sample units K: number of covariates/regressors/features measured on each of the n sample units i = 1, 2, . . . , n: indexes sample units k = 1, 2, . . . , K: indexes regressors y ∈ Rn: column vector of the dependent/response variable y =

     

y1 y2 . . . yn

     

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 18 / 26

Notes Notes

slide-10
SLIDE 10

xi ∈ RK: column vector containing the K regressors measured on the ith sample unit xi =

     

xi1 xi2 . . . xiK

     

So called design matrix, is the (n × K) matrix whose rows contain sample units and columns contain regressors X =

     

x11 x12 . . . x1K x21 x22 . . . x2K . . . . . . ... . . . xn1 xn2 . . . xnK

     

=

     

x1′ x2′ . . . xn′

     

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 19 / 26

ε ∈ Rn: column vector containing the error term for each unit ε =

     

ε1 ε2 . . . εn

     

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 20 / 26

Notes Notes

slide-11
SLIDE 11

Classical linear regression model (CLRM)

A1: linearity For all i = 1, 2, ..., n observed data are generated by the following linear model yi = β1xi1 + β2xi2 + ... + βKxiK + εi (3.3) = x′

iβ + εi

(3.4) where β ∈ RK is a vector of coefficients. In matrix form y = Xβ + ε (3.5) Remark 1: linearity of the model is wrt parameters not regressors. For x2=log(consumption), the model is still linear wrt to log(consumption).

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 21 / 26

Remark 2: often a constant/intercept term is introduced in the model yi = β1 + β2xi2 + ... + βKxiK + εi for i = 1, 2, ..., n In this case we consider conventionally (3.3) as if xi1 = 1 for all i in the sample. If the model includes a constant/intercept term, then the first column of X is the unit column vector, that is X·1 = 1n = (1, 1, . . . , 1)′ X =

     

1 x12 . . . x1K 1 x22 . . . x2K . . . . . . ... . . . 1 xn2 . . . xnK

     

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 22 / 26

Notes Notes

slide-12
SLIDE 12

A2: strict exogeneity E[εi | X] = 0 for all i = 1, 2, . . . , n, or E[ε | X] = 0 Implications:

1 E[εi] = 0 for all i = 1, 2, . . . , n. 2 All regressors are orthogonal to the error term for all units:

E[xjkεi] = 0 for all i, j = 1, 2, . . . , n and k = 1, 2, . . . , K

3 Orthogonality implies the zero-correlation conditions

Cov[xjk, εi] = 0 for all i, j = 1, 2, . . . , n and k = 1, 2, . . . , K If i =time (time series data), strict exogeneity implies that the error term is orthogonal to the past, current, and future regressors. For most time-series data, this condition is not satisfied, so the finite sample theory based on strict exogeneity is rarely applicable in time-series contexts.

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 23 / 26

Proof of the implications:

  • 1. It follows from the law of iterated expectations:

E[εi] = E[E[εi | X]] = E[0] = 0

  • 2. By the law of iterated expectations write

E[xjiεi] = E[E[xjkεi | xjk]] And by the linearity of the conditional expectation (“pull out what’s known”), write E[E[xjkεi | xjk]] = E[xjkE[εi | xjk]] But A2 says E[εi | xjk] = 0, which proves the desired result.

  • 3. This follows from previous results:

Cov[xjk, εi] = E[xjkεi] − E[xjk]E[εi] = E[xjkεi] = 0

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 24 / 26

Notes Notes

slide-13
SLIDE 13

A3: absence of multicollinearity The (n × K) design matrix X has rank K with probability 1. This assumption implies that X has full column rank, which means that the columns of X are linearly independent A3 also requires that n ≥ K. Essentially A3 is a technical assumption that guarantees that most computations can be performed... see this later.

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 25 / 26

A4: spherical error variance For all i, j = 1, 2, . . . , n and i = j

1 E[ε2 i | X] = σ2 > 0 (homoskedasticity) 2 E[εiεj | X] = 0 (units are uncorrelated)

A vector random variable is said to have a spherical distribution if its covariance matrix is a scalar multiple of the identity matrix. The sphericity here is shown as follows: Var[εi | X] = E[ε2

i | X] − E[εi | X]2 = E[ε2 i | X] = σ2

and Cov[εi, εj | X] = E[εiεj | X] − E[εi | X]E[εj | X] = E[εiεj | X] = 0 Now it is easy to show (do it as an exercise) that E[εε′ | X] = Var[ε | X] = σ2In

  • P. Coretto • MEF

Regression Models and the Classical Linear Regression Model (CLRM) 26 / 26

Notes Notes