BS2247 Introduction to Econometrics Lecture 6: The multiple - - PowerPoint PPT Presentation

bs2247 introduction to econometrics lecture 6 the
SMART_READER_LITE
LIVE PREVIEW

BS2247 Introduction to Econometrics Lecture 6: The multiple - - PowerPoint PPT Presentation

BS2247 Introduction to Econometrics Lecture 6: The multiple regression model OLS Unbiasedness, OLS Variances, Including irrelevant variables and excluding relevant variables, the Gauss-Markov Theorem Dr. Kai Sun Aston Business School 1 / 18


slide-1
SLIDE 1

BS2247 Introduction to Econometrics Lecture 6: The multiple regression model

OLS Unbiasedness, OLS Variances, Including irrelevant variables and excluding relevant variables, the Gauss-Markov Theorem

  • Dr. Kai Sun

Aston Business School

1 / 18

slide-2
SLIDE 2

Assumptions for Unbiasedness

MLR for “Multiple Linear Regression”

◮ Assumption MLR.1: The population model is linear in

parameters, i.e., y = β0 + β1x1 + β2x2 + . . . + βkxk + u

◮ Assumption MLR.2: Use a random sample of size n,

{(x1i, . . . , xki, yi) : i = 1, 2, . . . , n}

◮ Assumption MLR.3: No perfect collinearity: no exact linear

relationship among the independent variables. For example, we cannot estimate a model like this: y = β0 + β1x1 + β2x2 + u, where x1 = 2x2.

2 / 18

slide-3
SLIDE 3

Assumptions for Unbiasedness

MLR for “Multiple Linear Regression”

◮ Assumption MLR.1: The population model is linear in

parameters, i.e., y = β0 + β1x1 + β2x2 + . . . + βkxk + u

◮ Assumption MLR.2: Use a random sample of size n,

{(x1i, . . . , xki, yi) : i = 1, 2, . . . , n}

◮ Assumption MLR.3: No perfect collinearity: no exact linear

relationship among the independent variables. For example, we cannot estimate a model like this: y = β0 + β1x1 + β2x2 + u, where x1 = 2x2.

2 / 18

slide-4
SLIDE 4

Assumptions for Unbiasedness

MLR for “Multiple Linear Regression”

◮ Assumption MLR.1: The population model is linear in

parameters, i.e., y = β0 + β1x1 + β2x2 + . . . + βkxk + u

◮ Assumption MLR.2: Use a random sample of size n,

{(x1i, . . . , xki, yi) : i = 1, 2, . . . , n}

◮ Assumption MLR.3: No perfect collinearity: no exact linear

relationship among the independent variables. For example, we cannot estimate a model like this: y = β0 + β1x1 + β2x2 + u, where x1 = 2x2.

2 / 18

slide-5
SLIDE 5

Assumptions for Unbiasedness

◮ Assumption MLR.4: Zero conditional mean, i.e.,

E(ui|x1i, . . . , xki) = E(ui) = 0: the error term is not correlated with any of the regressors (i.e., all the regressors are exogenous) MLR.1 - MLR.4 guarantee that: E(ˆ βj) = βj, ∀j = 0, 1, 2, . . . , k. ˆ βj is unbiased for βj. Proof needs linear algebra - not required.

3 / 18

slide-6
SLIDE 6

Assumptions for Unbiasedness

◮ Assumption MLR.4: Zero conditional mean, i.e.,

E(ui|x1i, . . . , xki) = E(ui) = 0: the error term is not correlated with any of the regressors (i.e., all the regressors are exogenous) MLR.1 - MLR.4 guarantee that: E(ˆ βj) = βj, ∀j = 0, 1, 2, . . . , k. ˆ βj is unbiased for βj. Proof needs linear algebra - not required.

3 / 18

slide-7
SLIDE 7

Including irrelevant variables

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model

y = ˜ β0 + ˜ β1x1 + ˜ β2x2 + ˜ β3x3 + ˜ u

◮ This means that we included an irrelevant variable, x3. ◮ But if the estimated model still satisfies MLR.1 - MLR.4, then

E(˜ β0) = β0, E(˜ β1) = β1, E(˜ β2) = β2, E(˜ β3) = β3 = 0. (although the variances of the parameters in the estimated model would be larger than those in the true model)

4 / 18

slide-8
SLIDE 8

Including irrelevant variables

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model

y = ˜ β0 + ˜ β1x1 + ˜ β2x2 + ˜ β3x3 + ˜ u

◮ This means that we included an irrelevant variable, x3. ◮ But if the estimated model still satisfies MLR.1 - MLR.4, then

E(˜ β0) = β0, E(˜ β1) = β1, E(˜ β2) = β2, E(˜ β3) = β3 = 0. (although the variances of the parameters in the estimated model would be larger than those in the true model)

4 / 18

slide-9
SLIDE 9

Excluding relevant variables: Omitted Variable Bias

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model y = ˜

β0 + ˜ β1x1 + ˜ u

◮ This means that we omitted a possibly relevant variable, x2. ◮ So the estimated model is actually a simple regression model,

and ˜ β1 =

  • i(x1i−¯

x1)yi

  • i(x1i−¯

x1)2 ◮ We want to link ˜

β1 (in the estimated model) with ˆ β1 (in the true model)

5 / 18

slide-10
SLIDE 10

Excluding relevant variables: Omitted Variable Bias

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model y = ˜

β0 + ˜ β1x1 + ˜ u

◮ This means that we omitted a possibly relevant variable, x2. ◮ So the estimated model is actually a simple regression model,

and ˜ β1 =

  • i(x1i−¯

x1)yi

  • i(x1i−¯

x1)2 ◮ We want to link ˜

β1 (in the estimated model) with ˆ β1 (in the true model)

5 / 18

slide-11
SLIDE 11

Excluding relevant variables: Omitted Variable Bias

So plug in the true model, yi = ˆ β0 + ˆ β1x1i + ˆ β2x2i + ˆ ui, into ˜ β1: ˜ β1 =

  • i(x1i − ¯

x1)(ˆ β0 + ˆ β1x1i + ˆ β2x2i + ˆ ui)

  • i(x1i − ¯

x1)2 = ˆ β1 + ˆ β2˜ δ1, where ˜ δ1 is the coefficient from regressing x2 on x1.

6 / 18

slide-12
SLIDE 12

Excluding relevant variables: Omitted Variable Bias

Take the conditional expectation for both sides of ˜ β1 = ˆ β1 + ˆ β2˜ δ1, E(˜ β1|x1, x2) = E(ˆ β1 + ˆ β2˜ δ1|x1, x2) = E(ˆ β1|x1, x2) + ˜ δ1E(ˆ β2|x1, x2) = β1 + ˜ δ1β2

7 / 18

slide-13
SLIDE 13

◮ Define the bias of ˜

β1 as bias(˜ β1) = E(˜ β1|x1, x2) − β1 = ˜ δ1β2

◮ So we can say that bias(˜

β1) = 0 if and only if (1) ˜ δ1 = 0: x1 and x2 are uncorrelated, or (2) β2 = 0: x2 is irrelevant

◮ The bias is positive if and only if

(1) ˜ δ1 > 0 and β2 > 0, or (2) ˜ δ1 < 0 and β2 < 0

◮ The bias is negative if and only if

(1) ˜ δ1 > 0 and β2 < 0, or (2) ˜ δ1 < 0 and β2 > 0

8 / 18

slide-14
SLIDE 14

◮ Define the bias of ˜

β1 as bias(˜ β1) = E(˜ β1|x1, x2) − β1 = ˜ δ1β2

◮ So we can say that bias(˜

β1) = 0 if and only if (1) ˜ δ1 = 0: x1 and x2 are uncorrelated, or (2) β2 = 0: x2 is irrelevant

◮ The bias is positive if and only if

(1) ˜ δ1 > 0 and β2 > 0, or (2) ˜ δ1 < 0 and β2 < 0

◮ The bias is negative if and only if

(1) ˜ δ1 > 0 and β2 < 0, or (2) ˜ δ1 < 0 and β2 > 0

8 / 18

slide-15
SLIDE 15

◮ Define the bias of ˜

β1 as bias(˜ β1) = E(˜ β1|x1, x2) − β1 = ˜ δ1β2

◮ So we can say that bias(˜

β1) = 0 if and only if (1) ˜ δ1 = 0: x1 and x2 are uncorrelated, or (2) β2 = 0: x2 is irrelevant

◮ The bias is positive if and only if

(1) ˜ δ1 > 0 and β2 > 0, or (2) ˜ δ1 < 0 and β2 < 0

◮ The bias is negative if and only if

(1) ˜ δ1 > 0 and β2 < 0, or (2) ˜ δ1 < 0 and β2 > 0

8 / 18

slide-16
SLIDE 16

◮ Define the bias of ˜

β1 as bias(˜ β1) = E(˜ β1|x1, x2) − β1 = ˜ δ1β2

◮ So we can say that bias(˜

β1) = 0 if and only if (1) ˜ δ1 = 0: x1 and x2 are uncorrelated, or (2) β2 = 0: x2 is irrelevant

◮ The bias is positive if and only if

(1) ˜ δ1 > 0 and β2 > 0, or (2) ˜ δ1 < 0 and β2 < 0

◮ The bias is negative if and only if

(1) ˜ δ1 > 0 and β2 < 0, or (2) ˜ δ1 < 0 and β2 > 0

8 / 18

slide-17
SLIDE 17

Variance of OLS

◮ Now we know that the sampling distribution of our estimate is

centered around the true parameter (unbiasedness)

◮ Want to think about how spread out this distribution is ◮ Much easier to think about this variance under an additional

assumption, so Assumption MLR.5: Homoskedasticity, i.e., Var(u|x1, x2, . . . , xk) = σ2, or Var(y|x1, x2, . . . , xk) = σ2 The conditional variance is a constant (not a function of x’s).

9 / 18

slide-18
SLIDE 18

Variance of OLS

◮ Now we know that the sampling distribution of our estimate is

centered around the true parameter (unbiasedness)

◮ Want to think about how spread out this distribution is ◮ Much easier to think about this variance under an additional

assumption, so Assumption MLR.5: Homoskedasticity, i.e., Var(u|x1, x2, . . . , xk) = σ2, or Var(y|x1, x2, . . . , xk) = σ2 The conditional variance is a constant (not a function of x’s).

9 / 18

slide-19
SLIDE 19

Variance of OLS

◮ Now we know that the sampling distribution of our estimate is

centered around the true parameter (unbiasedness)

◮ Want to think about how spread out this distribution is ◮ Much easier to think about this variance under an additional

assumption, so Assumption MLR.5: Homoskedasticity, i.e., Var(u|x1, x2, . . . , xk) = σ2, or Var(y|x1, x2, . . . , xk) = σ2 The conditional variance is a constant (not a function of x’s).

9 / 18

slide-20
SLIDE 20

Variance of OLS

◮ The MLR.1-MLR.4 assumptions for unbiasedness, plus the

homoskedasticity assumption, MLR.5, are known as the Gauss-Markov assumptions.

◮ Given the Gauss-Markov assumptions,

Var(ˆ βj) =

σ2 SSTj(1−R2

j ), ∀j = 1, . . . , k

where SSTj =

i(xji − ¯

xj)2: total sum of squares for xj R2

j : R2 from regressing xj on all other regressors, including

intercept

10 / 18

slide-21
SLIDE 21

Variance of OLS

◮ The MLR.1-MLR.4 assumptions for unbiasedness, plus the

homoskedasticity assumption, MLR.5, are known as the Gauss-Markov assumptions.

◮ Given the Gauss-Markov assumptions,

Var(ˆ βj) =

σ2 SSTj(1−R2

j ), ∀j = 1, . . . , k

where SSTj =

i(xji − ¯

xj)2: total sum of squares for xj R2

j : R2 from regressing xj on all other regressors, including

intercept

10 / 18

slide-22
SLIDE 22

Variance of OLS

In the simple regression model, SSTj = SSTx (because there is only one regressor, x) R2

j = 0 (regressing x on an intercept yields SSEx = 0)

Var(ˆ βj) = σ2 SSTx , j = 1

11 / 18

slide-23
SLIDE 23

Variance of OLS

◮ What happens to Var(ˆ

βj) when σ2, or SSTj, or R2

j increases?

(1) a larger σ2 implies a larger variance for the OLS estimators (2) a larger SSTj implies a smaller variance for the estimators (3) a larger R2

j implies a larger variance for the estimators ◮ Var(ˆ

βj) → ∞ as R2

j → 1, i.e., as xj is close to perfectly

collinear with other regressors (multicollinearity problem)

12 / 18

slide-24
SLIDE 24

Variance of OLS

◮ What happens to Var(ˆ

βj) when σ2, or SSTj, or R2

j increases?

(1) a larger σ2 implies a larger variance for the OLS estimators (2) a larger SSTj implies a smaller variance for the estimators (3) a larger R2

j implies a larger variance for the estimators ◮ Var(ˆ

βj) → ∞ as R2

j → 1, i.e., as xj is close to perfectly

collinear with other regressors (multicollinearity problem)

12 / 18

slide-25
SLIDE 25

Variance of OLS

◮ What happens to Var(ˆ

βj) when σ2, or SSTj, or R2

j increases?

(1) a larger σ2 implies a larger variance for the OLS estimators (2) a larger SSTj implies a smaller variance for the estimators (3) a larger R2

j implies a larger variance for the estimators ◮ Var(ˆ

βj) → ∞ as R2

j → 1, i.e., as xj is close to perfectly

collinear with other regressors (multicollinearity problem)

12 / 18

slide-26
SLIDE 26

Variance of OLS

◮ What happens to Var(ˆ

βj) when σ2, or SSTj, or R2

j increases?

(1) a larger σ2 implies a larger variance for the OLS estimators (2) a larger SSTj implies a smaller variance for the estimators (3) a larger R2

j implies a larger variance for the estimators ◮ Var(ˆ

βj) → ∞ as R2

j → 1, i.e., as xj is close to perfectly

collinear with other regressors (multicollinearity problem)

12 / 18

slide-27
SLIDE 27

◮ We can detect the multicollinearity problem by looking at

“variance inflation factor”: VIFj =

1 1−R2

j ≥ 1, and re-write Var(ˆ

βj) as Var(ˆ βj) =

σ2 SSTj · VIFj ◮ So we want x’s to be as less correlated with each other as

possible, however, not perfectly uncorrelated for x’s to be relevant variables.

13 / 18

slide-28
SLIDE 28

◮ We can detect the multicollinearity problem by looking at

“variance inflation factor”: VIFj =

1 1−R2

j ≥ 1, and re-write Var(ˆ

βj) as Var(ˆ βj) =

σ2 SSTj · VIFj ◮ So we want x’s to be as less correlated with each other as

possible, however, not perfectly uncorrelated for x’s to be relevant variables.

13 / 18

slide-29
SLIDE 29

Misspecified Models

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model y = ˜

β0 + ˜ β1x1 + ˜ u (this is a misspecified model)

◮ We know that if we exclude a relevant variable, x2 in this

case, then E(˜ β1) = β1, i.e., ˜ β1 is biased for β1.

◮ But what about the variance of ˜

β1? Is it larger or smaller than that of ˆ β1 in the true model?

14 / 18

slide-30
SLIDE 30

Misspecified Models

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model y = ˜

β0 + ˜ β1x1 + ˜ u (this is a misspecified model)

◮ We know that if we exclude a relevant variable, x2 in this

case, then E(˜ β1) = β1, i.e., ˜ β1 is biased for β1.

◮ But what about the variance of ˜

β1? Is it larger or smaller than that of ˆ β1 in the true model?

14 / 18

slide-31
SLIDE 31

Misspecified Models

◮ Suppose that the true model is y = ˆ

β0 + ˆ β1x1 + ˆ β2x2 + ˆ u

◮ However, we estimate the model y = ˜

β0 + ˜ β1x1 + ˜ u (this is a misspecified model)

◮ We know that if we exclude a relevant variable, x2 in this

case, then E(˜ β1) = β1, i.e., ˜ β1 is biased for β1.

◮ But what about the variance of ˜

β1? Is it larger or smaller than that of ˆ β1 in the true model?

14 / 18

slide-32
SLIDE 32

◮ Var(ˆ

β1) =

σ2 SST1(1−R2

1) =

σ2 SST1 · VIF1

where SST1 is total variation in x1 R2

1 is R2 from regressing x1 on x2 ◮ Var(˜

β1) =

σ2 SST1 ≤ Var(ˆ

β1) because VIF1 ≥ 1

◮ So variance is smaller in the misspecified model! ◮ However, as n → ∞, SST1 → ∞, then Var(ˆ

β1) and Var(˜ β1) → 0, meaning that the difference between the two variances is negligible.

15 / 18

slide-33
SLIDE 33

◮ Var(ˆ

β1) =

σ2 SST1(1−R2

1) =

σ2 SST1 · VIF1

where SST1 is total variation in x1 R2

1 is R2 from regressing x1 on x2 ◮ Var(˜

β1) =

σ2 SST1 ≤ Var(ˆ

β1) because VIF1 ≥ 1

◮ So variance is smaller in the misspecified model! ◮ However, as n → ∞, SST1 → ∞, then Var(ˆ

β1) and Var(˜ β1) → 0, meaning that the difference between the two variances is negligible.

15 / 18

slide-34
SLIDE 34

◮ Var(ˆ

β1) =

σ2 SST1(1−R2

1) =

σ2 SST1 · VIF1

where SST1 is total variation in x1 R2

1 is R2 from regressing x1 on x2 ◮ Var(˜

β1) =

σ2 SST1 ≤ Var(ˆ

β1) because VIF1 ≥ 1

◮ So variance is smaller in the misspecified model! ◮ However, as n → ∞, SST1 → ∞, then Var(ˆ

β1) and Var(˜ β1) → 0, meaning that the difference between the two variances is negligible.

15 / 18

slide-35
SLIDE 35

◮ Var(ˆ

β1) =

σ2 SST1(1−R2

1) =

σ2 SST1 · VIF1

where SST1 is total variation in x1 R2

1 is R2 from regressing x1 on x2 ◮ Var(˜

β1) =

σ2 SST1 ≤ Var(ˆ

β1) because VIF1 ≥ 1

◮ So variance is smaller in the misspecified model! ◮ However, as n → ∞, SST1 → ∞, then Var(ˆ

β1) and Var(˜ β1) → 0, meaning that the difference between the two variances is negligible.

15 / 18

slide-36
SLIDE 36

σ2 is unknown

We can use the residuals to form an estimate of the error variance: ˆ σ2 =

  • i ˆ

u2

i

n−k−1 = SSR n−k−1

n − k − 1: degrees of freedom (DF) for SSR. (ˆ ui is restricted by (k + 1) moment restrictions.)

  • Var(ˆ

βj) =

ˆ σ2 SSTj(1−R2

j )

se(ˆ βj) =

  • Var(ˆ

βj) ˆ σ = √ ˆ σ2 is the standard error of the regression E(ˆ σ2) = σ2: ˆ σ2 is unbiased for σ2.

16 / 18

slide-37
SLIDE 37

σ2 is unknown

We can use the residuals to form an estimate of the error variance: ˆ σ2 =

  • i ˆ

u2

i

n−k−1 = SSR n−k−1

n − k − 1: degrees of freedom (DF) for SSR. (ˆ ui is restricted by (k + 1) moment restrictions.)

  • Var(ˆ

βj) =

ˆ σ2 SSTj(1−R2

j )

se(ˆ βj) =

  • Var(ˆ

βj) ˆ σ = √ ˆ σ2 is the standard error of the regression E(ˆ σ2) = σ2: ˆ σ2 is unbiased for σ2.

16 / 18

slide-38
SLIDE 38

The Gauss-Markov Theorem

Given our five Gauss-Markov Assumptions, it can be shown that OLS is “BLUE” Best Linear Unbiased Estimator

17 / 18

slide-39
SLIDE 39

Reading

Chapter 3, Introductory Econometrics - A Modern Approach, 4th Edition, J. Wooldridge

18 / 18