BS2247 Introduction to Econometrics Lecture 3: The simple regression - - PowerPoint PPT Presentation

bs2247 introduction to econometrics lecture 3 the simple
SMART_READER_LITE
LIVE PREVIEW

BS2247 Introduction to Econometrics Lecture 3: The simple regression - - PowerPoint PPT Presentation

BS2247 Introduction to Econometrics Lecture 3: The simple regression model OLS, Algebraic properties, Goodness-of-fit Dr. Kai Sun Aston Business School 1 / 28 The Simple Regression Model: y = 0 + 1 x + u In the simple linear regression of


slide-1
SLIDE 1

BS2247 Introduction to Econometrics Lecture 3: The simple regression model

OLS, Algebraic properties, Goodness-of-fit

  • Dr. Kai Sun

Aston Business School

1 / 28

slide-2
SLIDE 2

The Simple Regression Model: y = β0 + β1x + u

In the simple linear regression of y on x, we typically refer to y as the

◮ Dependent Variable, or ◮ Left-Hand Side Variable, or ◮ Explained Variable, or ◮ Regressand

2 / 28

slide-3
SLIDE 3

The Simple Regression Model: y = β0 + β1x + u

We typically refer to x as the

◮ Independent Variable, or ◮ Right-Hand Side Variable, or ◮ Explanatory Variable, or ◮ Regressor, or ◮ Covariate, or ◮ Control Variables

3 / 28

slide-4
SLIDE 4

The Simple Regression Model: y = β0 + β1x + u

◮ β0: intercept parameter, or constant term ◮ β1: slope parameter ◮ u: error term, or disturbance, or unobservable

(there would be no Econometrics without it!)

4 / 28

slide-5
SLIDE 5

A Simple Assumption

The expected value of u, the error term, in the population is 0. E(u) = 0 “There is no error on average.” But this assumption on its own is trivial.

5 / 28

slide-6
SLIDE 6

A Crucial Assumption: Zero Conditional Mean

◮ Assume how u and x are related

E(u|x) = E(u): x and u are completely uncorrelated.

◮ Intuitively, this means that knowing something about x does

not give us any information about u.

◮ It is crucial to assume that E(u|x) = 0, and therefore,

E(u) = 0.

◮ E(u|x) = 0 is the zero conditional mean assumption.

6 / 28

slide-7
SLIDE 7

A Crucial Assumption: Zero Conditional Mean

◮ Assume how u and x are related

E(u|x) = E(u): x and u are completely uncorrelated.

◮ Intuitively, this means that knowing something about x does

not give us any information about u.

◮ It is crucial to assume that E(u|x) = 0, and therefore,

E(u) = 0.

◮ E(u|x) = 0 is the zero conditional mean assumption.

6 / 28

slide-8
SLIDE 8

A Crucial Assumption: Zero Conditional Mean

◮ Assume how u and x are related

E(u|x) = E(u): x and u are completely uncorrelated.

◮ Intuitively, this means that knowing something about x does

not give us any information about u.

◮ It is crucial to assume that E(u|x) = 0, and therefore,

E(u) = 0.

◮ E(u|x) = 0 is the zero conditional mean assumption.

6 / 28

slide-9
SLIDE 9

A Crucial Assumption: Zero Conditional Mean

◮ Assume how u and x are related

E(u|x) = E(u): x and u are completely uncorrelated.

◮ Intuitively, this means that knowing something about x does

not give us any information about u.

◮ It is crucial to assume that E(u|x) = 0, and therefore,

E(u) = 0.

◮ E(u|x) = 0 is the zero conditional mean assumption.

6 / 28

slide-10
SLIDE 10

◮ E(u|x) = 0 implies that E(y|x) = β0 + β1x

Proof: y = β0 + β1x + u, u = y − β0 − β1x E(u|x) = E(y − β0 − β1x|x) = E(y|x) − β0 − β1x = 0 E(y|x) = β0 + β1x. QED.

◮ E(y|x) is called population regression function. ◮ E(y|x) is the expected value of y given a particular value of x.

7 / 28

slide-11
SLIDE 11

◮ E(u|x) = 0 implies that E(y|x) = β0 + β1x

Proof: y = β0 + β1x + u, u = y − β0 − β1x E(u|x) = E(y − β0 − β1x|x) = E(y|x) − β0 − β1x = 0 E(y|x) = β0 + β1x. QED.

◮ E(y|x) is called population regression function. ◮ E(y|x) is the expected value of y given a particular value of x.

7 / 28

slide-12
SLIDE 12

◮ E(u|x) = 0 implies that E(y|x) = β0 + β1x

Proof: y = β0 + β1x + u, u = y − β0 − β1x E(u|x) = E(y − β0 − β1x|x) = E(y|x) − β0 − β1x = 0 E(y|x) = β0 + β1x. QED.

◮ E(y|x) is called population regression function. ◮ E(y|x) is the expected value of y given a particular value of x.

7 / 28

slide-13
SLIDE 13

◮ E(u|x) = 0 implies that E(y|x) = β0 + β1x

Proof: y = β0 + β1x + u, u = y − β0 − β1x E(u|x) = E(y − β0 − β1x|x) = E(y|x) − β0 − β1x = 0 E(y|x) = β0 + β1x. QED.

◮ E(y|x) is called population regression function. ◮ E(y|x) is the expected value of y given a particular value of x.

7 / 28

slide-14
SLIDE 14

Ordinary Least Squares (OLS)

◮ Basic idea of regression is to estimate the unknown population

parameters, β0 and β1, from a sample.

◮ Let {(xi, yi) : i = 1, . . . , n} denote a random sample of size n

from the population.

◮ For each observation in this sample, it will be the case that

yi = β0 + β1xi + ui

8 / 28

slide-15
SLIDE 15

Ordinary Least Squares (OLS)

◮ Basic idea of regression is to estimate the unknown population

parameters, β0 and β1, from a sample.

◮ Let {(xi, yi) : i = 1, . . . , n} denote a random sample of size n

from the population.

◮ For each observation in this sample, it will be the case that

yi = β0 + β1xi + ui

8 / 28

slide-16
SLIDE 16

Ordinary Least Squares (OLS)

◮ Basic idea of regression is to estimate the unknown population

parameters, β0 and β1, from a sample.

◮ Let {(xi, yi) : i = 1, . . . , n} denote a random sample of size n

from the population.

◮ For each observation in this sample, it will be the case that

yi = β0 + β1xi + ui

8 / 28

slide-17
SLIDE 17

Population regression line, sample data points and the associated error terms

y E(y|x) = + x

.

y4

}

{ u u4 y E(y|x) 0 + 1x

. .

y2 y3

}

{

u u2 u3

.

y1 x1 x2 x3 x4

} u1

x

9 / 28

slide-18
SLIDE 18

Deriving OLS Estimates

◮ Use the assumption E(u|x) = E(u) = 0.

x and u are uncorrelated = ⇒ Cov(u, x) = 0.

◮ Recall that E(ux) = E(u)E(x) + Cov(u, x).

Since E(u) = 0 and Cov(u, x) = 0, E(ux) = 0

10 / 28

slide-19
SLIDE 19

Deriving OLS Estimates

◮ Use the assumption E(u|x) = E(u) = 0.

x and u are uncorrelated = ⇒ Cov(u, x) = 0.

◮ Recall that E(ux) = E(u)E(x) + Cov(u, x).

Since E(u) = 0 and Cov(u, x) = 0, E(ux) = 0

10 / 28

slide-20
SLIDE 20

Deriving OLS Estimates

◮ Use the two “algebraically feasible” assumptions:

(1) E(u) = 0 (2) E(ux) = 0

◮ Since u = y − β0 − β1x, these two assumptions become:

(1) E(y − β0 − β1x) = 0 (2) E[(y − β0 − β1x)x] = 0

◮ These are called population moment restrictions.

11 / 28

slide-21
SLIDE 21

Deriving OLS Estimates

◮ Use the two “algebraically feasible” assumptions:

(1) E(u) = 0 (2) E(ux) = 0

◮ Since u = y − β0 − β1x, these two assumptions become:

(1) E(y − β0 − β1x) = 0 (2) E[(y − β0 − β1x)x] = 0

◮ These are called population moment restrictions.

11 / 28

slide-22
SLIDE 22

Deriving OLS Estimates

◮ Use the two “algebraically feasible” assumptions:

(1) E(u) = 0 (2) E(ux) = 0

◮ Since u = y − β0 − β1x, these two assumptions become:

(1) E(y − β0 − β1x) = 0 (2) E[(y − β0 − β1x)x] = 0

◮ These are called population moment restrictions.

11 / 28

slide-23
SLIDE 23

Deriving OLS Estimates using M.O.M.

Use population moment restrictions to solve for population parameters. (1) E(yi − β0 − β1xi) = 0 (2) E[(yi − β0 − β1xi)xi] = 0 (These moment restrictions hold for each observation in the population.)

12 / 28

slide-24
SLIDE 24

Deriving OLS Estimates using M.O.M.

This left us with two unknowns (i.e., β0 and β1) and two

  • equations. To solve for these two equations,

we can rewrite the first condition as follows E(yi) = β0 + β1E(xi)

  • r

β0 = E(yi) − β1E(xi)

13 / 28

slide-25
SLIDE 25

Deriving OLS Estimates using M.O.M.

Use the second population moment restriction and β0 = E(yi) − β1E(xi), and we would have E[(yi − (E(yi) − β1E(xi)) − β1xi)xi] = 0 Solve for β1 gives β1 = E(xi − E(xi))(yi − E(yi)) E(xi − E(xi))2

14 / 28

slide-26
SLIDE 26

Deriving OLS Estimates using M.O.M.

To emphasize that β0 and β1 are estimated using a finite sample, which is mostly the case in empirical study, after replacing population moments by their sample counterpart, we put “hats”

  • ver the β’s, that is,

ˆ β0 = ¯ y − ˆ β1¯ x ˆ β1 =

  • i(xi − ¯

x)(yi − ¯ y)

  • i(xi − ¯

x)2

15 / 28

slide-27
SLIDE 27

Summary of OLS slope estimate

◮ The slope estimate is the sample covariance between x and y

divided by the sample variance of x

◮ If x and y are positively correlated, the slope will be positive ◮ If x and y are negatively correlated, the slope will be negative ◮ Only need x to vary in our sample

16 / 28

slide-28
SLIDE 28

Summary of OLS slope estimate

◮ The slope estimate is the sample covariance between x and y

divided by the sample variance of x

◮ If x and y are positively correlated, the slope will be positive ◮ If x and y are negatively correlated, the slope will be negative ◮ Only need x to vary in our sample

16 / 28

slide-29
SLIDE 29

Summary of OLS slope estimate

◮ The slope estimate is the sample covariance between x and y

divided by the sample variance of x

◮ If x and y are positively correlated, the slope will be positive ◮ If x and y are negatively correlated, the slope will be negative ◮ Only need x to vary in our sample

16 / 28

slide-30
SLIDE 30

Summary of OLS slope estimate

◮ The slope estimate is the sample covariance between x and y

divided by the sample variance of x

◮ If x and y are positively correlated, the slope will be positive ◮ If x and y are negatively correlated, the slope will be negative ◮ Only need x to vary in our sample

16 / 28

slide-31
SLIDE 31

Alternative approach to derivation

◮ OLS is fitting a line through the sample points such that the

sum of squared error is as small as possible, hence the term least squares.

◮ Given this intuitive idea of fitting a line, we can set up a

formal minimization problem as: minβ0,β1 E(u2

i ) = E(yi − β0 − β1xi)2 ◮ If one uses calculus to solve the minimization problem for the

two parameters, he can obtain the first order conditions exactly the same as the two population moment restrictions.

◮ This alternative derivation is not required for this course.

17 / 28

slide-32
SLIDE 32

Alternative approach to derivation

◮ OLS is fitting a line through the sample points such that the

sum of squared error is as small as possible, hence the term least squares.

◮ Given this intuitive idea of fitting a line, we can set up a

formal minimization problem as: minβ0,β1 E(u2

i ) = E(yi − β0 − β1xi)2 ◮ If one uses calculus to solve the minimization problem for the

two parameters, he can obtain the first order conditions exactly the same as the two population moment restrictions.

◮ This alternative derivation is not required for this course.

17 / 28

slide-33
SLIDE 33

Alternative approach to derivation

◮ OLS is fitting a line through the sample points such that the

sum of squared error is as small as possible, hence the term least squares.

◮ Given this intuitive idea of fitting a line, we can set up a

formal minimization problem as: minβ0,β1 E(u2

i ) = E(yi − β0 − β1xi)2 ◮ If one uses calculus to solve the minimization problem for the

two parameters, he can obtain the first order conditions exactly the same as the two population moment restrictions.

◮ This alternative derivation is not required for this course.

17 / 28

slide-34
SLIDE 34

Alternative approach to derivation

◮ OLS is fitting a line through the sample points such that the

sum of squared error is as small as possible, hence the term least squares.

◮ Given this intuitive idea of fitting a line, we can set up a

formal minimization problem as: minβ0,β1 E(u2

i ) = E(yi − β0 − β1xi)2 ◮ If one uses calculus to solve the minimization problem for the

two parameters, he can obtain the first order conditions exactly the same as the two population moment restrictions.

◮ This alternative derivation is not required for this course.

17 / 28

slide-35
SLIDE 35

Sample regression line, sample data points and the associated estimated error terms

From the estimates ˆ β0 and ˆ β1, we can construct the sample regression function: ˆ yi = ˆ β0 + ˆ β1xi

y

.

y4 }

{

û û4 y x y

1

ˆ ˆ ˆ ! "

. .

y2 y3 }

}

{ û û2 û3

.

y1 x1 x2 x3 x4

} û1

x

18 / 28

slide-36
SLIDE 36

A Comparison

Note that we can construct y using either the population or sample regression line, as y = E(y|x) + u = β0 + β1x + u

  • r

y = ˆ y + ˆ u = ˆ β0 + ˆ β1x + ˆ u ˆ u is called the residual. It is an estimate of the error term, u, and is the difference between y (i.e., actual y) and ˆ y (i.e., fitted y).

19 / 28

slide-37
SLIDE 37

Algebraic Properties of OLS

(1)

i ˆ

ui = 0: the sum (or average) of OLS residuals is zero similar to the first sample moment restriction can also use ˆ ui = yi − ˆ β0 − ˆ β1xi and plug in ˆ β0 and ˆ β1 to proof that

i ˆ

ui = 0 (2)

i xiˆ

ui = 0: the sample covariance between the regressor and the OLS residuals is zero similar to the second sample moment restriction can also use ˆ ui = yi − ˆ β0 − ˆ β1xi and plug in ˆ β0 and ˆ β1 to proof that

i xi ˆ

ui = 0

20 / 28

slide-38
SLIDE 38

Algebraic Properties of OLS

(1)

i ˆ

ui = 0: the sum (or average) of OLS residuals is zero similar to the first sample moment restriction can also use ˆ ui = yi − ˆ β0 − ˆ β1xi and plug in ˆ β0 and ˆ β1 to proof that

i ˆ

ui = 0 (2)

i xiˆ

ui = 0: the sample covariance between the regressor and the OLS residuals is zero similar to the second sample moment restriction can also use ˆ ui = yi − ˆ β0 − ˆ β1xi and plug in ˆ β0 and ˆ β1 to proof that

i xi ˆ

ui = 0

20 / 28

slide-39
SLIDE 39

Algebraic Properties of OLS

(3) The point (¯ x, ¯ y) lies on the sample regression line ˆ y = ˆ β0 + ˆ β1xi Proof: When xi = ¯ x, ˆ y = ˆ β0 + ˆ β1¯ x = ¯ y.

21 / 28

slide-40
SLIDE 40

◮ We can think of OLS as decomposing each yi into two parts,

a fitted value and a residual. That is: yi = ˆ yi + ˆ ui.

◮ Note that the sample covariance between ˆ

yi and ˆ ui is zero. (i.e, Cov(ˆ yi, ˆ ui) = 1/n

i ˆ

yiˆ ui = 0)

◮ So we want to disentangle the explained part, ˆ

yi, and the unexplained part, ˆ ui.

22 / 28

slide-41
SLIDE 41

◮ We can think of OLS as decomposing each yi into two parts,

a fitted value and a residual. That is: yi = ˆ yi + ˆ ui.

◮ Note that the sample covariance between ˆ

yi and ˆ ui is zero. (i.e, Cov(ˆ yi, ˆ ui) = 1/n

i ˆ

yiˆ ui = 0)

◮ So we want to disentangle the explained part, ˆ

yi, and the unexplained part, ˆ ui.

22 / 28

slide-42
SLIDE 42

◮ We can think of OLS as decomposing each yi into two parts,

a fitted value and a residual. That is: yi = ˆ yi + ˆ ui.

◮ Note that the sample covariance between ˆ

yi and ˆ ui is zero. (i.e, Cov(ˆ yi, ˆ ui) = 1/n

i ˆ

yiˆ ui = 0)

◮ So we want to disentangle the explained part, ˆ

yi, and the unexplained part, ˆ ui.

22 / 28

slide-43
SLIDE 43

Definition of variations

Define the following measures of variations (sum of squared deviation from the mean): (1) SST =

i(yi − ¯

y)2: total sum of squares (variation of yi) (2) SSE =

i(ˆ

yi − ¯ y)2: explained sum of squares (variation of ˆ

  • yi. In fact, ¯

ˆ y = ¯ y) (3) SSR =

i ˆ

u2

i :

residual sum of squares (variation of ˆ

  • ui. In fact, ¯

ˆ u = 0) It can be shown that SST = SSE + SSR.

23 / 28

slide-44
SLIDE 44

Definition of variations

Define the following measures of variations (sum of squared deviation from the mean): (1) SST =

i(yi − ¯

y)2: total sum of squares (variation of yi) (2) SSE =

i(ˆ

yi − ¯ y)2: explained sum of squares (variation of ˆ

  • yi. In fact, ¯

ˆ y = ¯ y) (3) SSR =

i ˆ

u2

i :

residual sum of squares (variation of ˆ

  • ui. In fact, ¯

ˆ u = 0) It can be shown that SST = SSE + SSR.

23 / 28

slide-45
SLIDE 45

Definition of variations

Define the following measures of variations (sum of squared deviation from the mean): (1) SST =

i(yi − ¯

y)2: total sum of squares (variation of yi) (2) SSE =

i(ˆ

yi − ¯ y)2: explained sum of squares (variation of ˆ

  • yi. In fact, ¯

ˆ y = ¯ y) (3) SSR =

i ˆ

u2

i :

residual sum of squares (variation of ˆ

  • ui. In fact, ¯

ˆ u = 0) It can be shown that SST = SSE + SSR.

23 / 28

slide-46
SLIDE 46

Definition of variations

Define the following measures of variations (sum of squared deviation from the mean): (1) SST =

i(yi − ¯

y)2: total sum of squares (variation of yi) (2) SSE =

i(ˆ

yi − ¯ y)2: explained sum of squares (variation of ˆ

  • yi. In fact, ¯

ˆ y = ¯ y) (3) SSR =

i ˆ

u2

i :

residual sum of squares (variation of ˆ

  • ui. In fact, ¯

ˆ u = 0) It can be shown that SST = SSE + SSR.

23 / 28

slide-47
SLIDE 47

SST =

  • i

(yi − ¯ y)2 =

  • i

(yi − ˆ yi + ˆ yi − ¯ y)2 =

  • i

(ˆ ui + (ˆ yi − ¯ y))2 =

  • i

ˆ u2

i + 2

  • i

ˆ ui(ˆ yi − ¯ y) +

  • i

(ˆ yi − ¯ y)2 = SSR + 2

  • i

ˆ ui(ˆ yi − ¯ y) + SSE Since

i ˆ

ui(ˆ yi − ¯ y) = 0, SST = SSR + SSE.

24 / 28

slide-48
SLIDE 48

Recall that SSE is the sample variation of ˆ yi. In fact, we can also understand it as the sample variation of xi. SSE =

  • i

(ˆ yi − ¯ y)2 =

  • i

((ˆ β0 + ˆ β1xi) − (ˆ β0 + ˆ β1¯ x))2 = ˆ β2

1

  • i

(xi − ¯ x)2

25 / 28

slide-49
SLIDE 49

Goodness-of-fit

◮ We’d like to know how well our sample regression line fits our

sample data

◮ Can compute the fraction of the total sum of squares (SST)

that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 − SSR/SST

◮ R2 × 100:

the percentage of the sample variation of y (i.e., SST) that can be explained by the sample variation of x (i.e., SSE).

26 / 28

slide-50
SLIDE 50

Goodness-of-fit

◮ We’d like to know how well our sample regression line fits our

sample data

◮ Can compute the fraction of the total sum of squares (SST)

that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 − SSR/SST

◮ R2 × 100:

the percentage of the sample variation of y (i.e., SST) that can be explained by the sample variation of x (i.e., SSE).

26 / 28

slide-51
SLIDE 51

Goodness-of-fit

◮ We’d like to know how well our sample regression line fits our

sample data

◮ Can compute the fraction of the total sum of squares (SST)

that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 − SSR/SST

◮ R2 × 100:

the percentage of the sample variation of y (i.e., SST) that can be explained by the sample variation of x (i.e., SSE).

26 / 28

slide-52
SLIDE 52

Using Stata for OLS regressions

◮ Now that we’ve derived the formula for calculating the OLS

estimates of our parameters, you’ll be happy to know you don’t have to compute them by hand

◮ Regressions in Stata are very simple, to run the regression of

y on x, just type reg y x and Stata will do the rest (i.e., calculate ˆ β0 and ˆ β1, etc.)

27 / 28

slide-53
SLIDE 53

Using Stata for OLS regressions

◮ Now that we’ve derived the formula for calculating the OLS

estimates of our parameters, you’ll be happy to know you don’t have to compute them by hand

◮ Regressions in Stata are very simple, to run the regression of

y on x, just type reg y x and Stata will do the rest (i.e., calculate ˆ β0 and ˆ β1, etc.)

27 / 28

slide-54
SLIDE 54

Reading

Chapter 2, Introductory Econometrics - A Modern Approach, 4th Edition, J. Wooldridge

28 / 28