Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING - - PowerPoint PPT Presentation

lecture 1 economic data and simple linear regression
SMART_READER_LITE
LIVE PREVIEW

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING - - PowerPoint PPT Presentation

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University February 25, 2020 C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25,


slide-1
SLIDE 1

Lecture 1 Economic Data and Simple Linear Regression

CHUNG-MING KUAN

Department of Finance & CRETA National Taiwan University

February 25, 2020

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 1 / 43

slide-2
SLIDE 2

Lecture Outline

1

Economic Data Taiwan’s Macroeconomic Data Taiwan’s Microeconomic Data Why Econometrics?

2

Simple Linear Regression Least-Squares Estimation Algebraic Properties of LS Estimation Statistical Properties of LS Estimation

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 2 / 43

slide-3
SLIDE 3

Introduction

Economic data are records of economic activities. Such data are usually compiled by government agencies (e.g. GDP and unemployment rates), collected from controlled experiments or surveys (e.g. Survey of Family Income and Expenditure), or recorded by some electronic systems (e.g. stock market transaction data). Internet activities, such as visiting a website, posting on a social media, shopping or booking on line, and clicking through an on-line ad, also produce a large amount of data unintentionally. Such data are also known as “digital footprints”. For some analysis, artificial data may be generated (simulated) computationally using certain algorithms or randomly re-arranged by some re-sampling methods.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 3 / 43

slide-4
SLIDE 4

Economic data may be time series if they are recorded over a period

  • f time, cross section data if they are recorded across different units

(agents, households, firms, industries, or countries) at a particular time point, or panel data if they are recorded across different units

  • ver a period of time.

Econometrics offers various statistical, mathematical and computational methods that can be used to establish (or analyze) economic relations based on economic data. Econometric analysis typically relies on numeric data; text documents are typically converted to numeric data (using text mining techniques) before they can be analyzed by econometric methods.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 4 / 43

slide-5
SLIDE 5

GDP and Unemployment Rates

Taiwan’s GDP data are collected and compiled by the DGBAS (Directorate General of Budget, Accounting and Statistics).

Annual data since 1951 Quarterly data since 1961 Seasonally adjusted, quarterly data since 1982

Since Nov. 2014, all national income statistics have been calculated in accordance with the guideline of United Nations (2008SNA). In particular, the GDP growth rates are now computed using the chain-linked method. Taiwan’s unemployment data are also collected by the DGBAS.

Monthly data since 1978 Seasonally adjusted, monthly data available from 2011

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 5 / 43

slide-6
SLIDE 6

Taiwan’s Annual GDP: 1951–2019

  • 4
  • 2

2 4 6 8 10 12 14 16 5,000 10,000 15,000 20,000 25,000 1951 1953 1955 1957 1959 1961 1963 1965 1967 1969 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019

GDP (Billion NTD) GDP (Growth Rates, %)

GDP and its growth rates (average growth rate 7.38%)

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 6 / 43

slide-7
SLIDE 7

Summary Statistics of Annual GDP Growth Rates

Period Avg S.d. Max Min 52–19 7.38 3.66 14.28 −1.61 52–59 8.67 1.76 12.00 6.17 60–69 9.85 1.83 12.63 7.05 70–79 10.86 3.83 14.28 2.67 80–89 8.48 2.57 12.75 4.81 90–99 6.62 1.26 8.37 4.20 00–09 3.88 3.34 6.95 −1.61 10–19 3.51 2.51 10.25 1.47

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 7 / 43

slide-8
SLIDE 8

Taiwan’s GDP Annual Growth Rates: 1952–2019

  • 4
  • 2

2 4 6 8 10 12 14 16 1952 1954 1956 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

Growth Rates (%) Average (52-59, 60-69, 70-79, 80-89, 90-99, 00-09, 10-19) Max: 14.28 (1976) Min: -1.61 (2009)

GDP and its growth rates (with 10-year averages)

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 8 / 43

slide-9
SLIDE 9

Taiwan’s Annual Unemployment Rates: 1978–2019

1 2 3 4 5 6 7 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

(%)

Unemployment rates (average growth rate 3.07%)

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 9 / 43

slide-10
SLIDE 10

Summary Statistics of Annual Unemployment Rates

Period Avg S.d. Max Min 78–19 3.07 1.29 5.85 1.23 80–89 2.07 0.60 2.91 1.23 90–99 2.04 0.61 2.92 1.45 00–09 4.41 0.79 5.85 2.99 10–19 4.09 0.46 5.21 3.71

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 10 / 43

slide-11
SLIDE 11

Taiwan’s Annual Unemployment Rates: 1978–2019

1 2 3 4 5 6 7 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018

Unemployment Rates (%) Average (80-89, 90-99, 00-09, 10-19) Max: 5.85 (2009) Min: 1.23 (1980)

Unemployment rates (with 10-year averages)

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 11 / 43

slide-12
SLIDE 12

Taiwan’s Quarterly GDP: 1961Q1–2019Q4

  • 10
  • 5

5 10 15 20 1000 2000 3000 4000 5000 6000 1961Q1 1962Q4 1964Q3 1966Q2 1968Q1 1969Q4 1971Q3 1973Q2 1975Q1 1976Q4 1978Q3 1980Q2 1982Q1 1983Q4 1985Q3 1987Q2 1989Q1 1990Q4 1992Q3 1994Q2 1996Q1 1997Q4 1999Q3 2001Q2 2003Q1 2004Q4 2006Q3 2008Q2 2010Q1 2011Q4 2013Q3 2015Q2 2017Q1 2018Q4

GDP (Billion NTD Seasonally unadjusted) GDP Growth Rates (%, YoY)

Seasonally unadjusted GDP and its growth rates (YoY)

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 12 / 43

slide-13
SLIDE 13

Summary Statistics of Quarterly GDP Growth Rates (YoY)

Period Avg s.d. Max Min 62–19 7.24 4.38 17.26 −7.88 62–69 10.59 2.23 14.39 6.05 70–79 10.89 4.54 17.26 −2.81 80–89 8.49 2.86 14.25 3.57 90–99 6.63 1.53 10.53 3.16 00–09 3.92 4.57 10.88 −7.88 10–19 3.60 2.83 12.02 −0.28

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 13 / 43

slide-14
SLIDE 14

Taiwan’s Quarterly GDP: 1962Q1–2019Q4

  • 10
  • 5

5 10 15 20 1962Q1 1963Q3 1965Q1 1966Q3 1968Q1 1969Q3 1971Q1 1972Q3 1974Q1 1975Q3 1977Q1 1978Q3 1980Q1 1981Q3 1983Q1 1984Q3 1986Q1 1987Q3 1989Q1 1990Q3 1992Q1 1993Q3 1995Q1 1996Q3 1998Q1 1999Q3 2001Q1 2002Q3 2004Q1 2005Q3 2007Q1 2008Q3 2010Q1 2011Q3 2013Q1 2014Q3 2016Q1 2017Q3 2019Q1

Growth Rates (%, NSA, YoY) Average (62-69, 70-79, 80-89, 90-99, 00-09, 10-19) Max: 17.26 (1978Q3) Min: -7.88 (2009Q1)

GDP growth rates (YoY) with 10-year averages

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 14 / 43

slide-15
SLIDE 15

Manpower Utilization Survey

Manpower utilization survey is conducted with Manpower survey in every May by the DGBAS.

Individuals above 15 in every household were surveyed. Approximately 20,000 households, 60,000 individuals were surveyed each time.

In 2010, there were 11,561 males and 9,348 females surveyed. The questions in the survey include: work status, working hours, earning, education level, etc.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 15 / 43

slide-16
SLIDE 16

Summary Statistics of log(wage)

Avg S.d. Max Min Full Sample 5.13 0.44 7.46 2.82 Education ≤ 9 4.93 0.38 6.83 2.99 Years 10–12 5.00 0.38 6.87 2.82 ≥ 13 5.29 0.44 7.46 3.36 ≤ 5 4.98 0.36 6.54 3.18 Working 6–15 5.12 0.40 7.05 2.82 Experience 16–25 5.21 0.46 7.46 3.36 ≥ 26 5.15 0.49 6.99 2.99 Note: Wage is real hourly wage in NTD; the base year is 2000.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 16 / 43

slide-17
SLIDE 17

Male Wage vs. Education Level: 2010

  • 4

5 6 7 10 15 20

Y ears of Education log(Wage)

10 15 20

Y ears of Education

Partial sample of 500 observations

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 17 / 43

slide-18
SLIDE 18

Female Wage vs. Education Level: 2010

  • 4

5 6 10 15 20

Y ears of Education log(Wage)

10 15 20

Y ears of Education

Partial sample of 500 observations

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 18 / 43

slide-19
SLIDE 19

Male Wage vs. Working Experience: 2010

  • 4

5 6 7 10 20 30 40

Experience log(Wage)

10 20 30 40

Experience

Partial sample of 500 obs; experience = age − education years − 6

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 19 / 43

slide-20
SLIDE 20

Female Wage vs. Working Experience: 2010

  • 4

5 6 10 20 30 40 50

Experience log(Wage)

10 20 30 40

Experience

Partial sample of 500 obs; experience = age − education years − 6

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 20 / 43

slide-21
SLIDE 21

Why Econometrics

One may examine data based on their summary statistics (e.g. mean, median, s.d. etc.). Yet, these statistics cannot tell us how one variable is related to other variables. Economists are interested in knowing the relations between economic

  • variables. It is thus important to have an analysis of the variable of

interest, conditional on other economic variables. This is precisely what econometrics does. The purpose of econometrics is to estimate economic relations based

  • n some models, test economic theories and hypotheses, predict the

future behavior of economic variables, and/or evaluate the effects of government and business programs.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 21 / 43

slide-22
SLIDE 22

Linear Specification

For the variable of interest (dependent variable, regressand) y, its systematic behavior is characterized by a function of the explanatory variable (regressor) x. A simple and convenient choice is the linear function of x: β0 + β1x with unknown parameters β0 and β1. Thus, y = β0 + β1x

  • systematic part

+ u(β0, β1)

  • error

. where the error, u(β0, β1) = y − (β0 + β1x), is the non-systematic part of

  • y. Given the sample data (xi, yi), i = 1, . . . , n, we have

yi = β0 + β1xi + ui, i = 1, . . . , n, where ui = ui(β0, β1) is the i th error term.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 22 / 43

slide-23
SLIDE 23

Least-Squares Minimization

Estimating the unknown parameters β0 and β1 amounts to finding a line that “best” fits the data (xi, yi), i = 1, . . . , n. This can be done by minimizing the sum of squared errors; that is, we minimize the following least-squares (LS) criterion function with respect to β0 and β1: Qn(β0, β1) :=

n

  • i=1

u2

i = n

  • i=1

(yi − β0 − β1xi)2. The first order conditions (FOCs) of the LS minimization problem are: ∂Qn(β0, β1) ∂β0 = −2

n

  • i=1

(yi − β0 − β1xi) = 0, ∂Qn(β0, β1) ∂β1 = −2

n

  • i=1

(yi − β0 − β1xi)xi = 0.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 23 / 43

slide-24
SLIDE 24

Least-Squares Estimator

Solving the FOCs for β0 and β1 we obtain the ordinary least squares (OLS) estimators (verify!): ˆ β1 = n

i=1(yi − ¯

y)(xi − ¯ x) n

i=1(xi − ¯

x)2 , ˆ β0 = ¯ y − ˆ β1¯ x. where ¯ y = n

i=1 yi/n and ¯

x = n

i=1 xi/n.

Remark: The OLS method does not require any assumption, except that xi are not a constant. Note that when xi are a constant c, we have ¯ x = c, so that ˆ β1 cannot be computed because its denominator is 0.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 24 / 43

slide-25
SLIDE 25

Estimated regression line: ˆ y = ˆ β0 + ˆ β1x, with the i th fitted value ˆ yi = ˆ β0 + ˆ β1xi.

ˆ β1 = d ˆ y/ dx is the slope of the estimated regression line, which predicts how much y would change when x changes by one unit. ˆ β0 is the intercept which predicts the level of y when x = 0.

Residual: ˆ u = u(ˆ β0, ˆ β1) = y − ˆ y is the error evaluated at ˆ β0 and ˆ β1, with the i th residual ˆ ui = yi − ˆ yi, the difference between the true value yi and the fitted value ˆ yi.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 25 / 43

slide-26
SLIDE 26

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 26 / 43

slide-27
SLIDE 27

Special Cases

For the specification yi = β1xi + ui, the LS criterion function is: Qn(β1) = n

i=1(yi − β1xi)2 with the FOC: n i=1(yi − β1xi)xi = 0.

The resulting OLS estimator of β1 is ˆ β1 =

n

  • i=1

xiyi/

n

  • i=1

x2

i .

For the specification yi = β0 + ui, the LS criterion function is: Qn(β0) = n

i=1(yi − β0)2 with the FOC: n i=1(yi − β0) = 0. The

resulting OLS estimator of β0 is ˆ β0 = 1 n

n

  • i=1

yi = ¯ y. This shows that the sample average gives the “best” fit of yi when there is no other information.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 27 / 43

slide-28
SLIDE 28

Other Minimization Programs

Given the linear specification, LS minimization yields the “best” fit of data in the sense that the sum of squared errors is the smallest. This is not the only way to find the best fit of data, however. There are other minimization programs for data fitting. For example, minimizing the sum of absolute errors:

n

  • i=1

|yi − β0 − β1xi|, yields the least absolute deviation (LAD) estimators for β0 and β1. Different criterion functions lead to different fits of data and hence different regression lines.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 28 / 43

slide-29
SLIDE 29

A special case: For the specification yi = β0 + ui, the LAD criterion function is:

n

  • i=1

|yi − β0| =

  • i:yi>β0

(yi − β0) −

  • i:yi<β0

(yi − β0). The FOC is −

i:yi>β0 1 + i:yi<β0 1 = 0, so that the solution satisfies:

(the number of yi > β0) = (the number of yi < β0). That is, the LAD estimator of β0 is the sample median of yi. Thus, the LAD regression line in effect describes the median behavior of yi conditional on xi. This result also shows that meidan (0.5 quantile) may be computed via an optimization program.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 29 / 43

slide-30
SLIDE 30

Algebraic Properties

Plugging ˆ β0 and ˆ β1 into the FOC: n

i=1(yi − β0 − β1xi) = 0, we have: n

  • i=1

(yi − ˆ β0 − ˆ β1xi) =

n

  • i=1

ˆ ui = 0. That is, the positive and negative residuals must cancel out. Also,

n

  • i=1

(yi − ˆ β0 − ˆ β1xi)xi =

n

  • i=1

ˆ uixi = 0, which suggests that the sample covariance between xi and ˆ ui is zero. As n

i=1 ˆ

ui = 0, we can see: ¯ y = 1 n

n

  • i=1

yi = ˆ β0 + ˆ β1 1 n

n

  • i=1

xi

  • + 1

n

n

  • i=1

ˆ ui = ˆ β0 + ˆ β1¯ x, which shows the estimated regression line must pass through (¯ x, ¯ y).

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 30 / 43

slide-31
SLIDE 31

Goodness of Fit

It is easy to verify that

n

  • i=1

(yi − ¯ y)2

  • SST

=

n

  • i=1

(ˆ ui + ˆ yi − ¯ y)2 =

n

  • i=1

ˆ u2

i SSR

+

n

  • i=1

(ˆ yi − ¯ y)2

  • SSE

, where SST denotes total sum of squares, SSR residual sum of squares, and SSE explained sum of squares. The goodness of fit of the estimated regression line is measured by the coefficient of determination, which is defined as the proportion of the total variation (SST) of yi due to the variation of ˆ yi (SSE): R2 = SSE/SST = 1 − SSR/SST.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 31 / 43

slide-32
SLIDE 32

0 ≤ R2 ≤ 1

R2 is based on sum of squares and hence cannot be negative. As SSE ≤ SST, R2 cannot exceed one. R2 = 0 if SSE is zero. This happens when ˆ yi = ¯ y for all i; that is, the estimated regression line is the horizontal line ¯ y. R2 = 1 if SSR is zero. This happens when there is perfect fit of the regression line; that is, the estimated regression line passes through all data points.

R2 from different regressions are comparable only when these regressions are for the same dependent variable y but with different regressor x. The regression line with higher R2 is the one that fits data better.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 32 / 43

slide-33
SLIDE 33

Example: Simple Wage Regressions

Taiwan’s estimated wage models based on 2010 male data (11561 obs):

  • log(wage) = 4.5929 + 0.0494 educ,

R2 = 0.133

  • log(wage) = 5.1208 + 0.0059 exper,

R2 = 0.026 where educ denotes the years of education and exper is the years of working experience. Note that the slope coefficient is d log(wage) dx = 1 wage d(wage) dx , which is the predicted percentage change of wage when x changes by one

  • unit. Thus, for a male receiving one more year of education, the first

regression line predicts that his wage would increase by almost 5%.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 33 / 43

slide-34
SLIDE 34

Classical Assumption I

The random variables yi, i = 1, . . . , n, follow the population model: yi = b0 + b1xi + ui for some numbers b0 and b1, where (i) xi are non-random, (ii) E(yi) = b0 + b1xi, and (iii) var(yi) = σ2

  • , cov(yi, yj) = 0

for i = j. The assumption of non-random xi is convenient in deriving statistical properties of the OLS estimators. The assumption (ii) means the linear function, β0 + β1x, is the correct specification for the mean function. Also, it implies E(ui) = 0. The assumption (iii) requires yi are uncorrelated and have a constant variance (homoskedasticity); when yi have unequal variances, they are said to exhibit heteroskedasticity. Note that var(yi) = var(ui), so that σ2

  • is known as the error variance.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 34 / 43

slide-35
SLIDE 35

Example of Conditonally Homoskedastic Data

  • 10

20 30 40 0.0 2.5 5.0 7.5 10.0

x y

Simulated data: y with homogeneous variation across x

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 35 / 43

slide-36
SLIDE 36

Example of Conditonally Heteroskedastic Data

  • −20

20 40 0.0 2.5 5.0 7.5 10.0

x y

2.5 5.0 7.5

x

Simulated data: y with heterogeneous variation across x

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 36 / 43

slide-37
SLIDE 37

Example of Conditonally Heteroskedastic Data

  • −20

20 40 60 0.0 2.5 5.0 7.5 10.0

x y

2.5 5.0 7.5

x

Simulated data: y with heterogeneous variation across x

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 37 / 43

slide-38
SLIDE 38

Some Algebra

n

i=1(xi − ¯

x) = 0, so that ¯ x n

i=1(xi − ¯

x) = 0 and

n

  • i=1

(xi − ¯ x)2 =

n

  • i=1

xi(xi − ¯ x). Similarly, n

i=1(yi − ¯

y)(xi − ¯ x) = n

i=1 yi(xi − ¯

x). The OLS estimator ˆ β1 can then be written as ˆ β1 = n

i=1 yi(xi − ¯

x) n

i=1(xi − ¯

x)2 = n

i=1(b0 + b1xi + ui)(xi − ¯

x) n

i=1(xi − ¯

x)2 = b0 n

i=1(xi − ¯

x) n

i=1(xi − ¯

x)2 + b1 n

i=1 xi(xi − ¯

x) n

i=1(xi − ¯

x)2 + n

i=1 ui(xi − ¯

x) n

i=1(xi − ¯

x)2 = b1 + n

i=1 ui(xi − ¯

x) n

i=1(xi − ¯

x)2 .

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 38 / 43

slide-39
SLIDE 39

Unbiasedness of the OLS Estimators

Under Classical Assumption I(i) and (ii), the OLS estimator ˆ β0 and ˆ β1 are unbiased for, respectively, b0 and b1. To see this, note that E(ˆ β1) = b1 + E n

i=1 ui(xi − ¯

x) n

i=1(xi − ¯

x)2

  • = b1 +

n

i=1 E(ui)(xi − ¯

x) n

i=1(xi − ¯

x)2 = b1, where the 2nd equality holds because xi are non-random. As E(yi) = b0 + b1xi, we have E(¯ y) = b0 + b1¯ x and E(ˆ β0) = E(¯ y − ˆ β1¯ x) = b0 + b1¯ x − E(ˆ β1)¯ x = b0.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 39 / 43

slide-40
SLIDE 40

Variance of the OLS Estimators

Under Classical Assumption I(i), (ii) and (iii), var(ˆ β1) = σ2

  • 1

n

i=1(xi − ¯

x)2 , var(ˆ β0) = σ2

  • n

i=1 x2 i /n

n

i=1(xi − ¯

x)2 . It is easy to verify that var(ˆ β1) = var n

i=1 ui(xi − ¯

x) n

i=1(xi − ¯

x)2

  • =

n

i=1 var(ui)(xi − ¯

x)2 [n

i=1(xi − ¯

x)2]2 = σ2

  • n

i=1(xi − ¯

x)2 [n

i=1(xi − ¯

x)2]2 = σ2

  • 1

n

i=1(xi − ¯

x)2 . We omit the details of deriving var(ˆ β0). Remark: var(ˆ β1) would be smaller if xi are more disperse (about ¯ x). In this case, the estimated regression line is more stable and would not be affected much by a few xi.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 40 / 43

slide-41
SLIDE 41

The OLS estimator of σ2

  • is an average of squared residuals:

ˆ σ2 = 1 n − 2

n

  • i=1

ˆ u2

i ,

and ˆ σ is also known as regression standard error. Note that the sum of squared residuals is divided by n − 2, rather than n, because ˆ ui must satisfy two FOCs of LS estimation and hence lose two degrees of freedom. The result below shows that ˆ σ2 is also an unbiased estimator for σ2

  • ; the

proof is omitted.

Unbiasedness of ˆ σ2

Under Classical Assumption I(i), (ii) and (iii), E(ˆ σ2) = σ2

  • .

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 41 / 43

slide-42
SLIDE 42

Replacing σ2

  • in var(ˆ

β1) and var(ˆ β0) with ˆ σ2, we obtain the following variance estimators:

  • var(ˆ

β1) = ˆ σ2 1 n

i=1(xi − ¯

x)2 ,

  • var(ˆ

β0) = ˆ σ2 n

i=1 x2 i /n

n

i=1(xi − ¯

x)2 , which are also unbiased for, respectively, var(ˆ β1) and var(ˆ β1). Their square roots are the standard errors for ˆ β1 and ˆ β0.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 42 / 43

slide-43
SLIDE 43

Example: Simple Wage Regressions

The estimated wage model based on Taiwan’s 2010 male data (11561 obs): The dependent variable is log(wage), and the estimated parameters are: 4.5929 + 0.0494 educ, R2 = 0.133, (0.0156) (0.0012) ˆ σ = 0.3971 5.1208 + 0.0059 exper, R2 = 0.026, (0.0073) (0.0003) ˆ σ = 0.4208 where the numbers in the parentheses are the standard errors of ˆ β0 and ˆ β1. These results show that the parameter estimates are quite precise because they have very small standard errors.

C.-M. Kuan (Finance & CRETA, NTU) Lecture 1: Data & Simple Regression February 25, 2020 43 / 43