Gov 2000: 9. Regression with Two Independent Variables Matthew - - PowerPoint PPT Presentation

gov 2000 9 regression with two independent variables
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 9. Regression with Two Independent Variables Matthew - - PowerPoint PPT Presentation

Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics with Two Covariates 5. OLS


slide-1
SLIDE 1

Gov 2000: 9. Regression with Two Independent Variables

Matthew Blackwell

Fall 2016

1 / 62

slide-2
SLIDE 2
  • 1. Why Add Variables to a Regression?
  • 2. Adding a Binary Covariate
  • 3. Adding a Continuous Covariate
  • 4. OLS Mechanics with Two Covariates
  • 5. OLS Assumptions with Two Covariates
  • 6. Omitted Variable Bias
  • 7. Goodness of Fit & Multicollinearity

2 / 62

slide-3
SLIDE 3

Where are we? Where are we going?

Last Week

Number of Covariates in Our Regressions

2 4 6 8 10

3 / 62

slide-4
SLIDE 4

Where are we? Where are we going?

Last Week This Week

Number of Covariates in Our Regressions

2 4 6 8 10

4 / 62

slide-5
SLIDE 5

Where are we? Where are we going?

Last Week This Week Next Week

Number of Covariates in Our Regressions

2 4 6 8 10

5 / 62

slide-6
SLIDE 6

1/ Why Add Variables to a Regression?

6 / 62

slide-7
SLIDE 7

7 / 62

slide-8
SLIDE 8

Berkeley gender bias

  • Graduate admissions data from Berkeley, 1973
  • Acceptance rates:

▶ Men: 8442 applicants, 44% admission rate ▶ Women: 4321 applicants, 35% admission rate

  • Evidence of discrimination toward women in admissions?
  • This is a marginal relationship.
  • What about the conditional relationship within departments?

8 / 62

slide-9
SLIDE 9

Berkeley gender bias, II

  • Within departments:

Men Women Dept Applied Admitted Applied Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 373 6% 341 7%

  • Within departments, women do somewhat better than men!
  • Women apply to more challenging departments.
  • Marginal relationships (admissions and gender) ≠ conditional

relationship given third variable (department).

9 / 62

slide-10
SLIDE 10

Simpson’s paradox

1 2 3 4

  • 3
  • 2
  • 1

1 X Y

Z = 0 Z = 1

  • Overall a positive relationship between 𝑍𝑗 and 𝑌𝑗.

10 / 62

slide-11
SLIDE 11

Simpson’s paradox

1 2 3 4

  • 3
  • 2
  • 1

1 X Y

Z = 0 Z = 1

  • Overall a positive relationship between 𝑍𝑗 and 𝑌𝑗.
  • But within levels of 𝑎𝑗, the opposite.

11 / 62

slide-12
SLIDE 12

Basic idea

  • Old goal: estimate the mean of 𝑍 as a function of some

independent variable, 𝑌: 𝔽[𝑍𝑗|𝑌𝑗].

  • For continuous 𝑌’s, we modeled the CEF/regression function

with a line: 𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝑣𝑗

  • New goal: estimate the relationship of two variables, 𝑍𝑗 and

𝑌𝑗, conditional on a third variable, 𝑎𝑗: 𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗

  • 𝛾’s are the population parameters we want to estimate.

12 / 62

slide-13
SLIDE 13

Why control for another variable

  • Descriptive

▶ Get a sense for the relationships in the data. ▶ Conditional on the number of steps I’ve taken, does higher

activity levels correlate with less weight?

  • Predictive

▶ We can usually make better predictions about the dependent

variable with more information on independent variables.

  • Causal

▶ Block potential confounding, which is when 𝑌 doesn’t cause 𝑍,

but only appears to because a third variable 𝑎 causally afgects both of them.

13 / 62

slide-14
SLIDE 14

Plan of atuack

  • 1. Interpretation with a binary 𝑎𝑗
  • 2. Interpretation with a continuous 𝑎𝑗
  • 3. Mechanics of OLS with 2 covariates
  • 4. OLS assumptions with 2 covariates:

▶ Omitted variable bias ▶ Multicollinearity 14 / 62

slide-15
SLIDE 15

What we won’t cover in lecture

  • 1. The OLS formulas for 2 covariates
  • 2. Proofs
  • 3. The second covariate being a function of the fjrst: 𝑎𝑗 = 𝑌2

𝑗

  • 4. Hypothesis test/confjdence intervals (almost exactly the same)

15 / 62

slide-16
SLIDE 16

2/ Adding a Binary Covariate

16 / 62

slide-17
SLIDE 17

Example

2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita

African countries Non-African countries

17 / 62

slide-18
SLIDE 18

Basics

  • Ye olde model:

𝔽[𝑍𝑗|𝑌𝑗] = 𝛽0 + 𝛽1𝑌𝑗

▶ (𝛽0, 𝛽1) are the bivariate intercept/slope, 𝑓𝑗 is the bivariate

error.

  • Concern: AJR might be picking up an “African efgect”:

▶ African countries might have low incomes and weak property

rights.

  • Condition on country being in Africa or not to remove this:

𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗

▶ 𝑎𝑗 = 1 to indicate that 𝑗 is an African country ▶ 𝑎𝑗 = 0 to indicate that 𝑗 is an non-African country ▶ Efgects are now within Africa or within non-Africa, not between 18 / 62

slide-19
SLIDE 19

AJR model

ajr.mod <- lm(logpgp95 ~ avexpr + africa, data = ajr) summary(ajr.mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.6556 0.3134 18.04 <2e-16 *** ## avexpr 0.4242 0.0397 10.68 <2e-16 *** ## africa

  • 0.8784

0.1471

  • 5.97

3e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.625 on 108 degrees of freedom ## (52 observations deleted due to missingness) ## Multiple R-squared: 0.708, Adjusted R-squared: 0.702 ## F-statistic: 131 on 2 and 108 DF, p-value: <2e-16

19 / 62

slide-20
SLIDE 20

Two lines, one regression

  • How can we interpret this model?
  • Plug in two possible values for 𝑎𝑗 and rearrange
  • When 𝑎𝑗 = 0:

̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2 × 0 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗

  • When 𝑎𝑗 = 1:

̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2 × 1 = ( ̂ 𝛾0 + ̂ 𝛾2) + ̂ 𝛾1𝑌𝑗

  • Two difgerent intercepts, same slope

20 / 62

slide-21
SLIDE 21

Interpretation of the coefficients

Intercept for 𝑌𝑗 Slope for 𝑌𝑗 Non-African country (𝑎𝑗 = 0) ̂ 𝛾0 ̂ 𝛾1 African country (𝑎𝑗 = 1) ̂ 𝛾0 + ̂ 𝛾2 ̂ 𝛾1

  • In this example, we have:

̂ 𝑍𝑗 = 5.656 + 0.424 × 𝑌𝑗 − 0.878 × 𝑎𝑗

  • ̂

𝛾0: average log income for non-African country (𝑎𝑗 = 0) with property rights measured at 0 is 5.656

  • ̂

𝛾1: A one-unit increase in property rights is associated with a 0.424 increase in average log incomes for two African countries (or for two non-African countries)

  • ̂

𝛾2: there is a − 0.878 average difgerence in log income per capita between African and non-African counties conditional

  • n property rights

21 / 62

slide-22
SLIDE 22

General interpretation of the coefficients

̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗

  • ̂

𝛾0: average value of 𝑍𝑗 when both 𝑌𝑗 and 𝑎𝑗 are equal to 0

  • ̂

𝛾1: A 1-unit increase in 𝑌𝑗 is associated with a ̂ 𝛾1-unit change in 𝑍𝑗 for units with the same value of 𝑎𝑗

  • ̂

𝛾2: average difgerence in 𝑍𝑗 between 𝑎𝑗 = 1 group and 𝑎𝑗 = 0 group for units with the same value of 𝑌𝑗

22 / 62

slide-23
SLIDE 23

Adding a binary variable, visually

2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita

β0 β0 = 5.656 β1 = 0.424

23 / 62

slide-24
SLIDE 24

Adding a binary variable, visually

2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita

β0 β0 + β2 β2 β0 = 5.656 β1 = 0.424 β2 = -0.878

24 / 62

slide-25
SLIDE 25

Marginal vs conditional

2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita

25 / 62

slide-26
SLIDE 26

3/ Adding a Continuous Covariate

26 / 62

slide-27
SLIDE 27

Adding a continuous variable

  • Ye olde model:

𝔽[𝑍𝑗|𝑌𝑗] = 𝛽0 + 𝛽1𝑌𝑗

  • New concern: geography is confounding the efgect

▶ geography might afgect political institutions ▶ geography might afgect average incomes (through diseases like

malaria)

  • Condition on 𝑎𝑗: mean temperature in country 𝑗 (continuous)

𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗

27 / 62

slide-28
SLIDE 28

AJR model, revisited

ajr.mod2 <- lm(logpgp95 ~ avexpr + meantemp, data = ajr) summary(ajr.mod2)

## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.8063 0.7518 9.05 1.3e-12 *** ## avexpr 0.4057 0.0640 6.34 3.9e-08 *** ## meantemp

  • 0.0602

0.0194

  • 3.11

0.003 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.643 on 57 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.615, Adjusted R-squared: 0.602 ## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e-12

28 / 62

slide-29
SLIDE 29

Interpretation with a continuous Z

Intercept for 𝑌𝑗 Slope for 𝑌𝑗 𝑎𝑗 = 0 ∘C ̂ 𝛾0 ̂ 𝛾1 𝑎𝑗 = 21 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 21 ̂ 𝛾1 𝑎𝑗 = 24 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 24 ̂ 𝛾1 𝑎𝑗 = 26 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 26 ̂ 𝛾1

  • In this example we have:

̂ 𝑍𝑗 = 6.806 + 0.406 × 𝑌𝑗 − 0.06 × 𝑎𝑗

  • ̂

𝛾0: average log income for a country with property rights measured at 0 and a mean temperature of 0 is 6.806

  • ̂

𝛾1: A one-unit increase in property rights is associated with a 0.406 change in average log incomes conditional on a country’s mean temperature

  • ̂

𝛾2: A one-degree increase in mean temperature is associated with a − 0.06 change in average log incomes conditional on strength of property rights

29 / 62

slide-30
SLIDE 30

General interpretation

̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗

  • The coeffjcient ̂

𝛾1 measures how the predicted outcome varies in 𝑌𝑗 for units with the same value of 𝑎𝑗.

  • The coeffjcient ̂

𝛾2 measures how the predicted outcome varies in 𝑎𝑗 for units with the same value of 𝑌𝑗.

30 / 62

slide-31
SLIDE 31

4/ OLS Mechanics with Two Covariates

31 / 62

slide-32
SLIDE 32

Fitued values and residuals

  • Where do we get our hats?

̂ 𝛾0, ̂ 𝛾1, ̂ 𝛾2

  • Fitted values for 𝑗 = 1, … , 𝑜:

̂ 𝑍𝑗 = ̂ 𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗

  • Residuals for 𝑗 = 1, … , 𝑜:

̂ 𝑣𝑗 = 𝑍𝑗 − ̂ 𝑍𝑗

  • Minimize the sum of the squared residuals, just like before:

( ̂ 𝛾0, ̂ 𝛾1, ̂ 𝛾2) = arg min

𝑐0,𝑐1,𝑐2 𝑜

𝑗=1

(𝑍𝑗 − 𝑐0 − 𝑐1𝑌𝑗 − 𝑐2𝑎𝑗)2

  • We’ll derive closed-form estimators with arbitrary covariates

next week.

32 / 62

slide-33
SLIDE 33

OLS estimator recipe using two steps

  • No explicit OLS formulas this week, but a recipe instead
  • “Partialling out” OLS recipe:
  • 1. Run regression of 𝑌𝑗 on 𝑎𝑗:

̂ 𝑌𝑗 = ̂ 𝔽[𝑌𝑗|𝑎𝑗] = ̂ 𝜀0 + ̂ 𝜀1𝑎𝑗

  • 2. Calculate residuals from this regression:

̂ 𝑠𝑦𝑨,𝑗 = 𝑌𝑗 − ̂ 𝑌𝑗

  • 3. Run a simple regression of 𝑍𝑗 on residuals,

̂ 𝑠𝑦𝑨,𝑗: ̂ 𝑍𝑗 = ̂ 𝛽0 + ̂ 𝛽1 ̂ 𝑠𝑦𝑨,𝑗

  • Estimate of ̂

𝛽1 will be equivalent to ̂ 𝛾1 from the “big” regression: ̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗

33 / 62

slide-34
SLIDE 34

First regression

  • Regress 𝑌𝑗 on 𝑎𝑗:

## when missing data exists, need the na.actionin order ## to place residuals or fitted values back into the data ajr.first <- lm(avexpr ~ meantemp, data = ajr, na.action = na.exclude) summary(ajr.first)

## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.9568 0.8202 12.1 < 2e-16 *** ## meantemp

  • 0.1490

0.0347

  • 4.3 0.000067 ***

## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.32 on 58 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.241, Adjusted R-squared: 0.228 ## F-statistic: 18.4 on 1 and 58 DF, p-value: 0.0000673

34 / 62

slide-35
SLIDE 35

Regression of log income on the residuals

  • Save residuals:

## store the residuals ajr$avexpr.res <- residuals(ajr.first)

  • Now we compare the estimated slopes:

coef(lm(logpgp95 ~ avexpr.res, data = ajr)) ## (Intercept) avexpr.res ## 8.0543 0.4057 coef(lm(logpgp95 ~ avexpr + meantemp, data = ajr)) ## (Intercept) avexpr meantemp ## 6.80627 0.40568

  • 0.06025

35 / 62

slide-36
SLIDE 36

Residual/partial regression plot

  • Can plot the conditional relationship between property rights

and income given temperature:

  • 3
  • 2
  • 1

1 2 3 6 7 8 9 10 Residuals(Property Right ~ Mean Temperature) Log GDP per capita

36 / 62

slide-37
SLIDE 37

5/ OLS Assumptions with Two Covariates

37 / 62

slide-38
SLIDE 38

OLS assumptions for unbiasedness

  • Simple regression assumptions unbiasedness/consistency of

OLS:

  • 1. Linearity
  • 2. Random/iid sample
  • 3. Variation in 𝑌𝑗
  • 4. Zero conditional mean error: 𝔽[𝑣𝑗|𝑌𝑗] = 0
  • Small modifjcation to these with 2 covariates:
  • 1. Linearity

𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗

  • 2. Random/iid sample
  • 3. No perfect collinearity
  • 4. Zero conditional mean error (both 𝑌𝑗 and 𝑎𝑗 unrelated to 𝑣𝑗)

𝔽[𝑣𝑗|𝑌𝑗, 𝑎𝑗] = 0

38 / 62

slide-39
SLIDE 39

New assumption

Assumption 3: No perfect collinearity

(1) No independent variable is constant in the sample and (2) there are no exactly linear relationships among the independent variables.

  • Two components
  • 1. Both 𝑌𝑗 and 𝑎𝑗 have to vary.
  • 2. 𝑎𝑗 cannot be a deterministic, linear function of 𝑌𝑗.
  • Part 2 rules out anything of the form:

𝑎𝑗 = 𝑏 + 𝑐𝑌𝑗

  • What’s the correlation between 𝑎𝑗 and 𝑌𝑗? 1!

39 / 62

slide-40
SLIDE 40

Perfect collinearity example

  • Simple example:

▶ 𝑌𝑗 = 1 if a country is not in Africa and 0 otherwise. ▶ 𝑎𝑗 = 1 if a country is in Africa and 0 otherwise.

  • But, clearly we have the following:

𝑎𝑗 = 1 − 𝑌𝑗

  • These two variables are perfectly collinear.
  • What about the following:

▶ 𝑌𝑗 = property rights ▶ 𝑎𝑗 = 𝑌2

𝑗

  • Do we have to worry about collinearity here?
  • No! Because while 𝑎𝑗 is a deterministic function of 𝑌𝑗, it is a

nonlinear function of 𝑌𝑗.

40 / 62

slide-41
SLIDE 41

R and perfect collinearity

  • R, Stata, et al will drop one of the variables if there is perfect

collinearity:

ajr$nonafrica <- 1 - ajr$africa summary(lm(logpgp95 ~ africa + nonafrica, data = ajr))

## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.7164 0.0899 96.94 < 2e-16 *** ## africa

  • 1.3612

0.1631

  • 8.35

4.9e-14 *** ## nonafrica NA NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.913 on 146 degrees of freedom ## (15 observations deleted due to missingness) ## Multiple R-squared: 0.323, Adjusted R-squared: 0.318 ## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e-14

41 / 62

slide-42
SLIDE 42

6/ Omitted Variable Bias

42 / 62

slide-43
SLIDE 43

Unbiasedness revisited

  • Long regression:

𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗

  • Assumptions 1-4 ⇒ OLS is unbiased for 𝛾0, 𝛾1, 𝛾2
  • What happens if we ignore the 𝑎𝑗 and just run the simple

linear regression with just 𝑌𝑗?

  • Short regression:

𝑍𝑗 = 𝛽0 + 𝛽1𝑌𝑗 + 𝑣∗

𝑗

  • OLS estimates from the short regression: (̂

𝛽0, ̂ 𝛽1)

  • Question: will 𝔽[̂

𝛽1] = 𝛾1? If not, what will be the difgerence?

43 / 62

slide-44
SLIDE 44

Deriving the short regression

  • How can we relate 𝛽1 to 𝛾1?

▶ Short regression will be unbiased for CEF of 𝑍𝑗 just given 𝑌𝑗.

  • Write “short CEF” in terms of the “long” regression model:

𝔽[𝑍𝑗|𝑌𝑗] = 𝔽[𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗] + 𝔽[𝑣𝑗|𝑌𝑗]

  • By assumption 4, 𝑌𝑗 is unrelated to the long-regression error,

so 𝔽[𝑣𝑗|𝑌𝑗] = 0. 𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗]

44 / 62

slide-45
SLIDE 45

Deriving the short regression

𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗]

  • Let 𝔽[𝑎𝑗|𝑌𝑗] = 𝛿0 + 𝛿1𝑌𝑗 be the (population) CEF from a

regression of 𝑎𝑗 on 𝑌𝑗.

  • Then, we can write the short CEF as:

𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2(𝛿0 + 𝛿1𝑌𝑗) = (𝛾0 + 𝛿0) + (𝛾1 + 𝛾2𝛿1)𝑌𝑗 = 𝛽0 + 𝛽1𝑌𝑗

  • Under these assumptions, short regression OLS unbiased for

𝛽1: 𝔽[̂ 𝛽1] = 𝛽1 = 𝛾1 + 𝛾2𝛿1

45 / 62

slide-46
SLIDE 46

Omitued variable bias

  • Omitted variable bias: bias for long regression coeffjcient from
  • mitting 𝑎𝑗:

Bias(̂ 𝛽1) = 𝔽[̂ 𝛽1] − 𝛾1 = 𝛾2𝜀1

  • In other words omitted variable bias is:

(“efgect” of 𝑎𝑗 on 𝑍𝑗) × (“efgect” of 𝑌𝑗 on 𝑎𝑗) (omitted → outcome) × (included → omitted)

46 / 62

slide-47
SLIDE 47

Omitued variable bias, summary

  • Remember that by OLS, the efgect of 𝑌𝑗 on 𝑎𝑗 is:

𝜀1 = cov(𝑎𝑗, 𝑌𝑗) var(𝑌𝑗)

  • We can summarize the direction of bias like so:

cov(𝑌𝑗, 𝑎𝑗) > 0 cov(𝑌𝑗, 𝑎𝑗) < 0 cov(𝑌𝑗, 𝑎𝑗) = 0 𝛾2 > 0 Positive bias Negative Bias No bias 𝛾2 < 0 Negative bias Positive Bias No bias 𝛾2 = 0 No bias No bias No bias

  • Very relevant if 𝑎𝑗 is unobserved for some reason!

47 / 62

slide-48
SLIDE 48

Including irrelevant variables

  • What if we do the opposite and include an irrelevant variable?
  • What would it mean for 𝑎𝑗 to be an irrelevant variable?

𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 0 × 𝑎𝑗 + 𝑣𝑗

  • So in this case, the true value of 𝛾2 = 0. But under

Assumptions 1-4, OLS is unbiased for all the parameters: 𝔽[ ̂ 𝛾0] = 𝛾0 𝔽[ ̂ 𝛾1] = 𝛾1 𝔽[ ̂ 𝛾2] = 0

  • Including an irrelevant variable will increase the standard

errors for ̂ 𝛾1.

48 / 62

slide-49
SLIDE 49

7/ Goodness of Fit & Multicollinearity

49 / 62

slide-50
SLIDE 50

Prediction error

  • How do we judge how well a regression fjts the data?
  • How much does 𝑌𝑗 help us predict 𝑍𝑗?
  • Prediction errors without 𝑌𝑗:

▶ Best prediction is the mean, 𝑍 ▶ Prediction error is called the total sum of squares (𝑇𝑇𝑢𝑝𝑢)

would be: 𝑇𝑇𝑢𝑝𝑢 =

𝑜

𝑗=1

(𝑍𝑗 − 𝑍)2

  • Prediction errors with 𝑌𝑗:

▶ Best predictions are the fjtted values, ̂

𝑍𝑗.

▶ Prediction error is the the sum of the squared residuals or

𝑇𝑇𝑠𝑓𝑡: 𝑇𝑇𝑠𝑓𝑡 =

𝑜

𝑗=1

(𝑍𝑗 − ̂ 𝑍𝑗)2

50 / 62

slide-51
SLIDE 51

Total SS vs SSR

3 4 5 6 7 8 9 10 6 7 8 9 10 11 12

Total Prediction Errors

Strength of Property Rights Log GDP per capita

Y

51 / 62

slide-52
SLIDE 52

Total SS vs SSR

3 4 5 6 7 8 9 10 6 7 8 9 10 11 12

Residuals

Strength of Property Rights Log GDP per capita

Yi

52 / 62

slide-53
SLIDE 53

R-square

  • Regression will always improve in-sample fjt: 𝑇𝑇𝑢𝑝𝑢 > 𝑇𝑇𝑠𝑓𝑡
  • How much better does using 𝑌𝑗 do? Coeffjcient of

determination or 𝑆2: 𝑆2 = 𝑇𝑇𝑢𝑝𝑢 − 𝑇𝑇𝑠𝑓𝑡 𝑇𝑇𝑢𝑝𝑢 = 1 − 𝑇𝑇𝑠𝑓𝑡 𝑇𝑇𝑢𝑝𝑢

  • 𝑆2 = fraction of the total prediction error eliminated by

conditioning on 𝑌𝑗.

  • Common interpretation: 𝑆2 is the fraction of the variation in

𝑍𝑗 is “explained by” 𝑌𝑗.

▶ 𝑆2 = 0 means no relationship ▶ 𝑆2 = 1 implies perfect linear fjt 53 / 62

slide-54
SLIDE 54

Sampling variance for bivariate regression

  • Under simple linear regression and homoskadasticity, the

sampling variance of the slope was: 𝕎[ ̂ 𝛾1|𝑌] = 𝜏2

𝑣

∑𝑜

𝑗=1(𝑌𝑗 − 𝑌)2 =

𝜏2

𝑣

(𝑜 − 1)𝑇2

𝑌

  • Factors afgecting the standard errors:

▶ The error variance 𝜏2

𝑣 (higher conditional variance of 𝑍𝑗 leads

to bigger SEs)

▶ The sample variance of 𝑌𝑗: 𝑇2

𝑌 (lower variation in 𝑌𝑗 leads to

bigger SEs)

▶ The sample size 𝑜 (higher sample size leads to lower SEs) 54 / 62

slide-55
SLIDE 55

Sampling variation with 2 covariates

  • Regression with an additional independent variable:

𝕎[ ̂ 𝛾1|𝑌𝑗, 𝑎𝑗] = 𝜏2

𝑣

(1 − 𝑆2

1)(𝑜 − 1)𝑇2 𝑌

  • Here, 𝑆2

1 is the 𝑆2 from the regression of 𝑌𝑗 on 𝑎𝑗:

̂ 𝑌𝑗 = ̂ 𝜀0 + ̂ 𝜀1𝑎𝑗

  • Factors now afgecting the standard errors:

▶ The error variance: 𝜏2

𝑣

▶ The sample variance of 𝑌𝑗: 𝑇2

𝑌

▶ The sample size 𝑜 ▶ The strength of the (linear) relationship betwee 𝑌𝑗 and 𝑎𝑗

(stronger relationships mean higher 𝑆2

1 and thus bigger SEs)

55 / 62

slide-56
SLIDE 56

Multicollinearity

Defjnition

Multicollinearity is defjned to be high, but not perfect, correlation between two independent variables in a regression.

  • With multicollinearity, we’ll have 𝑆2

1 ≈ 1, but not exactly.

  • The stronger the relationship between 𝑌𝑗 and 𝑎𝑗, the closer

the 𝑆2

1 will be to 1, and the higher the SEs will be:

𝕎[ ̂ 𝛾1|𝑌𝑗, 𝑎𝑗] = 𝜏2

𝑣

(1 − 𝑆2

1)(𝑜 − 1)𝑇2 𝑌

  • Given the symmetry, it will also increase var( ̂

𝛾2) as well.

56 / 62

slide-57
SLIDE 57

Intuition for multicollinearity

  • Remember the OLS recipe:

̂ 𝑠𝑦𝑨,𝑗 are the residuals from the regression of 𝑌𝑗 on 𝑎𝑗

▶ ̂

𝛾1 from regression of 𝑍𝑗 on ̂ 𝑠𝑦𝑨,𝑗

  • Estimated coeffjcient:

̂ 𝛾1 = ̂ cov[ ̂ 𝑠𝑦𝑨,𝑗𝑍𝑗] ̂ 𝕎[ ̂ 𝑠2

𝑦𝑨,𝑗]

  • When 𝑎𝑗 and 𝑌𝑗 have a strong relationship, then the residuals

will have low variation

  • We explain away a lot of the variation in 𝑌𝑗 through 𝑎𝑗.

57 / 62

slide-58
SLIDE 58

Multicollinearity, visualized

  • 1

1 2 3 4

  • 1

1 2 3 4 5

Weak X-Z

Z X

  • 1

1 2 3 4

  • 1

1 2 3 4 5

Strong X-Z

Z X

58 / 62

slide-59
SLIDE 59

Multicollinearity, visualized

  • 2
  • 1

1 2

  • 1

1 2 3 4 5

Weak X-Z

Residuals(X ~ Z) Y

  • 2
  • 1

1 2

  • 1

1 2 3 4 5

Strong X-Z

Residuals(X ~ Z) Y

59 / 62

slide-60
SLIDE 60

Multicollinearity, visualized

  • 2
  • 1

1 2

  • 1

1 2 3 4 5

Weak X-Z

Residuals(X ~ Z) Y

  • 2
  • 1

1 2

  • 1

1 2 3 4 5

Strong X-Z

Residuals(X ~ Z) Y

60 / 62

slide-61
SLIDE 61

Effects of multicollinearity

  • No efgect on the bias of OLS.
  • Only increases the standard errors.
  • Really just a sample size problem:

▶ If 𝑌𝑗 and 𝑎𝑗 are extremely highly correlated, you’re going to

need a much bigger sample to accurately difgerentiate between their efgects.

61 / 62

slide-62
SLIDE 62

Conclusion

  • In this brave new world with 2 independent variables:
  • 1. 𝛾’s have slightly difgerent interpretations
  • 2. OLS still minimizing the sum of the squared residuals
  • 3. Adding or omitting variables in a regression can afgect the bias

and the variance of OLS

  • Remainder of class:
  • 1. Regression in most general glory (matrices)
  • 2. How to diagnose and fjx violations of the OLS assumptions

62 / 62