Gov 2000: 9. Regression with Two Independent Variables
Matthew Blackwell
Fall 2016
1 / 62
Gov 2000: 9. Regression with Two Independent Variables Matthew - - PowerPoint PPT Presentation
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics with Two Covariates 5. OLS
Matthew Blackwell
Fall 2016
1 / 62
2 / 62
Last Week
Number of Covariates in Our Regressions
2 4 6 8 10
3 / 62
Last Week This Week
Number of Covariates in Our Regressions
2 4 6 8 10
4 / 62
Last Week This Week Next Week
Number of Covariates in Our Regressions
2 4 6 8 10
5 / 62
6 / 62
7 / 62
▶ Men: 8442 applicants, 44% admission rate ▶ Women: 4321 applicants, 35% admission rate
8 / 62
Men Women Dept Applied Admitted Applied Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 373 6% 341 7%
relationship given third variable (department).
9 / 62
1 2 3 4
1 X Y
Z = 0 Z = 1
10 / 62
1 2 3 4
1 X Y
Z = 0 Z = 1
11 / 62
independent variable, 𝑌: 𝔽[𝑍𝑗|𝑌𝑗].
with a line: 𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝑣𝑗
𝑌𝑗, conditional on a third variable, 𝑎𝑗: 𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗
12 / 62
▶ Get a sense for the relationships in the data. ▶ Conditional on the number of steps I’ve taken, does higher
activity levels correlate with less weight?
▶ We can usually make better predictions about the dependent
variable with more information on independent variables.
▶ Block potential confounding, which is when 𝑌 doesn’t cause 𝑍,
but only appears to because a third variable 𝑎 causally afgects both of them.
13 / 62
▶ Omitted variable bias ▶ Multicollinearity 14 / 62
𝑗
15 / 62
16 / 62
2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita
African countries Non-African countries
17 / 62
𝔽[𝑍𝑗|𝑌𝑗] = 𝛽0 + 𝛽1𝑌𝑗
▶ (𝛽0, 𝛽1) are the bivariate intercept/slope, 𝑓𝑗 is the bivariate
error.
▶ African countries might have low incomes and weak property
rights.
𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗
▶ 𝑎𝑗 = 1 to indicate that 𝑗 is an African country ▶ 𝑎𝑗 = 0 to indicate that 𝑗 is an non-African country ▶ Efgects are now within Africa or within non-Africa, not between 18 / 62
ajr.mod <- lm(logpgp95 ~ avexpr + africa, data = ajr) summary(ajr.mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.6556 0.3134 18.04 <2e-16 *** ## avexpr 0.4242 0.0397 10.68 <2e-16 *** ## africa
0.1471
3e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.625 on 108 degrees of freedom ## (52 observations deleted due to missingness) ## Multiple R-squared: 0.708, Adjusted R-squared: 0.702 ## F-statistic: 131 on 2 and 108 DF, p-value: <2e-16
19 / 62
̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2 × 0 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗
̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2 × 1 = ( ̂ 𝛾0 + ̂ 𝛾2) + ̂ 𝛾1𝑌𝑗
20 / 62
Intercept for 𝑌𝑗 Slope for 𝑌𝑗 Non-African country (𝑎𝑗 = 0) ̂ 𝛾0 ̂ 𝛾1 African country (𝑎𝑗 = 1) ̂ 𝛾0 + ̂ 𝛾2 ̂ 𝛾1
̂ 𝑍𝑗 = 5.656 + 0.424 × 𝑌𝑗 − 0.878 × 𝑎𝑗
𝛾0: average log income for non-African country (𝑎𝑗 = 0) with property rights measured at 0 is 5.656
𝛾1: A one-unit increase in property rights is associated with a 0.424 increase in average log incomes for two African countries (or for two non-African countries)
𝛾2: there is a − 0.878 average difgerence in log income per capita between African and non-African counties conditional
21 / 62
̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗
𝛾0: average value of 𝑍𝑗 when both 𝑌𝑗 and 𝑎𝑗 are equal to 0
𝛾1: A 1-unit increase in 𝑌𝑗 is associated with a ̂ 𝛾1-unit change in 𝑍𝑗 for units with the same value of 𝑎𝑗
𝛾2: average difgerence in 𝑍𝑗 between 𝑎𝑗 = 1 group and 𝑎𝑗 = 0 group for units with the same value of 𝑌𝑗
22 / 62
2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita
β0 β0 = 5.656 β1 = 0.424
23 / 62
2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita
β0 β0 + β2 β2 β0 = 5.656 β1 = 0.424 β2 = -0.878
24 / 62
2 4 6 8 10 4 5 6 7 8 9 10 11 Strength of Property Rights Log GDP per capita
25 / 62
26 / 62
𝔽[𝑍𝑗|𝑌𝑗] = 𝛽0 + 𝛽1𝑌𝑗
▶ geography might afgect political institutions ▶ geography might afgect average incomes (through diseases like
malaria)
𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗
27 / 62
ajr.mod2 <- lm(logpgp95 ~ avexpr + meantemp, data = ajr) summary(ajr.mod2)
## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.8063 0.7518 9.05 1.3e-12 *** ## avexpr 0.4057 0.0640 6.34 3.9e-08 *** ## meantemp
0.0194
0.003 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.643 on 57 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.615, Adjusted R-squared: 0.602 ## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e-12
28 / 62
Intercept for 𝑌𝑗 Slope for 𝑌𝑗 𝑎𝑗 = 0 ∘C ̂ 𝛾0 ̂ 𝛾1 𝑎𝑗 = 21 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 21 ̂ 𝛾1 𝑎𝑗 = 24 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 24 ̂ 𝛾1 𝑎𝑗 = 26 ∘C ̂ 𝛾0 + ̂ 𝛾2 × 26 ̂ 𝛾1
̂ 𝑍𝑗 = 6.806 + 0.406 × 𝑌𝑗 − 0.06 × 𝑎𝑗
𝛾0: average log income for a country with property rights measured at 0 and a mean temperature of 0 is 6.806
𝛾1: A one-unit increase in property rights is associated with a 0.406 change in average log incomes conditional on a country’s mean temperature
𝛾2: A one-degree increase in mean temperature is associated with a − 0.06 change in average log incomes conditional on strength of property rights
29 / 62
̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗
𝛾1 measures how the predicted outcome varies in 𝑌𝑗 for units with the same value of 𝑎𝑗.
𝛾2 measures how the predicted outcome varies in 𝑎𝑗 for units with the same value of 𝑌𝑗.
30 / 62
31 / 62
̂ 𝛾0, ̂ 𝛾1, ̂ 𝛾2
̂ 𝑍𝑗 = ̂ 𝔽[𝑍𝑗|𝑌𝑗, 𝑎𝑗] = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗
̂ 𝑣𝑗 = 𝑍𝑗 − ̂ 𝑍𝑗
( ̂ 𝛾0, ̂ 𝛾1, ̂ 𝛾2) = arg min
𝑐0,𝑐1,𝑐2 𝑜
∑
𝑗=1
(𝑍𝑗 − 𝑐0 − 𝑐1𝑌𝑗 − 𝑐2𝑎𝑗)2
next week.
32 / 62
̂ 𝑌𝑗 = ̂ 𝔽[𝑌𝑗|𝑎𝑗] = ̂ 𝜀0 + ̂ 𝜀1𝑎𝑗
̂ 𝑠𝑦𝑨,𝑗 = 𝑌𝑗 − ̂ 𝑌𝑗
̂ 𝑠𝑦𝑨,𝑗: ̂ 𝑍𝑗 = ̂ 𝛽0 + ̂ 𝛽1 ̂ 𝑠𝑦𝑨,𝑗
𝛽1 will be equivalent to ̂ 𝛾1 from the “big” regression: ̂ 𝑍𝑗 = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗 + ̂ 𝛾2𝑎𝑗
33 / 62
## when missing data exists, need the na.actionin order ## to place residuals or fitted values back into the data ajr.first <- lm(avexpr ~ meantemp, data = ajr, na.action = na.exclude) summary(ajr.first)
## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.9568 0.8202 12.1 < 2e-16 *** ## meantemp
0.0347
## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.32 on 58 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.241, Adjusted R-squared: 0.228 ## F-statistic: 18.4 on 1 and 58 DF, p-value: 0.0000673
34 / 62
## store the residuals ajr$avexpr.res <- residuals(ajr.first)
coef(lm(logpgp95 ~ avexpr.res, data = ajr)) ## (Intercept) avexpr.res ## 8.0543 0.4057 coef(lm(logpgp95 ~ avexpr + meantemp, data = ajr)) ## (Intercept) avexpr meantemp ## 6.80627 0.40568
35 / 62
and income given temperature:
1 2 3 6 7 8 9 10 Residuals(Property Right ~ Mean Temperature) Log GDP per capita
36 / 62
37 / 62
OLS:
𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗
𝔽[𝑣𝑗|𝑌𝑗, 𝑎𝑗] = 0
38 / 62
Assumption 3: No perfect collinearity
(1) No independent variable is constant in the sample and (2) there are no exactly linear relationships among the independent variables.
𝑎𝑗 = 𝑏 + 𝑐𝑌𝑗
39 / 62
▶ 𝑌𝑗 = 1 if a country is not in Africa and 0 otherwise. ▶ 𝑎𝑗 = 1 if a country is in Africa and 0 otherwise.
𝑎𝑗 = 1 − 𝑌𝑗
▶ 𝑌𝑗 = property rights ▶ 𝑎𝑗 = 𝑌2
𝑗
nonlinear function of 𝑌𝑗.
40 / 62
collinearity:
ajr$nonafrica <- 1 - ajr$africa summary(lm(logpgp95 ~ africa + nonafrica, data = ajr))
## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.7164 0.0899 96.94 < 2e-16 *** ## africa
0.1631
4.9e-14 *** ## nonafrica NA NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.913 on 146 degrees of freedom ## (15 observations deleted due to missingness) ## Multiple R-squared: 0.323, Adjusted R-squared: 0.318 ## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e-14
41 / 62
42 / 62
𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗
linear regression with just 𝑌𝑗?
𝑍𝑗 = 𝛽0 + 𝛽1𝑌𝑗 + 𝑣∗
𝑗
𝛽0, ̂ 𝛽1)
𝛽1] = 𝛾1? If not, what will be the difgerence?
43 / 62
▶ Short regression will be unbiased for CEF of 𝑍𝑗 just given 𝑌𝑗.
𝔽[𝑍𝑗|𝑌𝑗] = 𝔽[𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝑎𝑗 + 𝑣𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗] + 𝔽[𝑣𝑗|𝑌𝑗]
so 𝔽[𝑣𝑗|𝑌𝑗] = 0. 𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗]
44 / 62
𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2𝔽[𝑎𝑗|𝑌𝑗]
regression of 𝑎𝑗 on 𝑌𝑗.
𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗 + 𝛾2(𝛿0 + 𝛿1𝑌𝑗) = (𝛾0 + 𝛿0) + (𝛾1 + 𝛾2𝛿1)𝑌𝑗 = 𝛽0 + 𝛽1𝑌𝑗
𝛽1: 𝔽[̂ 𝛽1] = 𝛽1 = 𝛾1 + 𝛾2𝛿1
45 / 62
Bias(̂ 𝛽1) = 𝔽[̂ 𝛽1] − 𝛾1 = 𝛾2𝜀1
(“efgect” of 𝑎𝑗 on 𝑍𝑗) × (“efgect” of 𝑌𝑗 on 𝑎𝑗) (omitted → outcome) × (included → omitted)
46 / 62
𝜀1 = cov(𝑎𝑗, 𝑌𝑗) var(𝑌𝑗)
cov(𝑌𝑗, 𝑎𝑗) > 0 cov(𝑌𝑗, 𝑎𝑗) < 0 cov(𝑌𝑗, 𝑎𝑗) = 0 𝛾2 > 0 Positive bias Negative Bias No bias 𝛾2 < 0 Negative bias Positive Bias No bias 𝛾2 = 0 No bias No bias No bias
47 / 62
𝑍𝑗 = 𝛾0 + 𝛾1𝑌𝑗 + 0 × 𝑎𝑗 + 𝑣𝑗
Assumptions 1-4, OLS is unbiased for all the parameters: 𝔽[ ̂ 𝛾0] = 𝛾0 𝔽[ ̂ 𝛾1] = 𝛾1 𝔽[ ̂ 𝛾2] = 0
errors for ̂ 𝛾1.
48 / 62
49 / 62
▶ Best prediction is the mean, 𝑍 ▶ Prediction error is called the total sum of squares (𝑇𝑇𝑢𝑝𝑢)
would be: 𝑇𝑇𝑢𝑝𝑢 =
𝑜
∑
𝑗=1
(𝑍𝑗 − 𝑍)2
▶ Best predictions are the fjtted values, ̂
𝑍𝑗.
▶ Prediction error is the the sum of the squared residuals or
𝑇𝑇𝑠𝑓𝑡: 𝑇𝑇𝑠𝑓𝑡 =
𝑜
∑
𝑗=1
(𝑍𝑗 − ̂ 𝑍𝑗)2
50 / 62
3 4 5 6 7 8 9 10 6 7 8 9 10 11 12
Total Prediction Errors
Strength of Property Rights Log GDP per capita
Y
51 / 62
3 4 5 6 7 8 9 10 6 7 8 9 10 11 12
Residuals
Strength of Property Rights Log GDP per capita
Yi
52 / 62
determination or 𝑆2: 𝑆2 = 𝑇𝑇𝑢𝑝𝑢 − 𝑇𝑇𝑠𝑓𝑡 𝑇𝑇𝑢𝑝𝑢 = 1 − 𝑇𝑇𝑠𝑓𝑡 𝑇𝑇𝑢𝑝𝑢
conditioning on 𝑌𝑗.
𝑍𝑗 is “explained by” 𝑌𝑗.
▶ 𝑆2 = 0 means no relationship ▶ 𝑆2 = 1 implies perfect linear fjt 53 / 62
sampling variance of the slope was: 𝕎[ ̂ 𝛾1|𝑌] = 𝜏2
𝑣
∑𝑜
𝑗=1(𝑌𝑗 − 𝑌)2 =
𝜏2
𝑣
(𝑜 − 1)𝑇2
𝑌
▶ The error variance 𝜏2
𝑣 (higher conditional variance of 𝑍𝑗 leads
to bigger SEs)
▶ The sample variance of 𝑌𝑗: 𝑇2
𝑌 (lower variation in 𝑌𝑗 leads to
bigger SEs)
▶ The sample size 𝑜 (higher sample size leads to lower SEs) 54 / 62
𝕎[ ̂ 𝛾1|𝑌𝑗, 𝑎𝑗] = 𝜏2
𝑣
(1 − 𝑆2
1)(𝑜 − 1)𝑇2 𝑌
1 is the 𝑆2 from the regression of 𝑌𝑗 on 𝑎𝑗:
̂ 𝑌𝑗 = ̂ 𝜀0 + ̂ 𝜀1𝑎𝑗
▶ The error variance: 𝜏2
𝑣
▶ The sample variance of 𝑌𝑗: 𝑇2
𝑌
▶ The sample size 𝑜 ▶ The strength of the (linear) relationship betwee 𝑌𝑗 and 𝑎𝑗
(stronger relationships mean higher 𝑆2
1 and thus bigger SEs)
55 / 62
Defjnition
Multicollinearity is defjned to be high, but not perfect, correlation between two independent variables in a regression.
1 ≈ 1, but not exactly.
the 𝑆2
1 will be to 1, and the higher the SEs will be:
𝕎[ ̂ 𝛾1|𝑌𝑗, 𝑎𝑗] = 𝜏2
𝑣
(1 − 𝑆2
1)(𝑜 − 1)𝑇2 𝑌
𝛾2) as well.
56 / 62
▶
̂ 𝑠𝑦𝑨,𝑗 are the residuals from the regression of 𝑌𝑗 on 𝑎𝑗
▶ ̂
𝛾1 from regression of 𝑍𝑗 on ̂ 𝑠𝑦𝑨,𝑗
̂ 𝛾1 = ̂ cov[ ̂ 𝑠𝑦𝑨,𝑗𝑍𝑗] ̂ 𝕎[ ̂ 𝑠2
𝑦𝑨,𝑗]
will have low variation
57 / 62
1 2 3 4
1 2 3 4 5
Weak X-Z
Z X
1 2 3 4
1 2 3 4 5
Strong X-Z
Z X
58 / 62
1 2
1 2 3 4 5
Weak X-Z
Residuals(X ~ Z) Y
1 2
1 2 3 4 5
Strong X-Z
Residuals(X ~ Z) Y
59 / 62
1 2
1 2 3 4 5
Weak X-Z
Residuals(X ~ Z) Y
1 2
1 2 3 4 5
Strong X-Z
Residuals(X ~ Z) Y
60 / 62
▶ If 𝑌𝑗 and 𝑎𝑗 are extremely highly correlated, you’re going to
need a much bigger sample to accurately difgerentiate between their efgects.
61 / 62
and the variance of OLS
62 / 62