Gov 2000: 7. What is Regression?
Matthew Blackwell
Fall 2016
1 / 65
Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / - - PowerPoint PPT Presentation
Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65 1. Relationships between Variables 2. Conditional Expectation 3. Estimating the CEF 4. Linear CEFs and Linear Projections 5. Least Squares 2 / 65 Where are we? Where are
Matthew Blackwell
Fall 2016
1 / 65
2 / 65
variable.
the relationships between variables. How does one variable change we change the values of another variable? These will be the bread and butter of the class moving forward.
3 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP pc, 1995
4 / 65
5 / 65
about how two variables are related
▶ Does turnout vary by types of mailers received? ▶ Is the quality of political institutions related to average
incomes?
▶ Does confmict mediation help reduce civil confmict? 6 / 65
left-hand-side variable or response
▶ Voter turnout ▶ Log GDP per capita ▶ Number of battle deaths
regressor or right-hand-side variable or treatment or predictor
▶ Social pressure mailer versus Civic Duty Mailer ▶ Average Expropriation Risk ▶ Presence of confmict mediation 7 / 65
▶ 𝑍𝑗 and 𝑌𝑗 are measured on the same unit 𝑗 ▶ WARNING difgerent than our use of 𝑍𝑗 and 𝑌𝑗 as r.v.s for
difgerent groups.
▶ There, 𝑍𝑗 and 𝑌𝑗 corresponded to difgerent units.
▶ Covariance/correlation ▶ Conditional expectation
populations parameters for estimating relationships.
▶ Population-fjrst approach. 8 / 65
9 / 65
changes as 𝑌𝑗 changes. 𝜈(𝑦) = 𝔽[𝑍𝑗|𝑌𝑗 = 𝑦]
𝔽[𝑍𝑗|𝑌𝑗 = 𝑦] = ∫
∞ −∞ 𝑧𝑔𝑍|𝑌(𝑧|𝑦)𝑒𝑧
𝜈(𝑦) = ̂ 𝔽[𝑍𝑗|𝑌𝑗 = 𝑦]
10 / 65
▶ 𝑍𝑗 is the time respondent 𝑗 waited in line to vote. ▶ 𝑌𝑗 = 1 for whites, 𝑌𝑗 = 0 for non-whites.
𝜈(white) = 𝐹[𝑍𝑗|𝑌𝑗 = white] 𝜈(non-white) = 𝐹[𝑍𝑗|𝑌𝑗 = non-white]
then these two conditional means completely summarize the CEF.
11 / 65
10 20 30 40 50 60 Voting Wait Time
Whites Non-whites
μ(1) μ(0)
are shorter on average than for non-whites.
wait times.
12 / 65
𝑗’s polling station.
5 10 15 Voting Wait Time
5 Booths
μ(5)
10 Booths
μ(10)
15 Booths
μ(15)
20 Booths
μ(20) 13 / 65
multiple variables: 𝜈(white, man) = 𝔽[𝑍𝑗|𝑌𝑗 = white, 𝑎𝑗 = man] 𝜈(white, woman) = 𝔽[𝑍𝑗|𝑌𝑗 = white, 𝑎𝑗 = woman] 𝜈(non-white, man) = 𝔽[𝑍𝑗|𝑌𝑗 = non-white, 𝑎𝑗 = man] 𝜈(non-white, woman) = 𝔽[𝑍𝑗|𝑌𝑗 = non-white, 𝑎𝑗 = woman]
paribus).
non-white citizens of the same gender: 𝜈(white, man) − 𝜈(non-white, man)
14 / 65
𝔽[𝑍𝑗|𝑌𝑗 = 𝑦].
▶ Writing out each value of the CEF no longer feasible.
What does this function look like:
▶ Linear: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 ▶ Quadratic: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 + 𝛿𝑦2 ▶ Crazy, nonlinear: 𝜈(𝑦) = 𝛽/(𝛾 + 𝑦)
to make producing an estimator ̂ 𝜈(𝑦) very diffjcult!
15 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k
16 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k 30 60 Wait Times μ($25k)
17 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k 30 60 Wait Times μ($25k) μ($50k)
18 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k 30 60 Wait Times μ($25k) μ($50k) μ($75k)
19 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k 30 60 Wait Times μ($25k) μ($50k) μ($75k) μ($150k)
20 / 65
𝑍𝑗 = 𝔽[𝑍𝑗|𝑌𝑗] + 𝑣𝑗
▶ The mean of the error doesn’t depend on 𝑌𝑗:
𝔽[𝑣𝑗|𝑌𝑗] = 𝔽[𝑣𝑗] = 0
▶ The error is uncorrelated with any function of 𝑌𝑗.
part that is uncorrelated with 𝑌𝑗.
21 / 65
predictions about 𝑍𝑗 using 𝑌𝑗.
defjne the mean squared error (MSE) of the prediction as: 𝔽[(𝑍𝑗 − (𝑌𝑗))2]
prediction error: 𝔽[(𝑍𝑗 − (𝑌𝑗))2] ≥ 𝔽[(𝑍𝑗 − 𝜈(𝑌𝑗))2]
𝑌𝑗.
▶ …in terms of squared error. 22 / 65
23 / 65
𝔽[𝑍𝑗|𝑌𝑗 = 𝑦]?
̂ 𝔽[𝑍𝑗|𝑌𝑗 = 1] = 1 𝑜1 ∑
𝑗∶𝑌𝑗=1
𝑍𝑗 ̂ 𝔽[𝑍𝑗|𝑌𝑗 = 0] = 1 𝑜0 ∑
𝑗∶𝑌𝑗=0
𝑍𝑗
𝑗=1 𝑌𝑗 is the number of women in the sample.
is a woman.
estimating the means within each group of 𝑌𝑗.
24 / 65
## mean of log GDP among non-African countries mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE) ## [1] 8.716 ## mean of log GDP among African countries mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE) ## [1] 7.355
25 / 65
plot(ajr$africa, ajr$logpgp95, ylab = "Log GDP per capita", xlab = "Africa", bty = "n") points(x = 0, y = mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE), pch = 19, col = "red", cex = 3) points(x = 1, y = mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE), pch = 19, col = "red", cex = 3) 0.0 0.2 0.4 0.6 0.8 1.0 6 7 8 9 10
Africa Log GDP per capita
26 / 65
with the sample mean among those who have 𝑌𝑗 = 𝑦: ̂ 𝔽[𝑍𝑗|𝑌𝑗 = 𝑦] = 1 𝑜𝑦 ∑
𝑗∶𝑌𝑗=𝑦
𝑍𝑗
27 / 65
weight <- read.csv("../data/weight.csv", stringsAsFactors = FALSE) weight$weekday <- as.numeric(format(as.Date(weight$date, format = "%m/%d/%y%n%H:%M"), "%w")) + 1 weight$date <- as.Date(weight$date, format = "%m/%d/%y%n%H:%M") day.means <- rep(NA, times = 7) names(day.means) <- c("1 - Su", "2 - Mo", "3 - Tu", "4 - We", "5 - Th", "6 - Fr", "7 - Sa") for (i in 1:7) { day.means[i] <- mean(weight$weight[weight$weekday == i]) } day.means ## 1 - Su 2 - Mo 3 - Tu 4 - We 5 - Th 6 - Fr 7 - Sa ## 170.4 170.2 169.6 169.5 169.7 169.8 170.2
28 / 65
plot(x = weight$weekday, y = weight$weight, xaxt = "n", xlab = "Weekday", ylab = "Average Weight", pch = 19, col = "grey60") lines(x = 1:7, y = day.means, pch = 19, col = "indianred", lwd = 3) points(x = 1:7, y = day.means, pch = 21, col = "white", cex = 3, bg = "indianred") axis(side = 1, at = 1:7, labels = names(day.means)) 166 168 170 172 174 176
Weekday Average Weight 1 - Su 2 - Mo 3 - Tu 4 - We 5 - Th 6 - Fr 7 - Sa
29 / 65
value of 𝑌𝑗?
will be the same in a continuous variable is 0.
̂ 𝔽[𝑍𝑗|𝑌𝑗 = 𝑦], since 𝑜𝑦 will be at most 1 for any value of 𝑦.
30 / 65
am during the day
active minutes in the previous day using this approach.
fitbit <- read.csv("../data/fitbit.csv", stringsAsFactors = FALSE) fitbit$date <- as.Date(fitbit$date, format = "%m/%d/%y") ## lag fitbit by one day fitbit$date <- fitbit$date + 1 ## merge fitbit and weight data weight <- merge(weight, fitbit, by = "date")
31 / 65
plot(weight$active.mins[order(weight$active.mins)], weight$weight[order(weight$active.mins)], type = "l", lwd = 3, pch = 19, col = "indianred",xlab = "Active Minutes Previous Day", ylab = "Weight") points(weight$active.mins, weight$weight, pch = 19, cex = 0.5) 20 40 60 80 100 120 166 168 170 172 174
Active Minutes Previous Day Weight
32 / 65
we can take the continuous variable and turn it into a discrete
each strata.
into 3 categories: lazy (< 30mins), active (30-60mins), and very active (>60min).
lowactivity.mean <- mean(weight$weight[weight$active.mins < 30]) medactivity.mean <- mean(weight$weight[weight$active.mins >= 30 & weight$active.mins < 60]) hiactivity.mean <- mean(weight$weight[weight$active.mins >= 60])
33 / 65
20 40 60 80 100 120 166 168 170 172 174 Active Minutes Previous Day Weight
34 / 65
20 40 60 80 100 120 166 168 170 172 174 Active Minutes Previous Day Weight
Lazy Active Very Active
35 / 65
20 40 60 80 100 120 166 168 170 172 174 Active Minutes Previous Day Weight
Lazy Active Very Active
36 / 65
37 / 65
covariates.
▶ Even stratifjcation had many hidden assumptions: number of
categories, cutofgs for the categories, constant means within strata, etc.
CEF is linear: 𝜈(𝑦) = 𝔽[𝑍𝑗|𝑌𝑗 = 𝑦] = 𝛾0 + 𝛾1𝑦
change in 𝑌𝑗
38 / 65
▶ 𝛾0: average income among people with 0 years of education. ▶ 𝛾1: expected difgerence in income between two adults that
difger by 1 year of education.
𝔽[𝑍𝑗|𝑌𝑗 = 12]−𝔽[𝑍𝑗|𝑌𝑗 = 11] = 𝔽[𝑍𝑗|𝑌𝑗 = 16]−𝔽[𝑍𝑗|𝑌𝑗 = 15] = 𝛾1
getting college degree.
39 / 65
white and 𝑌𝑗 = 0 being non-white.
▶ Two possible values of the CEF: 𝜈(1) for whites and 𝜈(0) for
non-whites.
𝔽[𝑍𝑗|𝑌𝑗 = 𝑦] = 𝜈(𝑦) = 𝜈(0) + (𝜈(1) − 𝜈(0)) 𝑦
𝜈(𝑦) = 𝛾0 + 𝛾1𝑦
▶ 𝛾0: expected wait-time for non-whites ▶ 𝛾1: difgerence in expected wait times between whites and
non-whites.
40 / 65
is the best linear approximation to 𝑍𝑗.
the squared prediction errors: (𝛾0, 𝛾1) = arg min
(𝑐0,𝑐1)
𝔽[(𝑍𝑗 − (𝑐0 + 𝑐1𝑌𝑗))2]
the population linear regression of 𝑍𝑗 onto 𝑌𝑗.
▶ CEF, 𝜈(𝑦) is the best predictor of 𝑍𝑗 among all functions. ▶ Linear projection is best predictor among linear functions. 41 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k
42 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k CEF
43 / 65
Income Wait Times 20 40 60 80 $25k $50k 100k 150k 200k CEF Linear Projection
44 / 65
best fjt to the joint distribution of 𝑍𝑗 and 𝑌𝑗?
𝛾0 = 𝔽[𝑍𝑗] − 𝛾1𝔽[𝑌𝑗] 𝛾1 = Cov[𝑍𝑗, 𝑌𝑗] 𝕎[𝑌𝑗]
and is well-defjned even if the CEF is nonlinear.
45 / 65
CEF is linear
If the CEF is a linear function, 𝔽[𝑍𝑗|𝑌𝑗] = 𝑐0 + 𝑐1𝑌𝑗, then it will be equal to the linear projection: 𝔽[𝑍𝑗|𝑌𝑗] = 𝛾0 + 𝛾1𝑌𝑗.
Linear projection approximates CEF
The linear projection is the best linear approximation to the CEF, so that: (𝛾0, 𝛾1) = arg min
(𝑐0,𝑐1)
𝔽[(𝜈(𝑌𝑗) − (𝑐0 + 𝑐1𝑌𝑗))2]
46 / 65
47 / 65
▶ Defjned a population line of best fjt, 𝛾0 + 𝛾1𝑌𝑗. ▶ If CEF is linear, it is equal to this line.
like 𝜈 or 𝜏2!
population joint distribution, 𝑔(𝑍,𝑌)(𝑧, 𝑦)
48 / 65
best fjt: (𝛾0, 𝛾1) = arg min
(𝑐0,𝑐1)
𝔽[(𝑍𝑗 − 𝑐0 − 𝑐1𝑌𝑗)2]
expectation with a sample mean: ( ̂ 𝛾0, ̂ 𝛾1) = arg min
𝑐0,𝑐1
1 𝑜
𝑜
∑
𝑗=1
(𝑍𝑗 − 𝑐0 − 𝑐1𝑌𝑗)2
squares (OLS).
49 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
β0 + β1Xi (OLS Line)
50 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
β0 + β1Xi (OLS Line) β0 + β1Xi (Alt. Line)
51 / 65
𝑍𝑗 for a particular observation with independent variable 𝑌𝑗: ̂ 𝑍𝑗 = ̂ 𝔽[𝑍𝑗|𝑌𝑗] = ̂ 𝛾0 + ̂ 𝛾1𝑌𝑗
value of 𝑍𝑗 and the fjtted value, ̂ 𝑍𝑗: ̂ 𝑣𝑗 = 𝑍𝑗 − ̂ 𝑍𝑗 = 𝑍𝑗 − ̂ 𝛾0 − ̂ 𝛾1𝑌𝑗
52 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
β0 + β1Xi IRN
Yi
53 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
IRN β0 + β1Xi
Yi Yi
54 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
IRN β0 + β1Xi
Yi Yi ui = Yi − Yi
55 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
β0 + β1Xi (Alt. Line) IRN
ui = Yi − Yi ≈ 0
56 / 65
̂ 𝑣𝑗 = 𝑍𝑗 − ̂ 𝛾0 − ̂ 𝛾1𝑌𝑗, tell us how well the line fjts the data.
▶ Larger magnitude residuals means that points are very far from
the line
▶ Residuals close to 0 mean points very close to the line
doing at predicting 𝑍𝑗
57 / 65
2 4 6 8 10 6 7 8 9 10 Average Protection Against Expropriation Risk Log GDP per capita
β0 + β1Xi (OLS Line) β0 + β1Xi (Alt. Line)
58 / 65
( ̂ 𝛾0, ̂ 𝛾1) = arg min
𝑐0,𝑐1
1 𝑜
𝑜
∑
𝑗=1
(𝑍𝑗 − 𝑐0 − 𝑐1𝑌𝑗)2
𝛾0) and slope ( ̂ 𝛾1) in terms
Yes! ̂ 𝛾0 = 𝑍 − ̂ 𝛾1𝑌 ̂ 𝛾1 = ∑𝑜
𝑗=1(𝑍𝑗 − 𝑍)(𝑌𝑗 − 𝑌)
∑𝑜
𝑗=1(𝑌𝑗 − 𝑌)2
59 / 65
̂ Cov[𝑌, 𝑍] = 1 𝑜 − 1
𝑜
∑
𝑗=1
(𝑌𝑗 − 𝑌𝑜)(𝑍𝑗 − 𝑍𝑜)
̂ 𝕎[𝑌𝑗] = 1 𝑜 − 1
𝑜
∑
𝑗=1
(𝑌𝑗 − 𝑌)2
̂ 𝛾1 = ∑𝑜
𝑗=1(𝑍𝑗 − 𝑍)(𝑌𝑗 − 𝑌)
∑𝑜
𝑗=1(𝑌𝑗 − 𝑌)2
= ̂ Cov(𝑌, 𝑍) ̂ 𝕎[𝑌𝑗]
60 / 65
population: 𝛾0 = 𝔽[𝑍𝑗] − 𝛾1𝔽[𝑌𝑗] 𝛾1 = Cov[𝑍𝑗, 𝑌𝑗] 𝕎[𝑌𝑗]
̂ 𝛾0 = 𝑍 − ̂ 𝛾1𝑌 ̂ 𝛾1 = ̂ Cov(𝑌, 𝑍) ̂ 𝕎[𝑌𝑗]
sample versions!
61 / 65
ajr <- na.omit(ajr[, c("avexpr", "logpgp95")]) cov.xy <- cov(ajr$avexpr, ajr$logpgp95) var.x <- var(ajr$avexpr) cov.xy/var.x ## [1] 0.5319 mean(ajr$logpgp95) - cov.xy/var.x * mean(ajr$avexpr) ## [1] 4.626
coef(lm(logpgp95 ~ avexpr, data = ajr)) ## (Intercept) avexpr ## 4.6261 0.5319
62 / 65
𝑜
∑
𝑗=1
̂ 𝑣𝑗 = 0
𝑜
∑
𝑗=1
𝑌𝑗 ̂ 𝑣𝑗 = 0 ⇝ ̂ Cov(𝑌𝑗, ̂ 𝑣𝑗) = 0
𝑜
∑
𝑗=1
̂ 𝑍𝑗 ̂ 𝑣𝑗 = 0 ⇝ ̂ Cov(̂ 𝑍𝑗, ̂ 𝑣𝑗) = 0
63 / 65
mod <- lm(logpgp95 ~ avexpr, data = ajr) mean(residuals(mod)) ## [1] -2.006e-17 cor(ajr$logem4, residuals(mod)) ## [1] -3.185e-17 cor(fitted(mod), residuals(mod)) ## [1] -1.16e-16
64 / 65
65 / 65