 
              Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65
1. Relationships between Variables 2. Conditional Expectation 3. Estimating the CEF 4. Linear CEFs and Linear Projections 5. Least Squares 2 / 65
Where are we? Where are we going? distributions. Generally we’ve been learning about a single variable. the relationships between variables. How does one variable change we change the values of another variable? These will be the bread and butter of the class moving forward. 3 / 65 • What we’ve been up to: estimating parameters of population • This week and for the rest of the term, we’ll be interested in
4 / 65 AJR data 10 Log GDP pc, 1995 9 8 7 6 2 4 6 8 10 Average Protection Against Expropriation Risk • How do we draw this line?
1/ Relationships between Variables 5 / 65
What is a relationship and why do we care? about how two variables are related incomes? 6 / 65 • Most of what we want to do in the social science is learn • Examples: ▶ Does turnout vary by types of mailers received? ▶ Is the quality of political institutions related to average ▶ Does confmict mediation help reduce civil confmict?
Notation and conventions left-hand-side variable or response regressor or right-hand-side variable or treatment or predictor 7 / 65 • 𝑍 𝑗 - the dependent variable or outcome or regressand or ▶ Voter turnout ▶ Log GDP per capita ▶ Number of battle deaths • 𝑌 𝑗 - the independent variable or explanatory variable or ▶ Social pressure mailer versus Civic Duty Mailer ▶ Average Expropriation Risk ▶ Presence of confmict mediation
Joint distribution review difgerent groups. populations parameters for estimating relationships. 8 / 65 • (𝑍 𝑗 , 𝑌 𝑗 ) are draws from an i.i.d. joint distribution 𝑔 𝑍,𝑌 ▶ 𝑍 𝑗 and 𝑌 𝑗 are measured on the same unit 𝑗 ▶ WARNING difgerent than our use of 𝑍 𝑗 and 𝑌 𝑗 as r.v.s for ▶ There, 𝑍 𝑗 and 𝑌 𝑗 corresponded to difgerent units. • Several ways to summarize the joint population distribution: ▶ Covariance/correlation ▶ Conditional expectation • Today we’ll spend a lot of time thinking about the relevant ▶ Population-fjrst approach.
2/ Conditional Expectation 9 / 65
∞ Conditional expectation function 10 / 65 • Conditional expectation function (CEF): how the mean of 𝑍 𝑗 changes as 𝑌 𝑗 changes. 𝜈(𝑦) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] • The CEF is a feature of the joint distribution of 𝑍 𝑗 and 𝑌 𝑗 : 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] = ∫ −∞ 𝑧𝑔 𝑍|𝑌 (𝑧|𝑦)𝑒𝑧 𝜈(𝑦) = ̂ • Goal of regression is to estimate CEF: ̂ 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦]
then these two conditional means completely summarize the CEF for binary covariates CEF. 11 / 65 • Example: ▶ 𝑍 𝑗 is the time respondent 𝑗 waited in line to vote. ▶ 𝑌 𝑗 = 1 for whites, 𝑌 𝑗 = 0 for non-whites. • Then the mean in each group is just a conditional expectation: 𝜈( white ) = 𝐹[𝑍 𝑗 |𝑌 𝑗 = white ] 𝜈( non-white ) = 𝐹[𝑍 𝑗 |𝑌 𝑗 = non-white ] • Notice here that since 𝑌 𝑗 can only take on two values, 0 and 1,
Why is the CEF useful? are shorter on average than for non-whites. wait times. 12 / 65 Non-whites Whites μ ( 1 ) μ ( 0 ) 0 10 20 30 40 50 60 Voting Wait Time • The CEF encodes relationships between variables. • If 𝜈( white ) < 𝜈( non-white ) , so that waiting times for whites • Indicates a relationship in the population between race and
CEF for discrete covariates 13 / 65 𝑗 ’s polling station. • New covariate: 𝑌 𝑗 is the number of polling booths at citizen • The mean of 𝑍 𝑗 changes as 𝑌 𝑗 changes: 20 Booths 15 Booths 10 Booths 5 Booths μ ( 20 ) μ ( 15 ) μ ( 10 ) μ ( 5 ) 0 5 10 15 Voting Wait Time
CEF with multiple covariates multiple variables: paribus). 𝜈( white , man ) − 𝜈( non-white , man ) 14 / 65 • We could also be interested in the CEF conditioning on 𝜈( white , man ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = white , 𝑎 𝑗 = man ] 𝜈( white , woman ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = white , 𝑎 𝑗 = woman ] 𝜈( non-white , man ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = non-white , 𝑎 𝑗 = man ] 𝜈( non-white , woman ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = non-white , 𝑎 𝑗 = woman ] • Why? Allows more credible all else equal comparisons (ceteris • Ex: average difgerence in wait times between white and non-white citizens of the same gender:
What does this function look like: CEF for continuous covariates to make producing an estimator ̂ 𝜈(𝑦) very diffjcult! 15 / 65 • What if our independent variable, 𝑌 𝑗 is income? • Many possible values of 𝑌 𝑗 ⇝ many possible values of 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] . ▶ Writing out each value of the CEF no longer feasible. • Now we will think about 𝜈(𝑦) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] as function. ▶ Linear: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 ▶ Quadratic: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 + 𝛿𝑦 2 ▶ Crazy, nonlinear: 𝜈(𝑦) = 𝛽/(𝛾 + 𝑦) • These are unknown functions in the population! This is going
Wait times and income 16 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income
Wait times and income 17 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $25k ) 0 30 60 Wait Times
Wait times and income 18 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times
Wait times and income 19 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $75k ) μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times
Wait times and income 20 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $150k ) μ ( $75k ) μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times
The CEF decomposition 𝔽[𝑣 𝑗 |𝑌 𝑗 ] = 𝔽[𝑣 𝑗 ] = 0 part that is uncorrelated with 𝑌 𝑗 . 21 / 65 • We can always decompose 𝑍 𝑗 into the CEF and an error: 𝑍 𝑗 = 𝔽[𝑍 𝑗 |𝑌 𝑗 ] + 𝑣 𝑗 • Here, the CEF error has two defjnitional properties: ▶ The mean of the error doesn’t depend on 𝑌 𝑗 : ▶ The error is uncorrelated with any function of 𝑌 𝑗 . • 𝑍 𝑗 can be decomposed into the part “explained by 𝑌 𝑗 ” and a
Best predictor defjne the mean squared error (MSE) of the prediction as: prediction error: 𝑌 𝑗 . 22 / 65 • Another reason to focus on the CEF: it generates best predictions about 𝑍 𝑗 using 𝑌 𝑗 . • Let (𝑌 𝑗 ) be some function that generates prediction and 𝔽[(𝑍 𝑗 − (𝑌 𝑗 )) 2 ] • What function should you pick? The CEF minimizes this 𝔽[(𝑍 𝑗 − (𝑌 𝑗 )) 2 ] ≥ 𝔽[(𝑍 𝑗 − 𝜈(𝑌 𝑗 )) 2 ] • We say the CEF is the best predictor of 𝑍 𝑗 among functions of ▶ …in terms of squared error.
3/ Estimating the CEF 23 / 65
Estimating the CEF for binary ̂ estimating the means within each group of 𝑌 𝑗 . is a woman. 𝑍 𝑗 𝑗∶𝑌 𝑗 =0 ∑ 𝑜 0 covariates 24 / 65 𝑍 𝑗 𝑗∶𝑌 𝑗 =1 ∑ 𝑜 1 ̂ • How do we estimate ̂ 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] ? • Sample means within each group: 𝔽[𝑍 𝑗 |𝑌 𝑗 = 1] = 1 𝔽[𝑍 𝑗 |𝑌 𝑗 = 0] = 1 • 𝑜 1 = ∑ 𝑜 𝑗=1 𝑌 𝑗 is the number of women in the sample. • 𝑜 0 = 𝑜 − 𝑜 1 is the number of men. • ∑ 𝑗∶𝑌 𝑗 =1 sum only over the 𝑗 that have 𝑌 𝑗 = 1 , meaning that 𝑗 • ⇝ estimate the mean of 𝑍 𝑗 conditional on 𝑌 𝑗 by just
Binary covariate example ## mean of log GDP among non-African countries mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE) ## [1] 8.716 ## mean of log GDP among African countries mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE) ## [1] 7.355 25 / 65
Binary covariate CEF plot points(x = 1, y = mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE), plot(ajr$africa, ajr$logpgp95, ylab = "Log GDP per capita", xlab = "Africa", pch = 19, col = "red", cex = 3) 26 / 65 pch = 19, col = "red", cex = 3) points(x = 0, y = mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE), bty = "n") 10 Log GDP per capita 9 8 7 6 0.0 0.2 0.4 0.6 0.8 1.0 Africa
Discrete covariate: estimating the CEF ̂ 𝑜 𝑦 ∑ 𝑗∶𝑌 𝑗 =𝑦 𝑍 𝑗 27 / 65 • What if 𝑌 𝑗 isn’t binary, but takes on > 2 discrete values? • The same logic applies, we can still estimate 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] with the sample mean among those who have 𝑌 𝑗 = 𝑦 : 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] = 1
Recommend
More recommend