Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / - PowerPoint PPT Presentation

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65

1. Relationships between Variables 2. Conditional Expectation 3. Estimating the CEF 4. Linear CEFs and Linear Projections 5. Least Squares 2 / 65

Where are we? Where are we going? distributions. Generally we’ve been learning about a single variable. the relationships between variables. How does one variable change we change the values of another variable? These will be the bread and butter of the class moving forward. 3 / 65 • What we’ve been up to: estimating parameters of population • This week and for the rest of the term, we’ll be interested in

4 / 65 AJR data 10 Log GDP pc, 1995 9 8 7 6 2 4 6 8 10 Average Protection Against Expropriation Risk • How do we draw this line?

1/ Relationships between Variables 5 / 65

What is a relationship and why do we care? about how two variables are related incomes? 6 / 65 • Most of what we want to do in the social science is learn • Examples: ▶ Does turnout vary by types of mailers received? ▶ Is the quality of political institutions related to average ▶ Does confmict mediation help reduce civil confmict?

Notation and conventions left-hand-side variable or response regressor or right-hand-side variable or treatment or predictor 7 / 65 • 𝑍 𝑗 - the dependent variable or outcome or regressand or ▶ Voter turnout ▶ Log GDP per capita ▶ Number of battle deaths • 𝑌 𝑗 - the independent variable or explanatory variable or ▶ Social pressure mailer versus Civic Duty Mailer ▶ Average Expropriation Risk ▶ Presence of confmict mediation

Joint distribution review difgerent groups. populations parameters for estimating relationships. 8 / 65 • (𝑍 𝑗 , 𝑌 𝑗 ) are draws from an i.i.d. joint distribution 𝑔 𝑍,𝑌 ▶ 𝑍 𝑗 and 𝑌 𝑗 are measured on the same unit 𝑗 ▶ WARNING difgerent than our use of 𝑍 𝑗 and 𝑌 𝑗 as r.v.s for ▶ There, 𝑍 𝑗 and 𝑌 𝑗 corresponded to difgerent units. • Several ways to summarize the joint population distribution: ▶ Covariance/correlation ▶ Conditional expectation • Today we’ll spend a lot of time thinking about the relevant ▶ Population-fjrst approach.

2/ Conditional Expectation 9 / 65

∞ Conditional expectation function 10 / 65 • Conditional expectation function (CEF): how the mean of 𝑍 𝑗 changes as 𝑌 𝑗 changes. 𝜈(𝑦) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] • The CEF is a feature of the joint distribution of 𝑍 𝑗 and 𝑌 𝑗 : 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] = ∫ −∞ 𝑧𝑔 𝑍|𝑌 (𝑧|𝑦)𝑒𝑧 𝜈(𝑦) = ̂ • Goal of regression is to estimate CEF: ̂ 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦]

then these two conditional means completely summarize the CEF for binary covariates CEF. 11 / 65 • Example: ▶ 𝑍 𝑗 is the time respondent 𝑗 waited in line to vote. ▶ 𝑌 𝑗 = 1 for whites, 𝑌 𝑗 = 0 for non-whites. • Then the mean in each group is just a conditional expectation: 𝜈( white ) = 𝐹[𝑍 𝑗 |𝑌 𝑗 = white ] 𝜈( non-white ) = 𝐹[𝑍 𝑗 |𝑌 𝑗 = non-white ] • Notice here that since 𝑌 𝑗 can only take on two values, 0 and 1,

Why is the CEF useful? are shorter on average than for non-whites. wait times. 12 / 65 Non-whites Whites μ ( 1 ) μ ( 0 ) 0 10 20 30 40 50 60 Voting Wait Time • The CEF encodes relationships between variables. • If 𝜈( white ) < 𝜈( non-white ) , so that waiting times for whites • Indicates a relationship in the population between race and

CEF for discrete covariates 13 / 65 𝑗 ’s polling station. • New covariate: 𝑌 𝑗 is the number of polling booths at citizen • The mean of 𝑍 𝑗 changes as 𝑌 𝑗 changes: 20 Booths 15 Booths 10 Booths 5 Booths μ ( 20 ) μ ( 15 ) μ ( 10 ) μ ( 5 ) 0 5 10 15 Voting Wait Time

CEF with multiple covariates multiple variables: paribus). 𝜈( white , man ) − 𝜈( non-white , man ) 14 / 65 • We could also be interested in the CEF conditioning on 𝜈( white , man ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = white , 𝑎 𝑗 = man ] 𝜈( white , woman ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = white , 𝑎 𝑗 = woman ] 𝜈( non-white , man ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = non-white , 𝑎 𝑗 = man ] 𝜈( non-white , woman ) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = non-white , 𝑎 𝑗 = woman ] • Why? Allows more credible all else equal comparisons (ceteris • Ex: average difgerence in wait times between white and non-white citizens of the same gender:

What does this function look like: CEF for continuous covariates to make producing an estimator ̂ 𝜈(𝑦) very diffjcult! 15 / 65 • What if our independent variable, 𝑌 𝑗 is income? • Many possible values of 𝑌 𝑗 ⇝ many possible values of 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] . ▶ Writing out each value of the CEF no longer feasible. • Now we will think about 𝜈(𝑦) = 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] as function. ▶ Linear: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 ▶ Quadratic: 𝜈(𝑦) = 𝛽 + 𝛾𝑦 + 𝛿𝑦 2 ▶ Crazy, nonlinear: 𝜈(𝑦) = 𝛽/(𝛾 + 𝑦) • These are unknown functions in the population! This is going

Wait times and income 16 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income

Wait times and income 17 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $25k ) 0 30 60 Wait Times

Wait times and income 18 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times

Wait times and income 19 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $75k ) μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times

Wait times and income 20 / 65 80 Wait Times 60 40 20 0 $25k $50k 100k 150k 200k Income μ ( $150k ) μ ( $75k ) μ ( $50k ) μ ( $25k ) 0 30 60 Wait Times

The CEF decomposition 𝔽[𝑣 𝑗 |𝑌 𝑗 ] = 𝔽[𝑣 𝑗 ] = 0 part that is uncorrelated with 𝑌 𝑗 . 21 / 65 • We can always decompose 𝑍 𝑗 into the CEF and an error: 𝑍 𝑗 = 𝔽[𝑍 𝑗 |𝑌 𝑗 ] + 𝑣 𝑗 • Here, the CEF error has two defjnitional properties: ▶ The mean of the error doesn’t depend on 𝑌 𝑗 : ▶ The error is uncorrelated with any function of 𝑌 𝑗 . • 𝑍 𝑗 can be decomposed into the part “explained by 𝑌 𝑗 ” and a

Best predictor defjne the mean squared error (MSE) of the prediction as: prediction error: 𝑌 𝑗 . 22 / 65 • Another reason to focus on the CEF: it generates best predictions about 𝑍 𝑗 using 𝑌 𝑗 . • Let 𝑕(𝑌 𝑗 ) be some function that generates prediction and 𝔽[(𝑍 𝑗 − 𝑕(𝑌 𝑗 )) 2 ] • What function should you pick? The CEF minimizes this 𝔽[(𝑍 𝑗 − 𝑕(𝑌 𝑗 )) 2 ] ≥ 𝔽[(𝑍 𝑗 − 𝜈(𝑌 𝑗 )) 2 ] • We say the CEF is the best predictor of 𝑍 𝑗 among functions of ▶ …in terms of squared error.

3/ Estimating the CEF 23 / 65

Estimating the CEF for binary ̂ estimating the means within each group of 𝑌 𝑗 . is a woman. 𝑍 𝑗 𝑗∶𝑌 𝑗 =0 ∑ 𝑜 0 covariates 24 / 65 𝑍 𝑗 𝑗∶𝑌 𝑗 =1 ∑ 𝑜 1 ̂ • How do we estimate ̂ 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] ? • Sample means within each group: 𝔽[𝑍 𝑗 |𝑌 𝑗 = 1] = 1 𝔽[𝑍 𝑗 |𝑌 𝑗 = 0] = 1 • 𝑜 1 = ∑ 𝑜 𝑗=1 𝑌 𝑗 is the number of women in the sample. • 𝑜 0 = 𝑜 − 𝑜 1 is the number of men. • ∑ 𝑗∶𝑌 𝑗 =1 sum only over the 𝑗 that have 𝑌 𝑗 = 1 , meaning that 𝑗 • ⇝ estimate the mean of 𝑍 𝑗 conditional on 𝑌 𝑗 by just

Binary covariate example ## mean of log GDP among non-African countries mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE) ## [1] 8.716 ## mean of log GDP among African countries mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE) ## [1] 7.355 25 / 65

Binary covariate CEF plot points(x = 1, y = mean(ajr$logpgp95[ajr$africa == 1], na.rm = TRUE), plot(ajr$africa, ajr$logpgp95, ylab = "Log GDP per capita", xlab = "Africa", pch = 19, col = "red", cex = 3) 26 / 65 pch = 19, col = "red", cex = 3) points(x = 0, y = mean(ajr$logpgp95[ajr$africa == 0], na.rm = TRUE), bty = "n") 10 Log GDP per capita 9 8 7 6 0.0 0.2 0.4 0.6 0.8 1.0 Africa

Discrete covariate: estimating the CEF ̂ 𝑜 𝑦 ∑ 𝑗∶𝑌 𝑗 =𝑦 𝑍 𝑗 27 / 65 • What if 𝑌 𝑗 isn’t binary, but takes on > 2 discrete values? • The same logic applies, we can still estimate 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] with the sample mean among those who have 𝑌 𝑗 = 𝑦 : 𝔽[𝑍 𝑗 |𝑌 𝑗 = 𝑦] = 1

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / - PowerPoint PPT Presentation

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65 1. Relationships between Variables 2. Conditional Expectation 3. Estimating the CEF 4. Linear CEFs and Linear Projections 5. Least Squares 2 / 65 Where are we? Where are

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Winlink 2000 Winlink 2000 May 22, 2007 May 22, 2007 Gwinnett Amateur Radio Emergency Service

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1.

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

HD-2000 HIGH DEFINITION MPEG ENCODER MODULATOR WITH ASI OUTPUT HD-2000 FRONT HD-2000 BACK

Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models

ECE 566: Grid Integration of Wind Energy Systems S. Suryanarayanan Associate Professor ECE

CSE 473: Artificial Intelligence Constraint Satisfaction Luke Zettlemoyer Multiple slides

650 MHz couplers for PIP-II Sergey Kazakov, June 25, 2018, CEA, Paris PIP-II Fine Tuning

Efficient Weight Learning for Markov Logic Networks Speaker Manuel Noll Advisor Maximilian

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Cartographic Papers covered Temporally Varying Georeferenced Statistics MacEachren et al. (1998)

Noise Studies April 4-8 M. Johnson April 2016 1 Testing Crew L. Bagby S. Chappa A.

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / - PowerPoint PPT Presentation

Gov 2000: 7. What is Regression? Matthew Blackwell Fall 2016 1 / 65 1. Relationships between Variables 2. Conditional Expectation 3. Estimating the CEF 4. Linear CEFs and Linear Projections 5. Least Squares 2 / 65 Where are we? Where are

4th Quarter 2000 4th Quarter 2000 November 28, 2000 November 28, 2000 Investor Community

Wild fires 1950 1950 2000 2000 250 1950 1950 2000 2000 30 40 50 20 10 0 350 200

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Winlink 2000 Winlink 2000 May 22, 2007 May 22, 2007 Gwinnett Amateur Radio Emergency Service

TDR Assumptions for Pulsed Neutron Yield [/keV] Neutron Yield [/keV] 2500 2000 2000 2500

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1.

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

HD-2000 HIGH DEFINITION MPEG ENCODER MODULATOR WITH ASI OUTPUT HD-2000 FRONT HD-2000 BACK

Slides Set 9(part b): Sampling Techniques for Probabilistic and Deterministic Graphical models

ECE 566: Grid Integration of Wind Energy Systems S. Suryanarayanan Associate Professor ECE

CSE 473: Artificial Intelligence Constraint Satisfaction Luke Zettlemoyer Multiple slides

650 MHz couplers for PIP-II Sergey Kazakov, June 25, 2018, CEA, Paris PIP-II Fine Tuning

Efficient Weight Learning for Markov Logic Networks Speaker Manuel Noll Advisor Maximilian

S Graphics Paul Murrell paul@stat.auckland.ac.nz The University of Auckland S Graphics

Cartographic Papers covered Temporally Varying Georeferenced Statistics MacEachren et al. (1998)

Noise Studies April 4-8 M. Johnson April 2016 1 Testing Crew L. Bagby S. Chappa A.

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and