PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - PowerPoint PPT Presentation

PS 405 – Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014

Today’s plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions

Basic set-up ◮ Scalar: Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ...β K X Ki + ǫ i ◮ Matrix: Y i = X i β + ǫ i ◮ Y is a (quasi-)continuous outcome, Xs are independent variables, ǫ is a residual ◮ We’ll use matrix form and assume X could include k = 1 , 2 , ... variables ◮ Our goal: specify a model (pick Xs) and estimate parameters ( β 0 , β 1 , ...β K ) such that error is minimized

Re-cap from last week Y i = X i β + ǫ i , where i = 1 , 2 , ... N and k = 1 , 2 , ... K . For each term, ◮ vector or matrix? ◮ size? ◮ why are some (not all) terms indexed by i ?

Estimating OLS ◮ Collect data on Y and X. ◮ Estimate the model and obtain parameters: β 0 , β 1 , ...β K . ◮ Make predictions for each observation’s outcome ( ˆ Y ) via linear combination: Suppose our model is: Turnout i = β 0 + β 1 Competitiveness i + β 2 AdSpending i + ǫ i , where Turnout is measured 0-100, Competitiveness is a dummy, and AdSpending is measured 1-5. We estimate the model in R and get these coefficients: β 0 = 11 ,β C = 25, β AS = 6 . 25.

Now we can predict turnout in any election given competitiveness and ad spending data. For a competitive election with lots of spending (5/5), the predicted level of turnout is: ˆ Y i = 11 + 25 ( 1 ) + 6 . 25 ( 5 ) = 67 . 25 % . Suppose true turnout in that election was 71%. Then u i = Y i − ˆ Y i = 71 − 67 . 25 = 3 . 75 % . Recall, OLS estimates parameters such that these errors are minimized over the whole dataset: N ( Y i − ˆ � Y i ) 2 min i = 1

Estimating/Interpreting OLS in R ◮ Practice estimating a model using the USArrests dataset: library(datasets) summary(USArrests) murder.model<-lm(Murder ∼ Assault+Rape+UrbanPop, data=USArrests) summary(murder.model) ◮ Thanks to linearity assumption, we can interpret coefficients as the effect of a one-unit increase in X on Y. ◮ Thus, we MUST know units for X and Y to interpret. ◮ Check out description of variable codings here.

R output Call: lm(formula = Murder ~ Assault + Rape + UrbanPop, data = USArrests) Residuals: Min 1Q Median 3Q Max -4.3990 -1.9127 -0.3444 1.2557 7.4279 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.276639 1.737997 1.885 0.0657 . Assault 0.039777 0.005912 6.729 2.33e-08 *** Rape 0.061399 0.055740 1.102 0.2764 UrbanPop -0.054694 0.027880 -1.962 0.0559 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Let’s look at fitted values: fitted.values(murder.model) predict.lm(murder.model,interval="confidence") plot(fitted.values(murder.model),USArrests$Murder) Residuals: resid(murder.model) plot(murder.model) Other helpful commands: coef(murder.model) murder.model$coef[1] confint(murder.model) Later: lots of diagnostics for checking assumptions

Key point on interpretation ANOVA = does a factor (regardless of which category you’re in) predict outcome? OLS = does some variable, X, affect outcome relative to baseline (omitted category)? Helpful example: estimating treatment effects in experiments. DV: Policy Support (1-7) EE Treatment − 0.311 (0.263) − 0.609 ∗∗ (0.248) J Treatment − 0.621 ∗∗ (0.254) HA Treatment 5.508 ∗∗∗ (0.186) constant Observations 272 ∗ p < 0.1; ∗∗ p < 0.05; ∗∗∗ p < 0.01

Gauss-Markov Under certain assumptions, OLS is the Best Linear Unbiased Estimator of β . Assumptions 1 1. Linearity 2. Homoskedasticity 3. Error terms are i.i.d 4. Strict Exogeneity 5. Errors are Normally Distributed 6. No (Perfect) Multicollinearity 1 Note: every regression text you read will express/refer to these differently.

Assumption 1: Linearity ◮ Y is a linear function of the data: ˆ Y i = X i β ◮ Typically OK if DV is continuous. ◮ Categorical/limited DVs break linearity and require more advanced (non-linear) models, which you’ll learn in 407. ◮ Common DVs that break linearity: models for binary response (e.g., logit). Notice, function is non-linear: 1 ˆ Y i = 1 + e − X i β

Assumption 2: Homoskedasticity ◮ homoskedasticity: constant error variance = errors approximately the same size for subgroups of data: var ( ǫ | X ) = σ 2 , where σ 2 is some constant. ◮ heteroskedasticity: non-constant error variance = errors differ across subgroups of data ◮ easily testable/fixable (later this quarter)

Assumption 3: Error terms are i.i.d ◮ no correlation between error terms on different observations: E ( ǫ i ∗ ǫ j ) = 0 , i � = j ◮ common violation: autocorrelation ◮ easy fix: use time series models (not simple OLS)

Assumption 4: Strict Exogeneity ◮ Many ways to express. Usually: E ( ǫ i | X i ) = 0 ◮ Jay will write it this way (same idea): X ⊥ ǫ ◮ Xs are determined outside the model, uncorrelated with error term ◮ challenging assumption for political scientists (e.g., democracy/GDP, media choice/political knowledge, etc....) ◮ possible solution: instrumental variables regression (next quarter)

Assumption 5: Errors are Normally Distributed ◮ given your data and model, errors are Normal: ǫ ∼ N ( 0 , σ 2 ) , where σ 2 is some constant. ◮ depends on distribution of your variables and model ◮ easy problem to detect: normal probability plots: plot(murder.model,which=2) ◮ if violated, coefficients are OK, hypothesis testing invalid

Assumption 6: No (Perfect) Multicollinearity ◮ multicollinearity: correlation among independent variables in a model (e.g., ideology, PID) ◮ perfect multicollinearity: two variables perfectly predict one another (e.g., dummies for male, female) = we can’t estimate effect of one relative to other ◮ challenging assumption for political scientists (espec. behavioralists) ◮ What does R do with perfectly multicollinear regressors?......

dep.var<-rnorm(100,10,2) female<-rbinom(100,1,.51) male<-ifelse(female==1,0,1) perf.collin.model<-lm(dep.var ∼ female+male) summary(perf.collin.model) Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0323 0.2757 36.388 <2e-16 *** female -0.2203 0.3861 -0.571 0.57 male NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Residual standard error: 1.93 on 98 degrees of freedom Multiple R-squared: 0.003313, Adjusted R-squared: -0.006858 F-statistic: 0.3257 on 1 and 98 DF, p-value: 0.5695

Consequences of violating assumptions Note: from Yanna’s lecture (2/6/14)

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - PowerPoint PPT Presentation

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014 Todays plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions Basic set-up Scalar: Y i = 0 + 1 X 1 i +

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Figure 2. Cultural map of the world. Knack and Keefer (QJE 1997) TABLE I T RUST, C IVIC C

PS 4 Panel Models 11 December 2014 PS 4 Panel Models Pooled OLS vs Fixed Effects Pooled OLS vs

BS2247 Introduction to Econometrics Lecture 4: The simple regression model OLS Unbiasedness, OLS

Multiple Regression Analysis Independent Variables Mechanics and Interpretation of OLS

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Ordinary Least Squares (Linear) Regression Department of Political Science and Government Aarhus

More Regression Thomas J. Leeper Department of Political Science and Government Aarhus

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

PS 406 Week 1 Section: Review of OLS and Matrix Algebra D.J. Flynn April 4, 2014 D.J. Flynn

I-405 Peak-Use Shoulder Lane Project Overview Barrett Hanson, P.E. Design Manager WSDOT

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

ONLINE LINGUISTIC SUPPORT (OLS) Make the most of your experience abroad! OLS: LANGUAGE

High Middle 2011 11-2012 012 Total School ols School ols Students 6,707 3,712 2,995

Gr a d u a t e S e mi n a r 1 9 S e p t . Mo r n i n g : p r o j e

The Future of Video Indexing in the BBC Joanne Evans, BBC Information & Archives TrecVid

Scan Matching Overview Problem statement: n Given a scan and a map, or a scan and a scan, or a

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong,

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

CRISP Data Utility Overview HSCRC Data and Infrastructure Workgroup Meeting March 4, 2014 CRISP

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - PowerPoint PPT Presentation

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014 Todays plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions Basic set-up Scalar: Y i = 0 + 1 X 1 i +

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Figure 2. Cultural map of the world. Knack and Keefer (QJE 1997) TABLE I T RUST, C IVIC C

PS 4 Panel Models 11 December 2014 PS 4 Panel Models Pooled OLS vs Fixed Effects Pooled OLS vs

BS2247 Introduction to Econometrics Lecture 4: The simple regression model OLS Unbiasedness, OLS

Multiple Regression Analysis Independent Variables Mechanics and Interpretation of OLS

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Ordinary Least Squares (Linear) Regression Department of Political Science and Government Aarhus

More Regression Thomas J. Leeper Department of Political Science and Government Aarhus

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

PS 406 Week 1 Section: Review of OLS and Matrix Algebra D.J. Flynn April 4, 2014 D.J. Flynn

I-405 Peak-Use Shoulder Lane Project Overview Barrett Hanson, P.E. Design Manager WSDOT

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

ONLINE LINGUISTIC SUPPORT (OLS) Make the most of your experience abroad! OLS: LANGUAGE

High Middle 2011 11-2012 012 Total School ols School ols Students 6,707 3,712 2,995

Gr a d u a t e S e mi n a r 1 9 S e p t . Mo r n i n g : p r o j e

The Future of Video Indexing in the BBC Joanne Evans, BBC Information &amp; Archives TrecVid

Scan Matching Overview Problem statement: n Given a scan and a map, or a scan and a scan, or a

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong,

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Feature selection:

CRISP Data Utility Overview HSCRC Data and Infrastructure Workgroup Meeting March 4, 2014 CRISP

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik

The Future of Video Indexing in the BBC Joanne Evans, BBC Information & Archives TrecVid