PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - - PowerPoint PPT Presentation

ps 405 week 5 section ols regression and its assumptions
SMART_READER_LITE
LIVE PREVIEW

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - - PowerPoint PPT Presentation

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014 Todays plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions Basic set-up Scalar: Y i = 0 + 1 X 1 i +


slide-1
SLIDE 1

PS 405 – Week 5 Section: OLS Regression and Its Assumptions

D.J. Flynn February 11, 2014

slide-2
SLIDE 2

Today’s plan

Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions

slide-3
SLIDE 3

Basic set-up

◮ Scalar:

Yi = β0 + β1X1i + β2X2i + ...βKXKi + ǫi

◮ Matrix:

Yi = Xiβ + ǫi

◮ Y is a (quasi-)continuous outcome, Xs are independent

variables, ǫ is a residual

◮ We’ll use matrix form and assume X could include k = 1, 2, ...

variables

◮ Our goal: specify a model (pick Xs) and estimate parameters

(β0, β1, ...βK) such that error is minimized

slide-4
SLIDE 4

Re-cap from last week

Yi = Xiβ + ǫi,

where i = 1, 2, ...N and k = 1, 2, ...K. For each term,

◮ vector or matrix? ◮ size? ◮ why are some (not all) terms indexed by i?

slide-5
SLIDE 5

Estimating OLS

◮ Collect data on Y and X. ◮ Estimate the model and obtain parameters: β0, β1, ...βK. ◮ Make predictions for each observation’s outcome (ˆ

Y) via linear combination: Suppose our model is: Turnouti = β0 + β1Competitivenessi + β2AdSpendingi + ǫi, where Turnout is measured 0-100, Competitiveness is a dummy, and AdSpending is measured 1-5. We estimate the model in R and get these coefficients: β0 = 11,βC = 25, βAS = 6.25.

slide-6
SLIDE 6

Now we can predict turnout in any election given competitiveness and ad spending data. For a competitive election with lots of spending (5/5), the predicted level of turnout is: ˆ Yi = 11 + 25(1) + 6.25(5) = 67.25%. Suppose true turnout in that election was 71%. Then ui = Yi − ˆ Yi = 71 − 67.25 = 3.75%. Recall, OLS estimates parameters such that these errors are minimized over the whole dataset: min

N

  • i=1

(Yi − ˆ Yi)2

slide-7
SLIDE 7

Estimating/Interpreting OLS in R

◮ Practice estimating a model using the USArrests dataset:

library(datasets) summary(USArrests) murder.model<-lm(Murder∼Assault+Rape+UrbanPop, data=USArrests) summary(murder.model)

◮ Thanks to linearity assumption, we can interpret coefficients

as the effect of a one-unit increase in X on Y.

◮ Thus, we MUST know units for X and Y to interpret. ◮ Check out description of variable codings here.

slide-8
SLIDE 8

R output

Call: lm(formula = Murder ~ Assault + Rape + UrbanPop, data = USArrests) Residuals: Min 1Q Median 3Q Max

  • 4.3990 -1.9127 -0.3444

1.2557 7.4279 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.276639 1.737997 1.885 0.0657 . Assault 0.039777 0.005912 6.729 2.33e-08 *** Rape 0.061399 0.055740 1.102 0.2764 UrbanPop

  • 0.054694

0.027880

  • 1.962

0.0559 .

  • Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

slide-9
SLIDE 9

Let’s look at fitted values: fitted.values(murder.model) predict.lm(murder.model,interval="confidence") plot(fitted.values(murder.model),USArrests$Murder) Residuals: resid(murder.model) plot(murder.model) Other helpful commands: coef(murder.model) murder.model$coef[1] confint(murder.model) Later: lots of diagnostics for checking assumptions

slide-10
SLIDE 10

Key point on interpretation

ANOVA = does a factor (regardless of which category you’re in) predict outcome? OLS = does some variable, X, affect outcome relative to baseline (omitted category)? Helpful example: estimating treatment effects in experiments.

DV: Policy Support (1-7) EE Treatment −0.311 (0.263) J Treatment −0.609∗∗ (0.248) HA Treatment −0.621∗∗ (0.254) constant 5.508∗∗∗ (0.186) Observations 272

∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

slide-11
SLIDE 11

Gauss-Markov

Under certain assumptions, OLS is the Best Linear Unbiased Estimator of β . Assumptions1

  • 1. Linearity
  • 2. Homoskedasticity
  • 3. Error terms are i.i.d
  • 4. Strict Exogeneity
  • 5. Errors are Normally Distributed
  • 6. No (Perfect) Multicollinearity

1Note: every regression text you read will express/refer to these differently.

slide-12
SLIDE 12

Assumption 1: Linearity

◮ Y is a linear function of the data:

ˆ Yi = Xiβ

◮ Typically OK if DV is continuous. ◮ Categorical/limited DVs break linearity and require more

advanced (non-linear) models, which you’ll learn in 407.

◮ Common DVs that break linearity: models for binary response

(e.g., logit). Notice, function is non-linear: ˆ Yi = 1 1 + e−Xiβ

slide-13
SLIDE 13

Assumption 2: Homoskedasticity

◮ homoskedasticity: constant error variance = errors

approximately the same size for subgroups of data: var(ǫ|X) = σ2, where σ2 is some constant.

◮ heteroskedasticity: non-constant error variance = errors

differ across subgroups of data

◮ easily testable/fixable (later this quarter)

slide-14
SLIDE 14
slide-15
SLIDE 15

Assumption 3: Error terms are i.i.d

◮ no correlation between error terms on different observations:

E(ǫi ∗ ǫj) = 0, i = j

◮ common violation: autocorrelation ◮ easy fix: use time series models (not simple OLS)

slide-16
SLIDE 16

Assumption 4: Strict Exogeneity

◮ Many ways to express. Usually:

E(ǫi|Xi) = 0

◮ Jay will write it this way (same idea):

X ⊥ ǫ

◮ Xs are determined outside the model, uncorrelated with error

term

◮ challenging assumption for political scientists (e.g.,

democracy/GDP, media choice/political knowledge, etc....)

◮ possible solution: instrumental variables regression (next

quarter)

slide-17
SLIDE 17

Assumption 5: Errors are Normally Distributed

◮ given your data and model, errors are Normal:

ǫ ∼ N(0, σ2), where σ2 is some constant.

◮ depends on distribution of your variables and model ◮ easy problem to detect: normal probability plots:

plot(murder.model,which=2)

◮ if violated, coefficients are OK, hypothesis testing invalid

slide-18
SLIDE 18

Assumption 6: No (Perfect) Multicollinearity

◮ multicollinearity: correlation among independent variables

in a model (e.g., ideology, PID)

◮ perfect multicollinearity: two variables perfectly predict one

another (e.g., dummies for male, female) = we can’t estimate effect of one relative to other

◮ challenging assumption for political scientists (espec.

behavioralists)

◮ What does R do with perfectly multicollinear regressors?......

slide-19
SLIDE 19

dep.var<-rnorm(100,10,2) female<-rbinom(100,1,.51) male<-ifelse(female==1,0,1) perf.collin.model<-lm(dep.var∼female+male) summary(perf.collin.model) Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0323 0.2757 36.388 <2e-16 *** female

  • 0.2203

0.3861

  • 0.571

0.57 male NA NA NA NA

  • Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Residual standard error: 1.93 on 98 degrees of freedom Multiple R-squared: 0.003313, Adjusted R-squared:

  • 0.006858

F-statistic: 0.3257 on 1 and 98 DF, p-value: 0.5695

slide-20
SLIDE 20

Consequences of violating assumptions

Note: from Yanna’s lecture (2/6/14)