ps 405 week 5 section ols regression and its assumptions
play

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. - PowerPoint PPT Presentation

PS 405 Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014 Todays plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions Basic set-up Scalar: Y i = 0 + 1 X 1 i +


  1. PS 405 – Week 5 Section: OLS Regression and Its Assumptions D.J. Flynn February 11, 2014

  2. Today’s plan Basic OLS set-up Estimation/interpretion of OLS models in R Gauss-Markov Assumptions

  3. Basic set-up ◮ Scalar: Y i = β 0 + β 1 X 1 i + β 2 X 2 i + ...β K X Ki + ǫ i ◮ Matrix: Y i = X i β + ǫ i ◮ Y is a (quasi-)continuous outcome, Xs are independent variables, ǫ is a residual ◮ We’ll use matrix form and assume X could include k = 1 , 2 , ... variables ◮ Our goal: specify a model (pick Xs) and estimate parameters ( β 0 , β 1 , ...β K ) such that error is minimized

  4. Re-cap from last week Y i = X i β + ǫ i , where i = 1 , 2 , ... N and k = 1 , 2 , ... K . For each term, ◮ vector or matrix? ◮ size? ◮ why are some (not all) terms indexed by i ?

  5. Estimating OLS ◮ Collect data on Y and X. ◮ Estimate the model and obtain parameters: β 0 , β 1 , ...β K . ◮ Make predictions for each observation’s outcome ( ˆ Y ) via linear combination: Suppose our model is: Turnout i = β 0 + β 1 Competitiveness i + β 2 AdSpending i + ǫ i , where Turnout is measured 0-100, Competitiveness is a dummy, and AdSpending is measured 1-5. We estimate the model in R and get these coefficients: β 0 = 11 ,β C = 25, β AS = 6 . 25.

  6. Now we can predict turnout in any election given competitiveness and ad spending data. For a competitive election with lots of spending (5/5), the predicted level of turnout is: ˆ Y i = 11 + 25 ( 1 ) + 6 . 25 ( 5 ) = 67 . 25 % . Suppose true turnout in that election was 71%. Then u i = Y i − ˆ Y i = 71 − 67 . 25 = 3 . 75 % . Recall, OLS estimates parameters such that these errors are minimized over the whole dataset: N ( Y i − ˆ � Y i ) 2 min i = 1

  7. Estimating/Interpreting OLS in R ◮ Practice estimating a model using the USArrests dataset: library(datasets) summary(USArrests) murder.model<-lm(Murder ∼ Assault+Rape+UrbanPop, data=USArrests) summary(murder.model) ◮ Thanks to linearity assumption, we can interpret coefficients as the effect of a one-unit increase in X on Y. ◮ Thus, we MUST know units for X and Y to interpret. ◮ Check out description of variable codings here.

  8. R output Call: lm(formula = Murder ~ Assault + Rape + UrbanPop, data = USArrests) Residuals: Min 1Q Median 3Q Max -4.3990 -1.9127 -0.3444 1.2557 7.4279 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.276639 1.737997 1.885 0.0657 . Assault 0.039777 0.005912 6.729 2.33e-08 *** Rape 0.061399 0.055740 1.102 0.2764 UrbanPop -0.054694 0.027880 -1.962 0.0559 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

  9. Let’s look at fitted values: fitted.values(murder.model) predict.lm(murder.model,interval="confidence") plot(fitted.values(murder.model),USArrests$Murder) Residuals: resid(murder.model) plot(murder.model) Other helpful commands: coef(murder.model) murder.model$coef[1] confint(murder.model) Later: lots of diagnostics for checking assumptions

  10. Key point on interpretation ANOVA = does a factor (regardless of which category you’re in) predict outcome? OLS = does some variable, X, affect outcome relative to baseline (omitted category)? Helpful example: estimating treatment effects in experiments. DV: Policy Support (1-7) EE Treatment − 0.311 (0.263) − 0.609 ∗∗ (0.248) J Treatment − 0.621 ∗∗ (0.254) HA Treatment 5.508 ∗∗∗ (0.186) constant Observations 272 ∗ p < 0.1; ∗∗ p < 0.05; ∗∗∗ p < 0.01

  11. Gauss-Markov Under certain assumptions, OLS is the Best Linear Unbiased Estimator of β . Assumptions 1 1. Linearity 2. Homoskedasticity 3. Error terms are i.i.d 4. Strict Exogeneity 5. Errors are Normally Distributed 6. No (Perfect) Multicollinearity 1 Note: every regression text you read will express/refer to these differently.

  12. Assumption 1: Linearity ◮ Y is a linear function of the data: ˆ Y i = X i β ◮ Typically OK if DV is continuous. ◮ Categorical/limited DVs break linearity and require more advanced (non-linear) models, which you’ll learn in 407. ◮ Common DVs that break linearity: models for binary response (e.g., logit). Notice, function is non-linear: 1 ˆ Y i = 1 + e − X i β

  13. Assumption 2: Homoskedasticity ◮ homoskedasticity: constant error variance = errors approximately the same size for subgroups of data: var ( ǫ | X ) = σ 2 , where σ 2 is some constant. ◮ heteroskedasticity: non-constant error variance = errors differ across subgroups of data ◮ easily testable/fixable (later this quarter)

  14. Assumption 3: Error terms are i.i.d ◮ no correlation between error terms on different observations: E ( ǫ i ∗ ǫ j ) = 0 , i � = j ◮ common violation: autocorrelation ◮ easy fix: use time series models (not simple OLS)

  15. Assumption 4: Strict Exogeneity ◮ Many ways to express. Usually: E ( ǫ i | X i ) = 0 ◮ Jay will write it this way (same idea): X ⊥ ǫ ◮ Xs are determined outside the model, uncorrelated with error term ◮ challenging assumption for political scientists (e.g., democracy/GDP, media choice/political knowledge, etc....) ◮ possible solution: instrumental variables regression (next quarter)

  16. Assumption 5: Errors are Normally Distributed ◮ given your data and model, errors are Normal: ǫ ∼ N ( 0 , σ 2 ) , where σ 2 is some constant. ◮ depends on distribution of your variables and model ◮ easy problem to detect: normal probability plots: plot(murder.model,which=2) ◮ if violated, coefficients are OK, hypothesis testing invalid

  17. Assumption 6: No (Perfect) Multicollinearity ◮ multicollinearity: correlation among independent variables in a model (e.g., ideology, PID) ◮ perfect multicollinearity: two variables perfectly predict one another (e.g., dummies for male, female) = we can’t estimate effect of one relative to other ◮ challenging assumption for political scientists (espec. behavioralists) ◮ What does R do with perfectly multicollinear regressors?......

  18. dep.var<-rnorm(100,10,2) female<-rbinom(100,1,.51) male<-ifelse(female==1,0,1) perf.collin.model<-lm(dep.var ∼ female+male) summary(perf.collin.model) Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0323 0.2757 36.388 <2e-16 *** female -0.2203 0.3861 -0.571 0.57 male NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Residual standard error: 1.93 on 98 degrees of freedom Multiple R-squared: 0.003313, Adjusted R-squared: -0.006858 F-statistic: 0.3257 on 1 and 98 DF, p-value: 0.5695

  19. Consequences of violating assumptions Note: from Yanna’s lecture (2/6/14)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend