Ordinary Least Squares (Linear) Regression Department of Political - - PowerPoint PPT Presentation

ordinary least squares linear regression
SMART_READER_LITE
LIVE PREVIEW

Ordinary Least Squares (Linear) Regression Department of Political - - PowerPoint PPT Presentation

OLS Goodness-of-Fit Inference Ordinary Least Squares (Linear) Regression Department of Political Science and Government Aarhus University February 17, 2015 OLS Goodness-of-Fit Inference 1 OLS 2 Goodness-of-Fit 3 Inference OLS


slide-1
SLIDE 1

OLS Goodness-of-Fit Inference

Ordinary Least Squares (Linear) Regression

Department of Political Science and Government Aarhus University

February 17, 2015

slide-2
SLIDE 2

OLS Goodness-of-Fit Inference

1 OLS 2 Goodness-of-Fit 3 Inference

slide-3
SLIDE 3

OLS Goodness-of-Fit Inference

1 OLS 2 Goodness-of-Fit 3 Inference

slide-4
SLIDE 4

OLS Goodness-of-Fit Inference

Uses of Regression

1 Description 2 Prediction 3 Causal Inference

slide-5
SLIDE 5

OLS Goodness-of-Fit Inference

Descriptive Inference

1 We want to understand a population of cases 2 We cannot observe them all, so: 1 Draw a representative sample 2 Perform mathematical procedures on sample data 3 Use assumptions to make inferences about

population

4 Express uncertainty about those inferences based

  • n assumptions
slide-6
SLIDE 6

OLS Goodness-of-Fit Inference

Parameter Estimation

We want to observe population parameter θ If we obtain a representative sample of population units:

Our sample statistic ˆ θ is an unbiased estimate of θ Our sampling procedure dictates how uncertain we are about the value of θ

slide-7
SLIDE 7

OLS Goodness-of-Fit Inference

An Example

We want to know ¯ Y (population mean) Our estimator is the sample mean formula which produces the sample estimate ¯ y: ¯ y = 1 n

n

  • i=1

yi (1) The sampling variance is our uncertainty: Var(¯ y) = s2 n (2) where s2 = sample element variance

slide-8
SLIDE 8

OLS Goodness-of-Fit Inference

Uncertainty

We never know θ Our ˆ θ is an estimate that may not equal θ

Unbiased due to Law of Large Numbers For ¯ y: N(Y , σ2)

The size of sampling variance depends on:

Element variance Sample size!

Note: SE(¯ y) =

  • Var(¯

y) We may want to know ˆ θ per se, but we are mostly interested in it as an estimate of θ

slide-9
SLIDE 9

OLS Goodness-of-Fit Inference

Causal Inference

slide-10
SLIDE 10

OLS Goodness-of-Fit Inference

Causal Inference

1 Everything that goes into descriptive inference

slide-11
SLIDE 11

OLS Goodness-of-Fit Inference

Causal Inference

1 Everything that goes into descriptive inference 2 Plus, philosophical assumptions

slide-12
SLIDE 12

OLS Goodness-of-Fit Inference

Causal Inference

1 Everything that goes into descriptive inference 2 Plus, philosophical assumptions 3 Plus, randomization or perfectly specified

model

slide-13
SLIDE 13

OLS Goodness-of-Fit Inference

Questions about philosophical assumptions?

slide-14
SLIDE 14

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

slide-15
SLIDE 15

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect

slide-16
SLIDE 16

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X)

slide-17
SLIDE 17

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X) 3 Minimizing residual sum of squares (SSR)

slide-18
SLIDE 18

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X) 3 Minimizing residual sum of squares (SSR) 4 Line (or surface) of best fit

slide-19
SLIDE 19

OLS Goodness-of-Fit Inference

Bivariate Regression I

Y is continuous X is a randomized treatment indicator/dummy (0, 1) How do we know if the treatment X had an effect on Y ?

slide-20
SLIDE 20

OLS Goodness-of-Fit Inference

Bivariate Regression I

Y is continuous X is a randomized treatment indicator/dummy (0, 1) How do we know if the treatment X had an effect on Y ? Look at mean-difference: E[Yi|Xi = 1] − E[Yi|Xi = 0]

slide-21
SLIDE 21

OLS Goodness-of-Fit Inference

Three Equations

1 Population: Y = β0 + β1X (+ǫ) 2 Sample estimate: ˆ

y = ˆ β0 + ˆ β1x

3 Unit:

yi = ˆ β0 + ˆ β1xi + ei = ¯ y0i + (y1i − y0i)xi + (y0i − ¯ y0i)

slide-22
SLIDE 22

OLS Goodness-of-Fit Inference

Bivariate Regression I

Mean difference (E[Yi|Xi = 1] − E[Yi|Xi = 0]) is the regression line slope Slope (β) defined as ∆Y

∆X

slide-23
SLIDE 23

OLS Goodness-of-Fit Inference

Bivariate Regression I

Mean difference (E[Yi|Xi = 1] − E[Yi|Xi = 0]) is the regression line slope Slope (β) defined as ∆Y

∆X

∆Y = E[Yi|X = 1] − E[Yi|X = 0] ∆X = 1 − 0 = 1

slide-24
SLIDE 24

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7

slide-25
SLIDE 25

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7 ¯ y0 ¯ y1

slide-26
SLIDE 26

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7 ¯ y0 ¯ y1 ∆x ∆y

slide-27
SLIDE 27

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7 ¯ y0 ¯ y1 ∆x ∆y = β1 ˆ β0

slide-28
SLIDE 28

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7 ˆ β1 ˆ β0

ˆ y = ˆ β0 + ˆ β1x

slide-29
SLIDE 29

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7

ˆ y = 2 + 3x

slide-30
SLIDE 30

OLS Goodness-of-Fit Inference

x y 1 1 2 3 4 5 6 7

ˆ y = 2 + 3x yi = 2 + 3xi + ei ei

slide-31
SLIDE 31

OLS Goodness-of-Fit Inference

Systematic versus unsystematic component of the data

Systematic: Regression line (slope)

Linear regression estimates the conditional means

  • f the population data (i.e., E[Y |X])

Unsystematic: Error term is the deviation of

  • bservations from the line

The difference between each value yi and ˆ yi is the residual: ei OLS produces an estimate of the relationship between X and Y that minimizes the residual sum

  • f squares
slide-32
SLIDE 32

OLS Goodness-of-Fit Inference

Why are there residuals?

slide-33
SLIDE 33

OLS Goodness-of-Fit Inference

Why are there residuals?

Omitted variables Measurement error Fundamental randomness

slide-34
SLIDE 34

OLS Goodness-of-Fit Inference

Bivariate Regression I

Mean difference (E[Yi|Xi = 1] − E[Yi|Xi = 0]) is the regression line slope Slope (β) defined as ∆Y

∆X

∆Y = E[Yi|X = 1] − E[Yi|X = 0] ∆X = 1 − 0 = 1

slide-35
SLIDE 35

OLS Goodness-of-Fit Inference

Bivariate Regression I

Mean difference (E[Yi|Xi = 1] − E[Yi|Xi = 0]) is the regression line slope Slope (β) defined as ∆Y

∆X

∆Y = E[Yi|X = 1] − E[Yi|X = 0] ∆X = 1 − 0 = 1

How do we know if this is a significant difference?

We’ll come back to that

slide-36
SLIDE 36

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect

slide-37
SLIDE 37

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X)

slide-38
SLIDE 38

OLS Goodness-of-Fit Inference

Bivariate Regression II

Y is continuous X is continuous (and randomized) How do we know if the treatment X had an effect on Y ?

Correlation coefficient (ρ) Regression coefficient (slope; β1)

slide-39
SLIDE 39

OLS Goodness-of-Fit Inference

Correlation Coefficient (ρ)

Measures how well a scatterplot is represented by a straight (non-horizontal) line

slide-40
SLIDE 40

OLS Goodness-of-Fit Inference

slide-41
SLIDE 41

OLS Goodness-of-Fit Inference

Correlation Coefficient (ρ)

Measures how well a scatterplot is represented by a straight (non-horizontal) line

slide-42
SLIDE 42

OLS Goodness-of-Fit Inference

Correlation Coefficient (ρ)

Measures how well a scatterplot is represented by a straight (non-horizontal) line Formal definition: Cov(X,Y )

σXσy

As a reminder:

Cov(x, y) = n

i=1(xi − ¯

x)(yi − ¯ y) sx =

n

i=1(xi − ¯

x)2

slide-43
SLIDE 43

OLS Goodness-of-Fit Inference

OLS Coefficient (β1)1

Measures ∆Y given ∆X

1Multivariate formula involves matrices; Week 20

slide-44
SLIDE 44

OLS Goodness-of-Fit Inference

OLS Coefficient (β1)1

Measures ∆Y given ∆X Formal definition: Cov(X,Y )

Var(X)

As a reminder:

Cov(x, y) = n

i=1(xi − ¯

x)(yi − ¯ y) Var(x) = n

i=1(xi − ¯

x)2

1Multivariate formula involves matrices; Week 20

slide-45
SLIDE 45

OLS Goodness-of-Fit Inference

OLS Coefficient (β1)1

Measures ∆Y given ∆X Formal definition: Cov(X,Y )

Var(X)

As a reminder:

Cov(x, y) = n

i=1(xi − ¯

x)(yi − ¯ y) Var(x) = n

i=1(xi − ¯

x)2

ˆ ρ and ˆ β1 are just scaled versions of

  • Cov(x, y)

1Multivariate formula involves matrices; Week 20

slide-46
SLIDE 46

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

slide-47
SLIDE 47

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

slide-48
SLIDE 48

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

slide-49
SLIDE 49

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

slide-50
SLIDE 50

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

3 How many observations do we need?

slide-51
SLIDE 51

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

3 How many observations do we need?

n ≥ k, where k is number of parameters to be estimated

slide-52
SLIDE 52

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7

slide-53
SLIDE 53

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-54
SLIDE 54

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-55
SLIDE 55

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-56
SLIDE 56

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-57
SLIDE 57

OLS Goodness-of-Fit Inference

Calculations

xi yi xi − ¯ x yi − ¯ y (xi − ¯ x)(yi − ¯ y) (xi − ¯ x)2 1 1 ? ? ? ? 2 5 ? ? ? ? 3 3 ? ? ? ? 4 6 ? ? ? ? 5 2 ? ? ? ? 6 7 ? ? ? ?

slide-58
SLIDE 58

OLS Goodness-of-Fit Inference

Intercept ˆ β0

Simple formula: ˆ β0 = ¯ y − ˆ β1¯ x

slide-59
SLIDE 59

OLS Goodness-of-Fit Inference

Intercept ˆ β0

Simple formula: ˆ β0 = ¯ y − ˆ β1¯ x Intuition: OLS fit always runs through point (¯ x, ¯ y)

slide-60
SLIDE 60

OLS Goodness-of-Fit Inference

Intercept ˆ β0

Simple formula: ˆ β0 = ¯ y − ˆ β1¯ x Intuition: OLS fit always runs through point (¯ x, ¯ y) Ex.: ˆ β0 = 4 − 0.6857 ∗ 3.5 = 1.6

slide-61
SLIDE 61

OLS Goodness-of-Fit Inference

Intercept ˆ β0

Simple formula: ˆ β0 = ¯ y − ˆ β1¯ x Intuition: OLS fit always runs through point (¯ x, ¯ y) Ex.: ˆ β0 = 4 − 0.6857 ∗ 3.5 = 1.6 ˆ y = 1.6 + 0.6857ˆ x

slide-62
SLIDE 62

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-63
SLIDE 63

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X)

slide-64
SLIDE 64

OLS Goodness-of-Fit Inference

Ways of Thinking About OLS

1 Estimating Unit-level Causal Effect 2 Ratio of Cov(X, Y ) and Var(X) 3 Minimizing residual sum of squares (SSR)

slide-65
SLIDE 65

OLS Goodness-of-Fit Inference

OLS Minimizes SSR

Total Sum of Squares (SST):

n i=1(yi − ¯

y)2 We can partition SST into two parts (ANOVA):

Explained Sum of Squares (SSE) Residual Sum of Squares (SSR)

SST = SSE + SSR OLS is the line with the lowest SSR

slide-66
SLIDE 66

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-67
SLIDE 67

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-68
SLIDE 68

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-69
SLIDE 69

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-70
SLIDE 70

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-71
SLIDE 71

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-72
SLIDE 72

OLS Goodness-of-Fit Inference

x y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 ¯ x ¯ y

slide-73
SLIDE 73

OLS Goodness-of-Fit Inference

Questions about OLS calculations?

slide-74
SLIDE 74

OLS Goodness-of-Fit Inference

Are Our Estimates Any Good?

Yes, if:

1 Works mathematically 2 Causally valid theory 3 Linear relationship between X and Y 4 X is measured without error 5 No missing data (or MCAR; see Lecture 5) 6 No confounding

slide-75
SLIDE 75

OLS Goodness-of-Fit Inference

Linear Relationship

If linear, no problems If non-linear, we need to transform

Power terms (e.g., x 2, x 3) log (e.g., log(x)) Other transformations If categorical: convert to set of indicators Multivariate interactions (next week)

slide-76
SLIDE 76

OLS Goodness-of-Fit Inference

Coefficient Interpretation Activity

Four types of variables:

1 Indicator (0,1) 2 Categorical 3 Ordinal 4 Interval

How do we interpret a coefficient on each of these types of variables?

slide-77
SLIDE 77

OLS Goodness-of-Fit Inference

Notes on Interpretation

Effect β1 is constant across values of x

slide-78
SLIDE 78

OLS Goodness-of-Fit Inference

Notes on Interpretation

Effect β1 is constant across values of x That is not true when there are:

Interaction terms (next week) Nonlinear transformations (e.g., x 2) Nonlinear regression models (e.g., logit/probit)

slide-79
SLIDE 79

OLS Goodness-of-Fit Inference

Notes on Interpretation

Effect β1 is constant across values of x That is not true when there are:

Interaction terms (next week) Nonlinear transformations (e.g., x 2) Nonlinear regression models (e.g., logit/probit)

Interpretations are sample-level

Sample representativeness determines generalizability

slide-80
SLIDE 80

OLS Goodness-of-Fit Inference

Notes on Interpretation

Effect β1 is constant across values of x That is not true when there are:

Interaction terms (next week) Nonlinear transformations (e.g., x 2) Nonlinear regression models (e.g., logit/probit)

Interpretations are sample-level

Sample representativeness determines generalizability

Remember uncertainty

These are estimates, not population parameters

slide-81
SLIDE 81

OLS Goodness-of-Fit Inference

Measurement Error in Regressor(s)

We want effect of x, but we observe x ∗, where x = x ∗ + w: y = β0 + β1x ∗ + ǫ = β0 + β1(x − w) + ǫ = β0 + β1x + (ǫ − β1w) = β0 + β1x + v

slide-82
SLIDE 82

OLS Goodness-of-Fit Inference

Measurement Error in Regressor(s)

Produces attenuation: as measurement error increases, β1 → 0 Our coefficients fit the observed data But they are biased estimates of our population equation

This applies to all ˆ β in a multivariate regression Direction of bias is unknown

slide-83
SLIDE 83

OLS Goodness-of-Fit Inference

Measurement Error in Y

Not necessarily a problem If random (i.e., uncorrelated with x), it costs us precision If systematic, who knows?! If censored, see Lectures 11 and/or 12

slide-84
SLIDE 84

OLS Goodness-of-Fit Inference

Missing Data

Missing data can be a big problem We will discuss it in Lecture 5

slide-85
SLIDE 85

OLS Goodness-of-Fit Inference

Confounding (Selection Bias)

If x is not randomly assigned, potential

  • utcomes are not independent of x

Other factors explain why a unit i received their particular value xi In matching, we obtain this conditional independence by comparing units that are identical on all confounding variables

slide-86
SLIDE 86

OLS Goodness-of-Fit Inference

Omitted Variables

E[Yi|Xi = 1] − E[Yi|Xi = 0] =

  • Naive Effect

E[Y1i|Xi = 1] − E[Y0i|Xi = 1]

  • Treatment Effect on Treated (ATT)

+ E[Y0i|Xi = 1] − E[Y0i|Xi = 0]

  • Selection Bias
slide-87
SLIDE 87

OLS Goodness-of-Fit Inference

X D Y Z A B C

slide-88
SLIDE 88

OLS Goodness-of-Fit Inference

Omitted Variable Bias

We want to estimate: Y = β0 + β1X + β2Z + ǫ We actually estimate: ˜ y = ˜ β0 + ˜ β1x + ǫ = ˜ β0 + ˜ β1x + (0 ∗ z) + ǫ = ˜ β0 + ˜ β1x + ν Bias: ˜ β1 = ˆ β1 + ˆ β2˜ δ1, where ˜ z = ˜ δ0 + ˜ δ1x

slide-89
SLIDE 89

OLS Goodness-of-Fit Inference

Size and Direction of Bias

Bias: ˜ β1 = ˆ β1 + ˆ β2˜ δ1, where ˜ z = ˜ δ0 + ˜ δ1x Corr(x, z) < 0 Corr(x, z) > 0 β2 < 0 Positive Negative β2 > 0 Negative Positive

slide-90
SLIDE 90

OLS Goodness-of-Fit Inference

Aside: Three Meanings of “Endogeneity”

Formally endogeneity is when Cov(X, ǫ) = 0

1 Measurement error in regressors 2 Omitted variables associated with included

regressors

“Specification error” Confounding

3 Lack of temporal precedence

slide-91
SLIDE 91

OLS Goodness-of-Fit Inference

Example: Englebert

What is his research question? What is his theory? What does the graph look like? What is his analysis?

slide-92
SLIDE 92

OLS Goodness-of-Fit Inference

Common Conditioning Strategies

slide-93
SLIDE 93

OLS Goodness-of-Fit Inference

Common Conditioning Strategies

1 Condition on nothing (“naive effect”)

slide-94
SLIDE 94

OLS Goodness-of-Fit Inference

Common Conditioning Strategies

1 Condition on nothing (“naive effect”) 2 Condition on some variables

slide-95
SLIDE 95

OLS Goodness-of-Fit Inference

Common Conditioning Strategies

1 Condition on nothing (“naive effect”) 2 Condition on some variables 3 Condition on all observables

slide-96
SLIDE 96

OLS Goodness-of-Fit Inference

Common Conditioning Strategies

1 Condition on nothing (“naive effect”) 2 Condition on some variables 3 Condition on all observables

Which of these are good strategies?

slide-97
SLIDE 97

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

slide-98
SLIDE 98

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables

slide-99
SLIDE 99

OLS Goodness-of-Fit Inference

X D Y Z A B C

slide-100
SLIDE 100

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables

slide-101
SLIDE 101

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables

slide-102
SLIDE 102

OLS Goodness-of-Fit Inference

X D Y Z A B C

slide-103
SLIDE 103

OLS Goodness-of-Fit Inference

Post-treatment Bias

We usually want to know the total effect of a cause If we include a mediator, D, of the X → Y relationship, the coefficient on X:

Only reflects the direct effect Excludes the indirect effect of X through M

So don’t control for mediators!

slide-104
SLIDE 104

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables

slide-105
SLIDE 105

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables Do not include colinear variables

slide-106
SLIDE 106

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

3 How many observations do we need?

n ≥ k, where k is number of parameters to be estimated

slide-107
SLIDE 107

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

3 How many observations do we need?

n ≥ k, where k is number of parameters to be estimated

4 Can we have highly correlated regressors?

slide-108
SLIDE 108

OLS Goodness-of-Fit Inference

Minimum Mathematical Requirements

1 Do we need variation in X?

Yes, otherwise dividing by zero

2 Do we need variation in Y ?

No, ˆ β1 can equal zero

3 How many observations do we need?

n ≥ k, where k is number of parameters to be estimated

4 Can we have highly correlated regressors?

Generally no (due to multicollinearity)

slide-109
SLIDE 109

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables Do not include colinear variables

slide-110
SLIDE 110

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables Do not include colinear variables Including irrelevant variables costs certainty

slide-111
SLIDE 111

OLS Goodness-of-Fit Inference

What goes in our regression?

Use theory to build causal models

Often, a causal graph helps

Some guidance:

Include confounding variables Do not include post-treatment variables Do not include colinear variables Including irrelevant variables costs certainty Including variables that affect Y alone increases certainty

slide-112
SLIDE 112

OLS Goodness-of-Fit Inference

X D Y Z A B C

slide-113
SLIDE 113

OLS Goodness-of-Fit Inference

Questions about specification?

slide-114
SLIDE 114

OLS Goodness-of-Fit Inference

Multivariate Regression Interpretation

All our interpretation rules from earlier still apply in a multivariate regression Now we interpret a coefficient as an effect “all else constant” Generally, not good to give all coefficients a causal interpretation

Think “forward causal inference” We’re interested in the X → Y effect All other coefficients are there as “controls”

slide-115
SLIDE 115

OLS Goodness-of-Fit Inference

From Line to Surface I

In simple regression, we estimate a line In multiple regression, we estimate a surface Each coefficient is the marginal effect, all else constant (at mean) This can be hard to picture in your mind

slide-116
SLIDE 116

OLS Goodness-of-Fit Inference

From Line to Surface II

x y ˆ y = ˆ β0 + ˆ β1X

slide-117
SLIDE 117

OLS Goodness-of-Fit Inference

From Line to Surface II

x y ˆ y = ˆ β0 + ˆ β2Z z

slide-118
SLIDE 118

OLS Goodness-of-Fit Inference

From Line to Surface II

x y ˆ y = ˆ β0 + ˆ β1X + ˆ β2Z z

slide-119
SLIDE 119

OLS Goodness-of-Fit Inference

Are Our Estimates Any Good?

Yes, if:

1 Works mathematically 2 Causally valid theory 3 Linear relationship between X and Y 4 X is measured without error 5 No missing data (or MCAR; see Lecture 5) 6 No confounding

slide-120
SLIDE 120

OLS Goodness-of-Fit Inference

OLS is BLUE

BLUE: Best Linear Unbiased Estimator Gauss Markov Assumptions:

1 Linearity in parameters 2 Random sampling 3 No multicollinearity 4 Exogeneity (E[ǫ|X] = 0) 5 Homoskedasticity (Var(ǫ|X) = σ2)

Assumptions 1–4 prove OLS is unbiased Assumption 5 proves OLS is the best estimator

slide-121
SLIDE 121

OLS Goodness-of-Fit Inference

Squared vs. Absolute Errors

Conventionally use Sum of Squared Errors Using absolute errors is also unbiased Sum of Squared Errors:

more heavily weights outliers has a smaller variance

Thus OLS is BestLUE

slide-122
SLIDE 122

OLS Goodness-of-Fit Inference

1 OLS 2 Goodness-of-Fit 3 Inference

slide-123
SLIDE 123

OLS Goodness-of-Fit Inference

Goodness-of-Fit

We want to know: “How good is our model?”

slide-124
SLIDE 124

OLS Goodness-of-Fit Inference

Goodness-of-Fit

We want to know: “How good is our model?” We can answer: “How well does our model fit the observed data?”

slide-125
SLIDE 125

OLS Goodness-of-Fit Inference

Goodness-of-Fit

We want to know: “How good is our model?” We can answer: “How well does our model fit the observed data?” Is this what we want to know?

slide-126
SLIDE 126

OLS Goodness-of-Fit Inference

Correlation

Definition: Corr(x, y) = ˆ rx,y = Cov(x,y)

(n−1)sxsy

Slope ˆ β1 and correlation ˆ rx,y are simply different scalings of Cov(x, y) Interpretation: How well the bivariate relationship is summarized by a cloud of points? Units: none (range -1 to 1)

slide-127
SLIDE 127

OLS Goodness-of-Fit Inference

Coefficient of Determination (R2)

Definition: R2 = ˆ r 2

x,y = SSE SST = 1 − SSR SST

Interpretation: How much of the total variation in y is explained by the model? But, R2 increases simply by adding more variables So, Adjusted-R2 = R2 − (1 − R2)

k n−k−1, where

k is number of regressors Units: none (range 0 to 1)

slide-128
SLIDE 128

OLS Goodness-of-Fit Inference

Standard Error of the Regression (SER)

“Root mean squared error” or just σ Definition: ˆ σ =

  • SSR

n−p, where p is number of

parameters estimated Interpretation: How far, on average, are the

  • bserved y values from their corresponding

fitted values ˆ y

sd(y) is how far, on average, a given yi is from ¯ y σ is how far, on average, a given yi is from ˆ yi

Units: same as y (range 0 to sd(y))

slide-129
SLIDE 129

OLS Goodness-of-Fit Inference

The F-test

Definition: Test of whether any of our coefficients differ from zero

In a bivariate regression, F = t2

Interpretation: Do any of the coefficients differ from zero?

Not a very interesting measure

Units: none (range 0 to ∞)

slide-130
SLIDE 130

OLS Goodness-of-Fit Inference

. reg growth lcon Source | SS df MS Number of obs = 44

  • ------------+------------------------------

F( 1, 42) = 0.09 Model | .000038348 1 .000038348 Prob > F = 0.7615 Residual | .017255198 42 .000410838 R-squared = 0.0022

  • ------------+------------------------------

Adj R-squared = -0.0215 Total | .017293546 43 .000402175 Root MSE = .02027

  • growth |

Coef.

  • Std. Err.

t P>|t| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

lcon |

  • .0017819

.0058325

  • 0.31

0.761

  • .0135524

.0099886 _cons | .0158988 .0390155 0.41 0.686

  • .0628376

.0946353

slide-131
SLIDE 131

OLS Goodness-of-Fit Inference

The F-test for nested models

Can use an F-test to compare fit of two nested models?

ˆ y = ˆ β0 + ˆ β1x1 ˆ y = ˆ β0 + ˆ β1x1 + ˆ β2x2 + . . .

Reduced model is nested within expanded model Interpretation: Does adding additional variables significantly reduce SSR?

slide-132
SLIDE 132

OLS Goodness-of-Fit Inference

. nestreg: reg growth lcon (lconsq) +-------------------------------------------------------------+ Block Residual Change Block F df df Pr > F R2 in R2

  • ------+-----------------------------------------------------

1 0.09 1 42 0.7615 0.0022 2 7.98 1 41 0.0073 0.1649 0.1626 +-------------------------------------------------------------+

slide-133
SLIDE 133

OLS Goodness-of-Fit Inference

Questions about model fit?

slide-134
SLIDE 134

OLS Goodness-of-Fit Inference

1 OLS 2 Goodness-of-Fit 3 Inference

slide-135
SLIDE 135

OLS Goodness-of-Fit Inference

Inference from Sample to Population

We want to know population parameter θ We only observe sample estimate ˆ θ We have a guess but are also uncertain

slide-136
SLIDE 136

OLS Goodness-of-Fit Inference

Inference from Sample to Population

We want to know population parameter θ We only observe sample estimate ˆ θ We have a guess but are also uncertain What range of values for θ does our ˆ θ imply? Are values in that range large or meaningful?

slide-137
SLIDE 137

OLS Goodness-of-Fit Inference

How Uncertain Are We?

Our uncertainty depends on sampling procedures Most importantly, sample size

As n → ∞, uncertainty → 0

We typically summarize our uncertainty as the standard error

slide-138
SLIDE 138

OLS Goodness-of-Fit Inference

Standard Errors (SEs)

Definition: “The standard error of a sample estimate is the average distance that a sample estimate (ˆ θ) would be from the population parameter (θ) if we drew many separate random samples and applied our estimator to each.” In bivariate regression: Var( ˆ β1) =

1 n−2SSR

SSTx

Thus, SE is a ratio of unexplained variance in y (weighted by sample size) and variance in x Units: same as coefficient (y

x )

slide-139
SLIDE 139

OLS Goodness-of-Fit Inference

What affects size of SEs?

Larger variance in x means smaller SEs More unexplained variance in y means bigger SEs More observations reduces the numerator, thus smaller SEs Other factors:

Homoskedasticity Clustering

Interpretation:

Large SE: Uncertain about population effect size Small SE: Certain about population effect size

slide-140
SLIDE 140

OLS Goodness-of-Fit Inference

Ways to Express Our Uncertainty

1 Standard Error 2 Confidence interval 3 t-statistic 4 p-value

slide-141
SLIDE 141

OLS Goodness-of-Fit Inference

. reg growth lcon Source | SS df MS Number of obs = 44

  • ------------+------------------------------

F( 1, 42) = 0.09 Model | .000038348 1 .000038348 Prob > F = 0.7615 Residual | .017255198 42 .000410838 R-squared = 0.0022

  • ------------+------------------------------

Adj R-squared = -0.0215 Total | .017293546 43 .000402175 Root MSE = .02027

  • growth |

Coef.

  • Std. Err.

t P>|t| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

lcon |

  • .0017819

.0058325

  • 0.31

0.761

  • .0135524

.0099886 _cons | .0158988 .0390155 0.41 0.686

  • .0628376

.0946353

slide-142
SLIDE 142

OLS Goodness-of-Fit Inference

Confidence Interval (CI)

Definition: Were we to repeat our procedure of sampling, applying our estimator, and calculating a confidence interval repeatedly from the population, a fixed percentage of the resulting intervals would include the true population-level slope. Interpretation: If the confidence interval

  • verlaps zero, we are uncertain if β differs from

zero

slide-143
SLIDE 143

OLS Goodness-of-Fit Inference

Confidence Interval (CI)

A CI is simply a range, centered on the slope Units: Same scale as the coefficient (y

x )

We can calculate different CIs of varying confidence

Conventionally, α = 0.05, so 95% of the CIs will include the β

slide-144
SLIDE 144

OLS Goodness-of-Fit Inference

t-statistic

A measure of how large a coefficient is relative to our uncertainty about its size Typically used to test a formal null hypothesis:

No effect null: t ˆ

β1 = ˆ β1 SE ˆ

β1

Any other null:

ˆ β1−α SE ˆ

β1 , where α is our null

hypothesis effect size

slide-145
SLIDE 145

OLS Goodness-of-Fit Inference

t-statistic

A measure of how large a coefficient is relative to our uncertainty about its size Typically used to test a formal null hypothesis:

No effect null: t ˆ

β1 = ˆ β1 SE ˆ

β1

Any other null:

ˆ β1−α SE ˆ

β1 , where α is our null

hypothesis effect size

Note: The t-statistic from a t-test of mean-difference is the same as the t-statistic from a t-test on an OLS slope for a dummy covariate

slide-146
SLIDE 146

OLS Goodness-of-Fit Inference

p-value

A summary measure in a hypothesis test General definition: “the probability of a statistic as extreme as the one we observed, if the null hypothesis was true, the statistic is distributed as we assume, and the data are as variable as observed” Definition in a regression context: “the probability of a slope as large as the one we

  • bserved . . . ”
slide-147
SLIDE 147

OLS Goodness-of-Fit Inference

The p-value is not:

The probability that a hypothesis is true or false A reflection of our confidence or certainty about the result The probability that the true slope is in any particular range of values A statement about the importance or substantive size of the effect

slide-148
SLIDE 148

OLS Goodness-of-Fit Inference

Significance

1 Substantive significance 2 Statistical significance

slide-149
SLIDE 149

OLS Goodness-of-Fit Inference

Significance

1 Substantive significance

Is the effect size (or range of possible effect sizes) important in the real world?

2 Statistical significance

slide-150
SLIDE 150

OLS Goodness-of-Fit Inference

Significance

1 Substantive significance

Is the effect size (or range of possible effect sizes) important in the real world?

2 Statistical significance

Is the effect size (or range of possible effect sizes) larger than a predetermined threshold? Conventionally, p ≤ 0.05

slide-151
SLIDE 151

OLS Goodness-of-Fit Inference

Questions about inference?

slide-152
SLIDE 152