Gov 2000: 12. Troubleshooting the Linear Model Matthew Blackwell - - PowerPoint PPT Presentation

โ–ถ
gov 2000 12 troubleshooting the linear model
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 12. Troubleshooting the Linear Model Matthew Blackwell - - PowerPoint PPT Presentation

Gov 2000: 12. Troubleshooting the Linear Model Matthew Blackwell Fall 2016 1 / 67 1. Outliers, leverage points, and infmuential observations 2. Heteroskedasticity 3. Nonlinearity of the regression function 2 / 67 Where are we? Where are we


slide-1
SLIDE 1

Gov 2000: 12. Troubleshooting the Linear Model

Matthew Blackwell

Fall 2016

1 / 67

slide-2
SLIDE 2
  • 1. Outliers, leverage points, and infmuential observations
  • 2. Heteroskedasticity
  • 3. Nonlinearity of the regression function

2 / 67

slide-3
SLIDE 3

Where are we? Where are we going?

  • Last few weeks: estimation and inference for the linear model

under Gauss-Markov assumptions (and sometimes conditional Normality)

  • This week: what happens when the assumptions fail? Can we

tell? Can we fjx it?

  • Next weeks: dealing with panel data.

3 / 67

slide-4
SLIDE 4

Review of the OLS assumptions

  • 1. Linearity: ๐‘ง๐‘— = ๐ฒโ€ฒ

๐‘—๐œธ + ๐‘ฃ๐‘—

  • 2. Random sample: (๐‘ง๐‘—, ๐ฒโ€ฒ

๐‘—) are a iid sample from the population.

  • 3. Full rank: ๐˜ is an ๐‘œ ร— (๐‘™ + 1) matrix with rank ๐‘™ + 1
  • 4. Zero conditional mean: ๐”ฝ[๐‘ฃ๐‘—|๐ฒ๐‘—] = 0
  • 5. Homoskedasticity: ๐•Ž[๐‘ฃ๐‘—|๐ฒ๐‘—] = ๐œ2

๐‘ฃ

  • 6. Normality: ๐‘ฃ๐‘—|๐ฒ๐‘— โˆผ ๐‘‚(0, ๐œ2

๐‘ฃ)

  • 1-4 give us unbiasedness/consistency
  • 1-5 are the Gauss-Markov, allow for large-sample inference
  • 1-6 allow for small-sample inference

4 / 67

slide-5
SLIDE 5

Violations of the assumptions

Three issues today:

  • 1. Infmuential observations that skew regression estimates
  • 2. Violations of homoskedaticity

โ–ถ โ‡ SEs are biased (usually downward)

  • 3. Incorrect functional form/nonlinearity

โ–ถ โ‡ biased/inconsistent estimates 5 / 67

slide-6
SLIDE 6

1/ Outliers, leverage points, and influential

  • bservations

6 / 67

slide-7
SLIDE 7

Example: Buchanan votes in Florida, 2000

  • 2000 Presidential election in FL (Wand et al., 2001, APSR)

7 / 67

slide-8
SLIDE 8

Example: Buchanan votes in Florida, 2000

100000 200000 300000 400000 500000 600000 500 1000 1500 2000 2500 3000 3500 Total Votes

Buchanan Votes

8 / 67

slide-9
SLIDE 9

Example: Buchanan votes in Florida, 2000

100000 200000 300000 400000 500000 600000 500 1000 1500 2000 2500 3000 3500 Total Votes

Buchanan Votes Duval Lee Broward Martin Collier Dixie Pinellas Osceola Miami-Dade Alachua Glades Leon Volusia Highlands Hendry

  • St. Lucie

Polk Madison Orange Okeechobee Lafayette Desoto Indian River Wakulla Brevard Manatee Sarasota Jefferson Suwannee Gulf Columbia Walton Sumter Pasco Putnam

  • St. Johns

Seminole Franklin Monroe Charlotte Gilchrist Washington Marion Jackson Flagler Citrus Holmes Clay Taylor Union Lake Hillsborough Hernando Bradford Calhoun Nassau Bay Baker Gadsden Okaloosa Escambia Santa Rosa Levy Palm Beach Hamilton Hardee Liberty

9 / 67

slide-10
SLIDE 10

Example: Buchanan votes

mod <- lm(edaybuchanan ~ edaytotal, data = flvote) summary(mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.22945 49.14146 1.10 0.27 ## edaytotal 0.00232 0.00031 7.48 2.4e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 333 on 65 degrees of freedom ## Multiple R-squared: 0.463, Adjusted R-squared: 0.455 ## F-statistic: 56 on 1 and 65 DF, p-value: 2.42e-10

10 / 67

slide-11
SLIDE 11

Three types of extreme values

  • 1. Leverage point: extreme in one ๐‘ฆ direction
  • 2. Outlier: extreme in the ๐‘ง direction
  • 3. Infmuence point: extreme in both directions
  • Not all of these are problematic
  • If the data are truly โ€œcontaminatedโ€ (come from a difgerent

distribution), can cause ineffjciency and possibly bias

  • Can be a violation of iid (not identically distributed)
  • Diagnostics are loose

11 / 67

slide-12
SLIDE 12

Leverage point definition

  • 4
  • 2

2 4 6 8

  • 3
  • 2
  • 1

1 2 y

Without leverage point Full sample Leverage Point

  • Values that are extreme in the ๐‘ฆ direction
  • That is, values far from the center of the covariate distribution
  • Decrease SEs (more ๐‘ฆ variation)
  • No bias if typical in ๐‘ง dimension

12 / 67

slide-13
SLIDE 13

Hat matrix

  • First we need to defjne an important matrix

๐ˆ = ๐˜ (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ ฬ‚ ๐ฏ = ๐ณ โˆ’ ๐˜ ฬ‚ ๐œธ = ๐ณ โˆ’ ๐˜ (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐ณ โ‰ก ๐ณ โˆ’ ๐ˆ๐ณ = (๐‰ โˆ’ ๐ˆ)๐ณ

  • ๐ˆ is the hat matrix because it puts the โ€œhatโ€ on ๐ณ:

ฬ‚ ๐ณ = ๐ˆ๐ณ

โ–ถ ๐ˆ is an ๐‘œ ร— ๐‘œ symmetric matrix โ–ถ ๐ˆ is idempotent: ๐ˆ๐ˆ = ๐ˆ 13 / 67

slide-14
SLIDE 14

Hat values

ฬ‚ ๐ณ = ๐˜ ฬ‚ ๐œธ = ๐˜(๐˜โ€ฒ๐˜)โˆ’1๐˜โ€ฒ๐ณ = ๐ˆ๐ณ

  • For a particular observation ๐‘—, we can show this means:

ฬ‚ ๐‘ง๐‘— =

๐‘œ

โˆ‘

๐‘˜=1

โ„Ž๐‘—๐‘˜๐‘ง๐‘˜

  • โ„Ž๐‘—๐‘˜ = importance of observation ๐‘˜ is for the fjtted value

ฬ‚ ๐‘ง๐‘—

  • Leverage/hat values: โ„Ž๐‘— = โ„Ž๐‘—๐‘— diagonal entries of the hat

matrix

  • With a simple linear regression, we have

โ„Ž๐‘— = 1 ๐‘œ + (๐‘ฆ๐‘— โˆ’ ๐‘ฆ)2 โˆ‘๐‘œ

๐‘˜=1(๐‘ฆ๐‘˜ โˆ’ ๐‘ฆ)2

โ–ถ โ‡ how far ๐‘— is from the center of the ๐˜ distribution

  • Rule of thumb: examine hat values greater than 2(๐‘™ + 1)/๐‘œ

14 / 67

slide-15
SLIDE 15

Buchanan hats

head(hatvalues(mod), 5) ## 1 2 3 4 5 ## 0.04179 0.02285 0.22066 0.01556 0.01493

15 / 67

slide-16
SLIDE 16

Buchanan hats

Duval Lee Broward Martin Collier Dixie Pinellas Osceola Miami-Dade Alachua Glades Leon Volusia Highlands Hendry

  • St. Lucie

Polk Madison Orange Okeechobee Lafayette Desoto Indian River Wakulla Brevard Manatee Sarasota Jefferson Suwannee Gulf Columbia Walton Sumter Pasco Putnam

  • St. Johns

Seminole Franklin Monroe Charlotte Gilchrist Washington Marion Jackson Flagler Citrus Holmes Clay Taylor Union Lake Hillsborough Hernando Bradford Calhoun Nassau Bay Baker Gadsden Okaloosa Escambia Santa Rosa Levy Palm Beach Hamilton Hardee Liberty 0.05 0.10 0.15 0.20 0.25 Hat Values

16 / 67

slide-17
SLIDE 17

Outlier definition

  • 4
  • 2

2 4

  • 2

2 4 6

Without outlier Full sample Outlier

  • An outlier is a data point with very large regression errors, ๐‘ฃ๐‘—
  • Very distant from the rest of the data in the ๐‘ง-dimension
  • Increases standard errors (by increasing ฬ‚

๐œ2)

  • No bias if typical in the ๐‘ฆโ€™s

17 / 67

slide-18
SLIDE 18

Detecting outliers

  • Look for big residuals, right?

โ–ถ Problem:

ฬ‚ ๐‘ฃ๐‘— are not identically distributed.

โ–ถ Variance of the ๐‘—th residual:

๐•Ž[ ฬ‚ ๐‘ฃ๐‘—|๐˜] = ๐œ2

๐‘ฃ(1 โˆ’ โ„Ž๐‘—๐‘—)

  • Rescale to get standardized residuals with constant variance:

ฬ‚ ๐‘ฃโ€ฒ

๐‘— =

ฬ‚ ๐‘ฃ๐‘— ฬ‚ ๐œโˆš1 โˆ’ โ„Ž๐‘—๐‘—

  • Rule of thumb:

โ–ถ | ฬ‚

๐‘ฃโ€ฒ

๐‘—| > 2 will be relatively rare.

โ–ถ | ฬ‚

๐‘ฃโ€ฒ

๐‘—| > 4 โˆ’ 5 should defjnitely be checked.

18 / 67

slide-19
SLIDE 19

Buchanan outliers

std.resids <- rstandard(mod)

10 20 30 40 50 60

  • 2

2 4 6 Index Standardized Residuals

Palm Beach

19 / 67

slide-20
SLIDE 20

Detecting outliers

  • Standardized or regular residuals are not good for detecting
  • utliers because they might pull the regression line close to

them.

  • Better: leave-one-out prediction errors,
  • 1. Regress ๐˜(โˆ’๐‘—) on ๐ณ(โˆ’๐‘—), where these omit unit ๐‘—:

ฬ‚ ๐œธ(โˆ’๐‘—) = (๐˜โ€ฒ

(โˆ’๐‘—)๐˜(โˆ’๐‘—)) โˆ’1 ๐˜โ€ฒ (โˆ’๐‘—)๐ณ(โˆ’๐‘—)

  • 2. Calculate predicted value of ๐‘ง๐‘— using that regression:

ฬƒ ๐‘ง๐‘— = ๐ฒโ€ฒ

๐‘— ฬ‚

๐œธ(โˆ’๐‘—)

  • 3. Calculate prediction error:

ฬƒ ๐‘ฃ๐‘— = ๐‘ง๐‘— โˆ’ ฬƒ ๐‘ง๐‘—

  • Possible relate prediction errors to residuals:

ฬƒ ๐‘ฃ๐‘— = ฬ‚ ๐‘ฃ๐‘— 1 โˆ’ โ„Ž๐‘—

20 / 67

slide-21
SLIDE 21

Influence points

  • 4
  • 2

2 4 6 8

  • 2

2 4 6 y

Without influence point Full sample Influence Point

  • An infmuence point is one that is both an outlier and a

leverage point.

  • Extreme in both the ๐‘ฆ and ๐‘ง dimensions
  • Causes the regression line to move toward it (bias?)

21 / 67

slide-22
SLIDE 22

Overall measures of influence

  • A rough measure of infmuence is to look at how the difgerence

between the fjtted value and the predicted leave-one-out value: ฬ‚ ๐‘ง๐‘— โˆ’ ฬƒ ๐‘ง๐‘—

โ–ถ This is equivalent to

ฬƒ ๐‘ฃ๐‘—โ„Ž๐‘—, which is just the โ€œoutlier-ness ร— leverageโ€

  • Cookโ€™s distance (cooks.distance()): ๐ธ๐‘— =

ฬƒ ๐‘ฃ2

๐‘—

(๐‘™+1)ฬ‚ ๐œ2 ร— โ„Ž๐‘—

โ–ถ Basically: โ€œnormalized outlier-ness ร— leverageโ€ โ–ถ ๐ธ๐‘— > 4/(๐‘œ โˆ’ ๐‘™ โˆ’ 1) considered โ€œlargeโ€, but cutofgs are arbitrary

  • Infmuence plot:

โ–ถ x-axis: hat values, โ„Ž๐‘— โ–ถ y-axis: standardized residuals,

ฬ‚ ๐‘ฃโ€ฒ

๐‘—

22 / 67

slide-23
SLIDE 23

Influence plot from lm output

plot(mod, which = 5, labels.id = flvote$county)

0.00 0.05 0.10 0.15 0.20 0.25

  • 4
  • 2

2 4 6 8 Leverage Standardized residuals

lm(edaybuchanan ~ edaytotal) Cook's distance

Palm Beach Miami-Dade Broward

23 / 67

slide-24
SLIDE 24

Limitations of the standard tools

2 4 6 8

  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 y

  • What happens when there are two infmuence points?
  • Red line drops the red infmuence point
  • Blue line drops the blue infmuence point
  • โ€œLeave-one-outโ€ approaches helps recover the line

24 / 67

slide-25
SLIDE 25

What to do about outliers and influential units?

  • Is the data corrupted?

โ–ถ Fix the observation (obvious data entry errors) โ–ถ Remove the observation โ–ถ Be transparent either way

  • Is the outlier part of the data generating process?

โ–ถ Transform the dependent variable (log(๐‘ง)) โ–ถ Use a method that is robust to outliers (robust regression,

least absolute deviations)

25 / 67

slide-26
SLIDE 26

2/ Heteroskedasticity

26 / 67

slide-27
SLIDE 27

Review of homoskedasticity

  • Remember:

ฬ‚ ๐œธ = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒ๐ณ

  • ๐•Ž[๐ฏ|๐˜] = ฮฃ is the variance-covariance matrix of the errors.
  • Assumptions 1-4 give us this expression for sampling variance:

๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒฮฃ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

  • Under homoskedasticity, we simplifjed this to:

๐•Ž[ ฬ‚ ๐œธ|๐˜] = ๐œ2 (๐˜โ€ฒ๐˜)โˆ’1

  • Replace ๐œ2 with estimate ฬ‚

๐œ2 will give us our estimate of the covariance matrix

27 / 67

slide-28
SLIDE 28

Non-constant error variance

  • Homoskedastic:

๐•Ž[๐ฏ|๐˜] = ๐œ2๐‰ = โŽก โŽข โŽข โŽข โŽฃ ๐œ2 โ€ฆ ๐œ2 โ€ฆ โ‹ฎ โ€ฆ ๐œ2 โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Heteroskedastic:

๐•Ž[๐ฏ|๐˜] = โŽก โŽข โŽข โŽข โŽฃ ๐œ2

1

โ€ฆ ๐œ2

2

โ€ฆ โ‹ฎ โ€ฆ ๐œ2

๐‘œ

โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Independent, not identical
  • Cov[๐‘ฃ๐‘—, ๐‘ฃ๐‘˜|๐˜] = 0
  • ๐•Ž[๐‘ฃ๐‘—|๐ฒ๐‘—] = ๐œ2

๐‘—

28 / 67

slide-29
SLIDE 29

Violations of homoskedasticity

  • Violations: magnitude of ๐‘ฃ๐‘— difger at difgerent levels of ๐‘Œ๐‘—.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

  • 10
  • 5

5 10 15

Heteroskedastic

X Y 0.0 0.5 1.0 1.5 2.0 2.5 3.0

  • 10
  • 5

5 10 15

Homoskedastic

X Y 29 / 67

slide-30
SLIDE 30

Consequences of Heteroskedasticity

  • Standard error estimates biased, likely downward
  • Test statistics wonโ€™t have ๐‘ข or ๐บ distributions
  • ๐›ฝ-level tests, the probability of Type I error โ‰  ๐›ฝ
  • Coverage of 1 โˆ’ ๐›ฝ CIs โ‰  1 โˆ’ ๐›ฝ
  • OLS is not BLUE
  • ฬ‚

๐œธ still unbiased and consistent for ๐œธ

30 / 67

slide-31
SLIDE 31

Visual diagnostics

  • 1. Plot of residuals versus fjtted values

โ–ถ In R, plot(mod, which = 1) โ–ถ Residuals should have the same variance across ๐‘ฆ-axis

  • 2. Spread location plots

โ–ถ y-axis: Square-root of the absolute value of the residuals โ–ถ x-axis: Fitted values โ–ถ Usually has loess trend curve, should be fmat โ–ถ In R, plot(mod, which = 3) 31 / 67

slide-32
SLIDE 32

Diagnostics

plot(mod, which = 1, lwd = 3) plot(mod, which = 3, lwd = 3)

500 1000 1500

  • 1000

1000 2000 Fitted values Residuals

64 9 3

500 1000 1500 0.0 0.5 1.0 1.5 2.0 2.5 Fitted values Standardized residuals

64 9 3

32 / 67

slide-33
SLIDE 33

Dealing with non-constant error variance

  • 1. Transform the dependent variable
  • 2. Model the heteroskedasticity using Weighted Least Squares

(WLS)

  • 3. Use an estimator of ๐•Ž[ ฬ‚

๐œธ|๐˜] that is robust to heteroskedasticity

  • 4. Admit we have the wrong model and use a difgerent approach

33 / 67

slide-34
SLIDE 34

Example: Transforming Buchanan votes

mod2 <- lm(log(edaybuchanan) ~ log(edaytotal), data = flvote) summary(mod2) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept)

  • 2.728

0.400

  • 6.83

3.5e-09 *** ## log(edaytotal) 0.729 0.038 19.15 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.469 on 65 degrees of freedom ## Multiple R-squared: 0.849, Adjusted R-squared: 0.847 ## F-statistic: 367 on 1 and 65 DF, p-value: <2e-16

34 / 67

slide-35
SLIDE 35

Example: Transformed scale-location plot

plot(mod2, which = 3)

3 4 5 6 7 0.0 0.5 1.0 1.5 Fitted values Standardized residuals

lm(log(edaybuchanan) ~ log(edaytotal))

64 39 55

35 / 67

slide-36
SLIDE 36

Weighted least squares

  • Suppose that the heteroskedasticity is known up to a

multiplicative constant: ๐•Ž[๐‘ฃ๐‘—|๐˜] = ๐‘๐‘—๐œ2 where ๐‘๐‘— = ๐‘๐‘—(๐ฒโ€ฒ

๐‘—) is a positive and known function of ๐ฒโ€ฒ ๐‘—

  • WLS: multiply ๐‘ง๐‘— by 1/โˆš๐‘๐‘—:

๐‘ง๐‘— โˆš๐‘๐‘— = ๐›พ0 1 โˆš๐‘๐‘— + ๐›พ1 ๐‘ฆ๐‘—1 โˆš๐‘๐‘— + โ‹ฏ + ๐›พ๐‘™ ๐‘ฆ๐‘—๐‘™ โˆš๐‘๐‘— + ๐‘ฃ๐‘— โˆš๐‘๐‘—

36 / 67

slide-37
SLIDE 37

WLS intuition

  • Rescales errors to ๐‘ฃ๐‘—/โˆš๐‘๐‘—, which maintains zero mean error
  • But makes the error variance constant again:

๐•Ž [ 1 โˆš๐‘๐‘— ๐‘ฃ๐‘—|๐˜] = 1 ๐‘๐‘— ๐•Ž [๐‘ฃ๐‘—|๐˜] = 1 ๐‘๐‘— ๐‘๐‘—๐œ2 = ๐œ2

  • If you know ๐‘๐‘—, then you can use this approach to makes the

model homoeskedastic and, thus, BLUE again

  • When do we know ๐‘๐‘—?

37 / 67

slide-38
SLIDE 38

WLS procedure

  • Defjne the weighting matrix:

๐— = โŽก โŽข โŽข โŽข โŽฃ 1/โˆš๐‘1 1/โˆš๐‘2 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ 1/โˆš๐‘๐‘œ โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • Run the following regression:

๐—๐ณ = ๐—๐˜๐œธ + ๐—๐ฏ ๐ณโˆ— = ๐˜โˆ—๐œธ + ๐ฏโˆ—

  • Run regression of ๐ณโˆ— = ๐—๐ณ on ๐˜โˆ— = ๐—๐˜ and all

Gauss-Markov assumptions are satisfjed

  • Plugging into the usual formula for ฬ‚

๐œธ: ฬ‚ ๐œธ๐‘‹ = (๐˜โ€ฒ๐—โ€ฒ๐—๐˜)โˆ’1๐˜โ€ฒ๐—โ€ฒ๐—๐ณ

38 / 67

slide-39
SLIDE 39

WLS example

  • In R, use weights = argument to lm and give the weights

squared: 1/๐‘๐‘—

  • With the Buchanan data, maybe the variance is proportional

to the total number of ballots cast:

mod.wls <- lm(edaybuchanan ~ edaytotal, weights = 1/edaytotal, data = flvote) summary(mod.wls) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.06785 8.50723 3.18 0.0022 ** ## edaytotal 0.00263 0.00025 10.50 1.2e-15 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.565 on 65 degrees of freedom ## Multiple R-squared: 0.629, Adjusted R-squared: 0.624 ## F-statistic: 110 on 1 and 65 DF, p-value: 1.22e-15

39 / 67

slide-40
SLIDE 40

Comparing WLS to OLS

plot(mod, which = 3, lwd = 2, sub = "") plot(mod.wls, which = 3, lwd = 2, sub = "")

500 1000 1500 0.0 0.5 1.0 1.5 2.0 2.5 Fitted values Standardized residuals

64 9 3

OLS

500 1000 1500 0.0 0.5 1.0 1.5 2.0 2.5 Fitted values Standardized residuals

64 9 3

WLS

40 / 67

slide-41
SLIDE 41

Heteroskedasticity consistent estimator

  • Under non-constant error variance:

๐•Ž[๐ฏ|๐˜] = ฮฃ = โŽก โŽข โŽข โŽข โŽฃ ๐œ2

1

โ€ฆ ๐œ2

2

โ€ฆ โ‹ฎ โ€ฆ ๐œ2

๐‘œ

โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • When ฮฃ โ‰  ๐œ2๐‰, we are stuck with this expression:

๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒฮฃ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

  • White (1980) shows that we can consistently estimate this if

we have an estimate of ฮฃ: ฬ‚ ๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒฬ‚ ฮฃ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

  • Sandwich estimator with bread (๐˜โ€ฒ๐˜)โˆ’1 and meat ๐˜โ€ฒฬ‚

ฮฃ๐˜

41 / 67

slide-42
SLIDE 42

Computing HC/robust standard errors

  • 1. Fit regression and obtain residuals

ฬ‚ ๐ฏ

  • 2. Construct the โ€œmeatโ€ matrix ฬ‚

ฮฃ with squared residuals in diagonal:

ฬ‚ ฮฃ = โŽก โŽข โŽข โŽข โŽฃ ฬ‚ ๐‘ฃ2

1

โ€ฆ ฬ‚ ๐‘ฃ2

2

โ€ฆ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ€ฆ ฬ‚ ๐‘ฃ2

๐‘œ

โŽค โŽฅ โŽฅ โŽฅ โŽฆ

  • 3. Plug ฬ‚

ฮฃ into sandwich formula to obtain HC/robust estimator

  • f the covariance matrix:

ฬ‚ ๐•Ž[ ฬ‚ ๐œธ|๐˜] = (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒฬ‚ ฮฃ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

  • Small-sample corrections (called โ€˜HC1โ€™):

ฬ‚ ๐•Ž[ ฬ‚ ๐œธ|๐˜] = ๐‘œ ๐‘œ โˆ’ ๐‘™ โˆ’ 1 โ‹… (๐˜โ€ฒ๐˜)โˆ’1 ๐˜โ€ฒฬ‚ ฮฃ๐˜ (๐˜โ€ฒ๐˜)โˆ’1

42 / 67

slide-43
SLIDE 43

Robust SEs in Florida data

coeftest(mod) ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.22945 49.14146 1.10 0.27 ## edaytotal 0.00232 0.00031 7.48 2.4e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 coeftest(mod, vcovHC(mod, type = "HC0")) ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.22945 40.61283 1.34 0.1864 ## edaytotal 0.00232 0.00087 2.67 0.0096 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

43 / 67

slide-44
SLIDE 44

Robust SEs with correction

lmtest::coeftest(mod, sandwich::vcovHC(mod, type = "HC0")) ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.22945 40.61283 1.34 0.1864 ## edaytotal 0.00232 0.00087 2.67 0.0096 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 lmtest::coeftest(mod, sandwich::vcovHC(mod, type = "HC1")) ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.229453 41.232904 1.32 0.193 ## edaytotal 0.002323 0.000884 2.63 0.011 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

44 / 67

slide-45
SLIDE 45

WLS vs. Whiteโ€™s Estimator

  • WLS:

โ–ถ With known weights, WLS is effjcient โ–ถ and ฬ‚

๐‘‡๐น[ ฬ‚ ๐œธ๐‘‹๐‘€๐‘‡] is consistent

โ–ถ but weights usually arenโ€™t known

  • Whiteโ€™s Estimator:

โ–ถ Doesnโ€™t change estimate ฬ‚

๐œธ

โ–ถ Consistent for ๐•Ž[ ฬ‚

๐œธ] under any form of heteroskedasticity

โ–ถ Because it relies on consistency, it is a large sample result, best

with large ๐‘œ

โ–ถ For small ๐‘œ, performance might be poor 45 / 67

slide-46
SLIDE 46

3/ Nonlinearity of the regression function

46 / 67

slide-47
SLIDE 47

Buchanan model, part 2

mod3 <- lm(edaybuchanan ~ edaytotal + absnbuchanan, data = flvote) summary(mod3)

## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept)

  • 29.34807

55.19635

  • 0.53

0.5969 ## edaytotal 0.00110 0.00048 2.29 0.0253 * ## absnbuchanan 6.89546 2.12942 3.24 0.0019 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 317 on 61 degrees of freedom ## (3 observations deleted due to missingness) ## Multiple R-squared: 0.536, Adjusted R-squared: 0.521 ## F-statistic: 35.2 on 2 and 61 DF, p-value: 6.71e-11

47 / 67

slide-48
SLIDE 48

Added variable plot

  • Need a way to visualize conditional relationship between ๐‘

and ๐‘Œ๐‘˜

  • How to construct an added variable plot:
  • 1. Get residuals from regression of ๐‘ on all covariates except ๐‘Œ๐‘˜
  • 2. Get residuals from regression of ๐‘Œ๐‘˜ on all other covariates
  • 3. Plot residuals from (1) against residuals from (2)
  • In R: avPlots(model) from the car package
  • OLS fjt to this plot will have exactly ฬ‚

๐›พ๐‘˜ and 0 intercept

  • Use local smoother (loess) to detect any non-linearity

48 / 67

slide-49
SLIDE 49

Buchanan AV plot

par(mfrow = c(1, 2))

  • ut <- car::avPlots(mod3, "edaytotal")

lines(loess.smooth(x = out$edaytotal[, 1], y = out$edaytotal[, 2]), col = "dodgerblue", lwd = 2)

  • ut2 <- car::avPlots(mod3, "absnbuchanan")

lines(loess.smooth(x = out2$absnbuchanan[, 1], y = out2$absnbuchanan[, 2]), col = "dodgerblue", lwd = 2)

  • 100000

100000 300000 500000

  • 500

500 1000 1500 2000 edaytotal | others edaybuchanan | others

  • 80
  • 60
  • 40
  • 20

20 40 60

  • 1000
  • 500

500 1000 1500 2000 absnbuchanan | others edaybuchanan | others

49 / 67

slide-50
SLIDE 50

How to deal with non-linearity

  • Breaking up categorical variables into dummy variables
  • Including interaction terms
  • Including polynomial terms
  • Using transformations
  • Using more fmexible models:

โ–ถ Generalized additive models and splines allow the data to tell

us what the functional form is.

โ–ถ Complicated math, but important ideas. 50 / 67

slide-51
SLIDE 51

Basis functions

  • Basis functions are the function of ๐‘ฆ๐‘— that we include in the

model:

โ–ถ Examples weโ€™ve seen: โ„Ž๐‘›(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—, โ„Ž๐‘›(๐‘ฆ๐‘—) = ๐‘ฆ2

๐‘— ,

โ„Ž๐‘›(๐‘ฆ๐‘—) = log(๐‘ฆ๐‘—)

  • Difgerent basis functions will allow for difgerent forms of

non-linearity

  • We could always break up ๐‘Œ๐‘— into bins and estimate piecewise

constant: โ„Ž1 = 1, โ„Ž2 = ๐Ÿš(๐‘1 < ๐‘ฆ๐‘— < ๐‘2), โ„Ž3 = ๐Ÿš(๐‘ฆ๐‘— > ๐‘2)

  • ๐‘1 < ๐‘2 are knots

51 / 67

slide-52
SLIDE 52

Piecewise constant

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

52 / 67

slide-53
SLIDE 53

Piecewise linear

  • We could allow there to be difgerent regression lines in each

bin by adding interactions: โ„Ž1(๐‘ฆ๐‘—) = 1, โ„Ž2(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—, โ„Ž3(๐‘ฆ๐‘—) = ๐Ÿš(๐‘1 < ๐‘ฆ๐‘— < ๐‘2), โ„Ž4(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—๐Ÿš(๐‘1 < ๐‘ฆ๐‘— < ๐‘2), โ„Ž5(๐‘ฆ๐‘—) = ๐Ÿš(๐‘ฆ๐‘— โ‰ฅ ๐‘2), โ„Ž6(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—๐Ÿš(๐‘ฆ๐‘— โ‰ฅ ๐‘2)

53 / 67

slide-54
SLIDE 54

Piecewise linear

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

54 / 67

slide-55
SLIDE 55

Continuous piecewise linear

  • Problem: piecewise functions are discontinuous.
  • Can use clever basis functions to get continuous piecewise

linear function of ๐‘Œ๐‘—: โ„Ž1(๐‘ฆ๐‘—) = 1, โ„Ž2(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—, โ„Ž3(๐‘ฆ๐‘—) = (๐‘ฆ๐‘— โˆ’ ๐‘1)+, โ„Ž4(๐‘ฆ๐‘—) = (๐‘ฆ๐‘— โˆ’ ๐‘2)+

  • (๐‘ฆ๐‘— โˆ’ ๐‘1)+ = ๐‘ฆ๐‘— โˆ’ ๐‘1 when ๐‘ฆ๐‘— > ๐‘1, 0, otherwise

55 / 67

slide-56
SLIDE 56

Why continuous?

๐‘ง๐‘— = ๐›พ0 + ๐›พ1๐‘ฆ๐‘— + ๐›พ2(๐‘ฆ๐‘— โˆ’ ๐‘1)+ + ๐›พ3(๐‘ฆ๐‘— โˆ’ ๐‘2)+ + ๐‘ฃ๐‘—

  • Value at ๐‘1 approaching from below:

๐›พ0 + ๐›พ1๐‘1

  • Value at ๐‘1 approaching from above:

๐›พ0 + ๐›พ1๐‘1 + ๐›พ2(๐‘1 โˆ’ ๐‘1)+ = ๐›พ0 + ๐›พ1๐‘1

  • Function is thus continuous at the knot points, but slopes

change:

โ–ถ ๐›พ1 = slope when ๐‘Œ๐‘— < ๐‘1 โ–ถ ๐›พ1 + ๐›พ2 = slope when ๐‘1 < ๐‘Œ๐‘— < ๐‘2 โ–ถ ๐›พ1 + ๐›พ2 + ๐›พ3 = slope when ๐‘Œ๐‘— > ๐‘2 โ–ถ Function is continuous at cutpoints 56 / 67

slide-57
SLIDE 57

Continuous piecewise linear

h2 <- x h3 <- 1 * (x > -1.5) * (x - -1.5) h4 <- 1 * (x > 1.5) * (x - 1.5) reg <- lm(y ~ h2 + h3 + h4)

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

57 / 67

slide-58
SLIDE 58

Cubic splines

  • Continuous piecewise linear has โ€œkinksโ€ at the knots, but we

probably want โ€œsmoothโ€ functions.

โ–ถ What does smooth mean? Continuous derivatives! โ–ถ โ‡ use higher-order polynomials in the basis functions

  • Cubic spline basis: bases that produce continuous functions

with continuous fjrst and second derivatives โ„Ž1(๐‘ฆ๐‘—) = 1, โ„Ž2(๐‘ฆ๐‘—) = ๐‘ฆ๐‘—, โ„Ž3(๐‘ฆ๐‘—) = ๐‘ฆ2

๐‘—

โ„Ž4(๐‘ฆ๐‘—) = ๐‘ฆ3

๐‘— ,

โ„Ž5(๐‘ฆ๐‘—) = (๐‘ฆ๐‘— โˆ’ ๐‘1)3

+,

โ„Ž6(๐‘ฆ๐‘—) = (๐‘ฆ๐‘— โˆ’ ๐‘2)3

+

  • Basic idea: local polynomial regression (between knots) that

have to connect and be smooth at the knots.

โ–ถ Ensure this by allowing only the coeffjcient on the cubic term

to change at the knot point.

58 / 67

slide-59
SLIDE 59

Cubic spline

h2 <- x h3 <- x^2 h4 <- x^3 h5 <- 1 * (x > -1.5) * (x - -1.5)^3 h6 <- 1 * (x > 1.5) * (x - 1.5)^3 reg <- lm(y ~ h2 + h3 + h4 + h5 + h6)

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

59 / 67

slide-60
SLIDE 60

Cubic spline vs global

h2 <- x h3 <- x^2 h4 <- x^3 rr <- lm(y ~ h2 + h3 + h4)

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

Global Cubic Local Cubic Spline

60 / 67

slide-61
SLIDE 61

Knotuy problems

  • Any function can be approximated as we increase the number
  • f knot points.
  • How to choose the number/location of knot points?

โ–ถ More knot points โ‡ โ€œrougherโ€ function, less in-sample bias,

more variance.

โ–ถ Fewer knot points โ‡ โ€œsmootherโ€ function, more in-sample

bias, less variance.

  • In-sample fjt might be great, out-of-sample fjt might be

terrible.

  • More general smoothing approaches have difgerent ways of

representing this trade-ofg other than knots.

61 / 67

slide-62
SLIDE 62

Cross-validation

  • General strategy for bias-variance trade-ofgs: cross-validation.
  • Set aside units to test out-of-sample prediction
  • Cross-validation procedure:
  • 1. Choose a number of evenly spread knots, ๐‘.
  • 2. Withhold unit ๐‘—, estimate the CEF of ๐‘ง๐‘— given ๐‘ฆ๐‘— using a cubic

spline with ๐‘ knots.

  • 3. Get predicted value for ๐‘—,

ฬ‚ ๐‘งโˆ’๐‘—

๐‘—๐‘ and caculate squared prediction

error: (๐‘ง๐‘— โˆ’ ฬ‚ ๐‘งโˆ’๐‘—

๐‘—๐‘)2.

  • 4. Repeat 2-3 for each observation and take that average to get

the MSE with ๐‘ knots.

  • 5. Repeat 1-4 for difgerent values of ๐‘ and choose the value of ๐‘

that has the lowest MSE.

62 / 67

slide-63
SLIDE 63

Automatic knot selection

smth <- smooth.spline(x, y) plot(x, y, ylim = c(-3, 3), pch = 19, col = "grey50", bty = "n") lines(smth, col = "indianred", lwd = 2)

  • 4
  • 2

2 4

  • 3
  • 2
  • 1

1 2 3 x y

63 / 67

slide-64
SLIDE 64

Generalized additive models

  • Generalized additive models (GAMs) allow you to estimate

the spline of any particular variable in the regression.

โ–ถ Each spline is additive: ๐‘ง๐‘— = ๐‘”1(๐‘ฆ๐‘—1) + ๐‘”๐‘ฆ(๐‘ฆ๐‘—2) + ๐‘ฃ๐‘—

  • Can plot the AV-plot of the spline to get a sense for the

nonlinearity of the functional form.

  • Use cross-validation to select the number of knots

64 / 67

slide-65
SLIDE 65

GAM example fit

## library(mgcv) ## GAM package

  • ut <- gam(edaybuchanan ~ s(edaytotal) + s(absnbuchanan), data = flvote,

subset = county != "Palm Beach") ## ## Family: gaussian ## Link function: identity ## ## Formula: ## edaybuchanan ~ s(edaytotal) + s(absnbuchanan) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 221.84 6.41 34.6 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(edaytotal) 6.85 7.82 10.6 1.6e-09 *** ## s(absnbuchanan) 2.95 3.64 22.6 1.6e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## R-sq.(adj) = 0.95 Deviance explained = 95.8% ## GCV = 3129 Scale est. = 2592.3 n = 63 65 / 67

slide-66
SLIDE 66

Example: generalized additive models

plot(out, shade = TRUE, residual = TRUE, pch = 1)

200000 400000 600000

  • 200

200 400 600 edaytotal s(edaytotal,6.85) 20 40 60 80 100

  • 200

200 400 600 absnbuchanan s(absnbuchanan,2.95)

66 / 67

slide-67
SLIDE 67

Summary

  • For infmuential points, and nonlinearity:

โ–ถ Check your data! summary(), plot(), etc โ–ถ Use transformations to make assumptions more plausible โ–ถ Weaken linearity when you need to. 67 / 67