STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer - - PowerPoint PPT Presentation

stat 213 cross validation and multifactor anova
SMART_READER_LITE
LIVE PREVIEW

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer - - PowerPoint PPT Presentation

Outline Last Time Cross-Validation STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12 April 2016 Outline Last Time Cross-Validation Outline Last Time Cross-Validation Outline Last Time


slide-1
SLIDE 1

Outline Last Time Cross-Validation

STAT 213 Cross-Validation (and Multifactor ANOVA?)

Colin Reimer Dawson

Oberlin College

12 April 2016

slide-2
SLIDE 2

Outline Last Time Cross-Validation

Outline

Last Time Cross-Validation

slide-3
SLIDE 3

Outline Last Time Cross-Validation

Reflection Questions

How do you decide among all these predictor-selection methods?

slide-4
SLIDE 4

Outline Last Time Cross-Validation

For Thursday

  • Read: see last time
  • Write: finish today’s worksheet
  • Answer: see last time
slide-5
SLIDE 5

Outline Last Time Cross-Validation

Outline

Last Time Cross-Validation

slide-6
SLIDE 6

Outline Last Time Cross-Validation

Multicollinearity

When one predictor is highly predictable from the other predictors, the model suffers from multicollinearity

slide-7
SLIDE 7

Outline Last Time Cross-Validation

Multicollinearity

When one predictor is highly predictable from the other predictors, the model suffers from multicollinearity One measure: R2 from a model predicting Xj using X1, . . . , Xj−1, Xj+1, . . . , Xk.

slide-8
SLIDE 8

Outline Last Time Cross-Validation

Multicollinearity

When one predictor is highly predictable from the other predictors, the model suffers from multicollinearity One measure: R2 from a model predicting Xj using X1, . . . , Xj−1, Xj+1, . . . , Xk. Rough rule: If this R2 is > 0.80, test/intervals for coefficients may not be meaningful.

slide-9
SLIDE 9

Outline Last Time Cross-Validation

Multicollinearity

When one predictor is highly predictable from the other predictors, the model suffers from multicollinearity One measure: R2 from a model predicting Xj using X1, . . . , Xj−1, Xj+1, . . . , Xk. Rough rule: If this R2 is > 0.80, test/intervals for coefficients may not be meaningful. Equivalently: VIF =

1 1−R2 > 5

slide-10
SLIDE 10

Outline Last Time Cross-Validation

Variance Inflation Factor

m.midterm <- lm(Midterm ~ Quiz, data = Scores) summary(m.midterm)$r.squared [1] 0.9498368 m.quiz <- lm(Quiz ~ Midterm, data = Scores) summary(m.quiz)$r.squared [1] 0.9498368 vif(m.both) Midterm Quiz 19.93495 19.93495 vif(m.rotated) V1 V2 1 1

slide-11
SLIDE 11

Outline Last Time Cross-Validation

Remedies for Multicollinearity

  • 1. Remove redundant predictors
  • 2. Combine predictors into a scale
  • 3. Use the multicollinear model anyway, just don’t use

tests/intervals for individual coefficients.

slide-12
SLIDE 12

Outline Last Time Cross-Validation

Model Selection

“Scoring”

  • Adj. R2

Mallow’s Cp “Search” Domain Knowledge Best Subset Forward Selection Backward Selection Stepwise Selection

slide-13
SLIDE 13

Outline Last Time Cross-Validation

Criteria to "score" models

  • 1. Adj. R2: balances fit and complexity for a model in

isolation

slide-14
SLIDE 14

Outline Last Time Cross-Validation

Criteria to "score" models

  • 1. Adj. R2: balances fit and complexity for a model in

isolation

  • 2. Mallow’s Cp / Akaike Information Criterion (AIC):

estimates mean squared prediction error based on ˆ σ2

ε from

a “full” model

slide-15
SLIDE 15

Outline Last Time Cross-Validation

Mallow’s Cp / AIC

For a model with p coefficients (including the intercept), selected from a pool of predictors, fit using n observations: Cp = SSEreduced MSEfull + 2p − n (1) = p + SSEdiff MSEfull (2) Smaller values correspond to better fit and simpler models.

slide-16
SLIDE 16

Outline Last Time Cross-Validation

Methods to Explore the Space of Combinations

  • 1. Domain Knowledge: Only build models that make sense
  • 2. Best subset: consider all possible combinations (2k)
  • 3. Forward selection: start with null model, and consider

adding one predictor at a time

  • 4. Backward elimination: start with full model and consider

removing one predictor at a time

  • 5. Stepwise regression: consider steps in both directions at

each iteration Note: Choose best step based on adj-R2 or Cp/AIC, not based

  • n P-values
slide-17
SLIDE 17

Outline Last Time Cross-Validation

Outline

Last Time Cross-Validation

slide-18
SLIDE 18

Outline Last Time Cross-Validation

A third dimension

What data should we use to (a) Fit the models? (b) Evaluate the models?

slide-19
SLIDE 19

Outline Last Time Cross-Validation

A third dimension

What data should we use to (a) Fit the models? (b) Evaluate the models? Two answers

  • 1. Use all the data for both (what we’ve done so far)
  • 2. Separate the data set into distinct “training” and

“validation” sets.

slide-20
SLIDE 20

Outline Last Time Cross-Validation

In-Sample vs. Out of Sample Prediction

  • Idea: A good model should make accurate predictions on

data it hasn’t seen

slide-21
SLIDE 21

Outline Last Time Cross-Validation

In-Sample vs. Out of Sample Prediction

  • Idea: A good model should make accurate predictions on

data it hasn’t seen

  • Evaluating in-sample is subject to overfitting: Since we

try to minimize SSE (and maximize SSM), we are liable to extract too much “signal”. Some of the SSM will really be “noise”.

slide-22
SLIDE 22

Outline Last Time Cross-Validation

In-Sample vs. Out of Sample Prediction

  • Idea: A good model should make accurate predictions on

data it hasn’t seen

  • Evaluating in-sample is subject to overfitting: Since we

try to minimize SSE (and maximize SSM), we are liable to extract too much “signal”. Some of the SSM will really be “noise”.

  • This is particularly likely if we have lots of model d

f.

slide-23
SLIDE 23

Outline Last Time Cross-Validation

In-Sample vs. Out of Sample Prediction

  • Idea: A good model should make accurate predictions on

data it hasn’t seen

  • Evaluating in-sample is subject to overfitting: Since we

try to minimize SSE (and maximize SSM), we are liable to extract too much “signal”. Some of the SSM will really be “noise”.

  • This is particularly likely if we have lots of model d

f.

  • Approaches such as adjusted R2 and Mallow’s Cp try to

account for overfitting, but why not actually try to predict on different data than used for fitting?

slide-24
SLIDE 24

Outline Last Time Cross-Validation

Cross-Validation

Cross-validation is a technique whereby the full dataset is divided into training and validation (held-out) sets. The first is used for fitting parameters; the second for evaluating predictive power.

slide-25
SLIDE 25

Outline Last Time Cross-Validation

Cross-Validation

Cross-validation is a technique whereby the full dataset is divided into training and validation (held-out) sets. The first is used for fitting parameters; the second for evaluating predictive power. Versions:

  • 1. Two-fold: Divide data (randomly) in half. Fit two models,

exchanging roles of training and validation.

  • 2. k-fold: Divide data into k equal sized sets, fit k models letting

each set as the validation set.

  • 3. Leave-one-out (n-fold): Let each observation be its own

validation set. Requires fitting n models.

Can “score” a model form using average predictive accuracy on