STAT 213 Model Selection II Colin Reimer Dawson Oberlin College - - PDF document

▶

Apr 09, 2024 255 likes •321 views

Outline Model Selection Exploring Model Space Notes STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline Model Selection Exploring Model Space Notes Outline Model Selection Exploring Model

SLIDE 1

Outline Model Selection Exploring Model Space

STAT 213 Model Selection II

Colin Reimer Dawson

Oberlin College

March 30, 2018 1 / 13

Outline Model Selection Exploring Model Space

Outline

Model Selection Exploring Model Space 2 / 13

Outline Model Selection Exploring Model Space

Outline

Model Selection Exploring Model Space 3 / 13

Notes Notes Notes

SLIDE 2

Outline Model Selection Exploring Model Space

So many models...

How to decide among all these models?
1. Understand the subject area! Build sensible models.
2. Nested F-tests
3. Model quality measures

4 / 13

Outline Model Selection Exploring Model Space

What Makes a Good Model?

Fit High R2 Small SSE Large F Validity Strong evidence for predictors Simple (Parsimonious) Generalizes outside sample 5 / 13

Outline Model Selection Exploring Model Space

Why Does Parsimony Matter?

Don’t we just care about good predictions? Not exclusively...

We also use models to understand the world (harder with

more complexity) And even so...

We really care about making predictions for data we

haven’t seen yet. 6 / 13

Notes Notes Notes

SLIDE 3

Outline Model Selection Exploring Model Space

Criteria to “score” models

1. high R2/low SSE/low ˆ

σ2

ε: always prefers more complex

models

2. Adj. R2: balances fit and complexity
3. Mallow’s Cp / Akaike Information Criterion (AIC):

estimates mean squared prediction error based on ˆ σ2

ε from

a “full” model

4. Out-of-sample predictive accuracy (next time)

7 / 13

Outline Model Selection Exploring Model Space

Mallow’s Cp / AIC

Two measures that reduce to the same thing in the case of MLR with independent, equal variance, Normal residuals. For a “reduced” model with preduced total parameters (including the intercept) which is nested in a “full” model with pfull parameters, both fit using n observations: Cp = SSEreduced MSEfull + 2preduced − n (1) = preduced + SSEdiff MSEfull (2) where smaller values indicate a simpler model (smaller preduced) and/or a better fit (smaller SSEdiff) 8 / 13

Outline Model Selection Exploring Model Space

Outline

Model Selection Exploring Model Space 9 / 13

Notes Notes Notes

SLIDE 4

Outline Model Selection Exploring Model Space

Model Selection

Five predictor-selection methods:

1. Domain knowledge (+ a few F-tests)
2. Best subset
3. Forward selection
4. Backward selection
5. Stepwise selection

10 / 13

Outline Model Selection Exploring Model Space

Automated exploration of predictor subsets

1. Best subset: consider all possible combinations (2K)
2. Forward selection: start with null model, and consider

adding one predictor at a time

3. Backward elimination: start with full model and consider

removing one predictor at a time

4. Stepwise regression: consider both additions and

subtractions at each iteration Note: Choose best step based on adj-R2 or Cp/AIC, not based

n P-values

11 / 13

Outline Model Selection Exploring Model Space

Model Selection

“Scoring” R2

adj.

Cp CV Error (next time) “Search” Domain Knowledge Best Subset Forward Selection Backward Selection Stepwise Selection 12 / 13

Notes Notes Notes

STAT 213 Model Selection II

Colin Reimer Dawson

March 30, 2018 1 / 13

Outline

Model Selection Exploring Model Space 2 / 13

Outline

Model Selection Exploring Model Space 3 / 13

Notes Notes Notes

So many models...

4 / 13

What Makes a Good Model?

Fit High R2 Small SSE Large F Validity Strong evidence for predictors Simple (Parsimonious) Generalizes outside sample 5 / 13

Why Does Parsimony Matter?

Don’t we just care about good predictions? Not exclusively...

more complexity) And even so...

haven’t seen yet. 6 / 13

Notes Notes Notes

Criteria to “score” models

σ2

models

estimates mean squared prediction error based on ˆ σ2

a “full” model

7 / 13

Mallow’s Cp / AIC

Outline

Model Selection Exploring Model Space 9 / 13

Notes Notes Notes

Model Selection

Five predictor-selection methods:

10 / 13

Automated exploration of predictor subsets

adding one predictor at a time

removing one predictor at a time

subtractions at each iteration Note: Choose best step based on adj-R2 or Cp/AIC, not based

11 / 13

Model Selection

“Scoring” R2

Cp CV Error (next time) “Search” Domain Knowledge Best Subset Forward Selection Backward Selection Stepwise Selection 12 / 13

Notes Notes Notes

Example: Baseball Win % Demo

13 / 13

Notes Notes Notes