Linear Models for Statistical Learning, Regression David Dalpiaz - - PowerPoint PPT Presentation

linear models for statistical learning regression
SMART_READER_LITE
LIVE PREVIEW

Linear Models for Statistical Learning, Regression David Dalpiaz - - PowerPoint PPT Presentation

Linear Models for Statistical Learning, Regression David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 01 due today. Homework 02 released later today. (Hopefully.) 2 Statistical Learning Supervised Learning Regression


slide-1
SLIDE 1

Linear Models for Statistical Learning, Regression

David Dalpiaz STAT 430, Fall 2017

1

slide-2
SLIDE 2

Announcements

  • Homework 01 due today.
  • Homework 02 released later today. (Hopefully.)

2

slide-3
SLIDE 3

Statistical Learning

  • Supervised Learning
  • Regression
  • Classification
  • Unsupervised Learning

3

slide-4
SLIDE 4

Regression Setup

Y = f (x1, x2, x3, . . . xp) + ǫ numeric response = signal + noise

  • Want to learn the signal
  • Want to be very careful not to “learn noise”

4

slide-5
SLIDE 5

Using a Linear Model

Setup: Y = f (x1, x2, x3, . . . xp) + ǫ Assume: f (x1, x2, x3, . . . xp) = β0 + β1x1 + β2x2 + . . . + βpxp

5

slide-6
SLIDE 6

The Linear Model

Y = β0 + β1x1 + β2x2 + . . . + βpxp + ǫ, ǫ ∼ N(0, σ2) Y | X ∼ N(β0 + β1x1 + β2x2 + . . . + βpxp, σ2) There are a total of p + 2 parameters in this model

  • The p + 1 β parameters, or coefficients, control the signal
  • The σ2 controls the noise

6

slide-7
SLIDE 7

Fitting a Linear Model

This is a parametric model, meaning to fit the model, we need to estimate the parameters. For the sake of making predictions, we only need to estimate the β parameters since ˆ f (x1, x2, x3, . . . xp) = ˆ y(x1, x2, x3, . . . xp) = ˆ β0+ˆ β1x1+ˆ β2x2+. . .+ˆ βpxp Using either least squares or maximum likelihood, this becomes the same optimization problem argmin

β0,β1,...βp n

  • i=1

(yi − (β0 + β1xi1 + β2xi2 + · · · + βpxip))2

7

slide-8
SLIDE 8

Estimating σ2

While it is not needed to make predictions, to fully estimate the model, we would also need to estimate σ2. s2

e =

1 n − (p + 1)

n

  • i=1

(yi − ˆ yi)2 Least Squares ˆ σ2 = 1 n

n

  • i=1

(yi − ˆ yi)2 MLE Both are estimates of σ2. What is the difference?

8

slide-9
SLIDE 9

Model “Size”

Consider two models: Y = β0 + β1x1 + β2x2 + ǫ Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ǫ Which is bigger?

9

slide-10
SLIDE 10

Model Complexity

In general, we are interested in the complexity or flexibility of a model. For nested linear models, the more parameters, the bigger, thus, more complex. Models that are more complex will be more wiggly.

10

slide-11
SLIDE 11

Pictures of Complexity

Go to ISL Slides

11

slide-12
SLIDE 12

Test-Train Split

We’ve already discussed the Test-Train Split and RMSE RMSETrain = RMSE(ˆ f , Train Data) =

  • 1

nTr

  • i∈Train
  • yi − ˆ

f (xi)

2

RMSETest = RMSE(ˆ f , Test Data) =

  • 1

nTe

  • i∈Test
  • yi − ˆ

f (xi)

2

12

slide-13
SLIDE 13

Overfitting

  • Overfitting occurs when a model is too

complex (too flexible) for the data

  • Underfitting occurs when a model is not

complex enough (too inflexible) for the data

13

slide-14
SLIDE 14

Train RMSE

20 40 60 80 100 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Prediction Error vs Model Complexity

Complexity (Parameters) Error (RMSE)

14

slide-15
SLIDE 15

(Expected) Test RMSE

20 40 60 80 100 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Prediction Error vs Model Complexity

Complexity (Parameters) Error (RMSE) (Expected) Test Train

15

slide-16
SLIDE 16

The “Best” Model

  • Pick the model with the lowest Test RMSE
  • Compared to this. . .
  • More complex models with higher Test RMSE are Overfitting
  • Less complex models with higher Test RMSE are Underfitting
  • This is only a “guess” of the “best” model based on available

information

  • In practice, Test RMSE might not be such a nice curve
  • This is due to the randomness of the split
  • You could get lucky, or unlucky

16

slide-17
SLIDE 17

Explanation vs Prediction

  • Sometimes we check model assumptions directly
  • When predicting, we make assumptions and check them

indirectly

  • If we assume a correct (or close to correct) form of the model,

the Test RMSE will be low

17

slide-18
SLIDE 18

If Time. . .

  • rmarkdown Tables
  • Using code from the Internet
  • Back to Test-Train Split Lab
  • What would be a good Test RMSE?
  • Overfitting: n vs p
  • Randomness of Split
  • Pseudo RNG

18