Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 - - PowerPoint PPT Presentation

bias variance tradeoff
SMART_READER_LITE
LIVE PREVIEW

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 - - PowerPoint PPT Presentation

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 Announcements See Compass2g announcement See www.stat432.org Quiz 02 Quiz 03 Quiz 04 and Analysis 01 incoming 2 Statistical Learning Supervised Learning


slide-1
SLIDE 1

Bias-Variance Tradeoff

David Dalpiaz STAT 432, Fall 2019

1

slide-2
SLIDE 2

Announcements

  • See Compass2g announcement
  • See www.stat432.org
  • Quiz 02
  • Quiz 03
  • Quiz 04 and Analysis 01 incoming

2

slide-3
SLIDE 3

Statistical Learning

  • Supervised Learning
  • Regression
  • Parametric
  • Non-Parametric
  • Classification
  • Unsupervised Learning

3

slide-4
SLIDE 4

Regression Setup

Given a random pair (X, Y ) ∈ Rp × R. We would like to “predict” Y with some function of X, say, f (X). Define the squared error loss of estimating Y using f (X) as L(Y , f (X)) (Y − f (X))2 We call the expected loss the risk of estimating Y using f (X) R(Y , f (X)) E[L(Y , f (X))] = EX,Y [(Y − f (X))2]

4

slide-5
SLIDE 5

Minimizing Risk

After conditioning on X EX,Y

  • (Y − f (X))2

= EXEY |X

  • (Y − f (X))2 | X = x
  • We see that the risk is minimzied by the conditional mean

f (x) = E(Y | X = x) We call this, the regression function.

5

slide-6
SLIDE 6

Estimating f

Given data D = (xi, yi) ∈ Rp × R

  • ur goal is to find some ˆ

f that is a good estimate of the regression function f .

6

slide-7
SLIDE 7

Expected Prediction Error

EPE

  • Y , ˆ

f (X)

  • EX,Y ,D
  • Y − ˆ

f (X)

2

7

slide-8
SLIDE 8

Reducible and Irreducible Error

EPE

  • Y , ˆ

f (x)

  • =EY |X,D
  • Y − ˆ

f (X)

2 | X = x

  • = EY |X,D
  • Y − ˆ

f (X)

2 | X = x

  • = ED
  • f (x) − ˆ

f (x)

2

  • reducible error

+ VY |X [Y | X = x]

  • irreducible error

8

slide-9
SLIDE 9

Bias and Variance

Recall the definition of the bias of an estimator. bias(ˆ θ) E

ˆ

θ

  • − θ

Also recall the definition of the variance of an estimator. V(ˆ θ) = var(ˆ θ) E

ˆ

θ − E

ˆ

θ

2

9

slide-10
SLIDE 10

Bias and Variance

10

slide-11
SLIDE 11

Bias-Variance Decomposition

MSE

  • f (x), ˆ

f (x)

  • = ED
  • f (x) − ˆ

f (x)

2

=

  • f (x) − E

ˆ

f (x)

2

  • bias2(ˆ

f (x))

+ E

ˆ

f (x) − E

ˆ

f (x)

2

  • var(ˆ

f (x)) 11

slide-12
SLIDE 12

Bias-Variance Decomposition

MSE

  • f (x), ˆ

f (x)

  • = bias2 ˆ

f (x)

  • + var

ˆ

f (x)

  • 12
slide-13
SLIDE 13

Bias-Variance Decomposition

More Dominant Variance

Model Complexity Error

Decomposition of Prediction Error

Model Complexity Error

More Dominant Bias

Model Complexity Error Squared Bias Variance Bayes EPE

13

slide-14
SLIDE 14

Expected Test Error

Error versus Model Complexity

Error Low ← Complexity → High High ← Bias → Low Low ← Variance → High Validation Train

14

slide-15
SLIDE 15

Simulation Study, Regression Function

We will illustrate these decompositions, most importantly the bias-variance tradeoff, through simulation. Suppose we would like to train a model to learn the true regression function function f (x) = x2 f = function(x) { x ^ 2 }

15

slide-16
SLIDE 16

Simulation Study, Regression Function

More specifically, we’d like to predict an observation, Y , given that X = x by using ˆ f (x) where E[Y | X = x] = f (x) = x2 and V[Y | X = x] = σ2.

16

slide-17
SLIDE 17

Simulation Study, Data Generating Process

To carry out a concrete simulation example, we need to fully specify the data generating process. We do so with the following R code. gen_sim_data = function(f, sample_size = 100) { x = runif(n = sample_size, min = 0, max = 1) y = rnorm(n = sample_size, mean = f(x), sd = 0.3) data.frame(x, y) }

17

slide-18
SLIDE 18

Simulation Study, Models

Using this setup, we will generate datasets, D, with a sample size n = 100 and fit four models. predict(fit0, x) = ˆ f0(x) = ˆ β0 predict(fit1, x) = ˆ f1(x) = ˆ β0 + ˆ β1x predict(fit2, x) = ˆ f2(x) = ˆ β0 + ˆ β1x + ˆ β2x2 predict(fit9, x) = ˆ f9(x) = ˆ β0 + ˆ β1x + ˆ β2x2 + . . . + ˆ β9x9

18

slide-19
SLIDE 19

Simulation Study, Trained Models

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Four Polynomial Models fit to a Simulated Dataset

x y y ~ 1 y ~ poly(x, 1) y ~ poly(x, 2) y ~ poly(x, 9) truth

19

slide-20
SLIDE 20

Simulation Study, Repeated Training

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 1

x y y ~ 1 y ~ poly(x, 9) 0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 2

x y y ~ 1 y ~ poly(x, 9) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5

Simulated Dataset 3

x y y ~ 1 y ~ poly(x, 9)

20

slide-21
SLIDE 21

Simulation Study, KNN

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 1

x y k = 5 k = 100 0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 2

x y k = 5 k = 100 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5

Simulated Dataset 3

x y k = 5 k = 100

21

slide-22
SLIDE 22

Simulation Study, Setup

set.seed(1) n_sims = 250 n_models = 4 x = data.frame(x = 0.90) predictions = matrix(0, nrow = n_sims, ncol = n_models)

22

slide-23
SLIDE 23

Simulation Study, Running Simulations

for(sim in 1:n_sims) { sim_data = gen_sim_data(f = f) # fit models fit_0 = lm(y ~ 1, data = sim_data) fit_1 = lm(y ~ poly(x, degree = 1), data = sim_data) fit_2 = lm(y ~ poly(x, degree = 2), data = sim_data) fit_9 = lm(y ~ poly(x, degree = 9), data = sim_data) # get predictions predictions[sim, 1] = predict(fit_0, x) predictions[sim, 2] = predict(fit_1, x) predictions[sim, 3] = predict(fit_2, x) predictions[sim, 4] = predict(fit_9, x) }

23

slide-24
SLIDE 24

Simulation Study, Results

1 2 9 0.2 0.4 0.6 0.8 1.0

Simulated Predictions for Polynomial Models

Polynomial Degree Predictions

24

slide-25
SLIDE 25

Bias-Variance Tradeoff

  • As complexity increases, bias decreases.
  • As complexity increases, variance increases.

25

slide-26
SLIDE 26

Simulation Study, Quantities of Interest

MSE

  • f (0.90), ˆ

fk(0.90)

  • =
  • E

ˆ

fk(0.90)

  • − f (0.90)

2

  • bias2(ˆ

fk(0.90))

+ E

ˆ

fk(0.90) − E

ˆ

fk(0.90)

2

  • var(ˆ

fk(0.90)) 26

slide-27
SLIDE 27

Estimation Using Simulation

  • MSE
  • f (0.90), ˆ

fk(0.90)

  • =

1 nsims

nsims

  • i=1
  • f (0.90) − ˆ

fk(0.90)

2

  • bias

ˆ

f (0.90)

  • =

1 nsims

nsims

  • i=1

ˆ

fk(0.90)

  • − f (0.90)
  • var

ˆ

f (0.90)

  • =

1 nsims

nsims

  • i=1
  • ˆ

fk(0.90) − 1 nsims

nsims

  • i=1

ˆ fk(0.90)

2

27

slide-28
SLIDE 28

Simulation Study, Results

Degree Mean Squared Error Bias Squared Variance 0.22643 0.22476 0.00167 1 0.00829 0.00508 0.00322 2 0.00387 0.00005 0.00381 9 0.01019 0.00002 0.01017

28

slide-29
SLIDE 29

If Time

  • Note that, ˆ

f9(x) is unbiased

  • Some live coding

29