Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 - - PowerPoint PPT Presentation

▶

Mar 27, 2023 150 likes •456 views

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 Announcements See Compass2g announcement See www.stat432.org Quiz 02 Quiz 03 Quiz 04 and Analysis 01 incoming 2 Statistical Learning Supervised Learning

SLIDE 1

Bias-Variance Tradeoff

David Dalpiaz STAT 432, Fall 2019

1

SLIDE 2

Announcements

See Compass2g announcement
See www.stat432.org
Quiz 02
Quiz 03
Quiz 04 and Analysis 01 incoming

2

SLIDE 3

Statistical Learning

Supervised Learning
Regression
Parametric
Non-Parametric
Classification
Unsupervised Learning

3

SLIDE 4

Regression Setup

Given a random pair (X, Y ) ∈ Rp × R. We would like to “predict” Y with some function of X, say, f (X). Define the squared error loss of estimating Y using f (X) as L(Y , f (X)) (Y − f (X))2 We call the expected loss the risk of estimating Y using f (X) R(Y , f (X)) E[L(Y , f (X))] = EX,Y [(Y − f (X))2]

4

SLIDE 5

Minimizing Risk

After conditioning on X EX,Y

(Y − f (X))2

= EXEY |X

(Y − f (X))2 | X = x
We see that the risk is minimzied by the conditional mean

f (x) = E(Y | X = x) We call this, the regression function.

5

SLIDE 6

Estimating f

Given data D = (xi, yi) ∈ Rp × R

ur goal is to find some ˆ

f that is a good estimate of the regression function f .

6

SLIDE 7

Expected Prediction Error

EPE

Y , ˆ

f (X)

EX,Y ,D
Y − ˆ

f (X)

2

7

SLIDE 8

Reducible and Irreducible Error

EPE

Y , ˆ

f (x)

=EY |X,D
Y − ˆ

f (X)

2 | X = x

= EY |X,D
Y − ˆ

f (X)

2 | X = x

= ED
f (x) − ˆ

f (x)

2

reducible error

+ VY |X [Y | X = x]

irreducible error

8

SLIDE 9

Bias and Variance

Recall the definition of the bias of an estimator. bias(ˆ θ) E

ˆ

θ

− θ

Also recall the definition of the variance of an estimator. V(ˆ θ) = var(ˆ θ) E

ˆ

θ − E

ˆ

θ

2

9

SLIDE 10

Bias and Variance

10

SLIDE 11

Bias-Variance Decomposition

MSE

f (x), ˆ

f (x)

= ED
f (x) − ˆ

f (x)

2

=

f (x) − E

ˆ

f (x)

2

bias2(ˆ

f (x))

+ E

ˆ

f (x) − E

ˆ

f (x)

2

var(ˆ

f (x)) 11

SLIDE 12

Bias-Variance Decomposition

MSE

f (x), ˆ

f (x)

= bias2 ˆ

f (x)

+ var

ˆ

f (x)

SLIDE 13

Bias-Variance Decomposition

More Dominant Variance

Model Complexity Error

Decomposition of Prediction Error

Model Complexity Error

More Dominant Bias

Model Complexity Error Squared Bias Variance Bayes EPE

13

SLIDE 14

Expected Test Error

Error versus Model Complexity

Error Low ← Complexity → High High ← Bias → Low Low ← Variance → High Validation Train

14

SLIDE 15

Simulation Study, Regression Function

We will illustrate these decompositions, most importantly the bias-variance tradeoff, through simulation. Suppose we would like to train a model to learn the true regression function function f (x) = x2 f = function(x) { x ^ 2 }

15

SLIDE 16

Simulation Study, Regression Function

More specifically, we’d like to predict an observation, Y , given that X = x by using ˆ f (x) where E[Y | X = x] = f (x) = x2 and V[Y | X = x] = σ2.

16

SLIDE 17

Simulation Study, Data Generating Process

To carry out a concrete simulation example, we need to fully specify the data generating process. We do so with the following R code. gen_sim_data = function(f, sample_size = 100) { x = runif(n = sample_size, min = 0, max = 1) y = rnorm(n = sample_size, mean = f(x), sd = 0.3) data.frame(x, y) }

17

SLIDE 18

Simulation Study, Models

Using this setup, we will generate datasets, D, with a sample size n = 100 and fit four models. predict(fit0, x) = ˆ f0(x) = ˆ β0 predict(fit1, x) = ˆ f1(x) = ˆ β0 + ˆ β1x predict(fit2, x) = ˆ f2(x) = ˆ β0 + ˆ β1x + ˆ β2x2 predict(fit9, x) = ˆ f9(x) = ˆ β0 + ˆ β1x + ˆ β2x2 + . . . + ˆ β9x9

18

SLIDE 19

Simulation Study, Trained Models

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Four Polynomial Models fit to a Simulated Dataset

x y y ~ 1 y ~ poly(x, 1) y ~ poly(x, 2) y ~ poly(x, 9) truth

19

SLIDE 20

Simulation Study, Repeated Training

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 1

x y y ~ 1 y ~ poly(x, 9) 0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 2

x y y ~ 1 y ~ poly(x, 9) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5

Simulated Dataset 3

x y y ~ 1 y ~ poly(x, 9)

20

SLIDE 21

Simulation Study, KNN

0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 1

x y k = 5 k = 100 0.0 0.2 0.4 0.6 0.8 1.0 −0.5 0.0 0.5 1.0 1.5

Simulated Dataset 2

x y k = 5 k = 100 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5

Simulated Dataset 3

x y k = 5 k = 100

21

SLIDE 22

Simulation Study, Setup

set.seed(1) n_sims = 250 n_models = 4 x = data.frame(x = 0.90) predictions = matrix(0, nrow = n_sims, ncol = n_models)

22

SLIDE 23

Simulation Study, Running Simulations

for(sim in 1:n_sims) { sim_data = gen_sim_data(f = f) # fit models fit_0 = lm(y ~ 1, data = sim_data) fit_1 = lm(y ~ poly(x, degree = 1), data = sim_data) fit_2 = lm(y ~ poly(x, degree = 2), data = sim_data) fit_9 = lm(y ~ poly(x, degree = 9), data = sim_data) # get predictions predictions[sim, 1] = predict(fit_0, x) predictions[sim, 2] = predict(fit_1, x) predictions[sim, 3] = predict(fit_2, x) predictions[sim, 4] = predict(fit_9, x) }

23

SLIDE 24

Simulation Study, Results

1 2 9 0.2 0.4 0.6 0.8 1.0

Simulated Predictions for Polynomial Models

Polynomial Degree Predictions

24

SLIDE 25

Bias-Variance Tradeoff

As complexity increases, bias decreases.
As complexity increases, variance increases.

25

SLIDE 26

Simulation Study, Quantities of Interest

MSE

f (0.90), ˆ

fk(0.90)

ˆ

fk(0.90)

− f (0.90)

2

bias2(ˆ

fk(0.90))

+ E

ˆ

fk(0.90) − E

ˆ

fk(0.90)

2

var(ˆ

fk(0.90)) 26

SLIDE 27

Estimation Using Simulation

MSE
f (0.90), ˆ

fk(0.90)

1 nsims

nsims

i=1
f (0.90) − ˆ

fk(0.90)

2

bias

ˆ

f (0.90)

1 nsims

nsims

ˆ

fk(0.90)

− f (0.90)
var

ˆ

f (0.90)

1 nsims

nsims

fk(0.90) − 1 nsims

nsims

ˆ fk(0.90)

2

27

SLIDE 28

Simulation Study, Results

Degree Mean Squared Error Bias Squared Variance 0.22643 0.22476 0.00167 1 0.00829 0.00508 0.00322 2 0.00387 0.00005 0.00381 9 0.01019 0.00002 0.01017

28

SLIDE 29

If Time

Note that, ˆ

f9(x) is unbiased

Some live coding