Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1

Announcements • Homework 03 released • Regrade policy • Style policy? 2

Statistical Learning • Supervised Learning • Regression • Parametric • Non-Parametric • Classification • Unsupervised Learning 3

Regression Setup Given a random pair ( X , Y ) ∈ R p × R . We would like to “predict” Y with some function of X , say, f ( X ). Define the squared error loss of estimating Y using f ( X ) as L ( Y , f ( X )) � ( Y − f ( X )) 2 We call the expected loss the risk of estimating Y using f ( X ) R ( Y , f ( X )) � E [ L ( Y , f ( X ))] = E X , Y [( Y − f ( X )) 2 ] 4

Minimizing Risk After conditioning on X � ( Y − f ( X )) 2 � � � ( Y − f ( X )) 2 | X = x E X , Y = E X E Y | X We see that the risk is minimzied by the conditional mean f ( x ) = E ( Y | X = x ) We call this, the regression function . 5

Estimating f Given data D = ( x i , y i ) ∈ R p × R our goal is to find some ˆ f that is a good estimate of the regression function f . 6

Expected Prediction Error �� 2 � � � Y , ˆ Y − ˆ � E X , Y , D EPE f ( X ) f ( X ) 7

Bias and Variance Recall the definition of the bias of an estimator. � ˆ � bias(ˆ θ ) � E − θ θ Also recall the definition of the variance of an estimator. � � ˆ � ) 2 � V (ˆ θ ) = var(ˆ (ˆ θ ) � E θ − E θ 9

Bias and Variance Figure 1: Dartboard Analogy of Bias and Variance 10

Bias-Variance Decomposition �� 2 � � � f ( x ) , ˆ f ( x ) − ˆ = E D MSE f ( x ) f ( x ) �� ˆ �� 2 � � � ˆ �� 2 � ˆ f ( x ) − E + E f ( x ) − E = f ( x ) f ( x ) � �� bias 2 ( ˆ f ( x ) ) var ( ˆ f ( x ) ) 11

Bias-Variance Decomposition � � = bias 2 � ˆ � � ˆ � f ( x ) , ˆ MSE f ( x ) f ( x ) + var f ( x ) 12

Bias-Variance Decomposition More Dominant Variance Decomposition of Prediction Error More Dominant Bias Squared Bias Variance Bayes EPE Error Error Error Model Complexity Model Complexity Model Complexity 13

Expected Test Error Error versus Model Complexity Error (Expected) Test Train Low ← Complexity → High High ← Bias → Low Low ← Variance → High 14

Simulation Study, Regression Function We will illustrate these decompositions, most importantly the bias-variance tradeoff, through simulation. Suppose we would like to train a model to learn the true regression function function f ( x ) = x 2 f = function(x) { x ^ 2 } 15

Simulation Study, Regression Function More specifically, we’d like to predict an observation, Y , given that X = x by using ˆ f ( x ) where E [ Y | X = x ] = f ( x ) = x 2 and V [ Y | X = x ] = σ 2 . 16

Simulation Study, Data Generating Process To carry out a concrete simulation example, we need to fully specify the data generating process . We do so with the following R code. get_sim_data = function(f, sample_size = 100) { x = runif (n = sample_size, min = 0, max = 1) y = rnorm (n = sample_size, mean = f (x), sd = 0.3) data.frame (x, y) } 17

Simulation Study, Models Using this setup, we will generate datasets, D , with a sample size n = 100 and fit four models. predict(fit0, x) = ˆ f 0 ( x ) = ˆ β 0 predict(fit1, x) = ˆ f 1 ( x ) = ˆ β 0 + ˆ β 1 x predict(fit2, x) = ˆ f 2 ( x ) = ˆ β 0 + ˆ β 1 x + ˆ β 2 x 2 β 2 x 2 + . . . + ˆ predict(fit9, x) = ˆ f 9 ( x ) = ˆ β 0 + ˆ β 1 x + ˆ β 9 x 9 18

Simulation Study, Trained Models Four Polynomial Models fit to a Simulated Dataset 1.5 y ~ 1 y ~ poly(x, 1) y ~ poly(x, 2) y ~ poly(x, 9) 1.0 truth 0.5 y 0.0 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 x 19

Simulation Study, Repeated Training Simulated Dataset 1 Simulated Dataset 2 Simulated Dataset 3 1.5 1.5 y ~ 1 y ~ 1 y ~ 1 y ~ poly(x, 9) y ~ poly(x, 9) y ~ poly(x, 9) 1.5 1.0 1.0 1.0 0.5 y y y 0.5 0.5 0.0 0.0 0.0 −0.5 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x x 20

Simulation Study, KNN Simulated Dataset 1 Simulated Dataset 2 Simulated Dataset 3 1.5 1.5 k = 5 k = 5 k = 5 k = 100 k = 100 k = 100 1.5 1.0 1.0 1.0 0.5 y y y 0.5 0.5 0.0 0.0 0.0 −0.5 −0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x x 21

Simulation Study, Setup set.seed (1) n_sims = 250 n_models = 4 x = data.frame (x = 0.90) predictions = matrix (0, nrow = n_sims, ncol = n_models) 22

Simulation Study, Running Simulations for(sim in 1:n_sims) { sim_data = get_sim_data (f) # fit models fit_0 = lm (y ~ 1, data = sim_data) fit_1 = lm (y ~ poly (x, degree = 1), data = sim_data) fit_2 = lm (y ~ poly (x, degree = 2), data = sim_data) fit_9 = lm (y ~ poly (x, degree = 9), data = sim_data) # get predictions predictions[sim, 1] = predict (fit_0, x) predictions[sim, 2] = predict (fit_1, x) predictions[sim, 3] = predict (fit_2, x) predictions[sim, 4] = predict (fit_9, x) } 23

Simulation Study, Results Simulated Predictions for Polynomial Models 1.0 0.8 Predictions 0.6 0.4 0.2 0 1 2 9 Polynomial Degree 24

Bias-Variance Tradeoff • As complexity increases , bias decreases . • As complexity increases , variance increases . 25

Simulation Study, Quantities of Interest � � � � ˆ � � 2 f (0 . 90) , ˆ E MSE f k (0 . 90) = f k (0 . 90) − f (0 . 90) � �� bias 2 ( ˆ f k (0 . 90) ) �� ˆ �� 2 � � ˆ + E f k (0 . 90) − E f k (0 . 90) � �� var ( ˆ f k (0 . 90) ) 26

Estimation Using Simulation � � � � 2 n sims � 1 � f (0 . 90) , ˆ f (0 . 90) − ˆ MSE f k (0 . 90) = f k (0 . 90) n sims i =1 � ˆ � n sims � ˆ � � 1 � bias f (0 . 90) = f k ( x 0 . 90) − f (0 . 90) n sims i =1 � � 2 � ˆ � n sims n sims � � 1 1 ˆ ˆ var � f (0 . 90) = f k (0 . 90) − f k (0 . 90) n sims n sims i =1 i =1 27

Simulation Study, Results Degree Mean Squared Error Bias Squared Variance 0 0.22643 0.22476 0.00167 1 0.00829 0.00508 0.00322 2 0.00387 0.00005 0.00381 9 0.01019 0.00002 0.01017 28

If Time • Note that, ˆ f 9 ( x ) is ubiased • Some live coding 29

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 - PowerPoint PPT Presentation

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

10701 Machine Learning Boosting Fighting the bias-variance tradeoff Simple (a.k.a. weak)

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Bias-Variance Tradeoff Matthieu R. Bloch h in a given set H that minimizes the true risk R ( h ) .

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 Announcements See Compass2g

Linear Regression and the Bias Variance Tradeoff Guest Lecturer Joseph E. Gonzalez slides

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

UI Evolving Platform Evolving Architecture Evolving About Me Xianning ( Pronunciation

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

1 Dominance Frontiers Revisited Dominance Frontiers and SSA Suppose that node 3 defines variable

Dominant-Strategy Auction Design for Agents with Uncertain, Private Values David R.M. Thompson

Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for

The International Medium of Exchange Ryan Chahrour Rosen Valchev (Boston College) 50th Konstanz

Discrete Mathematics, Chapter 1.1.-1.3: Propositional Logic Richard Mayr University of

PS : Neural Vanishing Point Scanner via Conic Convolution Yichao Zhou * Haozhi Qi * Jingwei Huang