Bias-Variance Tradeoff Matthieu R. Bloch h in a given set H that - PDF document

1 Regularization plays a similar role by biasing answer away from complex functions. Tiis is particu- ; We formalize the bias-variance tradeoff assuming the following: which takes the form We now develop an alternative method to quantify the tradeoff called the bias-variance decomposition in terms of the VC generalization bound , which takes the form In the context of classification, we have already seen that the tradeoff can be precisely quantified larly crucial for regression in which the complexity must be carefully limited to avoid overfitting . the VC dimension approach developed for classification. tradeoff between two desirable characteristics: approximate the optimal Bayes classifier while in the context of regression we hope to approximate We have formalized the problem of supervised learning as finding a function (or hypothesis) ECE 6254 - Spring 2020 - Lecture 24 v1.0 - revised April 11, 2020 Bias-Variance Tradeoff Matthieu R. Bloch h in a given set H that minimizes the true risk R ( h ) . In the context of classification we hope to the true underlying function. We have already seen that the choice of H must strike a delicate • a more complex H leads to better chance of approximating ideal classifier/function; • a less complex H leads to better chance of generalizing to unseen data. R ( h ) ⩽ � R N ( h ) + ϵ ( H , N ) with high probability. R ( h ) ≈ bias 2 + variance . Tiererin, the bias captures how well H can approximate the true h ∗ , while the variance captures how likely we are to pick a good h ∈ H . Tiis approach generalizes more easily to regression than 1 Setup for bias-variance decomposition analysis • f : R d → R is the unknown target function that we are trying to learn; • D = { ( x i , y i ) } N i =1 is the dataset, where ( x i , y i ) are independent and identically distributed (i.i.d.); specifically, x i ∈ R d and y i = f ( x i ) + ε i ∈ R , where ε i is a zero-mean noise random variable independent of x i with variance σ 2 ε (for instance ϵ i ∼ N (0 , σ 2 ε ) ); h D : R d → R is our choice of function in H , selected using D ; • ˆ � h D ( X ) − Y ) 2 � • Tie performance of ˆ h D is measured in terms of the mean squared error R (ˆ (ˆ h D ) = E XY Note that the random variables ( X, Y ) denote the data at testing and should not be confused with the random variable D representing the training data.

2 Tie last three terms turn out to be zero since . Tien, (1) (2) (3) (4) Var (5) (6) (7) Var with (8) and (9) (10) that we obtain. Tie model space represent, for instance, all the linear models while the regularized model represents the regularized models. Tie blue area represents the variance of the model while the orange area represents the variance of the regularized model. Tie regularized model offers a smaller variance at the expense of an increased bias. Lemma 1.1 (Bias-variance decomposition ) . Var ECE 6254 - Spring 2020 - Lecture 24 v1.0 - revised April 11, 2020 � � � � � � � � R (ˆ ˆ Bias (ˆ = σ 2 h D ( X )) 2 | X E D h D ) ε + E X h D ( X ) | X + E X �� 2 � � � � ˆ ˆ ˆ ≜ E D h D ( X ) − E D h D ( X ) h D ( X ) � � Bias (ˆ ˆ h D ( X )) ≜ E D h D ( X ) − f ( X ) � � Proof. For clarity, set ¯ ˆ h ( X ) ≜ E D h D ( X ) � � � � h D ( X ) − Y ) 2 �� R (ˆ (ˆ E D h D ) = E D E XY � � h D ( X ) − f ( X ) − ε ) 2 �� (ˆ = E D E Xε � � h ( X ) − f ( X ) − ε ) 2 �� (ˆ h D ( X ) − ¯ h ( X ) + ¯ = E D E Xε � h ( X )) 2 + (¯ h ( X ) − f ( X )) 2 + ε 2 (ˆ h D ( X ) − ¯ = E D E X E ε +2(ˆ h D ( X ) − ¯ h ( X ))(¯ h ( X ) − f ( X )) − 2(ˆ h D ( X ) − ¯ h ( X )) ε � − 2(¯ h ( X ) − f ( X )) ε Note that in (4) we have used the fact that D , X , and ε are independent. Notice that � h ( X )) 2 � � � �� (ˆ h D ( X ) − ¯ ˆ ≜ E X h D ( X ) | X E D E X E ε � h D ( X )) 2 � � h ( X ) − f ( X )) 2 � (¯ Bias (ˆ ≜ E X E D E X E ε � ε 2 � ≜ σ 2 E D E X E ε ε . � � � � � � (ˆ h D ( X ) − ¯ h ( X ))(¯ ˆ − ¯ h ( X ))(¯ h ( X ) − f ( X )) = E X ( E D h D ( X ) h ( X ) − f ( X )) E D E X = 0 � � � � �� (ˆ h D ( X ) − ¯ h D ( X ) − ¯ ˆ h ( X )) ε = E X h ( X ) E ε ( ε ) = 0 E D E X E ε E D � � � ¯ � (¯ h ( X ) − f ( X )) ε = E X h ( X ) − f ( X ) E ε ( ε ) = 0 . E D E X E ε ■ 2 Intuition behind the bias-variance tradeoff Tie intuition behind the bias-variance tradeoff is illustrated in Fig. 1. Tie gray area around the true function f represents the variance of the perception of the true resulting from the noisy samples

3 function Model space Regularized model space Springer, 2009. Inference, and Prediction , ser. Springer series in statistics. [1] T. Hastie, R. Tibshirani, and J. H. Friedman, Tie Elements of Statistical Learning: Data Mining, Figure 1: Illustration of bias-variance tradeoff adapted from [ ? , Figure 7.2] variance estimator regularized variance estimator variance realization ECE 6254 - Spring 2020 - Lecture 24 v1.0 - revised April 11, 2020 f + noise (realization) f (true function) model bias h (model fit) e s t i m a t i o n h ′ (regularized model fit) b i a s References

Bias-Variance Tradeoff Matthieu R. Bloch h in a given set H that - PDF document

1 Regularization plays a similar role by biasing answer away from complex functions. Tiis is particu- ; We formalize the bias-variance tradeoff assuming the following: which takes the form We now develop an alternative method to quantify the

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

10701 Machine Learning Boosting Fighting the bias-variance tradeoff Simple (a.k.a. weak)

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 Announcements See Compass2g

Linear Regression and the Bias Variance Tradeoff Guest Lecturer Joseph E. Gonzalez slides

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

If you have a brain youre biased -Dr. Heidi Grant NeuroLeadership Institute Explicit

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

Understanding the Origins of Bias in Word Embeddings Marc-Etienne Brunet Colleen

Secure Key Generation from Biased PUFs Roel Maes, Vincent van der Leest, Erik van der Sluis

TO JOIN BY TELEPHONE: TO JOIN BY TELEPHONE: Phone: (5 Phone: (510) 2 ) 210-8882 0-8882 | Access

15: Ethics in Machine Learning, plus Artificial General Intelligence and some old Science Fiction

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Bias-Variance Tradeoff Matthieu R. Bloch h in a given set H that - PDF document

1 Regularization plays a similar role by biasing answer away from complex functions. Tiis is particu- ; We formalize the bias-variance tradeoff assuming the following: which takes the form We now develop an alternative method to quantify the

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

10701 Machine Learning Boosting Fighting the bias-variance tradeoff Simple (a.k.a. weak)

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03

Bias-Variance Tradeoff David Dalpiaz STAT 432, Fall 2019 1 Announcements See Compass2g

Linear Regression and the Bias Variance Tradeoff Guest Lecturer Joseph E. Gonzalez slides

Learning Theory Part 3: Bias-Variance Tradeoff Yingyu Liang Computer Sciences 760 Fall 2017

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

If you have a brain youre biased -Dr. Heidi Grant NeuroLeadership Institute Explicit

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

Understanding the Origins of Bias in Word Embeddings Marc-Etienne Brunet Colleen

Secure Key Generation from Biased PUFs Roel Maes, Vincent van der Leest, Erik van der Sluis

TO JOIN BY TELEPHONE: TO JOIN BY TELEPHONE: Phone: (5 Phone: (510) 2 ) 210-8882 0-8882 | Access

15: Ethics in Machine Learning, plus Artificial General Intelligence and some old Science Fiction

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh