Bias-variance trade-off. Crossvalidation. Regularization. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bias-variance trade-off. Crossvalidation. Regularization. Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 13

How to evaluate a predictive model? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 f(x) = x f(x) = x 3 −3x 2 +3x 2.5 2 1.5 1 0.5 0 −0.5 −1 −0.5 0 0.5 1 1.5 2 2.5 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 f(x) = x f(x) = x 3 −3x 2 +3x 2.5 2 1.5 1 0.5 0 −0.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! Using MSE only, the cubic model is better than linear!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! Using MSE only, the cubic model is better than linear!!! A basic method of evaluation is model validation on a different, independent data set from the same source, i.e. on testing data . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

Validation on testing data Example: Polynomial regression with varrying degree: X ∼ U ( − 1, 3 ) Y ∼ X 2 + N ( 0, 1 ) Polynom de g.: 0, tr. e rr.: 8.319, te st. e rr.: 6.901 Polynom de g.: 1, tr. e rr.: 2.013, te st. e rr.: 2.841 Polynom de g.: 2, tr. e rr.: 0.647, te st. e rr.: 0.925 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x Polynom de g.: 3, tr. e rr.: 0.645, te st. e rr.: 0.919 Polynom de g.: 5, tr. e rr.: 0.611, te st. e rr.: 0.979 Polynom de g.: 9, tr. e rr.: 0.545, te st. e rr.: 1.067 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 13

Training and testing error 9 How to evaluate a Tra ining e rror predictive model? 8 T e sting e rror • Model evaluation • Training and testing 7 error • Overfitting 6 • Bias vs Variance • Crossvalidation 5 • How to determine a MSE suitable model flexibility 4 • How to prevent overfitting? 3 Regularization 2 1 0 0 2 4 6 8 10 Polynom de gre e ■ The training error decreases with increasing model flexibility. ■ The testing error is minimal for certain degree of model flexibility. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 13

Overfitting Definition of overfitting : ■ Let H be a hypothesis space. ■ Let h ∈ H and h ′ ∈ H be 2 different hypotheses from Training data Testing data this space. Model Error ■ Let Err Tr ( h ) be an error of the hypothesis h measured on the training dataset (training error). ■ Let Err Tst ( h ) be an error of the hypothesis h measured on the testing dataset (testing error). ■ We say that h is overfitted if there is another h ′ for which Model Flexibility Err Tr ( h ) < Err Tr ( h ′ ) ∧ Err Tst ( h ) > Err Tst ( h ′ ) ■ “When overfitted, the model works well for the training data, but fails for new (testing) data.” ■ Overfitting is a general phenomenon affecting all kinds of inductive learning . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 13

Overfitting Definition of overfitting : ■ Let H be a hypothesis space. ■ Let h ∈ H and h ′ ∈ H be 2 different hypotheses from Training data Testing data this space. Model Error ■ Let Err Tr ( h ) be an error of the hypothesis h measured on the training dataset (training error). ■ Let Err Tst ( h ) be an error of the hypothesis h measured on the testing dataset (testing error). ■ We say that h is overfitted if there is another h ′ for which Model Flexibility Err Tr ( h ) < Err Tr ( h ′ ) ∧ Err Tst ( h ) > Err Tst ( h ′ ) ■ “When overfitted, the model works well for the training data, but fails for new (testing) data.” ■ Overfitting is a general phenomenon affecting all kinds of inductive learning . We want models and learning algorithms with a good generalization ability , i.e. ■ we want models that encode only the patterns valid in the whole domain , not those that learned the specifics of the training data, ■ we want algorithms able to find only the patterns valid in the whole domain and ignore specifics of the training data. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 13

Bias vs Variance Polynom de g.: 1, tr. e rr.: 2.013, te st. e rr.: 2.841 Polynom de g.: 2, tr. e rr.: 0.647, te st. e rr.: 0.925 Polynom de g.: 9, tr. e rr.: 0.545, te st. e rr.: 1.067 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x High bias: High variance: “Just right” model not flexible enough model flexibility too high (Good fit) (Underfit) (Overfit) P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 13

Bias-variance trade-off. Crossvalidation. Regularization. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bias-variance trade-off. Crossvalidation. Regularization. Petr Po s k P. Po s k c 2015 Artificial Intelligence 1 / 13

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy Kareem Amin, Alex

Update Agenda Single Security Initiative Overview Key Considerations Investors

Targets for S 3 : design, fabrication and control under irradiation Ch. Stodel, CNRS/IN2P3

Stat 5101 Lecture Slides Deck 4 Charles J. Geyer School of Statistics University of Minnesota

Cost estimation result consistency: Implications for SBSE Marc Roper Sukumar Letchmunan Murray

H.A.R.P. Clara Mae Barnhart & Soyuz Shrestha Binghamton University Academic Recovery through

Problem Domain Collaborative filtering (CF)-based recommender systems (RS). Issue:

draft-bonica-l3vpn-auth-01.txt SP can accidentally provision Customer_A interface into

TOS Arno Puder 1 Objectives Motivate the need for Inter-Process Communication Introduce

Bias-variance trade-off. Crossvalidation. Regularization. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bias-variance trade-off. Crossvalidation. Regularization. Petr Po s k P. Po s k c 2015 Artificial Intelligence 1 / 13

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine

Linear Regression, Regularization Bias-Variance Tradeoff Thanks to C Guestrin, T Dietterich, R

Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy Kareem Amin, Alex

Update Agenda Single Security Initiative Overview Key Considerations Investors

Targets for S 3 : design, fabrication and control under irradiation Ch. Stodel, CNRS/IN2P3

Stat 5101 Lecture Slides Deck 4 Charles J. Geyer School of Statistics University of Minnesota

Cost estimation result consistency: Implications for SBSE Marc Roper Sukumar Letchmunan Murray

H.A.R.P. Clara Mae Barnhart &amp; Soyuz Shrestha Binghamton University Academic Recovery through

Problem Domain Collaborative filtering (CF)-based recommender systems (RS). Issue:

draft-bonica-l3vpn-auth-01.txt SP can accidentally provision Customer_A interface into

TOS Arno Puder 1 Objectives Motivate the need for Inter-Process Communication Introduce

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

H.A.R.P. Clara Mae Barnhart & Soyuz Shrestha Binghamton University Academic Recovery through