bias variance trade off crossvalidation regularization
play

Bias-variance trade-off. Crossvalidation. Regularization. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bias-variance trade-off. Crossvalidation. Regularization. Petr Po s k P. Po s k c 2015 Artificial Intelligence 1 / 13


  1. CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bias-variance trade-off. Crossvalidation. Regularization. Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 13

  2. How to evaluate a predictive model? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 13

  3. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  4. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  5. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 f(x) = x f(x) = x 3 −3x 2 +3x 2.5 2 1.5 1 0.5 0 −0.5 −1 −0.5 0 0.5 1 1.5 2 2.5 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  6. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 f(x) = x f(x) = x 3 −3x 2 +3x 2.5 2 1.5 1 0.5 0 −0.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  7. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  8. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! Using MSE only, the cubic model is better than linear!!! P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  9. Model evaluation Fundamental question: What is a good measure of “model quality” from the machine-learning standpoint? ■ We have various measures of model error: ■ For regression tasks: MSE, MAE, . . . ■ For classification tasks: misclassification rate, measures based on confusion matrix, . . . ■ Some of them can be regarded as finite approximations of the Bayes risk . ■ Are these functions good approximations when measured on the data the models were trained on? 3 2.5 f(x) = x f(x) = −0.09 + 0.99x f(x) = x 3 −3x 2 +3x f(x) = 0.00 + (−0.31x) + (1.67x 2 ) + (−0.51x 3 ) 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −1 −0.5 −0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5 Using MSE only, both models are equivalent!!! Using MSE only, the cubic model is better than linear!!! A basic method of evaluation is model validation on a different, independent data set from the same source, i.e. on testing data . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 13

  10. Validation on testing data Example: Polynomial regression with varrying degree: X ∼ U ( − 1, 3 ) Y ∼ X 2 + N ( 0, 1 ) Polynom de g.: 0, tr. e rr.: 8.319, te st. e rr.: 6.901 Polynom de g.: 1, tr. e rr.: 2.013, te st. e rr.: 2.841 Polynom de g.: 2, tr. e rr.: 0.647, te st. e rr.: 0.925 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x Polynom de g.: 3, tr. e rr.: 0.645, te st. e rr.: 0.919 Polynom de g.: 5, tr. e rr.: 0.611, te st. e rr.: 0.979 Polynom de g.: 9, tr. e rr.: 0.545, te st. e rr.: 1.067 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 13

  11. Training and testing error 9 How to evaluate a Tra ining e rror predictive model? 8 T e sting e rror • Model evaluation • Training and testing 7 error • Overfitting 6 • Bias vs Variance • Crossvalidation 5 • How to determine a MSE suitable model flexibility 4 • How to prevent overfitting? 3 Regularization 2 1 0 0 2 4 6 8 10 Polynom de gre e ■ The training error decreases with increasing model flexibility. ■ The testing error is minimal for certain degree of model flexibility. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 13

  12. Overfitting Definition of overfitting : ■ Let H be a hypothesis space. ■ Let h ∈ H and h ′ ∈ H be 2 different hypotheses from Training data Testing data this space. Model Error ■ Let Err Tr ( h ) be an error of the hypothesis h measured on the training dataset (training error). ■ Let Err Tst ( h ) be an error of the hypothesis h measured on the testing dataset (testing error). ■ We say that h is overfitted if there is another h ′ for which Model Flexibility Err Tr ( h ) < Err Tr ( h ′ ) ∧ Err Tst ( h ) > Err Tst ( h ′ ) ■ “When overfitted, the model works well for the training data, but fails for new (testing) data.” ■ Overfitting is a general phenomenon affecting all kinds of inductive learning . P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 13

  13. Overfitting Definition of overfitting : ■ Let H be a hypothesis space. ■ Let h ∈ H and h ′ ∈ H be 2 different hypotheses from Training data Testing data this space. Model Error ■ Let Err Tr ( h ) be an error of the hypothesis h measured on the training dataset (training error). ■ Let Err Tst ( h ) be an error of the hypothesis h measured on the testing dataset (testing error). ■ We say that h is overfitted if there is another h ′ for which Model Flexibility Err Tr ( h ) < Err Tr ( h ′ ) ∧ Err Tst ( h ) > Err Tst ( h ′ ) ■ “When overfitted, the model works well for the training data, but fails for new (testing) data.” ■ Overfitting is a general phenomenon affecting all kinds of inductive learning . We want models and learning algorithms with a good generalization ability , i.e. ■ we want models that encode only the patterns valid in the whole domain , not those that learned the specifics of the training data, ■ we want algorithms able to find only the patterns valid in the whole domain and ignore specifics of the training data. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 13

  14. Bias vs Variance Polynom de g.: 1, tr. e rr.: 2.013, te st. e rr.: 2.841 Polynom de g.: 2, tr. e rr.: 0.647, te st. e rr.: 0.925 Polynom de g.: 9, tr. e rr.: 0.545, te st. e rr.: 1.067 10 10 10 Tra ining da ta Tra ining da ta Tra ining da ta 8 8 8 T e sting da ta T e sting da ta T e sting da ta 6 6 6 4 4 4 y y y 2 2 2 0 0 0 − 2 − 2 − 2 − 4 − 4 − 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 x x x High bias: High variance: “Just right” model not flexible enough model flexibility too high (Good fit) (Underfit) (Overfit) P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend