Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hypothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal with it Day/Month/Rain may give you a function that exactly matches the outcomes of dice rolls, but Modifying logistic regression training to avoid would that function be a good predictor of fu- overfitting ture dice rolls? Evaluating classifiers. Accuracy, precision, recall, ROC curves Linear regression vs. polynomial example How do you decide on a particular preference? Simpler functions vs. more complex ones. But how do we define complexity in all cases? 1 2

Regularization Simpler polynomial: lower degree Simpler decision tree: less depth Simpler linear function: Prevent overfitting by creating a function that lower weights? you are trying to maximize (on the training data) that explicitly penalizes model complex- Have to make a tradeo ff between fit to training ity data and complexity. Sometimes we will see that we can make this mathematically explicit We’ll see a number of di ff erent examples, but let’s examine regularization in the context of Other possibilities: statistical significance, sim- logistic regression (Section 3.3 of the Mitchell ulating unseen test data chapter) 3

Evaluating Accuracy: Random Training/Test Splits Divide dataset into training set and test set What is the central limit theorem? Sum of Apply learning algorithm to training set, gen- independent random variables with finite mean erating hypothesis h and variance tends to normality in the limit. Notice: no condition on the distribution from which they are drawn! Classify examples in test set using h , and measure percentage of correct predictions made by Therefore, so does the mean of the indepen- h (accuracy) dent r.v.s Repeat a set number of times. Sampling distribution of the mean then tells us: 95% confidence interval given by mean Repeat the whole thing for di ff erently sized σ / √ n ± 1 . 96ˆ training and test sets, if you want to construct a learning curve... How do you compute confidence intervals for test accuracy? 4

Cross-Validation Another possible method if you have two dif- ferent candidate models Leave-One-Out Cross-Validation Attempt to estimate accuracy of the models Just what it sounds like. Train on all examples on simulated “test” data except one and then test on that one example. Repeat for all examples in the training data Standard approach: n -fold cross validation (very typical: n = 10). Divide the data into n equally Most e ffi cient use of available data in terms of sized sets. Train on n − 1 of them and test on getting an estimate of accuracy the n th. Repeat for all n folds Is the accuracy of the better one then a good Can be horribly computationally ine ffi cient, un- estimate of expected accuracy on unseen test less you can figure out a smart way to retrain data? without throwing away everything when swap- ping in one example for another If you tune your parameters in any way on training data (including for model selection), you must test on fresh test data to get a good estimate! 5 6

Confusion Matrices, Precision, and Recall Two kinds of errors: false positives and false Spam you catch, and precision gives a measure negatives (can generalize this to k classes as of how often you will label a legitimate mes- “Predict class i , actually class j ” sage as Spam. Pred. Negative Pred. Positive Precision and Recall are usually traded o ff against Act. Negative TN FP each other. 100% recall can be achieved by Act. Positive FN TP predicting everything to be positive. Extremely high precision can be achieved by predicting Precision: Percentage of predicted positives only the examples you are most confident of that were actually positive – TP / (TP + FP) to be positive. (also known as Specificity) Important in information retrieval. What would Recall: Percentage of actual positives that were precision and recall be in terms of searching for predicted positive – TP / (TP + FN) (also information on the web? known as Sensitivity) Spam example: if Spam messages are consid- ered “positives” then recall is how much of the 7

ROC Curves Let’s generalize. For any predictor, for a given false positive rate, what will the true positive rate be? How do we do this? Well, just think about ranking test examples by confidence and then taking cuto ff s wherever we want to test. If one classifier dominates another at every point on the ROC curve it is better People often use the area under the curve (AUC) as a single summary statistic to measure per- formance of classifiers 8

Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hy- pothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning Basics Classification & Text Categorization Features Overfitting

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Lecture 6: Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: machine

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON

Universal Scaling in Fast Quenches Near Lifshitz-Like Fixed Points Ali Mollabashi YITP Workshop

The degree distribution Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit` ecnica de

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Sambuz

Useful Links

Newsletter

Mail Us

Overfitting Many hypotheses consistent with/close to the data - PowerPoint PPT Presentation

Overfitting Many hypotheses consistent with/close to the data About this class With enough features and a rich enough hy- pothesis space, it becomes easy to find mean- ingless regularity in the data The problem of overfitting and how to deal

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning Basics Classification &amp; Text Categorization Features Overfitting

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

Lecture 6: Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: machine

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON

Universal Scaling in Fast Quenches Near Lifshitz-Like Fixed Points Ali Mollabashi YITP Workshop

The degree distribution Ramon Ferrer-i-Cancho &amp; Argimiro Arratia Universitat Polit` ecnica de

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning Basics Classification & Text Categorization Features Overfitting

The degree distribution Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit` ecnica de