Learning From Data Lecture 11 Overfitting What is Overfitting - PowerPoint PPT Presentation

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur Stochastic and Deterministic Noise M. Magdon-Ismail CSCI 4100/6100

recap: Nonlinear Transforms Φ − → ˜ X -space is R d d Z -space is R 1 1 1       x 1 Φ 1 ( x ) z 1       x = z = Φ ( x ) =  = . . .  .   .   .  . . .      x d Φ ˜ d ( x ) z ˜ 1. Original data 2. Transform the data d x n ∈ X z n = Φ( x n ) ∈ Z x 1 , x 2 , . . . , x N z 1 , z 2 , . . . , z N ↓ y 1 , y 2 , . . . , y N y 1 , y 2 , . . . , y N w 0   w 1   no weights w = ˜ .  .  .   w ˜ ‘ Φ − 1 ’ d ← − d vc = d + 1 d vc = d + 1 g ( x ) = sign( ˜ w t Φ ( x )) 4. Classify in X -space 3. Separate data in Z -space g ( x ) = ˜ g (Φ( x )) = sign( ˜ w t Φ( x )) g ( z ) = sign( ˜ ˜ w t z ) M Overfitting : 2 /25 � A c L Creator: Malik Magdon-Ismail Digits data − →

recap: Digits Data “1” Versus “All” Symmetry Symmetry Average Intensity Average Intensity Linear model 3rd order polynomial model E in = 2 . 13% E in = 1 . 75% E out = 2 . 38% E out = 1 . 87% M Overfitting : 3 /25 � A c L Creator: Malik Magdon-Ismail Superstitions − →

Superstitions – Myth or Reality? • Paraskevedekatriaphobia – fear of Friday the 13th. – Are future Friday the 13ths really more dangerous? • OCD [medical journal, citation lost, can you find it?] the subjects performs an action which leads to a good outcome and thereby generalizes it as cause and effect: the action will always give good results. Having overfit the data, the subject compulsively engages in that activity. Humans are overfitting machines , very good at “finding coincidences”. M Overfitting : 4 /25 � A c L Creator: Malik Magdon-Ismail Simple illustration − →

An Illustration of Overfitting on a Simple Example Data Quadratic f Target 5 data points y A little noise (measurement error) 5 data points → 4th order polynomial fit x Classic overfitting: simple target with excessively complex H . The noise did us in. (why?) M Overfitting : 5 /25 � A c L Creator: Malik Magdon-Ismail Classic overfitting − →

An Illustration of Overfitting on a Simple Example Data Target Quadratic f Fit 5 data points y A little noise (measurement error) 5 data points → 4th order polynomial fit x Classic overfitting: simple target with excessively complex H . E in ≈ 0; E out ≫ 0 The noise did us in. (why?) M Overfitting : 6 /25 � A c L Creator: Malik Magdon-Ismail What is overfitting? − →

What is Overfitting? Fitting the data more than is warranted M Overfitting : 7 /25 � A c L Creator: Malik Magdon-Ismail Is it bad generalization? − →

Overfitting is Not Just Bad Generalization out-of-sample error Error bad generalization in-sample error VC dimension, d vc VC Analysis: Covers bad generalization but with lots of slack – the VC bound is loose M Overfitting : 8 /25 � A c L Creator: Malik Magdon-Ismail Beyond bad generalization − →

Overfitting is Not Just Bad Generalization out-of-sample error Error overfitting in-sample error VC dimension, d vc Overfitting: Going for lower and lower E in results in higher and higher E out M Overfitting : 9 /25 � A c L Creator: Malik Magdon-Ismail Case study: simple and complex f − →

Case Study: 2nd vs 10th Order Polynomial Fit y y Data Data Target Target x x 10th order f with noise. 50th order f with no noise. H 2 : 2nd order polynomial fit − special case of linear models with feature transform x �→ (1 , x, x 2 , · · · ) . ← H 10 : 10th order polynomial fit Which model do you pick for which problem and why? M Overfitting : 10 /25 � A c L Creator: Malik Magdon-Ismail H 2 versus H 10 − →

Case Study: 2nd vs 10th Order Polynomial Fit y y Data Data Target Target x x 10th order f with noise. 50th order f with no noise. H 2 : 2nd order polynomial fit − special case of linear models with feature transform x �→ (1 , x, x 2 , · · · ) . ← H 10 : 10th order polynomial fit Which model do you pick for which problem and why? M Overfitting : 11 /25 � A c L Creator: Malik Magdon-Ismail H 2 wins for both cases − →

Case Study: 2nd vs 10th Order Polynomial Fit y y Data Data 2nd Order Fit 2nd Order Fit 10th Order Fit 10th Order Fit x x simple noisy target complex noiseless target 2nd Order 10th Order 2nd Order 10th Order 10 − 5 0.050 0.034 0.029 E in E in 0.127 9.00 0.120 7680 E out E out Go figure: Simpler H is better even for the more complex target with no noise. M Overfitting : 12 /25 � A c L Creator: Malik Magdon-Ismail Is there really no noise − →

Is there Really “No Noise” with the Complex f ? y y Data Data Target Target x x Simple f with noise. Complex f with no noise. H should match quantity and quality of data , not f M Overfitting : 13 /25 � A c L Creator: Malik Magdon-Ismail Look only at the data − →

Is there Really “No Noise” with the Complex f ? y y x x Simple f with noise. Complex f with no noise. H should match quantity and quality of data , not f M Overfitting : 14 /25 � A c L Creator: Malik Magdon-Ismail Learning curves for H 2 , H 10 − →

When is H 2 Better than H 10 ? Learning curves for H 2 Learning curves for H 10 Expected Error Expected Error E out E in E out E in Number of Data Points, N Number of Data Points, N Overfitting: E out ( H 10 ) > E out ( H 2 ) Overfit measure σ 2 vs. N − M Overfitting : 15 /25 � A c L Creator: Malik Magdon-Ismail →

Overfit Measure: E out ( H 10 ) − E out ( H 2 ) 0.2 2 0.1 Noise Level, σ 2 0 1 -0.1 -0.2 0 80 100 120 Number of Data Points, N M Overfitting : 16 /25 � A c L Creator: Malik Magdon-Ismail Overfit measure Q f vs. N − →

Overfit Measure: E out ( H 10 ) − E out ( H 2 ) 0.2 100 0.2 Target Complexity, Q f 2 0.1 0.1 75 Noise Level, σ 2 0 0 50 1 -0.1 -0.1 25 -0.2 0 -0.2 0 80 100 120 80 100 120 Number of Data Points, N Number of Data Points, N Number of data points ↑ Overfitting ↓ Noise ↑ Overfitting ↑ Target complexity ↑ Overfitting ↑ M Overfitting : 17 /25 � A c L Creator: Malik Magdon-Ismail Define ‘noise’ − →

Noise That part of y we cannot model it has two sources . . . M Overfitting : 18 /25 � A c L Creator: Malik Magdon-Ismail Stochastic noise − →

Stochastic Noise — Data Error We would like to learn from ◦ : y = f ( x ) y n = f ( x n ) stoch. noise Unfortunately, we only observe ◦ : y y n = f ( x n ) + ‘stochastic noise’ ↑ no one can model this x Stochastic Noise: fluctuations/measurement errors we cannot model. M Overfitting : 19 /25 � A c L Creator: Malik Magdon-Ismail Deterministic noise − →

Deterministic Noise — Model Error We would like to learn from ◦ : best approximation to f in H h ∗ ( x ) y n = h ∗ ( x n ) det. noise Unfortunately, we only observe ◦ : y y = f ( x ) y n = f ( x n ) = h ∗ ( x n ) + ‘deterministic noise’ ↑ H cannot model this x Deterministic Noise: the part of f we cannot model. M Overfitting : 20 /25 � A c L Creator: Malik Magdon-Ismail Both hurt learning − →

Stochastic & Deterministic Noise Hurt Learning Stochastic Noise Deterministic Noise f ( x ) h ∗ y y y = h ∗ ( x )+det. noise y = f ( x )+stoch. noise x x source: random measurement errors source: learner’s H cannot model f re-measure y n re-measure y n stochastic noise changes. deterministic noise the same. change H change H stochastic noise the same. deterministic noise changes. We have single D and fixed H so we cannot distinguish M Overfitting : 21 /25 � A c L Creator: Malik Magdon-Ismail Stochastic noise and bias - var − →

Noise and the Bias-Variance Decomposition y = f ( x ) + ǫ ↑ measurement error E [ E out ( x )] = E D ,ǫ [( g ( x ) − f ( x ) − ǫ ) 2 ] = E D ,ǫ [( g ( x ) − f ( x )) 2 + 2( g ( x ) − f ( x )) ǫ + ǫ 2 ] ↓ ↓ ↓ σ 2 bias + var 0 bias - var - σ 2 and noise − M Overfitting : 22 /25 � A c L Creator: Malik Magdon-Ismail →

Noise and the Bias-Variance Decomposition y = f ( x ) + ǫ ↑ measurement error σ 2 E [ E out ( x )] = + + bias var ↑ ↑ ↑ stochastic deterministic indirect noise noise impact of noise M Overfitting : 23 /25 � A c L Creator: Malik Magdon-Ismail Noise causes overfitting − →

Noise is the Culprit Overfitting is the disease Noise is the cause Learning is led astray by fitting the noise more than the signal Cures Regularization: Putting on the brakes. Validation: A reality check from peeking at E out (the bottom line). M Overfitting : 24 /25 � A c L Creator: Malik Magdon-Ismail Regularization teaser − →

Regularization no regularization regularization! Data Target Fit y x M Overfitting : 25 /25 � A c L Creator: Malik Magdon-Ismail Regularization teaser − →

Regularization no regularization regularization! Data Target Fit y y x x M Overfitting : 26 /25 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 11 Overfitting What is Overfitting - PowerPoint PPT Presentation

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur Stochastic and Deterministic Noise M. Magdon-Ismail CSCI 4100/6100 recap: Nonlinear Transforms X -space is R d d Z -space is R 1 1

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Learning From Data Lecture 22 Neural Networks and Overfitting Approximation vs. Generalization

Machine Learning Basics Classification & Text Categorization Features Overfitting

Lecture 6: Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: machine

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

Fast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory File System

Windows named pipes 1 Your host 30 years Established in 1987, Comsec has nearly three- decades

Applications for Measurement: Improving Anonymity Online Rishab Nithyanand | Rachee Singh |

Real time Predictive Fraud Analytics using Databricks & Tableau Prasad Kona Partner Solution

The First Billion Rows Alexander Zaitsev and Robert Hodges About Us Robert Hodges - Altinity CEO

CC0pi/CC-inclusive Data Comparisons Patrick Stowell Introduction Learnt from the previous

Stat 5102 Lecture Slides: Deck 4 Bayesian Inference Charles J. Geyer School of Statistics

CUSZ : A HighPerformance GPU Based Lossy Sian Jin October 5, 2020 Argonne National Laboratory

Learning From Data Lecture 11 Overfitting What is Overfitting - PowerPoint PPT Presentation

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur Stochastic and Deterministic Noise M. Magdon-Ismail CSCI 4100/6100 recap: Nonlinear Transforms X -space is R d d Z -space is R 1 1

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

recap: Overfitting Fitting the data more than is warranted Learning From Data Data Lecture 12

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Learning From Data Lecture 22 Neural Networks and Overfitting Approximation vs. Generalization

Machine Learning Basics Classification &amp; Text Categorization Features Overfitting

Lecture 6: Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: machine

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3

Machine Learning (CSE 446): (continuation of overfitting &amp;) Limits of Learning Sham M Kakade

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC

Fast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory File System

Windows named pipes 1 Your host 30 years Established in 1987, Comsec has nearly three- decades

Applications for Measurement: Improving Anonymity Online Rishab Nithyanand | Rachee Singh |

Real time Predictive Fraud Analytics using Databricks &amp; Tableau Prasad Kona Partner Solution

The First Billion Rows Alexander Zaitsev and Robert Hodges About Us Robert Hodges - Altinity CEO

CC0pi/CC-inclusive Data Comparisons Patrick Stowell Introduction Learnt from the previous

Stat 5102 Lecture Slides: Deck 4 Bayesian Inference Charles J. Geyer School of Statistics

CUSZ : A HighPerformance GPU Based Lossy Sian Jin October 5, 2020 Argonne National Laboratory

Machine Learning Basics Classification & Text Categorization Features Overfitting

Machine Learning (CSE 446): (continuation of overfitting &) Limits of Learning Sham M Kakade

Real time Predictive Fraud Analytics using Databricks & Tableau Prasad Kona Partner Solution