4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples - PDF document

Last Time… ML II • Decision trees and how to build them What might • Information Gain we learn from these • Entropy examples? � • Next up: • Elements of a Learning System • What can go wrong? • How do we know how it went? 1 2 ML Intro: Review Learning Produces Models 1 What we have: • Trying to build a model of what 1 • Data: examples of our problem it means to be, e.g., yellow • Processed to produce features 2 2 1. Train over data • Average R, G, B values of pixels • Fuzzy or not fuzzy 2. Test on different data 3 3 • Turned into a feature vector 3. Deploy: the real test • X 1 : <200, 200, 40, yes> … • X 3 : <220, 10, 22, no> … • Sometimes labeled, sometimes not • Every step needs its own data 4 4 5 • X1: <200, 200, 40, yes, yellow=yes> • Split what we have into training data yellow? 6 What we want: and test data to see if our learner is good • A prediction over new data 5 6 3 4 One Possible Decision Tree One Possible Decision Tree sample attributes label • Predictions X 1 R G B Fuzzy? Yellow? X 2 X 3 ruh roh X 1 205 200 40 Y yes X 4 X 2 90 250 90 N no 7 8 X 3 220 10 22 N no X 4 205 210 10 N yes Prediction: X 5 235 210 30 N yes R G B Fuzzy? Is it yellow? X 6 50 215 60 Y no X 7 215 45 190 N no ✔ X 1 X 8 220 240 225 N yes ✗ X 2 X 4 1 3 ✔ 4 2 5 6 6 7 1

Overfitting Overfitting 2 Examples Attributes • Sometimes, model fits training data well but doesn’t • Irrelevant attributes à (training Class Bi- Feath- do well on test data overfitting data) pedal ers Attributes Examples (training Outcome Sparrow Y Y B • Can be it “overfit” to • If hypothesis space has Bipedal Flies Feathers data) Monkey Y N M the training data many dimensions (many Sparrow Y Y Y B Ostrich Y Y B attributes), may find • Model is too specific Monkey Y N N M meaningless regularity to training data Bat Y N M Ostrich Y N Y B • Doesn’t generalize to Bat Y Y N M Elephant N N M • Ex: Name starts with [A-M] à new information well Mammal Elephant N N N M • Learned model: (Y ∧ Y ∧ Y à B ∨ Y ∧ N ∧ N à M ∨ ...) 8 9 Overfitting 3 Overfitting • Incomplete training • Fix by by removing irrelevant features data à overfitting 1 • E.g., remove ‘first letter’ from feature vector 5 3 • Fix by getting more training data 2 6 4 • Fix by pruning low nodes in the decision tree • E.g., if improvement from best attribute at a node is below • Bad training/test 1 a threshold, stop and make this node a leaf rather than 5 split à overfitting generating child nodes 6 2 4 3 • Lots of other choices… 10 Noisy Data Pruning Decision Trees • Replace a whole subtree by a leaf node • Many kinds of “noise” can occur in the examples: • If: a decision rule establishes that he expected error rate in the subtree is • Two examples have same attribute/value pairs, but greater than in the single leaf. E.g., different classifications • Training: one training red success and two training blue failures • Test: three red failures and one blue success • Some values of attributes are incorrect • Consider replacing this subtree by a single Failure node. (leaf) • Errors in the data acquisition process, the preprocessing phase, // • After replacement we will have only two errors instead of five: • Classification is wrong (e.g., + instead of -) because of some error Pruned Test Training Color Color FAILURE • Some attributes are irrelevant to the decision-making red red process, e.g., color of a die is irrelevant to its outcome blue blue 2 success 1 success 4 failure 0 success 1 success 1 success • Some attributes are missing (are pangolins bipedal?) 1 failure 2 failures 3 failure 0 failure 12 13 2

Next Up Summary: Decision Tree Learning • One of the most widely used learning methods in practice • Evaluating a Learned Model • Can out-perform human experts in many problems • Elements of a Learning System • Strengths include • Fast • Simple to implement • Can convert result to a set of easily interpretable rules • Empirically valid in many commercial products • Handles noisy data • Weaknesses: • Univariate splits/partitioning using only one attribute at a time (limits types of possible trees) • Large decision trees may be hard to understand • Requires fixed-length feature vectors • Non-incremental (i.e., batch method) 14 15 A Learning System General Model of Learning Agents Four components of a machine learning system: Performance Standard 1. Representation: how do we describe the problem Sensors Critic space? Environment feedback 2. Actor: the part of the system that actually does changes Learning Performer things Element with KB knowledge learning goals 3. Critic: Provides the experience we learn from Problem Effectors Generator 4. Learner: the actual learning algorithm Agent Representing The Problem Representation: Examples to think about • Representing the problem to be solved is the first • How do we describe a problem? decision to be made (and most important) • Guessing an animal? • Playing checkers? • Requires understanding the domain – the field in • Labeling spam email? which the problem is set • OCRing a check? • There are two aspects of representing a problem: • Noticing new help desk topics? 1. Behavior that we want to learn • What data do you need to represent for each of 2. Inputs we will learn from these? What model might you learn? 3

Representation: Examples Actor • Guessing an animal: a tree of questions and answers • Want a system to do something. • Playing checkers: board, piece positions, rules; weights • Make a prediction for legal moves. • Sort into categories • Look for similarities • Labeling spam email: the frequencies of words used in this email and in our entire mailbox; Naive Bayes. • Once a model has been learned, we keep • OCRing: matrix of light/dark pixels; % light pixels; # using this piece straight lines, etc.; neural net. • Noticing new help desk topics: Clustering algorithms How Does the Actor Act? Critic • Guessing an animal: walk the tree, ask the questions • Provides the experience we learn from • Playing checkers: look through rules and weights to • Typically a set of examples + action that should identify a move be taken • Identifying spam: examine the set of features, • But, can be any kind of feedback that indicates calculate the probability of spam how close we are to where we want to be • OCRing a check: input the features for a digit, output probability for each of 0 through 9 • Feedback may be after one action, or a sequence • Help desk topics: output a representation of clusters Critic: Think About Critic: Possible Answers • How do we judge correct actions? • How do we judge correct actions? • Guessing an animal: Human feedback. • Guessing an animal: • OCRing digits: Human-categorized training set. • OCRing digits: • Identifying spam: Match to a set of human-categorized • Identifying spam: test documents. • Playing checkers: • Playing checkers: Who won? • Grouping documents: • Grouping documents: Which are most similar in language or content? • Can be generally categorized as supervised, unsupervised, reinforcement. 4

4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples - PDF document

Last Time ML II Decision trees and how to build them What might Information Gain we learn from these Entropy examples? Next up: Elements of a Learning System What can go wrong? How do we know how it went? 1 2

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Overfitting Many hypotheses consistent with/close to the data About this class With enough

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Course Evaluations 1. More examples Worked examples on whiteboard? Concrete examples of

Objectives You should be able to ... Lambda Calculus Examples Here are some examples! Dr.

Graphs More Examples More Examples More Examples Path graph P n : V = {1,,n} and E = {

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine & Trampert (2012)

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Datalog Development Tools O. Febbraro 2 , G. Grasso 3 , N. Leone 1 , K. Reale 1 and Francesco Ricca

Giuliano Preparata A Quintessential Physicist New Energy Times Archives La Fisica di Giuliano

Explicit Instruction Webinar #3 Designing Lessons Vocabulary and Skills Presented by: Gina

Constraint Sa+sfac+on Problems Toolbox so far Uninformed search

Tropical moist dynamical theory Tropical moist dynamical theory from AIRS and TRMM from AIRS and

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3

Day 3: Classification Lucas Leemann Essex Summer School Introduction to Statistical Learning L.

4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples - PDF document

Last Time ML II Decision trees and how to build them What might Information Gain we learn from these Entropy examples? Next up: Elements of a Learning System What can go wrong? How do we know how it went? 1 2

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

Holdout and Cross- -Validation Validation Holdout and Cross Methods Overfitting Avoidance

Regularization The problem of overfitting Machine Learning Example: Linear regression (housing

CSE 446: Week 3: Decision Trees (Apr 4) Instructor: Sergey Levine I. Overfitting idea 1: holdout

Overfitting Many hypotheses consistent with/close to the data About this class With enough

The Paradox of Overfitting Volker Nannen February 1, 2003 Artificial Intelligence

Course Evaluations 1. More examples Worked examples on whiteboard? Concrete examples of

Objectives You should be able to ... Lambda Calculus Examples Here are some examples! Dr.

Graphs More Examples More Examples More Examples Path graph P n : V = {1,,n} and E = {

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine &amp; Trampert (2012)

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Overfitting, Cross-Validation Recommended reading: Neural nets: Mitchell Chapter 4

Data Mining Model Overfitting Introduction to Data Mining, 2 nd Edition by Tan, Steinbach,

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Datalog Development Tools O. Febbraro 2 , G. Grasso 3 , N. Leone 1 , K. Reale 1 and Francesco Ricca

Giuliano Preparata A Quintessential Physicist New Energy Times Archives La Fisica di Giuliano

Explicit Instruction Webinar #3 Designing Lessons Vocabulary and Skills Presented by: Gina

Constraint Sa+sfac+on Problems Toolbox so far Uninformed search

Tropical moist dynamical theory Tropical moist dynamical theory from AIRS and TRMM from AIRS and

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3

Day 3: Classification Lucas Leemann Essex Summer School Introduction to Statistical Learning L.

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine & Trampert (2012)