An easy problem: two attributes provide most of the information - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 12, 2007 Decision Trees 2 20 questions • Consider this game of 20 questions on the web: 20Q.net Inc. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 2

Pick your poison • How do you decide if a mushroom is edible? • What’s the best identification strategy? • Let’s try decision trees. “Death Cap” Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 3 Some mushroom data (from the UCI machine learning repository) • • � • EDIBLE? CAP-SHAPE CAP-SURFACE CAP-COLOR ODOR STALK-SHAPE POPULATION HABITAT 1 edible flat fibrous red none tapering several woods • • � • 2 poisonous convex smooth red foul tapering several paths • • � • 3 edible flat fibrous brown none tapering abundant grasses • • � • 4 edible convex scaly gray none tapering several woods • • � • 5 poisonous convex smooth red foul tapering several woods • • � • 6 edible convex fibrous gray none tapering several woods • • � • 7 poisonous flat scaly brown fishy tapering several leaves • • � • 8 poisonous flat scaly brown spicy tapering several leaves • • � • 9 poisonous convex fibrous yellow foul enlarging several paths • • � • 10 poisonous convex fibrous yellow foul enlarging several woods • • � • 11 poisonous flat smooth brown spicy tapering several woods • • � • 12 edible convex smooth yellow anise tapering several woods • • � • 13 poisonous knobbed scaly red foul tapering several leaves • • � • 14 poisonous flat smooth brown foul tapering several leaves • • � • 15 poisonous flat fibrous gray foul enlarging several woods • • � • 16 edible sunken fibrous brown none enlarging solitary urban • • � • 17 poisonous flat smooth brown foul tapering several woods • • � • 18 poisonous convex smooth white foul tapering scattered urban • • � • 19 poisonous flat scaly yellow foul enlarging solitary paths • • � • 20 edible convex fibrous gray none tapering several woods • • � • • • � • • • � • • • � • • • � • • • � • • • � • • • � • • • � • • • � • Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 4

An easy problem: two attributes provide most of the information Poisonous: 44 Edible: 46 ODOR is almond, anise, or none +++"(",+-.+/0++1++23 no yes Poisonous: 1 Edible: 46 Poisonous: 43 SPORE-PRINT -COLOR is Edible: 0 +++$!",' ! !,#%4 ! 5"*",+-.+/0++1++6++7++8++2++9++:3 green !"#$"%"&$ no yes Poisonous: 0 100% classification accuracy Poisonous: 1 Edible: 46 on a 100 examples. Edible: 0 '(#)*' !"#$"%"&$ Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 5 Same problem with no odor or spore-print-color +++,#%% ! -'%'.+/0+12++3++45 +++,#%% ! (&6-#),+7+8 +++(96%: ! (*.;6-! ! 6$'<! ! .#),+7+2 +++-6& ! -'%'.+/0+18++=++2++>++?5 +++,#%% ! -'%'.+7+8@ !"#$%! &'#(')'*( +++,#%% ! (#A!+7+= &'#(')'*( !"#$%! !"#$%! 100% classification accuracy +++(96%: ! (*.;6-! ! 6$'<! ! .#),+7+8 &'#(')'*( on a 100 examples. Pretty good, right? &'#(')'*( !"#$%! What if we go off hunting with this decision tree? Performance on another set of 100 mushrooms: Why? 80% Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 6

Not enough examples? Training error 100 98 Testing error 96 %correct on another set of t he same size 94 92 90 88 Why is the testing error always lower than the 86 training error? 84 82 80 0 200 400 600 800 1000 1200 1400 1600 1800 2000 # training examples Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 7 The Overfitting Problem: Example Class B Class A • Suppose that, in an ideal world, class B is everything such that X 2 >= 0.5 and class A is everything with X 2 < 0.5 • Note that attribute X 1 is irrelevant • Generating a decision tree would be trivial, right? The following examples are from Prof Hebert. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 8

The Overfitting Problem: Example • But in the real world, our observations have variability. • They can also be corrupted by noise. • Thus, the observed pattern is more complex than it appears. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 9 The Overfitting Problem: Example ing Problem: Example • noise makes the decision tree more complex than it should be • The algorithm tries to classify all of the training set perfectly • This is a fundamental problem in learning and is called overfitting Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 10

The Overfitting Problem: Example ing Problem: Example • noise makes the decision tree more complex than it should be • The algorithm tries to classify all of the training set perfectly The tree classifies • This is a fundamental problem in this point as ‘A’, but it won’t generalize learning and is called overfitting to new examples. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 11 The Overfitting Problem: Example ing Problem: Example • noise makes the decision tree more complex than it should be • The algorithm tries to classify all of the training set perfectly • This is a fundamental problem in The problem started learning and is called overfitting here. X 1 is irrelevant to the underlying structure. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 12

The Overfitting Problem: Example ing Problem: Example Is there a way to identify that splitting this node is not helpful? • noise makes the decision tree Idea: When splitting would result in more complex than it should be a tree that is too “complex”? • The algorithm tries to classify all of the training set perfectly • This is a fundamental problem in The problem started learning and is called overfitting here. X 1 is irrelevant to the underlying structure. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 13 Addressing overfitting • Grow tree based on training data. • This yields an unpruned tree. • Then prune nodes from the tree that are unhelpful. • How do we know when this is the case? - Use additional data not used in training, ie test data - Use a statistical significance test to see if extra nodes are different from noise - Penalize the complexity of the tree Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 14

Training Data Unpruned decision tree from training data Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 15 Training data with the partitions induced Unpruned decision tree by the decision tree from training data (Notice the tiny regions at the top necessary to correctly classify the ‘A’ outliers!) Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 16

Training data Unpruned decision tree from training data Test data Performance (% correctly classified) Training: 100% Test: 77.5% Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 17 Training data Pruned decision tree from training data Test data Performance (% correctly classified) Training: 95% Test: 80% Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 18

Training data Pruned decision tree from training data Test data Performance (% correctly classified) Training: 80% Test: 97.5% Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 19 performance on Tree with best test set % of data correctly classified Performance on training set Performance on test set Size of decision tree Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 20

General principle Classification performance on training data %correct classification Region of overfitting the training data Classification performance on test data Complexity of model (eg size of tree) • As its complexity increases, the model is able to better classify the training data • Performance on the test data initially increases, but then falls as the model overfits , or becomes specialized for classifying the noise training • The complexity in decision trees is the number of free parameters, ie the number of nodes Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 21 Strategies for avoiding overfitting: Pruning • Ovoiding overfitting is equivalent to achieving good generalization • All strategies need some way to control the complexity of the model • Pruning: - constructs a standard decision tree, but keep a test data set on which the model is not trained - prunes leaves recursively - splits are eliminated (or pruned) by evaluating performance on the test data - a leaf is pruned if classification on the test data increases by removing the split Prune node if classification performance on test set is (2) (1) greater for (2) than for (1) Artificial Intelligence: Decision Trees 2 Michael S. Lewicki � Carnegie Mellon 22

An easy problem: two attributes provide most of the information - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 12, 2007 Decision Trees 2 20 questions Consider this game of 20 questions on the web: 20Q.net Inc. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Attributes Attributes: Describe a characteristic of a value Provide a way of assessing

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Artificial Intelligence Chapter 1, Sections 13 of; based on AIMA Slides c Artificial

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Testing Robotic Systems: A New Battlefield ! Arnaud Gotlieb Simula Research Laboratory Norway

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

Reinventing Mobility with Artificial Intelligence Pascal Van Hentenryck University of Michigan

Introduction to Artificial Intelligence ITK 340, Spring 2012 For Thursday Read Russell and

Our Road to Continuous Delivery @ Tango Amit Mathur Tango Overview Founded in 2009 and

Exploratory Factor Analysis Applied Multivariate Statistics Spring 2012 Latent-variable models

An easy problem: two attributes provide most of the information - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 12, 2007 Decision Trees 2 20 questions Consider this game of 20 questions on the web: 20Q.net Inc. Artificial Intelligence: Decision Trees 2 Michael S. Lewicki

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Attributes Attributes: Describe a characteristic of a value Provide a way of assessing

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Artificial Intelligence Chapter 1, Sections 13 of; based on AIMA Slides c Artificial

Generalized Measurement Invariance Tests for Proposed Proposed Tests Tests Factor Analysis

Testing Robotic Systems: A New Battlefield ! Arnaud Gotlieb Simula Research Laboratory Norway

In Intr troduction n to to Ar Arti tificial l In Inte telligence e (A (AI) I) Com

Reinventing Mobility with Artificial Intelligence Pascal Van Hentenryck University of Michigan

Introduction to Artificial Intelligence ITK 340, Spring 2012 For Thursday Read Russell and

Our Road to Continuous Delivery @ Tango Amit Mathur Tango Overview Founded in 2009 and

Exploratory Factor Analysis Applied Multivariate Statistics Spring 2012 Latent-variable models

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype