Extending Decision Trees Alice Gao Lecture 10 Based on work by K. - PowerPoint PPT Presentation

1/20 Extending Decision Trees Alice Gao Lecture 10 Based on work by K. Leyton-Brown, K. Larson, and P. van Beek

2/20 Outline Learning Goals Real-valued features Noise and over-fjtting Revisiting the Learning goals

3/20 Learning Goals By the end of the lecture, you should be able to cross-validation. ▶ Construct decision trees with real-valued features. ▶ Construct a decision tree for noisy data to avoid over-fjtting. ▶ Choose the best maximum depth of a decision tree by K -fold

4/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves the valet - training set Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 Yes No

5/20 High No Strong Normal Mild Rain 10 Yes Weak Cool Overcast Rain 9 Yes Weak High Cool Overcast 8 Yes 11 Mild Normal Cool Weak High Cool Sunny 14 No Strong High Sunny High 13 Yes Weak Normal Mild Sunny 12 Yes Weak Weak Mild Jeeves the valet - test set Strong 3 No Strong Normal Hot Rain 2 No High Cool Mild Sunny 1 Tennis? Wind Humidity Temp Outlook Day Rain High Overcast Normal 7 Yes Weak High Hot Rain 6 Yes Weak Cool Strong Overcast 5 Yes Strong High Hot Overcast 4 No No

6/20 Extending Decision Trees 1. Real-valued features 2. Noise and over-fjtting

7/20 Normal Yes Weak Normal 23.9 Rain 10 Yes Weak 20.6 Sunny Sunny 9 No Weak High 22.2 Sunny 8 Yes 11 23.9 Normal 27.2 Strong High 21.7 Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High 22.2 Overcast 12 Yes Strong Strong 17.7 Jeeves dataset with real-valued temperatures Weak 3 No Strong High 26.6 Sunny 2 No High 28.3 29.4 Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal 18.3 Rain 6 Yes Weak 20.0 Weak Rain 5 Yes Weak High 21.1 Rain 4 Yes No

8/20 Normal Yes Strong Normal 23.9 Sunny 11 Yes Weak 23.9 Sunny Rain 10 Yes Strong High 22.2 Overcast 12 No 2 26.6 High 28.3 Weak High 29.4 Sunny 1 Yes Weak High Overcast High 3 Yes Weak Normal 27.2 Overcast 13 No Strong Weak 22.2 Jeeves dataset ordered by temperatures Strong 5 No Strong Normal 18.3 Rain 6 Yes Normal 20.0 17.7 Overcast 7 Tennis? Wind Humidity Temp Outlook Day Rain Normal Sunny High 8 No Strong High 21.7 Rain 14 Yes Weak 21.1 Weak Rain 4 Yes Weak Normal 20.6 Sunny 9 Yes No

9/20 Handling a real-valued feature ▶ Discretize it. ▶ Dynamically choose a split point.

10/20 Choosing a split point for a real-valued feature 2. Possible split points are values that are midway between two difgerent values. 3. Suppose that the feature changes from X to Y. Should we takes the value X . takes the value Y . point. 7. Determine the expected information gain for each possible split point and choose the split point with the largest gain. 1. Sort the instances according to the real-valued feature consider ( X + Y ) / 2 as a possible split point? 4. Let L X be all the labels for the examples where the feature 5. Let L Y be all the labels for the examples where the feature 6. If there exists a label a ∈ L X and a label b ∈ L Y such that a ̸ = b , then we will consider ( X + Y ) / 2 as a possible split

11/20 CQ: Testing a discrete feature CQ: Suppose that feature X has discrete values (e.g. Temp is Cool, Mild, or Hot.) On any path from the root to a leaf, how many times can we test feature X ? (A) 0 times (B) 1 time (D) Two of (A), (B), and (C) are correct. (E) All of (A), (B), and (C) are correct. (C) > 1 time

12/20 CQ: Testing a real-valued feature CQ: Assume that we will do binary tests at each node in a decision tree. Suppose that feature X has real values (e.g. Temp ranges from 17.7 to 29.4.) On any path from the root to a leaf, how many times can we test feature X ? (A) 0 times (B) 1 time (D) Two of (A), (B), and (C) are correct. (E) All of (A), (B), and (C) are correct. (C) > 1 time

13/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves the valet - training set Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 Yes No

14/20 No Rain Overcast High Normal Sunny Yes Decision tree generated by ID3 Wind Yes No Yes Humidity Outlook Test error is 0/14. Weak Strong

15/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves training set is corrupted Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 No No

16/20 Sunny Strong Weak Rain Strong Weak High Normal Overcast High Normal No Decision tree for the corrupted data set Yes Wind Yes No Wind Yes Humidity No Yes Humidity Outlook Test error is 2/14.

17/20 Dealing with noisy data Problem: When the data is noisy, the ID3 algorithm grows the tree until the tree perfectly classifjes the training examples. Over-fjtting occurs. However, a smaller tree is likely to generalize to unseen data better. ▶ Grow the tree to a pre-specifjed maximum depth. ▶ Enforce a minimum number of examples at a leaf node. ▶ Post-prune the tree using a validation set.

18/20 Growing the tree to a maximum depth validation set. (For example, 2/3 is the training set and 1/3 is the validation set.) the maximum depth on the training set. validation set. highest prediction accuracy. ▶ Randomly split the entire dataset into a training set and a ▶ For each pre-specifjed maximum depth, generate a tree with ▶ Calculate the prediction accuracy of the generated tree on the ▶ Choose the maximum depth which results in the tree with the

19/20 K-fold cross-validation 1. For each pre-specifjed maximum depth, do steps 2 to 6. 2. Split the data into 5 equal subsets. 3. Perform 5 rounds of learning. 5. Over the 5 rounds, generate 5 difgerent trees and determine their prediction accuracies on the 5 difgerent data sets. 6. Calculate the average prediction accuracy on the validation sets. 7. Choose the maximum depth that results in the highest prediction accuracy on the validation sets. Suppose that K = 5. 4. In each round, 1 / 5 of the data is used as the validation set and 4 / 5 of the data is used as the training set.

20/20 Revisiting the Learning Goals By the end of the lecture, you should be able to cross-validation. ▶ Construct decision trees with real-valued features. ▶ Construct a decision tree for noisy data to avoid over-fjtting. ▶ Choose the best maximum depth of a decision tree by K -fold

Extending Decision Trees Alice Gao Lecture 10 Based on work by K. - PowerPoint PPT Presentation

1/20 Extending Decision Trees Alice Gao Lecture 10 Based on work by K. Leyton-Brown, K. Larson, and P. van Beek 2/20 Outline Learning Goals Real-valued features Noise and over-fjtting Revisiting the Learning goals 3/20 Learning Goals By

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Transition Planning One Agencys Ongoing Experience Background per the Fairfax County Park

Introduction Decision Tree for PlayTennis (Mitchell) CSCE CSCE 478/878 478/878 Outlook

Evaluating Information Extraction Andrea Esuli and Fabrizio Sebastiani Istituto di Scienza e

Towards an Ecosystem for Verifying Implementations of BFT protocols Ivana Vukotic, Vincent Rahli,

Welcome! Everything youll need to know is on the master website:

CPSC 121: Models of Computation Module 5: Predicate Logic Module 5: Predicate Logic Pre-class

Prolog In tro duction Prolog programmi ng F acts ab out ob jects ab

Discussion of Jared Bernstein s Wage Outcomes and Macroeconomic Conditions: What s the