Decision tree learning Aim: find a small tree consistent with the - PowerPoint PPT Presentation

Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose “most significant” attribute as root of (sub)tree function DTL ( examples, attributes, default ) returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return Mode ( examples ) else best ← Choose-Attribute ( attributes , examples ) tree ← a new decision tree with root test best for each value v i of best do examples i ← { elements of examples with best = v i } subtree ← DTL ( examples i , attributes − best , Mode ( examples )) add a branch to tree with label v i and subtree subtree return tree Chapter 18, Sections 1–3 24

Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) “all positive” or “all negative” Patrons? Type? None Some Full French Italian Thai Burger Patrons ? is a better choice—gives information about the classification Chapter 18, Sections 1–3 25

Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? Chapter 18, Sections 1–3 26

Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? � n ♦ Information: I ( P ( v 1 ) , ...P ( v n )) = i =1 − P ( v i ) log 2 P ( v i ) ♦ I (1 , 0) = 0 bit ♦ I (0 . 5 , 0 . 5) = − 0 . 5 × log 2 0 . 5 − 0 . 5 × log 2 0 . 5 = 1 bit Chapter 18, Sections 1–3 27

Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? � n ♦ Information: I ( P ( v 1 ) , ...P ( v n )) = i =1 − P ( v i ) log 2 P ( v i ) ♦ I (1 , 0) = 0 bit ♦ I (0 . 5 , 0 . 5) = − 0 . 5 × log 2 0 . 5 − 0 . 5 × log 2 0 . 5 = 1 bit ♦ I measures the information content for communication (or uncertainty in what is already known) ♦ The more one knows, the less to be communicated, the smaller is I ♦ The less one knows, the more to be communicated, the larger is I Chapter 18, Sections 1–3 28

Using Information Theory ♦ ( P ( pos ) , P ( neg )) : probabilities of positive and negative ♦ Attribute color : black (1,0), white (0,1) ♦ Attribute size : large (.5,.5), small (.5,.5) Chapter 18, Sections 1–3 29

Using Information Theory ♦ ( P ( pos ) , P ( neg )) : probabilities of positive and negative ♦ Attribute color : black (1,0), white (0,1) ♦ Attribute size : large (.5,.5), small (.5,.5) ♦ Before selecting an attribute • p = number of positive examples, n = number of negative examples p n • Estimating probabilities: P ( pos ) = p + n , P ( neg ) = p + n • Before () = I ( P ( pos ) , P ( neg )) Chapter 18, Sections 1–3 30

Selecting an Attribute ♦ Evaluating an attribute (e.g., color) • p i = number of positive examples for value i (e.g., black), n i = number of negative ones p i n i • Estimating probabilities for value i : P i ( pos ) = p i + n i , P i ( neg ) = p i + n i • v values for attribute A (e.g., 2 for color) p i + n i � v • Remainder ( A ) = After ( A ) = p + n I ( P i ( pos ) , P i ( neg )) [expected i =1 information] Chapter 18, Sections 1–3 31

Selecting an Attribute ♦ Evaluating an attribute (e.g., color) • p i = number of positive examples for value i (e.g., black), n i = number of negative ones p i n i • Estimating probabilities for value i : P i ( pos ) = p i + n i , P i ( neg ) = p i + n i • v values for attribute A (e.g., 2 for color) p i + n i � v • Remainder ( A ) = After ( A ) = p + n I ( P i ( pos ) , P i ( neg )) [expected i =1 information] ♦ “Information Gain” (reduction in uncertainty of what is known) • Gain ( A ) = Before () − After ( A ) [ Before () has more uncertainty] • Choose attribute A with the largest Gain ( A ) Chapter 18, Sections 1–3 32

Example contd. Decision tree learned from the 12 examples: Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger Fri/Sat? T T F No Yes F T Substantially simpler than “true” tree—a more complex hypothesis isn’t jus- tified by small amount of data Chapter 18, Sections 1–3 33

Performance measurement How do we know that h ≈ f ? How about measuring the accuracy of h on the examples that were used to learn h ? Chapter 18, Sections 1–3 34

Performance measurement How do we know that h ≈ f ? (Hume’s Problem of Induction ) 1. Use theorems of computational/statistical learning theory 2. Try h on a new test set of examples • use same distribution over example space as training set • divide into two disjoint subsets: training and test sets • prediction accuracy: accuracy on the (unseen) test set Chapter 18, Sections 1–3 35

Performance measurement Learning curve = % correct on test set as a function of training set size 1 % correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 10 20 30 40 50 60 70 80 90100 Training set size Chapter 18, Sections 1–3 36

Decision tree learning Aim: find a small tree consistent with the - PowerPoint PPT Presentation

Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose most significant attribute as root of (sub)tree function DTL ( examples, attributes, default ) returns a decision tree if examples

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Decision Tree and Automata Learning Stefan Edelkamp 1 Overview - Decision tree representation

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Decision Tree Learning: Part 2 CS 760@UW-Madison Goals for the last lecture you should

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Team

Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Decision trees

Decision Trees Petr Pok Czech Technical University in Prague Faculty of Electrical

Machine Learning I: Decision Trees AI Class 14 (Ch. 18.118.3) Cynthia Matuszek CMSC 671

and Random Forests Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Universit

Supervised Learning Decision Trees and Linear Models Marco Chiarandini Department of Mathematics

Applied Machine Learning Decision Trees Siamak Ravanbakhsh COMP 551 (Fall 2020)

Foundations of Artificial Intelligence 14. Machine Learning Learning from Observations Joschka

Contents Introduction Linear Regression Generalized Linear Regression Decision Trees with

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: Introduction to Machine Learning