Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1

Today’s Agenda We will address: 1. How do you train and test decision trees? 2. How can decision trees generalize? 3. What is the inductive bias of decision trees? COSC 425: Intro. to Machine Learning 2

Refresher Decision Tree Overview isCompilers 1. Questions à Trees no yes Problem: Asking the right questions. 2. Terminology isOnline Dislike Instance, Question, Answer, Label no yes 3. Finding the “Right” Tree Informative / Uninformative Questions isEasy isMorning? 4. Boolean Functions no yes no yes Trees ßà If-Then Rules 5. Decision Boundaries Dislike Like Dislike Like Plotting trees in 2D space COSC 425: Intro. to Machine Learning 3

1. How do you train / test Decision Trees? COSC 425: Intro. to Machine Learning 4

Decision Tree: Usage Suppose we get a new instance … radius = 16, texture = 12 How do we classify it? Procedure At every node, test the corresponding attribute. • Follow the branch based on the test. • When you reach a leaf, you have two options: • 1. Predict the class of the majority of examples at that test; or 2. Sample from the probabilities of the two classes. COSC 425: Intro. to Machine Learning 5

Decision Tree: Usage Deci Decisi sionTreeT eeTest est(t (tree, te testP tPoint) IS leaf(guess) th if tree IS if then : re return guess el else e if tree IS IS node(f, left, right) th then: if f IS if IS no in in testPoint th then : re return De Decisio isionTreeTest st(t (tree, ee, te testP stPoint) el else re return De Decisio isionTreeTest st(t (tree, ee, te testP stPoint) end if en end if Note: Decision tree algorithms are generally variations of core top-down algos. (See Quinlan’s Programs for Machine Learning. 1993.) COSC 425: Intro. to Machine Learning 6

Decision Tree: Training Given a set of training instances (i.e. <x i ,y i >) , we build a tree. Let’s say that this is our current node. 1. We iterate over all the features available in our current node. (Blue arrow) 2. For each feature, we test how “useful” it is to split on this feature from the current node. This *always* produces two child nodes. • COSC 425: Intro. to Machine Learning 7

Decision Tree: Training Let’s say that this is our current node. Given a set of training instances (i.e. <x i ,y i >) , we build a tree. 1. Exit Condition: If all training instances have the same class label (y i ), create a leaf with that class label and exit. 2. Test Selection: Pick the best test to split the data on. 3. Splitting : Split the training set according to the value of the outcome of the selected test from #2. 4. Recurse: Recursively repeat steps 1-3 on each subset of the training data. COSC 425: Intro. to Machine Learning 8

Decision Tree: Training Deci Decisi sionTreeT eeTrain(d (data, re remainingFeature res) guess ß most frequent answer in data if labels in data IS if IS ambiguous th then : Leaf Creation re return LEAF(guess) 1 IS empty th else el e if remaining features IS then : return LEAF(guess) re el else: e: fo for all f IN IN remaining features do do : Splitting Criterion NO ß the subset of data which f = no YES ß the subset of data which f = yes 2 score ß # of majority-vote answers in NO + # of majority-vote answers in YES end for en [ … Conti [ tinued in Next t Slide … ] en end if COSC 425: Intro. to Machine Learning 9

Decision Tree: Training Deci Decisi sionTreeT eeTrain(d (data, re remainingFeature res) guess ß most frequent answer in data [ [ … STEP EP 1 1 in Prior Slide… ] else: el e: fo for all f IN IN remaining features do do : Splitting Criterion NO ß the subset of data which f = no 2 YES ß the subset of data which f = yes score ß # of majority-vote answers in NO + # of majority-vote answers in YES en end for f ß the feature with the maximal score (f) Split Selection NO ß the subset of the data on which f = no 3 YES ß the subset of the data on which f = yes. left ß DecisionTreeTrain(NO, remaining features / { f }) right ß DecisionTreeTrain(YES, remaining features / { f }) Re Return NODE(f, left, right) end if en COSC 425: Intro. to Machine Learning 10

Decision Tree: Training What makes a good test? A “good” test provides information about the class label. Example: Say that you’re given 40 examples. (30 Positive & 10 Negative) Consider two tests that would split the examples as follows: All negatives bucketed to t , with some division for positives. Positives split evenly with negatives being more even, too. Option 2 Option 1 COSC 425: Intro. to Machine Learning 11

Decision Tree: Training What makes a good test? A “good” test provides information about the class label. Which is best? Example: Say that you’re given 40 examples. (30 Positive & 10 Negative) We prefer attributes that separate. Problem: How can we quantify this? Consider two tests that would split the examples as follows: All negatives bucketed to t , with some division for positives. Positives split evenly with negatives being more even, too. T2 T1 COSC 425: Intro. to Machine Learning 12

Splitting Mechanisms Quantifying Prospective Splits 1. Information Gain à Measure the entropy of a node’s information. COSC 425: Intro. to Machine Learning 13

Information Content as a Metric Consider three cases: Dice Two-sided Coin Biased Coin Each case yields a different amount of uncertainty to their observed outcome. COSC 425: Intro. to Machine Learning 14

Information Content as a Metric Let E be an event that occurs with probability P(E) . If we are told that E has occurred with certainty, then we receive I(E) bits of information. Alternative Perspective : Think of information as ”surprise” in the outcome. For example, if P(E) = 1, then I(E) = 0. Fair Coin Flip à log 2 2 = 1 bit of information • Fair Dice Roll à log 2 6 = 2.58 bits of information • COSC 425: Intro. to Machine Learning 15

Information Content as a Metric An example is the English alphabet. Consider all the letters within it. The lower their probability, the higher their information content / surprise. COSC 425: Intro. to Machine Learning 16

Information Entropy Given an information source S which yields k symbols from an alphabet { s 1 , …, s k } with probabilities { p 1 , …, p k } where each yield is independent of the others. Calculating Entropy 1. Take the log of 1 / p i In other words … 2. Multiply the value from Step 1 by p i . 3. Rinse and repeat for all ”symbols”. H(S) is the entropy of the information source. COSC 425: Intro. to Machine Learning 17

Information Entropy Calculate Entropy Several ways to think about Entropy: Average amount of information per symbol. • Average amount of surprise when observing the symbol. • Uncertainty the observer has before seeing the symbol. • Average number of bits needed to communicate the symbol. • COSC 425: Intro. to Machine Learning 18

Binary Classification Let’s now try to classify a sample of the data S using a decision tree. Suppose we have p positive samples and n negative samples. What’s the entropy of the dataset? COSC 425: Intro. to Machine Learning 19

Binary Classification Example: Say that you’re given 40 examples. (30 Positive & 10 Negative) COSC 425: Intro. to Machine Learning 20

Binary Classification Interpreting Entropy: Entropy = 0 when all members of S belong to the same class. Entropy = 1 when classes in S are represented equally (i.e., Num of p == Num of n ) Problem : Raw entropy only works for the current node. à Because child nodes have access to smaller subsets of the data. COSC 425: Intro. to Machine Learning 21

Conditional Entropy The conditional entropy H( | x ) is the average specific conditional entropy of y given the values of x . H( y | Calculate Entropy Plain English : When we evaluate a prospective child node, we need to evaluate how a node’s information changes probabilistically. Cond. Entropy COSC 425: Intro. to Machine Learning 22

Decision Tree: Training What makes a good test? A “good” test provides information about the class label. Example: Say that you’re given 40 examples. (30 Positive & 10 Negative) T2 T1 Now, you split on the feature that gives you the highest information gain: H(S) – H(S | x) COSC 425: Intro. to Machine Learning 23

Splitting Mechanisms Quantifying Prospective Splits 1. Information Gain à Measure the entropy of a node’s information. 2. Gini Impurity à Measure the “impurity” of a node’s information.. Note: A 4 th measure is variance. (Continuously targets only.) COSC 425: Intro. to Machine Learning 24

Gini Impurity Given an information source S which yields k symbols from an alphabet { s 1 , …, s k } with probabilities { p 1 , …, p k } where each yield is independent of the others. 2 G(S) G(S) So, what? Gini outperforms computationally. (No log calls.) COSC 425: Intro. to Machine Learning 25

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 Todays Agenda We will address: 1. How do you train and test decision trees? 2.

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Contents Introduction Linear Regression Generalized Linear Regression Decision Trees with

Foundations of Artificial Intelligence 14. Machine Learning Learning from Observations Joschka

Applied Machine Learning Decision Trees Siamak Ravanbakhsh COMP 551 (Fall 2020)

Supervised Learning Decision Trees and Linear Models Marco Chiarandini Department of Mathematics

Decision Trees CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: some examples & figures

Decision Trees MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr

Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2019

Decision Tree and Random Forest Implementations for fast Fitlering of Sensor Data Sebastian

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees II Dr. Alex Williams August 26, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 Todays Agenda We will address: 1. How do you train and test decision trees? 2.

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Contents Introduction Linear Regression Generalized Linear Regression Decision Trees with

Foundations of Artificial Intelligence 14. Machine Learning Learning from Observations Joschka

Applied Machine Learning Decision Trees Siamak Ravanbakhsh COMP 551 (Fall 2020)

Supervised Learning Decision Trees and Linear Models Marco Chiarandini Department of Mathematics

Decision Trees CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: some examples &amp; figures

Decision Trees MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr

Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2019

Decision Tree and Random Forest Implementations for fast Fitlering of Sensor Data Sebastian

Decision Trees CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: some examples & figures