Supervised Learning via Decision Trees Lecture 4 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Outline 1. Learning via feature splits 2. ID3 – Information gain 3. Extensions – Continuous features – Gain ratio – Ensemble learning Supervised Learning via Decision Trees October 13, 2015 2

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Decision Trees • Sequence of decisions at choice nodes from root to a leaf node – Each choice node splits on a single feature • Can be used for classification or regression • Explicit, easy for humans to understand • Typically very fast at testing/ prediction time h"ps://en.wikipedia.org/wiki/Decision_tree_learning Supervised Learning via Decision Trees October 13, 2015 3

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Weather Example Supervised Learning via Decision Trees October 13, 2015 4

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky IRIS Example Supervised Learning via Decision Trees October 13, 2015 5

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Training Issues • Approximation – Optimal tree-building is NP-complete – Typically greedy, top-down • Bias vs. Variance – Occam’s Razor vs. CC/SSN • Pruning, ensemble methods • Splitting metric – Information gain , gain ratio , Gini impurity Supervised Learning via Decision Trees October 13, 2015 6

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky I terative D ichotomiser 3 • Invented by Ross Quinlan in 1986 – Precursor to C4.5/5 • Categorical data only • Greedily consumes features – Subtrees cannot consider previous feature(s) for further splits – Typically produces shallow trees Supervised Learning via Decision Trees October 13, 2015 7

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A • branch = ID3( attributes - {A} ) Supervised Learning via Decision Trees October 13, 2015 8

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Details Classification Regression • “same” = same class • “same” = std. dev. < ε • f (examples) = majority • f (examples) = average Supervised Learning via Decision Trees October 13, 2015 9

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursion • A method of programming in which a function refers to itself in order to solve a problem – Example: ID3 calls itself for subtrees • Never necessary – In some situations, results in simpler and/or easier-to-write code – Can often be more expensive in terms of memory + time Supervised Learning via Decision Trees October 13, 2015 10

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example Consider the factorial function n Y n ! = k = 1 ∗ 2 ∗ 3 ∗ . . . ∗ n k =1 Supervised Learning via Decision Trees October 13, 2015 11

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Iterative Implementation def factorial(n): result = 1 for i in range(n): result *= (i+1) return result Supervised Learning via Decision Trees October 13, 2015 12

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Consider a Recursive Definition Base Case 0! = 1 when n ≥ 1 n ! = n ( n − 1)! Recursive Step Supervised Learning via Decision Trees October 13, 2015 13

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursive Implementation def factorial_r(n): if n == 0: return 1 else: return (n * factorial_r(n-1)) Supervised Learning via Decision Trees October 13, 2015 14

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 factorial_r return 1 * factorial_r( 0 ) factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 15

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 * 1 factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 16

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 2 * 1 Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 17

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 3 * 2 factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 18

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 4 * 6 Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 19

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack Stack main Frame print 24 Supervised Learning via Decision Trees October 13, 2015 20

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A Base Cases • branch = ID3( attributes - {A} ) Recursive Step Supervised Learning via Decision Trees October 13, 2015 21

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Splitting Metric: The “best” Feature Classification Regression • Information gain • Standard Deviation Reduction – Goal: choose splits that proceed from much->little h"p://www.saedsayad.com/ uncertainty decision_tree_reg.htm Supervised Learning via Decision Trees October 13, 2015 22

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Shannon Entropy • Measure of “impurity” or uncertainty • Intuition: the less likely the event, the more information is transmitted Supervised Learning via Decision Trees October 13, 2015 23

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Entropy Range Small Large Supervised Learning via Decision Trees October 13, 2015 24

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Entropy H ( X ) = E [ I ( X )] Expected value of informaCon X Z P ( x i ) I ( x i ) P ( x ) I ( x ) dx i Supervised Learning via Decision Trees October 13, 2015 25

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Intuition for Information I ( X ) = . . . I ( X ) ≥ 0 • Shouldn’t be negative I (1) = 0 • Events that always occur communicate no information • Information from independent I ( X 1 , X 2 ) = events are additive I ( X 1 ) + I ( X 2 ) Supervised Learning via Decision Trees October 13, 2015 26

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Information 1 I ( X ) = log b P ( X ) = − log b P ( X ) Log Base = Units: 2=bit ( bi nary digi t ), 3=trit, e=nat X H ( X ) = − P ( x i ) log b P ( x i ) i Log Base = Units: 2=shannon/bit Supervised Learning via Decision Trees October 13, 2015 27

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Fair Coin Toss I(heads) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit I(tails) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit H(fair toss) = (0 . 5)(1) + (0 . 5)(1) = = 1 shannon Supervised Learning via Decision Trees October 13, 2015 28

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Double Headed Coin H (double head) = (1) · I (head) = (1) · log 2 (1 1) = (1) · (0) = 0 shannons Supervised Learning via Decision Trees October 13, 2015 29

Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Exercise: Weighted Coin Compute the entropy of a coin that will land on heads about 25% of the time, and tails the remaining 75%. Supervised Learning via Decision Trees October 13, 2015 30

Supervised Learning via Decision Trees Lecture 4 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1 Wentworth Institute of Technology COMP4050

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

HOT COMMUNITY GRANTS KEY OBJECTIVES Encourage leadership Help communities share Broaden the

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal

WHAT WILL YOU WRITE? OST STUDENT STAFF INFO SESSIONS FALL 2020 OVERVIEW Office of Student

Neutron emission asymmetries from linearly polarized rays on nat Cd, nat Sn, and 181 Ta Clarke

(1) status of publications - peer reviewed journals pending papers from phase 1 - conference

S OME H OUSEKEEPING I TEMS ( CONTINUED ) To ask a question Select the Questions pane on

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&M

Supervised Learning via Decision Trees Lecture 4 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1 Wentworth Institute of Technology COMP4050

Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March

Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

HOT COMMUNITY GRANTS KEY OBJECTIVES Encourage leadership Help communities share Broaden the

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal

WHAT WILL YOU WRITE? OST STUDENT STAFF INFO SESSIONS FALL 2020 OVERVIEW Office of Student

Neutron emission asymmetries from linearly polarized rays on nat Cd, nat Sn, and 181 Ta Clarke

(1) status of publications - peer reviewed journals pending papers from phase 1 - conference

S OME H OUSEKEEPING I TEMS ( CONTINUED ) To ask a question Select the Questions pane on

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&amp;M

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&M