supervised learning via decision trees
play

Supervised Learning via Decision Trees Lecture 4 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1 Wentworth Institute of Technology COMP4050


  1. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1

  2. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Outline 1. Learning via feature splits 2. ID3 – Information gain 3. Extensions – Continuous features – Gain ratio – Ensemble learning Supervised Learning via Decision Trees October 13, 2015 2

  3. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Decision Trees • Sequence of decisions at choice nodes from root to a leaf node – Each choice node splits on a single feature • Can be used for classification or regression • Explicit, easy for humans to understand • Typically very fast at testing/ prediction time h"ps://en.wikipedia.org/wiki/Decision_tree_learning Supervised Learning via Decision Trees October 13, 2015 3

  4. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Weather Example Supervised Learning via Decision Trees October 13, 2015 4

  5. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky IRIS Example Supervised Learning via Decision Trees October 13, 2015 5

  6. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Training Issues • Approximation – Optimal tree-building is NP-complete – Typically greedy, top-down • Bias vs. Variance – Occam’s Razor vs. CC/SSN • Pruning, ensemble methods • Splitting metric – Information gain , gain ratio , Gini impurity Supervised Learning via Decision Trees October 13, 2015 6

  7. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky I terative D ichotomiser 3 • Invented by Ross Quinlan in 1986 – Precursor to C4.5/5 • Categorical data only • Greedily consumes features – Subtrees cannot consider previous feature(s) for further splits – Typically produces shallow trees Supervised Learning via Decision Trees October 13, 2015 7

  8. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A • branch = ID3( attributes - {A} ) Supervised Learning via Decision Trees October 13, 2015 8

  9. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Details Classification Regression • “same” = same class • “same” = std. dev. < ε • f (examples) = majority • f (examples) = average Supervised Learning via Decision Trees October 13, 2015 9

  10. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursion • A method of programming in which a function refers to itself in order to solve a problem – Example: ID3 calls itself for subtrees • Never necessary – In some situations, results in simpler and/or easier-to-write code – Can often be more expensive in terms of memory + time Supervised Learning via Decision Trees October 13, 2015 10

  11. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example Consider the factorial function n Y n ! = k = 1 ∗ 2 ∗ 3 ∗ . . . ∗ n k =1 Supervised Learning via Decision Trees October 13, 2015 11

  12. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Iterative Implementation def factorial(n): result = 1 for i in range(n): result *= (i+1) return result Supervised Learning via Decision Trees October 13, 2015 12

  13. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Consider a Recursive Definition Base Case 0! = 1 when n ≥ 1 n ! = n ( n − 1)! Recursive Step Supervised Learning via Decision Trees October 13, 2015 13

  14. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursive Implementation def factorial_r(n): if n == 0: return 1 else: return (n * factorial_r(n-1)) Supervised Learning via Decision Trees October 13, 2015 14

  15. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 factorial_r return 1 * factorial_r( 0 ) factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 15

  16. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 * 1 factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 16

  17. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 2 * 1 Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 17

  18. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 3 * 2 factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 18

  19. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 4 * 6 Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 19

  20. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack Stack main Frame print 24 Supervised Learning via Decision Trees October 13, 2015 20

  21. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A Base Cases • branch = ID3( attributes - {A} ) Recursive Step Supervised Learning via Decision Trees October 13, 2015 21

  22. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Splitting Metric: The “best” Feature Classification Regression • Information gain • Standard Deviation Reduction – Goal: choose splits that proceed from much->little h"p://www.saedsayad.com/ uncertainty decision_tree_reg.htm Supervised Learning via Decision Trees October 13, 2015 22

  23. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Shannon Entropy • Measure of “impurity” or uncertainty • Intuition: the less likely the event, the more information is transmitted Supervised Learning via Decision Trees October 13, 2015 23

  24. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Entropy Range Small Large Supervised Learning via Decision Trees October 13, 2015 24

  25. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Entropy H ( X ) = E [ I ( X )] Expected value of informaCon X Z P ( x i ) I ( x i ) P ( x ) I ( x ) dx i Supervised Learning via Decision Trees October 13, 2015 25

  26. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Intuition for Information I ( X ) = . . . I ( X ) ≥ 0 • Shouldn’t be negative I (1) = 0 • Events that always occur communicate no information • Information from independent I ( X 1 , X 2 ) = events are additive I ( X 1 ) + I ( X 2 ) Supervised Learning via Decision Trees October 13, 2015 26

  27. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Information 1 I ( X ) = log b P ( X ) = − log b P ( X ) Log Base = Units: 2=bit ( bi nary digi t ), 3=trit, e=nat X H ( X ) = − P ( x i ) log b P ( x i ) i Log Base = Units: 2=shannon/bit Supervised Learning via Decision Trees October 13, 2015 27

  28. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Fair Coin Toss I(heads) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit I(tails) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit H(fair toss) = (0 . 5)(1) + (0 . 5)(1) = = 1 shannon Supervised Learning via Decision Trees October 13, 2015 28

  29. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Double Headed Coin H (double head) = (1) · I (head) = (1) · log 2 (1 1) = (1) · (0) = 0 shannons Supervised Learning via Decision Trees October 13, 2015 29

  30. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Exercise: Weighted Coin Compute the entropy of a coin that will land on heads about 25% of the time, and tails the remaining 75%. Supervised Learning via Decision Trees October 13, 2015 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend