supervised learning via decision trees
play

Supervised Learning via Decision Trees Lecture 9 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP3770 Artificial Intelligence | Spring 2016 | Derbinsky Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March 22, 2016 1 Wentworth Institute of Technology


  1. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Supervised Learning via Decision Trees Lecture 9 Supervised Learning via Decision Trees March 22, 2016 1

  2. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Outline 1. Learning via feature splits 2. ID3 – Information gain 3. Extensions – Continuous features – Gain ratio – Ensemble learning Supervised Learning via Decision Trees March 22, 2016 2

  3. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Decision Trees • Sequence of decisions at choice nodes from root to a leaf node – Each choice node splits on a single feature • Can be used for classification or regression • Explicit, easy for humans to understand • Typically very fast at testing/prediction time https://en.wikipedia.org/wiki/Decision_tree_learning Supervised Learning via Decision Trees March 22, 2016 3

  4. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Input Data (Weather) Supervised Learning via Decision Trees March 22, 2016 4

  5. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Output Tree (Weather) Supervised Learning via Decision Trees March 22, 2016 5

  6. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Training Issues • Approximation – Optimal tree-building is NP-complete – Typically greedy, top-down • Under/over-fitting – Occam’s Razor vs. CC/SSN • Pruning, ensemble methods • Splitting metric – Information gain , gain ratio, Gini impurity Supervised Learning via Decision Trees March 22, 2016 6

  7. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky I terative D ichotomiser 3 • Invented by Ross Quinlan in 1986 – Precursor to C4.5/5 • Categorical data only (can’t split on numbers) • Greedily consumes features – Subtrees cannot reconsider previous feature(s) for further splits – Typically produces shallow trees Supervised Learning via Decision Trees March 22, 2016 7

  8. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A • branch = ID3( attributes - {A} ) Classification Regression • “same” = same class • “same” = std. dev. < ε • f (examples) = majority • f (examples) = average • “best” = information gain • “best” = std. dev. reduction Now! http://www.saedsayad.com/decision_tree_reg.htm Supervised Learning via Decision Trees March 22, 2016 8

  9. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Shannon Entropy • Measure of “impurity” or uncertainty • Intuition: the less likely the event, the more information is transmitted Supervised Learning via Decision Trees March 22, 2016 9

  10. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Entropy Range Small Large Supervised Learning via Decision Trees March 22, 2016 10

  11. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Quantifying Entropy H ( X ) = E [ I ( X )] Expected value of information X Z P ( x i ) I ( x i ) P ( x ) I ( x ) dx i Supervised Learning via Decision Trees March 22, 2016 11

  12. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Intuition for Information I ( X ) = . . . I ( X ) ≥ 0 • Shouldn’t be negative • Events that always occur I (1) = 0 communicate no information • Information from independent I ( X 1 , X 2 ) = events are additive I ( X 1 ) + I ( X 2 ) Supervised Learning via Decision Trees March 22, 2016 12

  13. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Quantifying Information 1 I ( X ) = log b P ( X ) = − log b P ( X ) Log Base = Units: 2=bit ( bi nary digi t ), 3=trit, e=nat X H ( X ) = − P ( x i ) log b P ( x i ) i Log Base = Units: 2=shannon/bit Supervised Learning via Decision Trees March 22, 2016 13

  14. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Example: Fair Coin Toss I(heads) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit I(tails) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit H(fair toss) = (0 . 5)(1) + (0 . 5)(1) = = 1 shannon Supervised Learning via Decision Trees March 22, 2016 14

  15. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Example: Double Headed Coin H (double head) = (1) · I (head) = (1) · log 2 (1 1) = (1) · (0) = 0 shannons Supervised Learning via Decision Trees March 22, 2016 15

  16. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Exercise: Weighted Coin Compute the entropy of a coin that will land on heads about 25% of the time, and tails the remaining 75%. Supervised Learning via Decision Trees March 22, 2016 16

  17. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Answer H (weighted toss) = (0 . 25) · I (head) + (0 . 75) · I (tails) 1 1 = (0 . 25) · log 2 0 . 25 + (0 . 75) · log 2 0 . 75 = 0 . 81 shannons Supervised Learning via Decision Trees March 22, 2016 17

  18. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Entropy vs. P Supervised Learning via Decision Trees March 22, 2016 18

  19. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Exercise Calculate the entropy of the following data Supervised Learning via Decision Trees March 22, 2016 19

  20. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Answer H (data) = 16 30 · I (green circle) + 14 30 · I (purple cross) = 16 30 16 + 14 30 30 · log 2 30 · log 2 14 = 0 . 99679 shannons Supervised Learning via Decision Trees March 22, 2016 20

  21. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Bounds on Entropy H ( X ) ≥ 0 H ( X ) = 0 ⇐ ⇒ ∃ x ∈ X ( P ( x ) = 1) H b ( X ) ≤ log b ( |X| ) |X| denotes the number of elements in the range of X H b ( X ) = log b ( |X| ) ⇐ ⇒ X has a uniform distribution over |X| Supervised Learning via Decision Trees March 22, 2016 21

  22. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Information Gain To use entropy for a splitting metric, we consider the information gain of an action as the resulting change in entropy IG( T, a ) = H( T ) − H( T | a ) | T i | X = H( T ) − | T | H( T i ) i Average Entropy of the children Supervised Learning via Decision Trees March 22, 2016 22

  23. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Example Split { 4 17 , 13 17 } { 12 13 , 1 13 } { 16 30 , 14 30 } Supervised Learning via Decision Trees March 22, 2016 23

  24. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Example Information Gain H 1 = 4 17 4 + 13 17 17 log 2 17 log 2 13 ∼ 0 . 79 H 2 = 12 13 12 + 1 13 13 log 2 13 log 2 1 ∼ 0 . 39 IG = H( T ) − (17 30H 1 + 13 30H 2 ) = 0 . 99679 − 0 . 62 = 0 . 38 shannons Supervised Learning via Decision Trees March 22, 2016 24

  25. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky Exercise Consider the following dataset. Compute the information gain for each of the non-target attributes. Decide which attribute is the best to split on. X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 22, 2016 25

  26. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky H(C) H ( C ) = − (0 . 5) log 2 0 . 5 − (0 . 5) log 2 0 . 5 = 1 shannon X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 22, 2016 26

  27. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky IG(C,X) H(C | X) = 3 4[2 2 + 1 3 1] + 1 3 3 log 2 3 log 2 4[0] = 0 . 689 shannons IG(C , X) = 1 − 0 . 689 = 0 . 311 shannons X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 22, 2016 27

  28. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky IG(C,Y) H ( C | Y ) = 1 2[0] + 1 2[0] = 0 shannons IG( C, Y ) = 1 − 0 = 1 shannon X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 22, 2016 28

  29. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2016 | Derbinsky IG(C,Z) H ( C | Y ) = 1 2[1] + 1 2[1] = 1 shannons IG( C, Z ) = 1 − 1 = 0 shannons X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 22, 2016 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend