supervised learning via decision trees
play

Supervised Learning via Decision Trees Lecture 8 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP3770 Artificial Intelligence | Spring 2017 | Derbinsky Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March 27, 2017 1 Wentworth Institute of Technology


  1. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Supervised Learning via Decision Trees Lecture 8 Supervised Learning via Decision Trees March 27, 2017 1

  2. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Outline 1. Learning via feature splits 2. ID3 – Information gain 3. Extensions – Continuous features – Gain ratio – Ensemble learning Supervised Learning via Decision Trees March 27, 2017 2

  3. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Decision Trees • Sequence of decisions at choice nodes from root to a leaf node – Each choice node splits on a single feature • Can be used for classification or regression • Explicit, easy for humans to understand • Typically very fast at testing/prediction time https://en.wikipedia.org/wiki/Decision_tree_learning Supervised Learning via Decision Trees March 27, 2017 3

  4. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Input Data (Weather) Supervised Learning via Decision Trees March 27, 2017 4

  5. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Output Tree (Weather) Supervised Learning via Decision Trees March 27, 2017 5

  6. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Training Issues • Approximation – Optimal tree-building is NP-complete – Typically greedy, top-down • Under/over-fitting – Occam’s Razor vs. CC/SSN • Pruning, ensemble methods • Splitting metric – Information gain , gain ratio, Gini impurity Supervised Learning via Decision Trees March 27, 2017 6

  7. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky I terative D ichotomiser 3 • Invented by Ross Quinlan in 1986 – Precursor to C4.5/5 • Categorical data only (can’t split on numbers) • Greedily consumes features – Subtrees cannot reconsider previous feature(s) for further splits – Typically produces shallow trees Supervised Learning via Decision Trees March 27, 2017 7

  8. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A • branch = ID3( attributes - {A} ) Classification Regression • “same” = same class • “same” = std. dev. < ε • f (examples) = majority • f (examples) = average • “best” = information gain • “best” = std. dev. reduction Now! http://www.saedsayad.com/decision_tree_reg.htm Supervised Learning via Decision Trees March 27, 2017 8

  9. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Shannon Entropy • Measure of “impurity” or uncertainty • Intuition: the less likely the event, the more information is transmitted Supervised Learning via Decision Trees March 27, 2017 9

  10. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Entropy Range Small Large Supervised Learning via Decision Trees March 27, 2017 10

  11. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Quantifying Entropy H ( X ) = E [ I ( X )] Expected value of information X Z P ( x i ) I ( x i ) P ( x ) I ( x ) dx i Supervised Learning via Decision Trees March 27, 2017 11

  12. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Intuition for Information I ( X ) = . . . I ( X ) ≥ 0 • Shouldn’t be negative • Events that always occur I (1) = 0 communicate no information • Information from independent I ( X 1 , X 2 ) = events are additive I ( X 1 ) + I ( X 2 ) Supervised Learning via Decision Trees March 27, 2017 12

  13. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Quantifying Information 1 I ( X ) = log b P ( X ) = − log b P ( X ) Log Base = Units: 2=bit ( bi nary digi t ), 3=trit, e=nat X H ( X ) = − P ( x i ) log b P ( x i ) i Log Base = Units: 2=shannon/bit Supervised Learning via Decision Trees March 27, 2017 13

  14. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Example: Fair Coin Toss I(heads) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit I(tails) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit H(fair toss) = (0 . 5)(1) + (0 . 5)(1) = = 1 shannon Supervised Learning via Decision Trees March 27, 2017 14

  15. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Example: Double Headed Coin H (double head) = (1) · I (head) = (1) · log 2 (1 1) = (1) · (0) = 0 shannons Supervised Learning via Decision Trees March 27, 2017 15

  16. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Exercise: Weighted Coin Compute the entropy of a coin that will land on heads about 25% of the time, and tails the remaining 75%. Supervised Learning via Decision Trees March 27, 2017 16

  17. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Answer H (weighted toss) = (0 . 25) · I (head) + (0 . 75) · I (tails) 1 1 = (0 . 25) · log 2 0 . 25 + (0 . 75) · log 2 0 . 75 = 0 . 81 shannons Supervised Learning via Decision Trees March 27, 2017 17

  18. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Entropy vs. P Supervised Learning via Decision Trees March 27, 2017 18

  19. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Exercise Calculate the entropy of the following data Supervised Learning via Decision Trees March 27, 2017 19

  20. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Answer H (data) = 16 30 · I (green circle) + 14 30 · I (purple cross) = 16 30 16 + 14 30 30 · log 2 30 · log 2 14 = 0 . 99679 shannons Supervised Learning via Decision Trees March 27, 2017 20

  21. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Bounds on Entropy H ( X ) ≥ 0 H ( X ) = 0 ⇐ ⇒ ∃ x ∈ X ( P ( x ) = 1) H b ( X ) ≤ log b ( |X| ) |X| denotes the number of elements in the range of X H b ( X ) = log b ( |X| ) ⇐ ⇒ X has a uniform distribution over |X| Supervised Learning via Decision Trees March 27, 2017 21

  22. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Information Gain To use entropy for a splitting metric, we consider the information gain of an action as the resulting change in entropy IG( T, a ) = H( T ) − H( T | a ) | T i | X = H( T ) − | T | H( T i ) i Average Entropy of the children Supervised Learning via Decision Trees March 27, 2017 22

  23. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Example Split { 4 17 , 13 17 } { 12 13 , 1 13 } { 16 30 , 14 30 } Supervised Learning via Decision Trees March 27, 2017 23

  24. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Example Information Gain H 1 = 4 17 4 + 13 17 17 log 2 17 log 2 13 ∼ 0 . 79 H 2 = 12 13 12 + 1 13 13 log 2 13 log 2 1 ∼ 0 . 39 IG = H( T ) − (17 30H 1 + 13 30H 2 ) = 0 . 99679 − 0 . 62 = 0 . 38 shannons Supervised Learning via Decision Trees March 27, 2017 24

  25. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky Exercise Consider the following dataset. Compute the information gain for each of the non-target attributes. Decide which attribute is the best to split on. X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 27, 2017 25

  26. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky H(C) H ( C ) = − (0 . 5) log 2 0 . 5 − (0 . 5) log 2 0 . 5 = 1 shannon X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 27, 2017 26

  27. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky IG(C,X) H(C | X) = 3 4[2 2 + 1 3 1] + 1 3 3 log 2 3 log 2 4[0] = 0 . 689 shannons IG(C , X) = 1 − 0 . 689 = 0 . 311 shannons X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 27, 2017 27

  28. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky IG(C,Y) H ( C | Y ) = 1 2[0] + 1 2[0] = 0 shannons IG( C, Y ) = 1 − 0 = 1 shannon X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 27, 2017 28

  29. Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Spring 2017 | Derbinsky IG(C,Z) H ( C | Y ) = 1 2[1] + 1 2[1] = 1 shannons IG( C, Z ) = 1 − 1 = 0 shannons X Y Z Class 1 1 1 A 1 1 0 A 0 0 1 B 1 0 0 B Supervised Learning via Decision Trees March 27, 2017 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend