decision tree learning
play

Decision tree learning Aim: find a small tree consistent with the - PowerPoint PPT Presentation

Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose most significant attribute as root of (sub)tree function DTL ( examples, attributes, default ) returns a decision tree if examples


  1. Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose “most significant” attribute as root of (sub)tree function DTL ( examples, attributes, default ) returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return Mode ( examples ) else best ← Choose-Attribute ( attributes , examples ) tree ← a new decision tree with root test best for each value v i of best do examples i ← { elements of examples with best = v i } subtree ← DTL ( examples i , attributes − best , Mode ( examples )) add a branch to tree with label v i and subtree subtree return tree Chapter 18, Sections 1–3 24

  2. Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) “all positive” or “all negative” Patrons? Type? None Some Full French Italian Thai Burger Patrons ? is a better choice—gives information about the classification Chapter 18, Sections 1–3 25

  3. Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? Chapter 18, Sections 1–3 26

  4. Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? � n ♦ Information: I ( P ( v 1 ) , ...P ( v n )) = i =1 − P ( v i ) log 2 P ( v i ) ♦ I (1 , 0) = 0 bit ♦ I (0 . 5 , 0 . 5) = − 0 . 5 × log 2 0 . 5 − 0 . 5 × log 2 0 . 5 = 1 bit Chapter 18, Sections 1–3 27

  5. Information Theory ♦ Consider communicating two messages (A and B) between two parties ♦ Bits are used to measure message size ♦ If P ( A ) = 1 and P ( B ) = 0 , how many bits are needed? ♦ If P ( A ) = . 5 and P ( B ) = . 5 , how many bits are needed? � n ♦ Information: I ( P ( v 1 ) , ...P ( v n )) = i =1 − P ( v i ) log 2 P ( v i ) ♦ I (1 , 0) = 0 bit ♦ I (0 . 5 , 0 . 5) = − 0 . 5 × log 2 0 . 5 − 0 . 5 × log 2 0 . 5 = 1 bit ♦ I measures the information content for communication (or uncertainty in what is already known) ♦ The more one knows, the less to be communicated, the smaller is I ♦ The less one knows, the more to be communicated, the larger is I Chapter 18, Sections 1–3 28

  6. Using Information Theory ♦ ( P ( pos ) , P ( neg )) : probabilities of positive and negative ♦ Attribute color : black (1,0), white (0,1) ♦ Attribute size : large (.5,.5), small (.5,.5) Chapter 18, Sections 1–3 29

  7. Using Information Theory ♦ ( P ( pos ) , P ( neg )) : probabilities of positive and negative ♦ Attribute color : black (1,0), white (0,1) ♦ Attribute size : large (.5,.5), small (.5,.5) ♦ Before selecting an attribute • p = number of positive examples, n = number of negative examples p n • Estimating probabilities: P ( pos ) = p + n , P ( neg ) = p + n • Before () = I ( P ( pos ) , P ( neg )) Chapter 18, Sections 1–3 30

  8. Selecting an Attribute ♦ Evaluating an attribute (e.g., color) • p i = number of positive examples for value i (e.g., black), n i = number of negative ones p i n i • Estimating probabilities for value i : P i ( pos ) = p i + n i , P i ( neg ) = p i + n i • v values for attribute A (e.g., 2 for color) p i + n i � v • Remainder ( A ) = After ( A ) = p + n I ( P i ( pos ) , P i ( neg )) [expected i =1 information] Chapter 18, Sections 1–3 31

  9. Selecting an Attribute ♦ Evaluating an attribute (e.g., color) • p i = number of positive examples for value i (e.g., black), n i = number of negative ones p i n i • Estimating probabilities for value i : P i ( pos ) = p i + n i , P i ( neg ) = p i + n i • v values for attribute A (e.g., 2 for color) p i + n i � v • Remainder ( A ) = After ( A ) = p + n I ( P i ( pos ) , P i ( neg )) [expected i =1 information] ♦ “Information Gain” (reduction in uncertainty of what is known) • Gain ( A ) = Before () − After ( A ) [ Before () has more uncertainty] • Choose attribute A with the largest Gain ( A ) Chapter 18, Sections 1–3 32

  10. Example contd. Decision tree learned from the 12 examples: Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger Fri/Sat? T T F No Yes F T Substantially simpler than “true” tree—a more complex hypothesis isn’t jus- tified by small amount of data Chapter 18, Sections 1–3 33

  11. Performance measurement How do we know that h ≈ f ? How about measuring the accuracy of h on the examples that were used to learn h ? Chapter 18, Sections 1–3 34

  12. Performance measurement How do we know that h ≈ f ? (Hume’s Problem of Induction ) 1. Use theorems of computational/statistical learning theory 2. Try h on a new test set of examples • use same distribution over example space as training set • divide into two disjoint subsets: training and test sets • prediction accuracy: accuracy on the (unseen) test set Chapter 18, Sections 1–3 35

  13. Performance measurement Learning curve = % correct on test set as a function of training set size 1 % correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 10 20 30 40 50 60 70 80 90100 Training set size Chapter 18, Sections 1–3 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend