decision trees
play

Decision Trees 2-26-16 Reading Quiz Decision trees are an - PowerPoint PPT Presentation

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning task? a) clustering b) dimensionality reduction c) classification d) regression Reading Quiz Which error metric is most appropriate for


  1. Decision Trees 2-26-16

  2. Reading Quiz Decision trees are an algorithm for which machine learning task? a) clustering b) dimensionality reduction c) classification d) regression

  3. Reading Quiz Which error metric is most appropriate for evaluating a {0,1} classification task? a) worst-case error b) sum of squares error c) entropy d) precision and recall

  4. Terminology Learning a model: ● input = training examples ○ feature = dimension ● output = model = hypothesis Using a learned model: ● input = test example ● output = class = label = target

  5. Decision trees setting Supervised learning Classification Input: can be continuous or discrete Output: must be discrete can be {0,1} or multi-class

  6. What type of model are we building? Should I play tennis? Should I read a Reddit post? Who plays tennis when it’s raining but not when it’s humid?

  7. How do we build such a model? Modeling questions: ● How many decision nodes should there be? ● At each node, what feature should we split on? ● For each such feature, how should we split it? ○ This is trivial if the feature is boolean. Bad idea: generate all possible trees and test how well they work. Better idea: build the tree incrementally.

  8. Building the tree incrementally. Within a region, pick the best: ● feature to split on ● value at which to split it elevation Sort the training data into the sub-regions. Recursively build decision trees for the sub-regions. $ / ft 2

  9. Picking the best split Try all features. Try all possible splits of that feature. If feature F is ______, there are ________ possible splits to consider. ● binary ... one ● discrete and ordered ... | F | - 1 discrete and unordered … 2 | F | - 1 - 1 (two options for where to put each value) ● ● continuous … | training set | - 1 (any split between two points is the same) | F | is indicating the number of possible values a discrete feature can take on.

  10. Discussion question: Can we do better? Try all features. Try all possible splits of that feature. If feature F is ______, there are ________ possible splits to consider. ● binary ... one ● discrete and ordered ... | F | - 1 discrete and unordered … 2 | F | - 1 - 1 (two options for where to put each value) ● ● continuous … | training set | - 1 (any split between two points is the same) Ordered or continuous cases: binary search. Unordered case: local search.

  11. How do we pick the best split? Key idea: minimize entropy. ∑ e ∈ E ∑ Y ∈ T [val(e,Y) * log pval(e,Y) + (1-val(e,Y)) * log (1-pval(e,Y))] e: training example val(e,Y): true value Y: target feature pval(e,y): predicted value

  12. Entropy: alternative explanation ● S is a collection of positive and negative examples ● Pos: proportion of positive examples in S ● Neg: proportion of negative examples in S Entropy(S) = -Pos * log2(Pos) - Neg * log2(Neg) ● Entropy is 0 when all members of S belong to the same class, for example when Pos=1 and Neg=0 ● Entropy is 1 when S contains an equal number of positive and negative examples, when Pos=1⁄2 and Neg=1⁄2

  13. When do we stop splitting? Bad idea: when every training point is classified correctly. Why is this a bad idea? Better idea: maximum depth, or minimum number of points in a region.

  14. Exercise: build a decision tree. HI CS BR FF EA RS TB AC MX IM SC ES SS CR DF SA R n y n y y y n n n y ? y y y n y R n y n y y y n n n n n y y y n ? D ? y y ? y y n n n n y n y y n n D n y y n ? y n n n n y n y n n y D y y y n y y n n n n y ? y y y y D n y y n y y n n n n n n y y y y D n y n y y y n n n n n n ? y y y R n y n y y y n n n n n n y y ? y R n y n y y y n n n n n y y y n y D y y y n n n y y y n n n n n ? ? R n y n y y n n n n n ? ? y y n n R n y n y y y n n n n y ? y y ? ? D n y y n n n y y y n n n y n ? ? D y y y n n y y y ? y y ? n n y ? R n y n y y y n n n n n y ? ? n ? R n y n y y y n n n y n y y ? n ? D y n y n n y n y ? y y y ? n n y D y ? y n n n y y y n n n y n y y R n y n y y y n n n n n ? y y n n D y y y n n n y y y n y n n n y y

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend