Decision Trees 2-26-16 Reading Quiz Decision trees are an - - PowerPoint PPT Presentation

decision trees
SMART_READER_LITE
LIVE PREVIEW

Decision Trees 2-26-16 Reading Quiz Decision trees are an - - PowerPoint PPT Presentation

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning task? a) clustering b) dimensionality reduction c) classification d) regression Reading Quiz Which error metric is most appropriate for


slide-1
SLIDE 1

Decision Trees

2-26-16

slide-2
SLIDE 2

Reading Quiz

Decision trees are an algorithm for which machine learning task? a) clustering b) dimensionality reduction c) classification d) regression

slide-3
SLIDE 3

Which error metric is most appropriate for evaluating a {0,1} classification task? a) worst-case error b) sum of squares error c) entropy d) precision and recall

Reading Quiz

slide-4
SLIDE 4

Terminology

Learning a model:

  • input = training examples

○ feature = dimension

  • utput = model = hypothesis

Using a learned model:

  • input = test example
  • utput = class = label = target
slide-5
SLIDE 5

Decision trees setting

Supervised learning Classification Input: can be continuous or discrete Output: must be discrete can be {0,1} or multi-class

slide-6
SLIDE 6

What type of model are we building?

Should I play tennis? Should I read a Reddit post? Who plays tennis when it’s raining but not when it’s humid?

slide-7
SLIDE 7

How do we build such a model?

Modeling questions:

  • How many decision nodes should there be?
  • At each node, what feature should we split on?
  • For each such feature, how should we split it?

○ This is trivial if the feature is boolean.

Bad idea: generate all possible trees and test how well they work. Better idea: build the tree incrementally.

slide-8
SLIDE 8

Within a region, pick the best:

  • feature to split on
  • value at which to split it

Sort the training data into the sub-regions. Recursively build decision trees for the sub-regions.

Building the tree incrementally.

$ / ft2 elevation

slide-9
SLIDE 9

Picking the best split

Try all features. Try all possible splits of that feature. If feature F is ______, there are ________ possible splits to consider.

  • binary ... one
  • discrete and ordered ... | F | - 1
  • discrete and unordered … 2| F | - 1 - 1 (two options for where to put each value)
  • continuous … | training set | - 1 (any split between two points is the same)

| F | is indicating the number of possible values a discrete feature can take on.

slide-10
SLIDE 10

Discussion question: Can we do better?

Try all features. Try all possible splits of that feature. If feature F is ______, there are ________ possible splits to consider.

  • binary ... one
  • discrete and ordered ... | F | - 1
  • discrete and unordered … 2| F | - 1 - 1 (two options for where to put each value)
  • continuous … | training set | - 1 (any split between two points is the same)

Ordered or continuous cases: binary search. Unordered case: local search.

slide-11
SLIDE 11

How do we pick the best split?

Key idea: minimize entropy. ∑e∈E ∑Y∈T [val(e,Y) * log pval(e,Y) + (1-val(e,Y)) * log (1-pval(e,Y))] e: training example val(e,Y): true value Y: target feature pval(e,y): predicted value

slide-12
SLIDE 12

Entropy: alternative explanation

  • S is a collection of positive and negative examples
  • Pos: proportion of positive examples in S
  • Neg: proportion of negative examples in S

Entropy(S) = -Pos * log2(Pos) - Neg * log2(Neg)

  • Entropy is 0 when all members of S belong to the same class, for example

when Pos=1 and Neg=0

  • Entropy is 1 when S contains an equal number of positive and negative

examples, when Pos=1⁄2 and Neg=1⁄2

slide-13
SLIDE 13

When do we stop splitting?

Bad idea: when every training point is classified correctly. Why is this a bad idea? Better idea: maximum depth, or minimum number of points in a region.

slide-14
SLIDE 14

HI CS BR FF EA RS TB AC MX IM SC ES SS CR DF SA R n y n y y y n n n y ? y y y n y R n y n y y y n n n n n y y y n ? D ? y y ? y y n n n n y n y y n n D n y y n ? y n n n n y n y n n y D y y y n y y n n n n y ? y y y y D n y y n y y n n n n n n y y y y D n y n y y y n n n n n n ? y y y R n y n y y y n n n n n n y y ? y R n y n y y y n n n n n y y y n y D y y y n n n y y y n n n n n ? ? R n y n y y n n n n n ? ? y y n n R n y n y y y n n n n y ? y y ? ? D n y y n n n y y y n n n y n ? ? D y y y n n y y y ? y y ? n n y ? R n y n y y y n n n n n y ? ? n ? R n y n y y y n n n y n y y ? n ? D y n y n n y n y ? y y y ? n n y D y ? y n n n y y y n n n y n y y R n y n y y y n n n n n ? y y n n D y y y n n n y y y n y n n n y y

Exercise: build a decision tree.