Outline Univariate Trees 1 Decision Trees Classification - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Univariate Trees 1 Decision Trees Classification - - PowerPoint PPT Presentation

Univariate Trees Rule Extraction Multivariate Trees Univariate Trees Rule Extraction Multivariate Trees Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil Old Dominion Univ. Rule Extraction 2


slide-1
SLIDE 1

Univariate Trees Rule Extraction Multivariate Trees

Decision Trees

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Univariate Trees Rule Extraction Multivariate Trees

Outline

1

Univariate Trees Classification Regression Pruning

2

Rule Extraction

3

Multivariate Trees

2 Univariate Trees Rule Extraction Multivariate Trees

Decision Trees

3 Univariate Trees Rule Extraction Multivariate Trees

Tree Structure

Internal nodes: split an attribute (univariate) or multiple attributes (multivariate)

Numeric xi: relational split xi > wm Discrete xi: n-way split on all possible values

Leaves

Classification: class label or labels with proportions Regression: r average or local fit

4

slide-2
SLIDE 2

Univariate Trees Rule Extraction Multivariate Trees

Classification Trees

For a node m, let Nm be # of training instances that reach m, of which Ni

m are in class Ci.

ˆ P(Ci| x, m) ≡ pi

m = Ni m

Nm Node m is pure if pi

m is 0 or 1

Measure of purity is “entropy” Variants: ID3, CART, C4.5

5 Univariate Trees Rule Extraction Multivariate Trees

Entropy

Shannon’s entropy is a measure whose expected value is the # of bits required to encode a message

Alteratively, the amount of information we are missing if we don’t know the message

Represents a limit on what can be achived with lossless compression

6 Univariate Trees Rule Extraction Multivariate Trees

Why Entropy?

Information content of x ∈ {v1, ..., vn} I(x) = − log p(x) Log (base 2) reflects idea of encoding as a bit string. Entropy I(§) ≡ E[I(§)] = −

  • √(§) log √(§)

Why “entropy” rather than “expected information content”? This formula resembles one from physics: H(x) = −kB

  • i

pi ln pi and, like the original entropy, information loss is minimized as

  • rder increases.

7 Univariate Trees Rule Extraction Multivariate Trees

Entropy and Tree Nodes

Entropy is a measure of the impurity of a node If all instance that reach a node are in the same class, then we lose no infromation by not asking which instances actually brought us here. By contrast, if the instances reaching this node are evenly distributed among the classes, we learn nothing by coming here that we did not know in the step before.

8

slide-3
SLIDE 3

Univariate Trees Rule Extraction Multivariate Trees

Greedy Splitting

If node m is pure, generate a leaf and stop. Otherwise split and continue recursively Impurity after the split: Suppose Nmj of the Nm take branch j and that Ni

mj of these belong to Ci

ˆ P(Ci| x, m, j) ≡ pi

mj =

Ni

mj

Nmj I ′

m = − n

  • j=1

Ni

mj

Nmj

K

  • i=1

pi

mj log2 pi mj

Select the variable and split that minimizes impurity

For numeric variables, include choices of split positions

9 Univariate Trees Rule Extraction Multivariate Trees

GenerateTree(X)

if I(X) < θI then Create leaf labelled by majority class in X else i ← SplitAttribute(X) for all branches of xi do Find Xi falling in branch Generatetree(Xi) end for end if

10 Univariate Trees Rule Extraction Multivariate Trees

SplitAttribute(X)

MinEnt ← ∞ for all attributes i = 1, . . . , d do if xi is discrete with n values then Split X into X1, . . . , Xn by xi e ← SplitEntropy(X1, . . . , Xn) if e < MinEnt then MinEnt ← e bestf ← i end if else for all possible splits of numeric attribute do Split X into X1, X2 by xi e ← SplitEntropy(X1, X2) if e < MinEnt then MinEnt ← e bestf ← i end if end for end if end for return bestf 11 Univariate Trees Rule Extraction Multivariate Trees

Regression Trees

Let bm( x) = 1 if x ∈ Xm

  • therwise

Error at node m: Em = 1 Nm

  • t

(rt − gm)2bm( xt) gm =

  • t bm(

xt)rt

  • t bm(

xt) After splitting E ′

m

= 1 Nm

  • j
  • t

(rt − gmj)2bmj( xt) gmj =

  • t bmj(

xt)rt

  • t bmj(

xt)

12

slide-4
SLIDE 4

Univariate Trees Rule Extraction Multivariate Trees

Examples

13 Univariate Trees Rule Extraction Multivariate Trees

Pruning

Pre-pruning: Stop generating nodes when number of training instances reaching a node is small (e.g., 5% of trianing set) Post-pruning: Grow full tree then prune subtrees that overfit on trianing set use a set-aside pruning set Replace subtrees by a leaf and eval on pruning set.

If leaf performs as well as subtree, keep the leaf.

Example: Replace lowest node (x < 6.31) in third tree by leaf with 0.9

14 Univariate Trees Rule Extraction Multivariate Trees

Rule Extraction

Trees can be interpreted as programming instructions: R1: IF (age > 38.5) AND years-in-job>2.5) then y=0.8 R2: IF (age > 38.5) AND years-in-job <= 2.5) then y=0.6 R3: IF (age <= 38.5) AND job-type=’A’) then y=0.4 R4: IF (age <= 38.5) AND job-type=’B’) then y=0.3 R5: IF (age <= 38.5) AND job-type=’C’) then y=0.2 Called a rule base.

15 Univariate Trees Rule Extraction Multivariate Trees

Multivariate Trees

16