Agenda Background Techniques Example Applications Summary 2 1 - - PDF document

agenda
SMART_READER_LITE
LIVE PREVIEW

Agenda Background Techniques Example Applications Summary 2 1 - - PDF document

12/1/2011 Emil Brissman & Kajsa Eriksson 2011-12-07 1 Agenda Background Techniques Example Applications Summary 2 1 12/1/2011 Background: The problem Decision trees: Need to have low prediction error


slide-1
SLIDE 1

12/1/2011 1

Emil Brissman & Kajsa Eriksson 2011-12-07

1

Agenda

 Background  Techniques  Example  Applications  Summary

2

slide-2
SLIDE 2

12/1/2011 2

Background: The problem

 Decision trees:  Need to have low prediction error  Should not be over fitted to training data  Should generally have low complexity, for

interpretation purposes

3

Background: Existing solutions

 Pruning:  Post-process to reduce over fitting and

decision tree complexity

 Risk of the tree becoming under fitted  Boosting:  Reduces prediction error by applying a

series of separate classifications and then combines them

 Complexity of the tree increases drastically

4

slide-3
SLIDE 3

12/1/2011 3

Background: Grafting

 To prove that a more complex tree could

have a lower prediction error without being over fitted to training data

 The idea is to reclassify regions of the

instance space without training data or with just misclassified data

 Reclassification results in a higher

probability of rightly classifying data that fall into empty regions

5

Techniques (1/2)

 There are four algorithms for grafting that

are built upon each other

 They are made as a post process for the

C4.5 classification technique

 C4.5X is the first algorithm which was

developed just to test the theory of grafting

 C4.5+ is a formal grafting algorithm

developed because of the success of C4.5X

6

slide-4
SLIDE 4

12/1/2011 4

Techniques (2/2)

 C4.5++ is a further development that is

proved to not produce over fitting in the

  • tree. In other words it balances the bias

and variance of the tree.

 C4.5A is the fourth and final algorithm

which is a performance update from previous ones. By considering a smaller set of data the computational time is reduced.

7

Example (1/4)

 Classification of instance space after

C4.5 algorithm:

A A B ◊ * * A <= 7 A > 7 A > 2 A <= 2 B > 5 B <= 5

8

slide-5
SLIDE 5

12/1/2011 5

Example (2/4)

 The blue region is a leaf in the deduced

decision tree that C4.5 classified as *.

 But what is really the most likely class

for the area marked with ?

 By applying grafting as a post process a

new prediction for the area can be made

9

Example (3/4)

 Step 1: For each leaf the algorithm visits

all ancestor nodes. It tries to find possible cuts that split the leaf region.

 Step 2: It chooses those cuts that have

the highest Laplacian accuracy estimate

 Laplace: (P+1) / (T+2)

○ T – number of instances below certain ancestor ○ P – number of instances of majority class below

same ancestor

10

slide-6
SLIDE 6

12/1/2011 6

Example (4/4)

 Step 3: The best supported cuts are

introduced in the decision tree as new branches and leaves with a more likely class

 Result: 3 new leaves  a - ◊  b - *  c - ◊  The region with the ?

now belongs to class ◊

11

Applications

 Grafting as a post-process to C4.5 is

implemented in Weka as J48graft

12

slide-7
SLIDE 7

12/1/2011 7

Summary

 Grafting is a post-process that successfully

reduces the prediction error of a decision tree by re-evaluating areas of the instance space where no training data exists

 It is proved that the increased complexity of

a grafted tree does not mean that the tree is more over fitted

 Grafting together with pruning most often

gives even better results. Probably because the algorithms complement each

  • ther.

13

Bibliography

Kumar, V., Steinbach, M. & Tan, P.-N. (2006). Introduction to Data Mining. Pearson College Div.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Los Altos: Morgan Kaufmann.

University of Waikato. Weka 3: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/index.html [2011-12-01]

Webb, G.I. (1996). Further Experimental Evidence against the Utility of Occam's Razor. Journal of Artificial Intelligence Research, vol. 4, pp. 397-417.

Webb, G.I (1997). Decision Tree Grafting. Learning, IJCAI’97 Proceedings of the Fifteenth international joint conference on Artificial intelligence, vol. 2, pp. 846-85.

Webb, G.I. (1999). Decision Tree Grafting From the All-Test- But-One Partition. Machine Learning, IJCAI '99 Proceedings of the Sixteenth international joint conference on Artificial intelligence, vol. 2, pp. 702-707.

14