Keep the Decision Tree and Estimate the Class Probabilities Using its - PDF document

Keep the Decision Tree and Estimate the Class Probabilities Using its Decision Boundary Isabelle Alvarez (1 , 2) Stephan Bernard (2) Guillaume Deffuant (2) (1) LIP6, Paris VI University (2) Cemagref, LISC (2) Cemagref, LISC 4 place Jussieu F-63172 Aubiere, France 63172 Aubiere, France 75005 Paris, France stephan.bernard@cemagref.fr guillaume.deffuant@cemagref.fr isabelle.alvarez@lip6.fr Abstract (except smoothing) induce a drastic change in the fundamen- tal properties of the tree: Either the structure of the tree as a This paper proposes a new method to estimate the model is modified, or its main objective, or its intelligibility. class membership probability of the cases classified The method we propose here aims at improving the class by a Decision Tree. This method provides smooth probability estimate without modifying the tree itself, in or- class probabilities estimate, without any modifica- der to preserve its intelligibility and other use. Besides the tion of the tree, when the data are numerical. It ap- attributes of the cases, we consider a new feature, the dis- plies a posteriori and doesn’t use additional train- tance from the decision boundary induced by the DT (the ing cases. It relies on the distance to the deci- boundary of the inverse image of the different class labels). sion boundary induced by the decision tree. The We propose to use this new feature (which can be seen as distance is computed on the training sample. It the margin of the DT) to estimate the posterior probabilities, is then used as an input for a very simple one- as we expect the class membership probability to be closely dimension kernel-based density estimator, which related to the distance from the decision boundary. It is the provides an estimate of the class membership prob- case for other geometric methods, like Support Vector Ma- ability. This geometric method gives good results chines (SVM). A SVM defines a unique hyperplane in the even with pruned trees, so the intelligibility of the feature space to classify the data (in the original input space tree is fully preserved. the corresponding decision boundary can be very complex). The distance from this hyperplane can be used to estimate the posterior probabilities, see [Platt, 2000] for the details in the 1 Introduction two-class problem. In the case of DT, the decision boundary Decision Tree (DT) algorithms are very popular and widely consists in several pieces of hyperplanes instead of a unique used for classification purpose, since they provide relatively hyperplane. We propose to compute the distance to this deci- easily an intelligible model of the data, contrary to other sion boundary for the training cases. Adapting an idea from learning methods. Intelligibility is a very desirable property [Smyth et al. , 1995], we then train a kernel-based density es- in artificial intelligence, considering the interactions with the timator (KDE), not on the attributes of the cases but on this end-user, all the more when the end-user is an expert. On the single new feature. other hand, the end-user of a classification system needs addi- The paper is organized as follows: Section 2 discusses re- tional information rather than just the output class, in order to lated work on probability estimate for DT. Section 3 presents asses the result: This information consists generally in con- in detail the distance-based estimate of the posterior proba- fusion matrix, accuracy, specific error rates (like specificity, bilities. Section 4 reports the experiment performed on the sensitivity, likelihood ratios, including costs, which are com- numerical databases of the UCI repository, the comparison monly used in diagnosis applications). In the context of de- between the distance-based method and smoothing methods. cision aid system, the most valuable information is the class Section 5 discusses the use of geometrically defined subsets membership probability. Unfortunately, DT can only provide of the training set in order to enhance the probability estimate. piecewise constant estimates of the class posterior probabil- We make further comments about the use of the distance in ities, since all the cases classified by a leaf share the same the concluding section. posterior probabilities. Moreover, as a consequence of their main objective, which is to separate the different classes, the 2 Estimating Class Probabilities with a DT raw estimate at the leaf is highly biased. On the contrary, methods that are highly suitable for probability estimate pro- Decision Trees (DT) posterior probabilities are piecewise duce generally less intelligible models. A lot of work aims at constant over the leaves. They are also inaccurate. Thus they improving the class probability estimate at the leaf: Smooth- are of limited use (for ranking examples, or to evaluate the ing methods, specialized trees, combined methods (decision risk of the decision). This is the reason why a lot of work has tree combined with other algorithms), fuzzy methods, ensem- been done to improve the accuracy of the posterior probabili- ble methods (see section 2). Actually, most of these methods ties and to build better trees in this concern.

Keep the Decision Tree and Estimate the Class Probabilities Using its - PDF document

Keep the Decision Tree and Estimate the Class Probabilities Using its Decision Boundary Isabelle Alvarez (1 , 2) Stephan Bernard (2) Guillaume Deffuant (2) (1) LIP6, Paris VI University (2) Cemagref, LISC (2) Cemagref, LISC 4 place Jussieu

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

2018 Keep Project Moving Quickly Keep Sidewalks Open & Front Doors Accessible Keep

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

3 pictures to keep in mind 1 Tree to Excursion Trace around the tree starting from root. Go up

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Decision Tree and Automata Learning Stefan Edelkamp 1 Overview - Decision tree representation

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Team

Enterprise Ltd TEHNOROS equipment for the mining industry ABOUT Enterprise Ltd

Building a Dynamo Bridge between revit and excel Vdc Tdindustries Craig technology chappell

Spike reproducible simulation experiments with con fi guration fi le branching Jacek Chodak,

Game Programming with presented by Nathan Baur What is libGDX? Free, open source

May, 2020 Adani Ports and SEZ Limited Updated - May, 2020 Group Profile 04-07 Company Profile

Co rpo rate pre se ntatio n June 2018 Cautio nar y state me nts F o rwa rd-lo o king sta te

Product design & market responses to

FCPS TEACHER CLICK TO EDIT RETENTION MASTER TITLE STYLE Human Resources Advisory Committee

Keep the Decision Tree and Estimate the Class Probabilities Using its - PDF document

Keep the Decision Tree and Estimate the Class Probabilities Using its Decision Boundary Isabelle Alvarez (1 , 2) Stephan Bernard (2) Guillaume Deffuant (2) (1) LIP6, Paris VI University (2) Cemagref, LISC (2) Cemagref, LISC 4 place Jussieu

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

2018 Keep Project Moving Quickly Keep Sidewalks Open &amp; Front Doors Accessible Keep

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

3 pictures to keep in mind 1 Tree to Excursion Trace around the tree starting from root. Go up

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Decision Tree and Automata Learning Stefan Edelkamp 1 Overview - Decision tree representation

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Team

Enterprise Ltd TEHNOROS equipment for the mining industry ABOUT Enterprise Ltd

Building a Dynamo Bridge between revit and excel Vdc Tdindustries Craig technology chappell

Spike reproducible simulation experiments with con fi guration fi le branching Jacek Chodak,

Game Programming with presented by Nathan Baur What is libGDX? Free, open source

May, 2020 Adani Ports and SEZ Limited Updated - May, 2020 Group Profile 04-07 Company Profile

Co rpo rate pre se ntatio n June 2018 Cautio nar y state me nts F o rwa rd-lo o king sta te

Product design &amp; market responses to

FCPS TEACHER CLICK TO EDIT RETENTION MASTER TITLE STYLE Human Resources Advisory Committee

2018 Keep Project Moving Quickly Keep Sidewalks Open & Front Doors Accessible Keep

Product design & market responses to