decision trees id3

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto - PowerPoint PPT Presentation

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto Nordander 2 Pierre Nugues 3 1 Department of Computer Science Lunds University 2 Department of Computer Science Lunds University 3 Department of Computer Science Lunds University


  1. Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto Nordander 2 Pierre Nugues 3 1 Department of Computer Science Lunds University 2 Department of Computer Science Lunds University 3 Department of Computer Science Lunds University Supervisor EDAN70, 2017 Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 1 / 12

  2. Outline Introduction 1 Decision trees Scikit-learn ID3 2 Features of ID3 Scikit-Learn 3 Current state Integration and API Scikit-learn-contrib ID3 and our extensions 4 Extensions Current state of our work 5 Demo and Usage Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 2 / 12

  3. Introduction Decision trees Decision trees Easy to explain. More closely relates to human decision-making than other machine learning approaches. Trees can be displayed in an easy to understand manner. Gives a basic understanding of data. Often less accurate predictions but very fast. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 3 / 12

  4. Introduction Scikit-learn Scikit-learn Very popular toolbox for machine learning. Scriptable and easy to integrate (fit, predict). Written with NumPy SciPy. No support for decision tree with nominal values. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 4 / 12

  5. ID3 Features of ID3 Categorical values Entropy/Information gain Nodes can have several children Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 5 / 12

  6. Current state Scikit learn Scikit learn CART Only numerical, no nominal values. No post-pruning. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 6 / 12

  7. Integration and API Scikit learn Scikit learn linear regression regr = linear model.LinearRegression() regr.fit(data, target) regr.predict(data test) Scikit learn decision tree clf = tree.DecisionTreeClassifier() clf.fit(data, target) clf.predict(data test) Our decision tree clf = id3.Id3Estimator() clf.fit(data, target) clf.predict(data test) Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 7 / 12

  8. Scikit-learn-contrib Compatible with Scikit-learn. Enforcing standards on code and documentation. Deploy to PyPI. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 8 / 12

  9. Extensions ID3 and our extensions Gain ratio � � |{ x ∈ Ex | value ( x , a ) = v }| |{ x ∈ Ex | value ( x , a ) = v }| � IV ( Ex , a ) = − · log 2 (1) | Ex | | Ex | v ∈ values ( a ) IGR ( Ex , a ) = IG / IV (2) Where IG is Information Gain and IV is Intrinsic Value. Gain ratio is used in place of information gain to reduce bias towards features that have many possible values. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 9 / 12

  10. Extensions ID3 and our extensions Pre-pruning Min samples split Max depth Post-pruning Split data into test and training. Transform feature nodes to classifying nodes. If new test error is lower keep the transformation (prune). Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 10 / 12

  11. Extensions ID3 and our extensions Numerical values. Multiclass classification. Reuse features. Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 11 / 12

  12. Current state of our work Demo and Usage Demo... Github - https://github.com/svaante/decision-tree-id3/ PyPI - pip install decision-tree-id3 Scikit-learn-contrib Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 12 / 12

Recommend


More recommend