Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto - - PowerPoint PPT Presentation

decision trees id3
SMART_READER_LITE
LIVE PREVIEW

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto - - PowerPoint PPT Presentation

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto Nordander 2 Pierre Nugues 3 1 Department of Computer Science Lunds University 2 Department of Computer Science Lunds University 3 Department of Computer Science Lunds University


slide-1
SLIDE 1

Decision Trees ID3

A Python implementation Daniel Pettersson1 Otto Nordander2 Pierre Nugues3

1Department of Computer Science

Lunds University

2Department of Computer Science

Lunds University

3Department of Computer Science

Lunds University Supervisor

EDAN70, 2017

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 1 / 12

slide-2
SLIDE 2

Outline

1

Introduction Decision trees Scikit-learn

2

ID3 Features of ID3

3

Scikit-Learn Current state Integration and API Scikit-learn-contrib

4

ID3 and our extensions Extensions

5

Current state of our work Demo and Usage

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 2 / 12

slide-3
SLIDE 3

Introduction

Decision trees

Decision trees Easy to explain. More closely relates to human decision-making than other machine learning approaches. Trees can be displayed in an easy to understand manner. Gives a basic understanding of data. Often less accurate predictions but very fast.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 3 / 12

slide-4
SLIDE 4

Introduction

Scikit-learn

Scikit-learn Very popular toolbox for machine learning. Scriptable and easy to integrate (fit, predict). Written with NumPy SciPy. No support for decision tree with nominal values.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 4 / 12

slide-5
SLIDE 5

ID3

Features of ID3

Categorical values Entropy/Information gain Nodes can have several children

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 5 / 12

slide-6
SLIDE 6

Current state

Scikit learn

Scikit learn

CART Only numerical, no nominal values. No post-pruning.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 6 / 12

slide-7
SLIDE 7

Integration and API

Scikit learn

Scikit learn linear regression

regr = linear model.LinearRegression() regr.fit(data, target) regr.predict(data test)

Scikit learn decision tree

clf = tree.DecisionTreeClassifier() clf.fit(data, target) clf.predict(data test)

Our decision tree

clf = id3.Id3Estimator() clf.fit(data, target) clf.predict(data test)

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 7 / 12

slide-8
SLIDE 8

Scikit-learn-contrib

Compatible with Scikit-learn. Enforcing standards on code and documentation. Deploy to PyPI.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 8 / 12

slide-9
SLIDE 9

Extensions

ID3 and our extensions

Gain ratio

IV (Ex, a) = −

  • v∈values(a)

|{x ∈ Ex|value(x, a) = v}| |Ex| · log2

  • |{x ∈ Ex|value(x, a) = v}|

|Ex|

  • (1)

IGR(Ex, a) = IG/IV (2)

Where IG is Information Gain and IV is Intrinsic Value. Gain ratio is used in place of information gain to reduce bias towards features that have many possible values.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 9 / 12

slide-10
SLIDE 10

Extensions

ID3 and our extensions

Pre-pruning

Min samples split Max depth

Post-pruning

Split data into test and training. Transform feature nodes to classifying nodes. If new test error is lower keep the transformation (prune).

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 10 / 12

slide-11
SLIDE 11

Extensions

ID3 and our extensions

Numerical values. Multiclass classification. Reuse features.

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 11 / 12

slide-12
SLIDE 12

Current state of our work

Demo and Usage

Demo... Github - https://github.com/svaante/decision-tree-id3/ PyPI - pip install decision-tree-id3 Scikit-learn-contrib

Daniel Pettersson, Otto Nordander, Pierre Nugues (Lunds University) Decision Trees ID3 EDAN70, 2017 12 / 12