Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. - - PowerPoint PPT Presentation

decision trees
SMART_READER_LITE
LIVE PREVIEW

Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. - - PowerPoint PPT Presentation

Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. Sutton-Charani Artificial intelligence Decision trees 1 / 47 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning


slide-1
SLIDE 1

Decision trees

PRISM - Nicolas Sutton-Charani 20/01/2020

  • N. Sutton-Charani

Artificial intelligence Decision trees 1 / 47

slide-2
SLIDE 2
  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 2 / 47

slide-3
SLIDE 3

Introduction

What is a decision tree ?

attribute J1 attribute J2 label prediction label prediction attribute J3 attribute J4 label prediction label prediction label prediction

  • N. Sutton-Charani

Artificial intelligence Decision trees 3 / 47

slide-4
SLIDE 4

Introduction

What is a decision tree ?

attribute J1 attribute J2 label prediction label prediction attribute J3 attribute J4 label prediction label prediction label prediction

  • N. Sutton-Charani

Artificial intelligence Decision trees 4 / 47

slide-5
SLIDE 5

Introduction

What is a decision tree ? → supervised learning

attribute J1 attribute J2 label prediction

values

label prediction

values values

attribute J3 attribute J4 label prediction

values

label prediction

values values

label prediction

values values

  • N. Sutton-Charani

Artificial intelligence Decision trees 5 / 47

slide-6
SLIDE 6

Introduction

A little history

!

△machine learning (or data mining) decision trees

= decision theory decision trees

  • N. Sutton-Charani

Artificial intelligence Decision trees 6 / 47

slide-7
SLIDE 7

Introduction

Types of decision trees

type of class label numerical → regression tree nominal → classification tree type of algorithm (→ structure) CART : statistics, binary tree C4.5 : computer science, small tree

  • N. Sutton-Charani

Artificial intelligence Decision trees 7 / 47

slide-8
SLIDE 8

Use of decision trees Prediction

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 8 / 47

slide-9
SLIDE 9

Use of decision trees Prediction

Classification trees

Will the badminton match take place ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 9 / 47

slide-10
SLIDE 10

Use of decision trees Prediction

Classification trees

What fruit is it ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 10 / 47

slide-11
SLIDE 11

Use of decision trees Prediction

Classification trees

What he/she come to my party ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 11 / 47

slide-12
SLIDE 12

Use of decision trees Prediction

Classification trees

Will they wait ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 12 / 47

slide-13
SLIDE 13

Use of decision trees Prediction

Classification trees

Who will win he election in this county ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 13 / 47

slide-14
SLIDE 14

Use of decision trees Prediction

Regression trees

What grade will a student get (given his homework average grade) ?

  • N. Sutton-Charani

Artificial intelligence Decision trees 14 / 47

slide-15
SLIDE 15

Use of decision trees Interpretability : Descriptive data analysis

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 15 / 47

slide-16
SLIDE 16

Use of decision trees Interpretability : Descriptive data analysis

Data analysis tool

Trees are very interpretable : attributes spaces partitioning → a tree can be resumed by its leaves which define a law mixture → wonderful collaboration tool with experts

!

△INSTABILITY ← overfitting

  • N. Sutton-Charani

Artificial intelligence Decision trees 16 / 47

slide-17
SLIDE 17

Learning of decision trees

Formalism

Learning dataset (supervised learning)    x1, y1 . . . xN, yN    =    x1

1

. . . xJ

1

y1 . . . . . . . . . x1

N

. . . xJ

N

yN    samples are assumed to be i.i.d Attributes X = (X 1, . . . , X J) ∈ X = X 1 × · · · × X J Spaces X j can be categorical or numerical Class label Y ∈ Ω = {ω1, . . . , ωK} (∈ RK for regression) Tree PH = {t1, . . . , tH}

and πh = P(th) ≈ |th|

N

with |th| = #{i : xi ∈ th}

  • N. Sutton-Charani

Artificial intelligence Decision trees 17 / 47

slide-18
SLIDE 18

Learning of decision trees

Recursive partitioning

  • N. Sutton-Charani

Artificial intelligence Decision trees 18 / 47

slide-19
SLIDE 19

Learning of decision trees

Recursive partitioning

  • N. Sutton-Charani

Artificial intelligence Decision trees 19 / 47

slide-20
SLIDE 20

Learning of decision trees

Recursive partitioning

  • N. Sutton-Charani

Artificial intelligence Decision trees 20 / 47

slide-21
SLIDE 21

Learning of decision trees

Recursive partitioning

  • N. Sutton-Charani

Artificial intelligence Decision trees 21 / 47

slide-22
SLIDE 22

Learning of decision trees

Learning principle

Start with all the dataset in the initial node Chose the best splits (on attributes) in order to get pure leaves Classification trees purity = homogeneity in term of class labels

CART → Gini impurity : i(th) =

K

  • k=1

pk(1 − pk) ID3, C4.5 → Shanon entropy : i(th) = −

K

  • k=1

pk log2(pk) whith pk = P(Y = ωk|th)

Regression trees purity = low variance of class labels

→ i(th) = Var(Y |th) =

1 |th|

  • xi ∈th

(yi − E(Y |th))2 with E(Y |th) =

1 |th|

  • xi ∈th

yi

  • N. Sutton-Charani

Artificial intelligence Decision trees 22 / 47

slide-23
SLIDE 23

Learning of decision trees

Impurity measures

  • N. Sutton-Charani

Artificial intelligence Decision trees 23 / 47

slide-24
SLIDE 24

Learning of decision trees Purity criteria

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 24 / 47

slide-25
SLIDE 25

Learning of decision trees Purity criteria

Purity criteria

leaf to split

th

Impurity measure + tree structure → criteria CART, ID3 : purity gain C4.5 : information gain ratio Regression trees CART : Variance minimisation

  • N. Sutton-Charani

Artificial intelligence Decision trees 25 / 47

slide-26
SLIDE 26

Learning of decision trees Purity criteria

Purity criteria

attribute ? prediction ?

values ?

prediction ?

values ?

th tL tR

Impurity measure + tree structure → criteria CART, ID3 : purity gain → ∆i = i(th) − πLi(tL) − πRi(tR) C4.5 : information gain ratio → IGR =

∆i H(πL,πR)

Regression trees CART : Variance minimisation → ∆i = i(th) − πLi(tL) − πRi(tR)

  • N. Sutton-Charani

Artificial intelligence Decision trees 26 / 47

slide-27
SLIDE 27

Learning of decision trees Stopping criteria

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 27 / 47

slide-28
SLIDE 28

Learning of decision trees Stopping criteria

Stopping criteria (pre-pruning)

For all leaves {th}h=1,...,H and their potential children : leaves purity : ∃k ∈ {1, . . . , K} : pk = 1 leaves and children sizes : |th| ≤ minLeafSize leaves and children weights : πh = |th|

t0 ≤ minLeafProba

leaves number : H ≥ maxNumberLeaves tree depth : depth(PH) ≥ maxDepth purity gain : ∆i ≤ minPurityGain

  • N. Sutton-Charani

Artificial intelligence Decision trees 28 / 47

slide-29
SLIDE 29

Learning of decision trees Learning algorithm

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 29 / 47

slide-30
SLIDE 30

Learning of decision trees Learning algorithm

Learning algorithm

Result: Learnt tree Start with all the learning data in an initial node (single leaf); while Stopping criteria not verified for all leaves do for each splitable leaf do compute the purity gains obtained from all possible split; end SPLIT : select the split achieving the maximum purity gain; end prune the obtained tree;

Recursive partitioning

  • N. Sutton-Charani

Artificial intelligence Decision trees 30 / 47

slide-31
SLIDE 31

Learning of decision trees Learning algorithm

ID3 - Training Examples – [9+,5-]

  • N. Sutton-Charani

Artificial intelligence Decision trees 31 / 47

slide-32
SLIDE 32

Learning of decision trees Learning algorithm

ID3 - Selecting Next Attribute

  • N. Sutton-Charani

Artificial intelligence Decision trees 32 / 47

slide-33
SLIDE 33

Learning of decision trees Learning algorithm

ID3 - Selecting Next Attribute

  • N. Sutton-Charani

Artificial intelligence Decision trees 33 / 47

slide-34
SLIDE 34

Learning of decision trees Learning algorithm

ID3 - Selecting Next Attribute

  • N. Sutton-Charani

Artificial intelligence Decision trees 34 / 47

slide-35
SLIDE 35

Learning of decision trees Learning algorithm

ID3 - Best Attribute - Outlook

  • N. Sutton-Charani

Artificial intelligence Decision trees 35 / 47

slide-36
SLIDE 36

Learning of decision trees Learning algorithm

ID3 - Ssunny

  • N. Sutton-Charani

Artificial intelligence Decision trees 36 / 47

slide-37
SLIDE 37

Learning of decision trees Learning algorithm

ID3 - Results

  • N. Sutton-Charani

Artificial intelligence Decision trees 37 / 47

slide-38
SLIDE 38

Pruning of decision trees

Overfitting

  • N. Sutton-Charani

Artificial intelligence Decision trees 38 / 47

slide-39
SLIDE 39

Pruning of decision trees

Overfitting

Remark : decision trees do not need variable selection or dimension reduction (in term of accuracy).

  • N. Sutton-Charani

Artificial intelligence Decision trees 39 / 47

slide-40
SLIDE 40

Pruning of decision trees Cost-complexity trade-off

Plan

  • 1. Introduction
  • 2. Use of decision trees

2.1 Prediction 2.2 Interpretability : Descriptive data analysis

  • 3. Learning of decision trees

3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm

  • 4. Pruning of decision trees

4.1 Cost-complexity trade-off

  • 5. Extension : Random forest
  • N. Sutton-Charani

Artificial intelligence Decision trees 40 / 47

slide-41
SLIDE 41

Pruning of decision trees Cost-complexity trade-off

Cost-Complexity Pruning

The idea trade-off between predictive efficiency and complexity find a subtree that fulfills this trade-off Metrics ’Err’ ← misclassification rate or MSE Criterion : Rα = Err + αH Steps Find a useful sequence of nested subtrees Choose the right subtree

  • N. Sutton-Charani

Artificial intelligence Decision trees 41 / 47

slide-42
SLIDE 42

Pruning of decision trees Cost-complexity trade-off

Cost-Complexity Pruning

Sequence of subtrees creation Result: sequence of trees that are all sub-trees of T0 : T0 ≫ T1 ≫ T2 ≫ T3 ≫ . . . ≫ Tk ≫ P1(initialnode) Learn the biggest tree Ts = T0 := PHmax obtained for α0 = 0 (s=0); while Ts = P1 do Ts+1 = argmin

t∈subtrees(Ts)

[Rαs(t) − Rαs(Ts)]; αs+1 = Rαs(Ts+1) − Rαs(Ts); end We get 2 bijective sets : {T0, . . . , TS} and {α0, . . . , αS} (with TS = P1) Selection : Ts∗ = argmin

Ts∈{T0,...,TS}

Err(Ts) ← pruning set or cross validation

  • N. Sutton-Charani

Artificial intelligence Decision trees 42 / 47

slide-43
SLIDE 43

Pruning of decision trees Cost-complexity trade-off

Cost-Complexity Pruning

Figure – Sequence of nested subtrees

Here, α2 < α1 = ⇒ T − T1 ⊂ T − T2

  • N. Sutton-Charani

Artificial intelligence Decision trees 43 / 47

slide-44
SLIDE 44

Extension : Random forest

Random forest

Motivation trees instability bias-variance trade-off Averaging reduces variance : Var(X) = Var(X) N (for independant predictions) → Average models to reduce model variance One problem :

  • only one training set
  • where do multiple models come from ?
  • N. Sutton-Charani

Artificial intelligence Decision trees 44 / 47

slide-45
SLIDE 45

Extension : Random forest

Bagging : Bootstrap Aggregation

Tin Kam Ho (1995) → Leo Breiman (2001) Take repeated bootstrap samples from the training set Bootstrap sampling : Given a training set D containing N examples, draw N examples at random with replacement from D. Bagging :

  • create B bootstrap samples D1, . . . , DB
  • train distinct classifier on each Db
  • classify new instance by majority vote / averaging / aggregating

predictions

  • N. Sutton-Charani

Artificial intelligence Decision trees 45 / 47

slide-46
SLIDE 46

Extension : Random forest

Random forest

  • N. Sutton-Charani

Artificial intelligence Decision trees 46 / 47

slide-47
SLIDE 47

Extension : Random forest

References

* L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification And Regression Trees, 1984. * J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1,

  • pp. 81–106, Oct. 1986

* L. Breiman. Random forests. Statistics, pages 1–33, 2001. * G. Biau, L. Devroye, and G. Lugosi. Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res., 9 :2015–2033, jun 2008.

  • N. Sutton-Charani

Artificial intelligence Decision trees 47 / 47