Introduction to Machine Learning CART: Growing a Tree Learning - - PowerPoint PPT Presentation

introduction to machine learning cart growing a tree
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning CART: Growing a Tree Learning - - PowerPoint PPT Presentation

Introduction to Machine Learning CART: Growing a Tree Learning goals Understand how a tree is grown 1 1 by an exhaustive search over all setosa .33 .33 .33 100% possible features and split points Petal.Length < 2.5 Petal.Length <


slide-1
SLIDE 1

Introduction to Machine Learning CART: Growing a Tree

Petal.Length < 2.5 yes no

1 2 3

Petal.Length < 2.5 setosa .33 .33 .33 100% setosa 1.00 .00 .00 33% versicolor .00 .50 .50 67% yes no

1 2 3

Learning goals

Understand how a tree is grown by an exhaustive search over all possible features and split points Know where exactly the split point is set if several yield the same empirical risk

slide-2
SLIDE 2

TREE GROWING

We start with an empty tree, a root node that contains all the data. Trees are then grown by recursively applying greedy optimization to each node N . Greedy means we do an exhaustive search: All possible splits of N

  • n all possible points t for all features xj are compared in terms of their

empirical risk R(N, j, t). The training data is then distributed to child nodes according to the

  • ptimal split and the procedure is repeated in the child nodes.

c

  • Introduction to Machine Learning – 1 / 5
slide-3
SLIDE 3

TREE GROWING

Start with a root node of all data, then search for a feature and split point that minimizes the empirical risk in the child nodes.

0.0 0.5 1.0 1.5 2.0 2.5 2 4 6

Petal.Length Petal.Width Response

setosa versicolor virginica

Iris Data

Petal.Length < 2.5

yes no

1 2 3

Petal.Length < 2.5 setosa .33 .33 .33 100% setosa 1.00 .00 .00 33% versicolor .00 .50 .50 67%

yes no

1 2 3

Nodes display their current label distribution here for illustration.

c

  • Introduction to Machine Learning – 2 / 5
slide-4
SLIDE 4

TREE GROWING

We then proceed recursively for each child node: Iterate over all features, and for each feature over all possible split points. Select the best split and divide data in parent node into left and right child nodes:

0.0 0.5 1.0 1.5 2.0 2.5 2 4 6

Petal.Length Petal.Width Response

setosa versicolor virginica

Iris Data

Petal.Length < 2.5 Petal.Width < 1.8

yes no

1 2 3 6 7

Petal.Length < 2.5 Petal.Width < 1.8 setosa .33 .33 .33 100% setosa 1.00 .00 .00 33% versicolor .00 .50 .50 67% versicolor .00 .91 .09 36% virginica .00 .02 .98 31%

yes no

1 2 3 6 7

c

  • Introduction to Machine Learning – 3 / 5
slide-5
SLIDE 5

TREE GROWING

We then proceed recursively for each child node: Iterate over all features, and for each feature over all possible split points. Select the best split and divide data in parent node into left and right child nodes:

0.0 0.5 1.0 1.5 2.0 2.5 2 4 6

Petal.Length Petal.Width Response

setosa versicolor virginica

Iris Data

Petal.Length < 2.5 Petal.Width < 1.8 Petal.Length < 5

yes no

1 2 3 6 12 13 7

Petal.Length < 2.5 Petal.Width < 1.8 Petal.Length < 5 setosa .33 .33 .33 100% setosa 1.00 .00 .00 33% versicolor .00 .50 .50 67% versicolor .00 .91 .09 36% versicolor .00 .98 .02 32% virginica .00 .33 .67 4% virginica .00 .02 .98 31%

yes no

1 2 3 6 12 13 7

c

  • Introduction to Machine Learning – 4 / 5
slide-6
SLIDE 6

SPLIT PLACEMENT

2.0 2.5 3.0 3.5 5.0 5.5 6.0 6.5 7.0

Sepal.Length Sepal.Width

Splits are usually placed at the mid-point of the observations they split: the large margin to the next closest observations makes better generalization on new, unseen data more likely.

c

  • Introduction to Machine Learning – 5 / 5