Decision Trees: Representation Machine Learning 1 Some slides from - - PowerPoint PPT Presentation

decision trees representation
SMART_READER_LITE
LIVE PREVIEW

Decision Trees: Representation Machine Learning 1 Some slides from - - PowerPoint PPT Presentation

Decision Trees: Representation Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others Key issues in machine learning Modeling How to formulate your problem as a machine learning problem? How to represent data? Which


slide-1
SLIDE 1

Machine Learning

Decision Trees: Representation

1

Some slides from Tom Mitchell, Dan Roth and others

slide-2
SLIDE 2

Key issues in machine learning

  • Modeling

How to formulate your problem as a machine learning problem? How to represent data? Which algorithms to use? What learning protocols?

  • Representation

Good hypothesis spaces and good features

  • Algorithms

– What is a good learning algorithm? – What is success? – Generalization vs overfitting – The computational question: How long will learning take?

2

slide-3
SLIDE 3

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms

– Decision trees and the ID3 algorithm – Linear classifiers

  • Perceptron
  • SVM
  • Logistic regression

– Combining multiple classifiers

  • Boosting, bagging

– Non-linear classifiers – Nearest neighbors

3

slide-4
SLIDE 4

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms

– Decision trees and the ID3 algorithm – Linear classifiers

  • Perceptron
  • SVM
  • Logistic regression

– Combining multiple classifiers

  • Boosting, bagging

– Non-linear classifiers – Nearest neighbors

4

Important issues to consider

  • 1. What do these hypotheses represent?
  • 2. Implicit assumptions and tradeoffs
  • 3. Generalization?
  • 4. How do we learn?
slide-5
SLIDE 5

This lecture: Learning Decision Trees

  • 1. Representation: What are decision trees?
  • 2. Algorithm: Learning decision trees

– The ID3 algorithm: A greedy heuristic

  • 3. Some extensions

5

slide-6
SLIDE 6

This lecture: Learning Decision Trees

  • 1. Representation: What are decision trees?
  • 2. Algorithm: Learning decision trees

– The ID3 algorithm: A greedy heuristic

  • 3. Some extensions

6

slide-7
SLIDE 7

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Label Claire Cardie

  • Peter Bartlett

+

Eric Baum

+

Haym Hirsh

  • Leslie Pack Kaelbling

+

Yoav Freund

  • 7
slide-8
SLIDE 8

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • 8
slide-9
SLIDE 9

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • Representing data

Data can be represented as a big table, with columns denoting different attributes

With these four attributes, how many unique rows are possible? 2· 26· 26· 2 = 2704 If there are 100 attributes, all binary, how many unique rows are possible? 2100

9

slide-10
SLIDE 10

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • Representing data

Data can be represented as a big table, with columns denoting different attributes

With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? 2100

10

slide-11
SLIDE 11

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • Representing data

Data can be represented as a big table, with columns denoting different attributes

With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? 2100

11

slide-12
SLIDE 12

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • Representing data

Data can be represented as a big table, with columns denoting different attributes

With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? (100 times) 2×2×2× ⋯×2 = 2)**

12

slide-13
SLIDE 13

Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes

  • Peter Bartlett

No e No No

+

Eric Baum No r No No

+

Haym Hirsh No a No Yes

  • Leslie Pack

Kaelbling No e Yes No

+

Yoav Freund No

  • No

No

  • Representing data

Data can be represented as a big table, with columns denoting different attributes

With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? (100 times) 2×2×2× ⋯×2 = 2)**

13

If we wanted to store all possible rows, this number is too large. We need to figure out how to represent data in a better, more efficient way

slide-14
SLIDE 14

What are decision trees?

A hierarchical data structure that represents data using a divide-and-conquer strategy Can be used as hypothesis class for non-parametric classification or regression General idea: Given a collection of examples, learn a decision tree that represents it

14

slide-15
SLIDE 15

What are decision trees?

  • Decision trees are a family of classifiers for instances that are

represented by collections of attributes (i.e. features)

  • Nodes are tests for feature values
  • There is one branch for every value that the feature can take
  • Leaves of the tree specify the class labels

15

slide-16
SLIDE 16

Let’s build a decision tree for classifying shapes

Label=A Label=C Label=B

16

slide-17
SLIDE 17

Let’s build a decision tree for classifying shapes

17

Before building a decision tree: What is the label for a red triangle? And why?

Label=A Label=C Label=B

slide-18
SLIDE 18

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

18

Label=A Label=C Label=B

slide-19
SLIDE 19

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

19

Label=A Label=C Label=B

slide-20
SLIDE 20

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color?

20

Label=A Label=C Label=B

slide-21
SLIDE 21

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Blue Red Green

21

Label=A Label=C Label=B

slide-22
SLIDE 22

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Blue Red Green B

22

Label=A Label=C Label=B

slide-23
SLIDE 23

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Blue Red Green B square triangle circle C A B Shape?

23

Label=A Label=C Label=B

slide-24
SLIDE 24

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Shape? circle square A B Blue Red Green B square triangle circle C A B Shape?

24

Label=A Label=C Label=B

slide-25
SLIDE 25

Label=A Label=C Label=B

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Shape? circle square A B Blue Red Green square triangle circle C A B Shape?

  • 1. How do we learn a decision tree?

Coming up soon…

  • 2. How to use a decision tree for prediction?
  • What is the label for a red triangle?
  • Just follow a path from the root to a leaf
  • What about a green triangle?

25

B

slide-26
SLIDE 26

Label=A Label=C Label=B

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color, Shape

Color? Shape? circle square A B Blue Red Green square triangle circle C A B Shape?

  • 1. How do we learn a decision tree?

Coming up soon…

  • 2. How to use a decision tree for prediction?
  • What is the label for a red triangle?
  • Just follow a path from the root to a leaf
  • What about a green triangle?

26

B

slide-27
SLIDE 27

Expressivity of Decision trees

What Boolean functions can decision trees represent? – Any Boolean function

(Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND…. Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules

Any Boolean function can be represented as a decision tree.

27

slide-28
SLIDE 28

Expressivity of Decision trees

What Boolean functions can decision trees represent? – Any Boolean function

(Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND….

Any Boolean function can be represented as a decision tree.

28

Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules

slide-29
SLIDE 29

Expressivity of Decision trees

What Boolean functions can decision trees represent? – Any Boolean function

(Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND….

Any Boolean function can be represented as a decision tree.

29

Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules

slide-30
SLIDE 30

Decision Trees

  • Outputs are discrete categories
  • But real valued outputs are also possible (regression trees)
  • Well studied methods for handling noisy data (noise in the

label or in the features) and for handling missing attributes

– Pruning trees helps with noise – More on this later…

30

slide-31
SLIDE 31

Numeric attributes and decision boundaries

  • We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

– Values have been categorical

  • How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values – This example divides the feature space into axis parallel rectangles

31

slide-32
SLIDE 32

Numeric attributes and decision boundaries

  • We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

– Values have been categorical

  • How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values – This example divides the feature space into axis parallel rectangles

32

slide-33
SLIDE 33

Numeric attributes and decision boundaries

  • We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

– Values have been categorical

  • How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values – This example divides the feature space into axis parallel rectangles

1 3 X 7 5 Y

  • +

+ + + +

  • +

33

slide-34
SLIDE 34

Numeric attributes and decision boundaries

  • We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

– Values have been categorical

  • How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values – This example divides the feature space into axis parallel rectangles

1 3 X 7 5 Y

  • +

+ + + +

  • +

34

X<3 Y<5 no yes Y>7 yes no X < 1 no yes

  • +

+ +

  • yes

no

slide-35
SLIDE 35

Numeric attributes and decision boundaries

  • We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

– Values have been categorical

  • How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values – This example divides the feature space into axis parallel rectangles

1 3 X 7 5 Y

  • +

+ + + +

  • +

Decision boundaries can be non-linear

35

X<3 Y<5 no yes Y>7 yes no X < 1 no yes

  • +

+ +

  • yes

no

slide-36
SLIDE 36

Summary: Decision trees

  • Decision trees can represent any Boolean function
  • A way to represent lot of data
  • A natural representation (think 20 questions)
  • Predicting with a decision tree is easy
  • Clearly, given a dataset, there are many decision trees

that can represent it. [Exercise: Why?]

  • Learning a good representation from data is the next

question

36

slide-37
SLIDE 37

Summary: Decision trees

  • Decision trees can represent any Boolean function
  • A way to represent lot of data
  • A natural representation (think 20 questions)
  • Predicting with a decision tree is easy
  • Clearly, given a dataset, there are many decision trees

that can represent it. [Exercise: Why?]

  • Learning a good representation from data is the next

question

37

slide-38
SLIDE 38

Exercises

1. Write down the decision tree for the shapes data if the root node was Shape instead of Color. 2. Will the two trees make the same predictions for unseen shapes/color combinations? 3. Show that multiple structurally different decision trees can represent the same Boolean function of two or more variables.

38

Label=A Label=C Label=B (think about what it means for two trees to be structurally different)