Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - - PowerPoint PPT Presentation

decision trees i
SMART_READER_LITE
LIVE PREVIEW

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What


slide-1
SLIDE 1

Decision Trees I

1 COSC 425: Intro. to Machine Learning

COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874)

  • Dr. Alex Williams

August 24, 2020

slide-2
SLIDE 2

2 COSC 425: Intro. to Machine Learning

slide-3
SLIDE 3

3 COSC 425: Intro. to Machine Learning

Today’s Agenda

We will address:

  • 1. What are decision trees?
  • 2. What functions can we learn with decision trees?
slide-4
SLIDE 4

4 COSC 425: Intro. to Machine Learning

  • 1. What are Decision Trees?
slide-5
SLIDE 5

5 COSC 425: Intro. to Machine Learning

Types of Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning

slide-6
SLIDE 6

6 COSC 425: Intro. to Machine Learning

Suppose you have data about students’ preferences for courses at UTK.

student_id course_type course_location difficulty grade rating s1 ML

  • nline

easy 80 … like s1 Compilers face-to-face easy 87 … like s2 Compilers face-to-face hard 72 … dislike s3 OS

  • nline

hard 79 … dislike s3 Algorithms

  • nline

hard 85 … dislike s4 ML

  • nline

hard 66 … like …

Dataset (i.e. with Input-Output Pairs) Input Variables (Features) Output Variables (Targets)

Decision Trees: Example Data

Example / Instance

slide-7
SLIDE 7

7 COSC 425: Intro. to Machine Learning

Training Data Input-output Pairs

(xi, yi)

Learning Algorithm

f x

Testing Data

f(x) y

Input-output Pairs

(xi, yi) Goal: Predict whether a student will like a course.

slide-8
SLIDE 8

8 COSC 425: Intro. to Machine Learning

Decision Trees: Questions

You: Is the course a Compilers course? Goal: Predict whether a student will like a course. Me: Yes.

f x f(x)

Prospective Course Predicted Like Learned Function

1 2 3

You: Is the course online? Me: Yes. You: Were past online courses difficult? Me: Yes. You: I predict the student will not like this course.

slide-9
SLIDE 9

9 COSC 425: Intro. to Machine Learning

Decision Trees: Questions

You: Is the course a Compilers course? Goal: Predict whether a student will like a course. Me: Yes.

f x f(x)

Prospective Course Predicted Like Learned Function

1 2 3

You: Is the course online? Me: Yes. You: Has the student liked most online courses? Me: No. You: I predict the student will not like this course.

Prediction is about finding questions that matter.

slide-10
SLIDE 10

10 COSC 425: Intro. to Machine Learning

Decision Trees: Questions

f x f(x)

Prospective Course Predicted Like Learned Function

1 2 3 isCompilers isOnline isMorning? isEasy yes no yes no yes no yes no

Dislike

Dislike

Like Dislike Like

slide-11
SLIDE 11

11 COSC 425: Intro. to Machine Learning

From Questions to Learning

Terminology for Decision Trees

instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating”

slide-12
SLIDE 12

12 COSC 425: Intro. to Machine Learning

From Questions to Learning

Learning is concerned with finding the “best” tree for the data.

We could enumerate all possible trees and evaluate each tree. … Okay. So, how many trees is that? Answer: Too many!

Thus: We greedily ask “If I could ask one question, what is it?”

Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard

(See Hyafil and Rivest 1976.)

slide-13
SLIDE 13

13 COSC 425: Intro. to Machine Learning

From Questions to Learning

Decision Trees Split Your Data

  • Each node represents a question that splits your data.
  • Decision tree learning = choosing what internal nodes should be.

Questions are Conditionals

  • Grade > 80
  • Grade in [80-90]
  • Location is {“online”, “hybrid”, “face-to-face”}
  • Teacher is DR_WILLIAMS
  • MLgrade * 2 + COMPILERgrade * 3
slide-14
SLIDE 14

14 COSC 425: Intro. to Machine Learning

From Questions to Learning

Distribution of Like/Dislike labels for each question.

easy Compilers morning

  • nline
slide-15
SLIDE 15

15 COSC 425: Intro. to Machine Learning

From Questions to Learning

easy Compilers

Informative Uninformative

slide-16
SLIDE 16

16 COSC 425: Intro. to Machine Learning

  • 2. What Functions Can We Learn?
slide-17
SLIDE 17

17 COSC 425: Intro. to Machine Learning

Problem Setting:

  • Set of possible instances:
  • Unknown target function:
  • Set of function hypotheses

(Daumé, pg. 9)

Supervised Learning: Theory

The Learning Algorithm:

  • Input: training examples
  • Output: Hypothesis

that best approximates the target function

The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space.

slide-18
SLIDE 18

18 COSC 425: Intro. to Machine Learning

Problem Setting:

  • Set of possible instances:

Supervised Learning: Theory

The Learning Algorithm:

  • Input: training examples
  • Output: Hypothesis

that best approximates the target function

Each instance is a feature vector. y = 1 if a student likes the course; otherwise, y = 0

  • Unknown target function:
  • Set of function hypotheses

Each hypothesis is a decision tree!

slide-19
SLIDE 19

19 COSC 425: Intro. to Machine Learning

Trees as Functions: Boolean Logic

Example: Weather Prediction

Translate the Tree to Boolean Logic

slide-20
SLIDE 20

20 COSC 425: Intro. to Machine Learning

Example: Cancer Recurrence Prediction

Output Variables: N = No Recurrence; R = Recurrence

radius texture perimeter … 18.02 27.6 117.5 17.99 10.38 122.8 20.29 14.34 135.1 … … …

  • utcome

N N R …

Input Variables (Features) Output Variables (Targets) Example / Instance

slide-21
SLIDE 21

21 COSC 425: Intro. to Machine Learning

Example: Cancer Recurrence Prediction

What does a node present?

A partitioning of the input space. Internal Nodes: A test or question.

  • Discrete features: Branch on all values
  • Real Features: Branch on threshold value

Leaf Nodes: Include instances that satisfy the tests along the branch. Remember the Following:

  • Each instance maps to a particular leaf.
  • Each leaf typically contains more than one example.
slide-22
SLIDE 22

22 COSC 425: Intro. to Machine Learning

Example: Cancer Recurrence Prediction

What does a node present?

A partitioning of the input space. Internal Nodes: A test or question.

  • Discrete features: Branch on all values
  • Continuous Features: Branch on threshold value

Leaf Nodes: Include instances that satisfy the tests along the branch. Remember the Following:

  • Each instance maps to a particular leaf.
  • Each leaf typically contains more than one example.

Output Variables: N = No Recurrence; R = Recurrence

radius texture perimeter … 18.02 27.6 117.5 17.99 10.38 122.8 20.29 14.34 135.1 … … …

  • utcome

N N R …

Input Variables (Features) Output Variables (Targets) Example / Instance

slide-23
SLIDE 23

23 COSC 425: Intro. to Machine Learning

Example: Cancer Recurrence Prediction

Conversion: Decision trees translate to sets of if-then rules.

slide-24
SLIDE 24

24 COSC 425: Intro. to Machine Learning

Example: Cancer Recurrence Prediction

Conversion: Decision trees can represent probability of recurrence.

slide-25
SLIDE 25

25 COSC 425: Intro. to Machine Learning

Decision Trees: Interpretation

Important: Decision trees form boundaries in your data.

slide-26
SLIDE 26

26 COSC 425: Intro. to Machine Learning

Decision Trees: Interpretation

+ + + + + + + + +

1 10 20 30 100 150 200

Height (H) Width (W)

H > 148 yes no

Ski

  • -
  • -
  • 17

20 148 125

Predict a person’s interest in skiing or snowboarding.

+

  • Skier

Snowboarder

W > 20 yes no

SB

W < 17 no H > 125 no yes

SB Ski Ski

yes

slide-27
SLIDE 27

27 COSC 425: Intro. to Machine Learning

Decision Trees: Interpretation

See Ishwaran H. and Rao J.S. (2009)

slide-28
SLIDE 28

28 COSC 425: Intro. to Machine Learning

Hypothesis Space

For decision trees, the hypothesis space is the set of all possible finite discrete functions that can be learned based on the data. Every finite discrete function can be represented by some decision tree. (… Hence the need to be greedy!)

à f(x) = { category1, category2, …, categoryN }

slide-29
SLIDE 29

29 COSC 425: Intro. to Machine Learning

Hypothesis Space

Boolean functions can be fully expressed in decision trees.

  • Each entry in a truth table can be one path. (Inefficient!)
  • Most Boolean functions can be encoded more compactly.

Note: Encoding Challenges

  • Some functions demand exponentially

large decision trees to represent.

slide-30
SLIDE 30

30 COSC 425: Intro. to Machine Learning

Decision Boundaries

Boolean functions can be fully expressed in decision trees.

  • Each entry in a truth table can be one path. (Inefficient!)
  • Most Boolean functions can be encoded more compactly.

Note: Encoding Challenges

  • Some functions demand exponentially

large decision trees to represent.

slide-31
SLIDE 31

31 COSC 425: Intro. to Machine Learning

Decision Boundaries for Real-Valued Features

Use Real-Valued Features with “Nice” Bounds.

  • Best used when labels occupy “axis-orthogonal” regions of input space.
slide-32
SLIDE 32

32 COSC 425: Intro. to Machine Learning

Today’s Agenda

We have addressed:

  • 1. What are decision trees?
  • 2. What functions can we learn with decision trees?
slide-33
SLIDE 33

33 COSC 425: Intro. to Machine Learning

Reading

  • Daume. Chapter 1
slide-34
SLIDE 34

34 COSC 425: Intro. to Machine Learning

Next Time

We will address:

  • 1. How do you train and test decision trees?
  • 2. How can decision trees generalize?
  • 3. What is the inductive bias of decision trees?