Decision Trees I
1 COSC 425: Intro. to Machine Learning
COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874)
- Dr. Alex Williams
Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - - PowerPoint PPT Presentation
Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What
1 COSC 425: Intro. to Machine Learning
COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874)
2 COSC 425: Intro. to Machine Learning
3 COSC 425: Intro. to Machine Learning
4 COSC 425: Intro. to Machine Learning
5 COSC 425: Intro. to Machine Learning
6 COSC 425: Intro. to Machine Learning
student_id course_type course_location difficulty grade rating s1 ML
easy 80 … like s1 Compilers face-to-face easy 87 … like s2 Compilers face-to-face hard 72 … dislike s3 OS
hard 79 … dislike s3 Algorithms
hard 85 … dislike s4 ML
hard 66 … like …
Dataset (i.e. with Input-Output Pairs) Input Variables (Features) Output Variables (Targets)
Example / Instance
7 COSC 425: Intro. to Machine Learning
Training Data Input-output Pairs
Learning Algorithm
Testing Data
Input-output Pairs
8 COSC 425: Intro. to Machine Learning
Prospective Course Predicted Like Learned Function
1 2 3
9 COSC 425: Intro. to Machine Learning
Prospective Course Predicted Like Learned Function
1 2 3
10 COSC 425: Intro. to Machine Learning
Prospective Course Predicted Like Learned Function
1 2 3 isCompilers isOnline isMorning? isEasy yes no yes no yes no yes no
Dislike
Dislike
Like Dislike Like
11 COSC 425: Intro. to Machine Learning
instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating”
12 COSC 425: Intro. to Machine Learning
We could enumerate all possible trees and evaluate each tree. … Okay. So, how many trees is that? Answer: Too many!
Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard
(See Hyafil and Rivest 1976.)
13 COSC 425: Intro. to Machine Learning
14 COSC 425: Intro. to Machine Learning
easy Compilers morning
15 COSC 425: Intro. to Machine Learning
16 COSC 425: Intro. to Machine Learning
17 COSC 425: Intro. to Machine Learning
(Daumé, pg. 9)
The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space.
18 COSC 425: Intro. to Machine Learning
Each instance is a feature vector. y = 1 if a student likes the course; otherwise, y = 0
Each hypothesis is a decision tree!
19 COSC 425: Intro. to Machine Learning
Example: Weather Prediction
20 COSC 425: Intro. to Machine Learning
radius texture perimeter … 18.02 27.6 117.5 17.99 10.38 122.8 20.29 14.34 135.1 … … …
N N R …
Input Variables (Features) Output Variables (Targets) Example / Instance
21 COSC 425: Intro. to Machine Learning
A partitioning of the input space. Internal Nodes: A test or question.
Leaf Nodes: Include instances that satisfy the tests along the branch. Remember the Following:
22 COSC 425: Intro. to Machine Learning
A partitioning of the input space. Internal Nodes: A test or question.
Leaf Nodes: Include instances that satisfy the tests along the branch. Remember the Following:
radius texture perimeter … 18.02 27.6 117.5 17.99 10.38 122.8 20.29 14.34 135.1 … … …
N N R …
Input Variables (Features) Output Variables (Targets) Example / Instance
23 COSC 425: Intro. to Machine Learning
24 COSC 425: Intro. to Machine Learning
25 COSC 425: Intro. to Machine Learning
26 COSC 425: Intro. to Machine Learning
1 10 20 30 100 150 200
Height (H) Width (W)
H > 148 yes no
Ski
20 148 125
Predict a person’s interest in skiing or snowboarding.
Snowboarder
W > 20 yes no
SB
W < 17 no H > 125 no yes
SB Ski Ski
yes
27 COSC 425: Intro. to Machine Learning
See Ishwaran H. and Rao J.S. (2009)
28 COSC 425: Intro. to Machine Learning
à f(x) = { category1, category2, …, categoryN }
29 COSC 425: Intro. to Machine Learning
30 COSC 425: Intro. to Machine Learning
31 COSC 425: Intro. to Machine Learning
32 COSC 425: Intro. to Machine Learning
33 COSC 425: Intro. to Machine Learning
34 COSC 425: Intro. to Machine Learning