Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1

COSC 425: Intro. to Machine Learning 2

Today’s Agenda We will address: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 3

1. What are Decision Trees? COSC 425: Intro. to Machine Learning 4

Types of Machine Learning Supervised Unsupervised Reinforcement Learning Learning Learning COSC 425: Intro. to Machine Learning 5

Decision Trees: Example Data Suppose you have data about students’ Input Variables Output Variables preferences for courses at UTK. (Features) (Targets) student_id course_type course_location difficulty grade rating s1 ML online easy 80 … like Example / s1 Compilers face-to-face easy 87 … like Instance s2 Compilers face-to-face hard 72 … dislike s3 OS online hard 79 … dislike s3 Algorithms online hard 85 … dislike s4 ML online hard 66 … like … Dataset (i.e. with Input-Output Pairs) COSC 425: Intro. to Machine Learning 6

Goal: Predict whether a student will like a course. Testing Data Input-output Pairs ( x i , y i ) x Training Data f Learning Algorithm Input-output Pairs f(x) y ( x i , y i ) COSC 425: Intro. to Machine Learning 7

Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Function Me: Yes. You: Were past online courses difficult? Predicted f(x) 3 Like Me: Yes. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 8

Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Prediction is about finding Function Me: Yes. questions that matter. You: Has the student liked most online courses? Predicted f(x) 3 Like Me: No. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 9

Decision Trees: Questions isCompilers Prospective x 1 Course no yes isOnline Dislike f Learned 2 no yes Function isEasy isMorning? Predicted f(x) 3 Like no yes no yes Dislike Like Dislike Like COSC 425: Intro. to Machine Learning 10

From Questions to Learning Terminology for Decision Trees instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating” COSC 425: Intro. to Machine Learning 11

From Questions to Learning Learning is concerned with finding the “best” tree for the data. We could enumerate all possible trees and evaluate each tree. Answer: Too many! … Okay. So, how many trees is that? <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard (See Hyafil and Rivest 1976.) Thus: We greedily ask “If I could ask one question, what is it?” Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” COSC 425: Intro. to Machine Learning 12

From Questions to Learning Decision Trees Split Your Data Each node represents a question that splits your data. • Decision tree learning = choosing what internal nodes should be. • Questions are Conditionals Grade > 80 • Grade in [80-90] • Location is {“online”, “hybrid”, “face-to-face”} • Teacher is DR_WILLIAMS • MLgrade * 2 + COMPILERgrade * 3 • COSC 425: Intro. to Machine Learning 13

From Questions to Learning morning Compilers online easy Distribution of Like/Dislike labels for each question. COSC 425: Intro. to Machine Learning 14

From Questions to Learning Compilers easy Uninformative Informative COSC 425: Intro. to Machine Learning 15

2. What Functions Can We Learn? COSC 425: Intro. to Machine Learning 16

Supervised Learning: Theory Problem Setting: • Set of possible instances: • Unknown target function: • Set of function hypotheses The Learning Algorithm: • Input : training examples • Output : Hypothesis that best approximates the target function (Daumé, pg. 9) The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space . COSC 425: Intro. to Machine Learning 17

Supervised Learning: Theory Problem Setting: • Set of possible instances: Each instance is a feature vector. • Unknown target function: y = 1 if a student likes the course; otherwise, y = 0 • Set of function hypotheses The Learning Algorithm: Each hypothesis is a decision tree! • Input : training examples • Output : Hypothesis that best approximates the target function COSC 425: Intro. to Machine Learning 18

Trees as Functions: Boolean Logic Translate the Tree to Boolean Logic Example : Weather Prediction COSC 425: Intro. to Machine Learning 19

Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Instance 17.99 10.38 122.8 N 20.29 14.34 135.1 R … … … … N = No Recurrence; R = Recurrence Output Variables : COSC 425: Intro. to Machine Learning 20

Example: Cancer Recurrence Prediction What does a node present? A partitioning of the input space. Internal Nodes : A test or question. Discrete features : Branch on all values • Real Features : Branch on threshold value • Leaf Nodes : Include instances that satisfy the tests along the branch. Remember the Following: Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 21

Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) What does a node present? A partitioning of the input space. radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Internal Nodes : A test or question. Instance Discrete features : Branch on all values • 17.99 10.38 122.8 N Continuous Features : Branch on threshold value • 20.29 14.34 135.1 R Leaf Nodes : Include instances that satisfy … … … … the tests along the branch. Remember the Following: N = No Recurrence; R = Recurrence Output Variables : Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 22

Example: Cancer Recurrence Prediction Conversion: Decision trees translate to sets of if-then rules. COSC 425: Intro. to Machine Learning 23

Example: Cancer Recurrence Prediction Conversion: Decision trees can represent probability of recurrence. COSC 425: Intro. to Machine Learning 24

Decision Trees: Interpretation Important: Decision trees form boundaries in your data. COSC 425: Intro. to Machine Learning 25

Decision Trees: Interpretation Predict a person’s interest in skiing or snowboarding. 17 20 + H > 148 Skier - Snowboarder yes no 200 + + + + Height (H) W > 20 + Ski + + yes no - 150 148 + - - - - + - W < 17 - SB 125 - - yes - no 100 H > 125 Ski yes no 10 30 1 20 Width (W) SB Ski COSC 425: Intro. to Machine Learning 26

Decision Trees: Interpretation See Ishwaran H. and Rao J.S. (2009) COSC 425: Intro. to Machine Learning 27

Hypothesis Space For decision trees, the hypothesis space is the set of all possible finite discrete functions that can be learned based on the data. à f(x) = { category1, category2, …, category N } Every finite discrete function can be represented by some decision tree. (… Hence the need to be greedy!) COSC 425: Intro. to Machine Learning 28

Hypothesis Space Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 29

Decision Boundaries Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 30

Decision Boundaries for Real-Valued Features Use Real-Valued Features with “Nice” Bounds. • Best used when labels occupy “axis-orthogonal” regions of input space. COSC 425: Intro. to Machine Learning 31

Today’s Agenda We have addressed: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 32

Reading Daume. Chapter 1 COSC 425: Intro. to Machine Learning 33

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Vectors, matrices, eigenvalues and eigenvectors 1 1 2 0 . 5 2

Dementia Research Event Bridging the Gap to Evidence- based Dementia Care 7 th December

Cognitive Radio Networks (CRN) Speaker: You-Min Lin Advisor: Dr. Kai-Wei Ke Date: 2011/04/25 1

The Effect of Clause Elimination on SLS for SAT Oliver Gableske 1 1 oliver.gableske@uni-ulm.de

mlogit : a R package for the estimation of the multinomial logit Yves Croissant 1 1 (LET

Ph Phol old&It!* &It!* ** AutomatedDoughLayerer * PRESENTED(BY( PINK%A% The%Problem%

EVERYONE IS A DESIGNER. Agile Prague 2018 | @LiamHutchinson_ Section one DELIGHTFUL

A great probabilist: Catherine Dol eans-Dade B. Hajek Department of Electrical and Computer

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

Vectors, matrices, eigenvalues and eigenvectors 1 1 2 0 . 5 2

Dementia Research Event Bridging the Gap to Evidence- based Dementia Care 7 th December

Cognitive Radio Networks (CRN) Speaker: You-Min Lin Advisor: Dr. Kai-Wei Ke Date: 2011/04/25 1

The Effect of Clause Elimination on SLS for SAT Oliver Gableske 1 1 oliver.gableske@uni-ulm.de

mlogit : a R package for the estimation of the multinomial logit Yves Croissant 1 1 (LET

Ph Phol old&amp;It!* &amp;It!* ** Automated*Dough*Layerer * PRESENTED(BY( PINK%A% The%Problem%

EVERYONE IS A DESIGNER. Agile Prague 2018 | @LiamHutchinson_ Section one DELIGHTFUL

A great probabilist: Catherine Dol eans-Dade B. Hajek Department of Electrical and Computer

Ph Phol old&It!* &It!* ** AutomatedDoughLayerer * PRESENTED(BY( PINK%A% The%Problem%