decision trees i
play

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What


  1. Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1

  2. COSC 425: Intro. to Machine Learning 2

  3. Today’s Agenda We will address: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 3

  4. 1. What are Decision Trees? COSC 425: Intro. to Machine Learning 4

  5. Types of Machine Learning Supervised Unsupervised Reinforcement Learning Learning Learning COSC 425: Intro. to Machine Learning 5

  6. Decision Trees: Example Data Suppose you have data about students’ Input Variables Output Variables preferences for courses at UTK. (Features) (Targets) student_id course_type course_location difficulty grade rating s1 ML online easy 80 … like Example / s1 Compilers face-to-face easy 87 … like Instance s2 Compilers face-to-face hard 72 … dislike s3 OS online hard 79 … dislike s3 Algorithms online hard 85 … dislike s4 ML online hard 66 … like … Dataset (i.e. with Input-Output Pairs) COSC 425: Intro. to Machine Learning 6

  7. Goal: Predict whether a student will like a course. Testing Data Input-output Pairs ( x i , y i ) x Training Data f Learning Algorithm Input-output Pairs f(x) y ( x i , y i ) COSC 425: Intro. to Machine Learning 7

  8. Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Function Me: Yes. You: Were past online courses difficult? Predicted f(x) 3 Like Me: Yes. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 8

  9. Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Prediction is about finding Function Me: Yes. questions that matter. You: Has the student liked most online courses? Predicted f(x) 3 Like Me: No. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 9

  10. Decision Trees: Questions isCompilers Prospective x 1 Course no yes isOnline Dislike f Learned 2 no yes Function isEasy isMorning? Predicted f(x) 3 Like no yes no yes Dislike Like Dislike Like COSC 425: Intro. to Machine Learning 10

  11. From Questions to Learning Terminology for Decision Trees instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating” COSC 425: Intro. to Machine Learning 11

  12. From Questions to Learning Learning is concerned with finding the “best” tree for the data. We could enumerate all possible trees and evaluate each tree. Answer: Too many! … Okay. So, how many trees is that? <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard (See Hyafil and Rivest 1976.) Thus: We greedily ask “If I could ask one question, what is it?” Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” COSC 425: Intro. to Machine Learning 12

  13. From Questions to Learning Decision Trees Split Your Data Each node represents a question that splits your data. • Decision tree learning = choosing what internal nodes should be. • Questions are Conditionals Grade > 80 • Grade in [80-90] • Location is {“online”, “hybrid”, “face-to-face”} • Teacher is DR_WILLIAMS • MLgrade * 2 + COMPILERgrade * 3 • COSC 425: Intro. to Machine Learning 13

  14. From Questions to Learning morning Compilers online easy Distribution of Like/Dislike labels for each question. COSC 425: Intro. to Machine Learning 14

  15. From Questions to Learning Compilers easy Uninformative Informative COSC 425: Intro. to Machine Learning 15

  16. 2. What Functions Can We Learn? COSC 425: Intro. to Machine Learning 16

  17. Supervised Learning: Theory Problem Setting: • Set of possible instances: • Unknown target function: • Set of function hypotheses The Learning Algorithm: • Input : training examples • Output : Hypothesis that best approximates the target function (Daumé, pg. 9) The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space . COSC 425: Intro. to Machine Learning 17

  18. Supervised Learning: Theory Problem Setting: • Set of possible instances: Each instance is a feature vector. • Unknown target function: y = 1 if a student likes the course; otherwise, y = 0 • Set of function hypotheses The Learning Algorithm: Each hypothesis is a decision tree! • Input : training examples • Output : Hypothesis that best approximates the target function COSC 425: Intro. to Machine Learning 18

  19. Trees as Functions: Boolean Logic Translate the Tree to Boolean Logic Example : Weather Prediction COSC 425: Intro. to Machine Learning 19

  20. Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Instance 17.99 10.38 122.8 N 20.29 14.34 135.1 R … … … … N = No Recurrence; R = Recurrence Output Variables : COSC 425: Intro. to Machine Learning 20

  21. Example: Cancer Recurrence Prediction What does a node present? A partitioning of the input space. Internal Nodes : A test or question. Discrete features : Branch on all values • Real Features : Branch on threshold value • Leaf Nodes : Include instances that satisfy the tests along the branch. Remember the Following: Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 21

  22. Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) What does a node present? A partitioning of the input space. radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Internal Nodes : A test or question. Instance Discrete features : Branch on all values • 17.99 10.38 122.8 N Continuous Features : Branch on threshold value • 20.29 14.34 135.1 R Leaf Nodes : Include instances that satisfy … … … … the tests along the branch. Remember the Following: N = No Recurrence; R = Recurrence Output Variables : Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 22

  23. Example: Cancer Recurrence Prediction Conversion: Decision trees translate to sets of if-then rules. COSC 425: Intro. to Machine Learning 23

  24. Example: Cancer Recurrence Prediction Conversion: Decision trees can represent probability of recurrence. COSC 425: Intro. to Machine Learning 24

  25. Decision Trees: Interpretation Important: Decision trees form boundaries in your data. COSC 425: Intro. to Machine Learning 25

  26. Decision Trees: Interpretation Predict a person’s interest in skiing or snowboarding. 17 20 + H > 148 Skier - Snowboarder yes no 200 + + + + Height (H) W > 20 + Ski + + yes no - 150 148 + - - - - + - W < 17 - SB 125 - - yes - no 100 H > 125 Ski yes no 10 30 1 20 Width (W) SB Ski COSC 425: Intro. to Machine Learning 26

  27. Decision Trees: Interpretation See Ishwaran H. and Rao J.S. (2009) COSC 425: Intro. to Machine Learning 27

  28. Hypothesis Space For decision trees, the hypothesis space is the set of all possible finite discrete functions that can be learned based on the data. à f(x) = { category1, category2, …, category N } Every finite discrete function can be represented by some decision tree. (… Hence the need to be greedy!) COSC 425: Intro. to Machine Learning 28

  29. Hypothesis Space Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 29

  30. Decision Boundaries Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 30

  31. Decision Boundaries for Real-Valued Features Use Real-Valued Features with “Nice” Bounds. • Best used when labels occupy “axis-orthogonal” regions of input space. COSC 425: Intro. to Machine Learning 31

  32. Today’s Agenda We have addressed: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 32

  33. Reading Daume. Chapter 1 COSC 425: Intro. to Machine Learning 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend