BBM406 Fundamentals of Machine Learning Lecture 18: Decision - PowerPoint PPT Presentation

Photo byUnsplash user @technobulka BBM406 Fundamentals of   Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe University // Fall 2019

Today • Decision Trees • Tree construction • Overfitting • Pruning • Real-valued inputs 2

Machine Learning in the ER Physician documentation Triage Information Specialist consults MD comments (Free text) (free text) 2 hrs 30 min T=0 Repeated vital signs Disposition (continuous values) Measured every 30 s Lab results slide by David Sontag (Continuous valued) 3

Can we predict infection? Physician documentation Specialist consults Triage Information (Free text) MD comments (free text) Many crucial decisions about a patient’s care are Repeated vital signs made here! (continuous values) Measured every 30 s Lab results slide by David Sontag (Continuous valued) 4

Can we predict infection • Previous automatic approaches based on simple criteria: - Temperature < 96.8 °F or > 100.4 °F - Heart rate > 90 beats/min - Respiratory rate > 20 breaths/min   • Too simplified... e.g., heart rate depends on age! slide by David Sontag 5

Can we predict infection? • These are the attributes we have for each patient: - Temperature - Heart rate (HR) - Respiratory rate (RR) - Age - Acuity and pain level - Diastolic and systolic blood pressure (DBP , SBP) - Oxygen Saturation (SaO2) • We have these attributes + label (infection) for 200,000 patients! • Let’s learn to classify infection slide by David Sontag 6

Predicting infection using decision trees slide by David Sontag 7

Example: Image Classification assification example slide by Nando de Freitas 8 [Criminisi et al, 2011]

Example: Mushrooms slide by Jerry Zhu http://www.usask.ca/biology/fungi/ 9

Mushroom features 1. cap-shape: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s 2. cap-surface: fibrous=f, grooves=g, scaly=y, smooth=s 3. cap-color: brown=n, bu ff =b, cinnamon=c, gray=g, green=r, pink=p,purple=u, red=e, white=w, yellow=y 4. bruises?: bruises=t,no=f 5. odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s 6. gill-attachment: attached=a, descending=d, free=f, notched=n 7. ... slide by Jerry Zhu 10

Two mushrooms x 1 =x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u y 1 =p x 2 =x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g y 2 =e 1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s 2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y slide by Jerry Zhu 4. … 11

Example: Automobile Miles-per- gallon prediction mpg cylinders displacement horsepower weight acceleration modelyear maker good 4 low low low high 75to78 asia bad 6 medium medium medium medium 70to74 america bad 4 medium medium medium low 75to78 europe bad 8 high high high low 70to74 america bad 6 medium medium medium medium 70to74 america bad 4 low medium low medium 70to74 asia bad 4 low medium low low 70to74 asia bad 8 high high high low 75to78 america : : : : : : : : : : : : : : : : : : : : : : : : bad 8 high high high low 70to74 america good 8 high medium high high 79to83 america bad 8 high high high low 75to78 america good 4 low low low low 79to83 america bad 6 medium medium medium high 75to78 america good 4 medium low low low 79to83 america good 4 low low medium high 79to83 america bad 8 high high high low 70to74 america good 4 low medium low medium 75to78 europe slide by Jerry Zhu bad 5 medium medium medium medium 75to78 europe 12

Hypotheses: decision trees f : X → Y • Each internal node Cylinders& tests an attribute x i • Each branch 3 & 4 & 5 & 6 & 8 & assigns an good bad bad attribute value x i = v Maker& Horsepower& • Each leaf assigns a low & med & america & asia & europe & high & class y   bad good bad good good bad • To classify input x : traverse the tree from root to leaf, Human interpretable! output the labeled y slide by David Sontag 13

Hypothesis space sis space • How many possible hypotheses?   • What functions can be represented? Cylinders& 6 & 3 & 4 & 5 & 8 & bad good bad Maker& Horsepower& slide by David Sontag america & low & med & high & asia & europe & bad good good good bad bad 14

What functions can be represented? → • Decision trees can A t& A B A xor B F T represent any function of F F F B B F T T the input attributes!   F T F T T F T T T F F T T F • For Boolean functions, path (Figure&from&Stuart&Russell)& & to leaf gives truth table row   & Cylinders& • But, could require exponentially many 6 & 3 & 4 & 5 & 8 & & nodes… bad good bad Maker& Horsepower& america & low & med & high & asia & europe & bad good good good bad bad slide by David Sontag cyl=3 ∨ (cyl=4 ∧ (maker=asia ∨ maker=europe)) ∨ … 15

            Are all decision trees equal? • Many trees can represent the same concept • But, not all trees will have the same size - e.g., φ =(A ∧ B) ∨ (¬A ∧ C) — ((A and B) or (not A and C))   � � � � B A t f t f C C B C t f t f t f t f _ _ _ A A + + + t f t f • Which tree do we prefer? slide by David Sontag _ _ + + r? 16

Learning decision trees is hard!!! • Learning the simplest (smallest) decision tree is an NP-complete problem [Hyafil & Rivest ’76]   • Resort to a greedy heuristic: - Start from empty decision tree - Split on next best attribute (feature) - Recurse slide by David Sontag 17

A Decision Stump Internal node q�estion: ¡��hat ¡is ¡the ¡ number of c�linders�? Leaves: classify by slide by Jerry Zhu majority vote 18

Key idea: Greedily learn trees using recursion Records in which cylinders = 4 Records in which cylinders = 5 Take the And partition it Original according Records Dataset.. to the value of in which the attribute we cylinders split on = 6 Records in which slide by David Sontag cylinders = 8 19

Recursive Step Build tree from Build tree from Build tree from Build tree from These records.. These records.. These records.. These records.. Records in Records in which cylinders which cylinders = 8 slide by David Sontag = 6 Records in Records in which cylinders which cylinders = 5 = 4 20

Second level of tree Recursively build a tree from the seven (Similar recursion in slide by David Sontag records in which there are four cylinders the other cases) and the maker was based in Asia 21

1. Do not split when all The full examples have the same label decision tree 2. Can not split when we run out of questions slide by Jerry Zhu 22

              Splitting: Choosing a good attribute • Would we prefer to split on X 1 or ? X 1 X 2 Y X 2 ?   T T T T F T X 1 X 2 T T T t f t f T F T Y=t : 4 Y=t : 1 Y=t : 3 Y=t : 2 2 F T T Y=f : 0 Y=f : 3 Y=f : 1 Y=f : 2 F F F F T F Idea: use counts at leaves to F F F define probability distributions, slide by David Sontag so we can measure uncertainty! 23

Measuring uncertainty • Good split if we are more certain about classification after split - Deterministic good (all true or all false) - Uniform distribution bad - What about distributions in between? P(Y=A) = 1/2 P(Y=B) = 1/4 P(Y=C) = 1/8 P(Y=D) = 1/8 P(Y=A) = 1/4 P(Y=B) = 1/4 P(Y=C) = 1/4 P(Y=D) = 1/4 slide by David Sontag 24

Entropy • Entropy H ( Y ) of a random variable Y Entropy&of&a&coin&flip& Entropy&of&a&coin&flip& • More uncertainty, more entropy!   Entropy& • Information Theory interpretation: s H ( Y ) is the expected number of   bits needed to encode a randomly drawn value of Y (under most e ffi cient code) slide by David Sontag Probability&of&heads& 25

High, Low Entropy • “High Entropy” - Y is from a uniform like distribution - Flat histogram - Values sampled from it are less predictable   • “Low Entropy” - Y is from a varied (peaks and valleys) distribution - Histogram has many lows and highs - Values sampled from it are more predictable slide by Vibhav Gogate 26

Entropy Example Entropy&of&a&coin&flip& Entropy& Probability&of&heads& P(Y=t) = 5/6 X 1 X 2 Y T T T P(Y=f) = 1/6 T F T T T T H(Y) = - 5/6 log 2 5/6 - 1/6 log 2 1/6 T F T = 0.65 F T T slide by David Sontag F F F 27

Conditional Entropy Condi>onal&Entropy& H( Y |X) &of&a&random&variable& Y &condi>oned&on&a& random&variable& X X 1 X 2 Y Example: X 1 T T T t f T F T P(X 1 =t) = 4/6 Y=t : 4 Y=t : 1 T T T P(X 1 =f) = 2/6 Y=f : 0 Y=f : 1 T F T F T T H(Y|X 1 ) = - 4/6 (1 log 2 1 + 0 log 2 0) F F F slide by David Sontag - 2/6 (1/2 log 2 1/2 + 1/2 log 2 1/2) = 2/6 28

BBM406 Fundamentals of Machine Learning Lecture 18: Decision - PowerPoint PPT Presentation

Photo byUnsplash user @technobulka BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe University // Fall 2019 Today Decision Trees Tree construction Overfitting Pruning

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

CPSC 121: Models of Computation Translate back and forth between simple natural language

EU Regulation: Cross-border & extraterritorial issues Alexandria Carr 14 August 2013 Of

Reconfiguration of Satisfying Assignments and Subset Sums: Easy to Find, Hard to Connect x in

Formal Verification of the FTTRS Mechanisms for the Consistent Update of the Traffic Schedule

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

Software Testing Lecture 5 Justin Pearson 2019 1 / 38 Covering Logical Expressions Logic

Us Using Knowled edge e Compilati tion to to PP PP PP -Complet Solve e PP ete e Pro

Efficient Padding Oracle Attacks On Cryptographic Hardware or The Million Message Attack in 15

BBM406 Fundamentals of Machine Learning Lecture 18: Decision - PowerPoint PPT Presentation

Photo byUnsplash user @technobulka BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe University // Fall 2019 Today Decision Trees Tree construction Overfitting Pruning

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

CPSC 121: Models of Computation Translate back and forth between simple natural language

EU Regulation: Cross-border &amp; extraterritorial issues Alexandria Carr 14 August 2013 Of

Reconfiguration of Satisfying Assignments and Subset Sums: Easy to Find, Hard to Connect x in

Formal Verification of the FTTRS Mechanisms for the Consistent Update of the Traffic Schedule

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

Software Testing Lecture 5 Justin Pearson 2019 1 / 38 Covering Logical Expressions Logic

Us Using Knowled edge e Compilati tion to to PP PP PP -Complet Solve e PP ete e Pro

Efficient Padding Oracle Attacks On Cryptographic Hardware or The Million Message Attack in 15

EU Regulation: Cross-border & extraterritorial issues Alexandria Carr 14 August 2013 Of