Decision Trees Lecture 12 David Sontag New York - PowerPoint PPT Presentation

Decision ¡Trees ¡ Lecture ¡12 ¡ David ¡Sontag ¡ New ¡York ¡University ¡ Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Machine ¡Learning ¡in ¡the ¡ER ¡ Physician documentation Triage Information Specialist consults MD comments (Free text) (free text) 2 hrs 30 min T=0 Repeated vital signs Disposition (continuous values) Measured every 30 s Lab results (Continuous valued)

Can ¡we ¡predict ¡infec>on? ¡ Physician documentation Specialist consults Triage Information (Free text) MD comments (free text) Many crucial decisions about a patient’s care are Repeated vital signs made here! (continuous values) Measured every 30 s Lab results (Continuous valued)

Can ¡we ¡predict ¡infec>on? ¡ • Previous ¡automa>c ¡approaches ¡based ¡on ¡simple ¡criteria: ¡ – Temperature ¡< ¡96.8 ¡°F ¡or ¡> ¡100.4 ¡°F ¡ – Heart ¡rate ¡> ¡90 ¡beats/min ¡ – Respiratory ¡rate ¡> ¡20 ¡breaths/min ¡ • Too ¡simplified… ¡e.g., ¡heart ¡rate ¡depends ¡on ¡age! ¡

Can ¡we ¡predict ¡infec>on? ¡ • These ¡are ¡the ¡aYributes ¡we ¡have ¡for ¡each ¡pa>ent: ¡ – Temperature ¡ – Heart ¡rate ¡(HR) ¡ – Respiratory ¡rate ¡(RR) ¡ – Age ¡ – Acuity ¡and ¡pain ¡level ¡ – Diastolic ¡and ¡systolic ¡blood ¡pressure ¡(DBP, ¡SBP) ¡ – Oxygen ¡Satura>on ¡(SaO2) ¡ • We ¡have ¡these ¡aYributes ¡+ ¡label ¡(infec>on) ¡for ¡200,000 ¡ pa>ents! ¡ • Let’s ¡ learn ¡to ¡classify ¡infec>on ¡

Predic>ng ¡infec>on ¡using ¡decision ¡trees ¡

Hypotheses: decision trees f : X  Y • Each internal node tests an attribute x i Cylinders ¡ • Each branch assigns an attribute 3 ¡ 4 ¡ 5 ¡ 6 ¡ 8 ¡ value x i =v good bad bad Maker ¡ Horsepower ¡ • Each leaf assigns a class y low ¡ med ¡ america ¡ asia ¡ europe ¡ high ¡ • To classify input x : bad good bad good good bad traverse the tree from root to leaf, output the labeled y Human ¡interpretable! ¡

Hypothesis space • How many possible hypotheses? • What functions can be represented? Cylinders ¡ 6 ¡ 3 ¡ 4 ¡ 5 ¡ 8 ¡ bad good bad Maker ¡ Horsepower ¡ america ¡ low ¡ med ¡ high ¡ asia ¡ europe ¡ bad good good good bad bad

What ¡func>ons ¡can ¡be ¡represented? ¡ → A • Decision ¡trees ¡can ¡represent ¡ A B A xor B F T F F F any ¡func>on ¡of ¡the ¡input ¡ B B F T T F T F T T F T aYributes! ¡ T T F F T T F (Figure ¡from ¡Stuart ¡Russell) ¡ • For ¡Boolean ¡func>ons, ¡path ¡ to ¡leaf ¡gives ¡truth ¡table ¡row ¡ Cylinders ¡ • But, ¡could ¡require ¡ 6 ¡ 3 ¡ 4 ¡ 5 ¡ 8 ¡ exponen>ally ¡many ¡nodes… ¡ bad good bad Maker ¡ Horsepower ¡ america ¡ low ¡ med ¡ high ¡ asia ¡ europe ¡ bad good good good bad bad cyl=3 ∨ (cyl=4 ∧ (maker=asia ∨ maker=europe)) ∨ …

Are ¡all ¡decision ¡trees ¡equal? ¡ • Many ¡trees ¡can ¡represent ¡the ¡same ¡concept ¡ • But, ¡not ¡all ¡trees ¡will ¡have ¡the ¡same ¡size! ¡ – e.g., ¡ φ ¡= ¡(A ¡ ∧ ¡B) ¡ ∨ ¡( ¬ A ¡ ∧ C) ¡-‑-‑ ¡((A ¡and ¡B) ¡or ¡(not ¡A ¡and ¡C)) ¡ B A t f t f C C B C t t f f t f t f _ _ _ A A + + + t f t f _ _ + + • Which tree do we prefer?

Learning ¡decision ¡trees ¡is ¡hard!!! ¡ • Learning ¡the ¡simplest ¡(smallest) ¡decision ¡tree ¡is ¡ an ¡NP-‑complete ¡problem ¡[Hyafil ¡& ¡Rivest ¡’76] ¡ ¡ • Resort ¡to ¡a ¡greedy ¡heuris>c: ¡ – Start ¡from ¡empty ¡decision ¡tree ¡ – Split ¡on ¡ next ¡best ¡a4ribute ¡(feature) ¡ – Recurse ¡

A ¡Decision ¡Stump ¡

Key ¡idea: ¡Greedily ¡learn ¡trees ¡using ¡ recursion ¡ Records in which cylinders = 4 Records in which cylinders = 5 Take the And partition it Original according Records Dataset.. to the value of in which the attribute we cylinders split on = 6 Records in which cylinders = 8

Recursive ¡Step ¡ Build tree from Build tree from Build tree from Build tree from These records.. These records.. These records.. These records.. Records in Records in which cylinders which cylinders = 8 = 6 Records in Records in which cylinders which cylinders = 5 = 4

Second ¡level ¡of ¡tree ¡ Recursively build a tree from the seven (Similar recursion in records in which there are four cylinders the other cases) and the maker was based in Asia

A full tree

Spliong: ¡choosing ¡a ¡good ¡aYribute ¡ Would we prefer to split on X 1 or X 2 ? X 1 X 2 Y T T T T F T X 1 X 2 T T T t f t f T F T Y=t : 4 Y=t : 1 Y=t : 3 Y=t : 2 F T T Y=f : 0 Y=f : 3 Y=f : 1 Y=f : 2 F F F F T F Idea: use counts at leaves to define F F F probability distributions, so we can measure uncertainty!

Measuring ¡uncertainty ¡ • Good ¡split ¡if ¡we ¡are ¡more ¡certain ¡about ¡ classifica>on ¡aper ¡split ¡ – Determinis>c ¡good ¡(all ¡true ¡or ¡all ¡false) ¡ – Uniform ¡distribu>on ¡bad ¡ – What ¡about ¡distribu>ons ¡in ¡between? ¡ P(Y=A) = 1/2 P(Y=B) = 1/4 P(Y=C) = 1/8 P(Y=D) = 1/8 P(Y=A) = 1/4 P(Y=B) = 1/4 P(Y=C) = 1/4 P(Y=D) = 1/4

Entropy ¡ Entropy ¡ H(Y) ¡of ¡a ¡random ¡variable ¡ Y Entropy ¡of ¡a ¡coin ¡flip ¡ More uncertainty, more entropy! Entropy ¡ Information Theory interpretation: H(Y) is the expected number of bits needed to encode a randomly drawn value of Y (under most efficient code) Probability ¡of ¡heads ¡

High, ¡Low ¡Entropy ¡ • “High ¡Entropy” ¡ ¡ – Y ¡is ¡from ¡a ¡uniform ¡like ¡distribu>on ¡ – Flat ¡histogram ¡ – Values ¡sampled ¡from ¡it ¡are ¡less ¡predictable ¡ • “Low ¡Entropy” ¡ ¡ – Y ¡is ¡from ¡a ¡varied ¡(peaks ¡and ¡valleys) ¡ distribu>on ¡ – Histogram ¡has ¡many ¡lows ¡and ¡highs ¡ – Values ¡sampled ¡from ¡it ¡are ¡more ¡predictable ¡ (Slide from Vibhav Gogate)

Entropy ¡of ¡a ¡coin ¡flip ¡ Entropy ¡Example ¡ Entropy ¡ Probability ¡of ¡heads ¡ P(Y=t) = 5/6 X 1 X 2 Y P(Y=f) = 1/6 T T T T F T H(Y) = - 5/6 log 2 5/6 - 1/6 log 2 1/6 T T T = 0.65 T F T F T T F F F

Condi>onal ¡Entropy ¡ Condi>onal ¡Entropy ¡ H( Y |X) ¡of ¡a ¡random ¡variable ¡ Y ¡condi>oned ¡on ¡a ¡ random ¡variable ¡ X X 1 X 2 Y Example: X 1 T T T t f T F T P(X 1 =t) = 4/6 Y=t : 4 Y=t : 1 T T T P(X 1 =f) = 2/6 Y=f : 0 Y=f : 1 T F T F T T H(Y|X 1 ) = - 4/6 (1 log 2 1 + 0 log 2 0) F F F - 2/6 (1/2 log 2 1/2 + 1/2 log 2 1/2) = 2/6

Informa>on ¡gain ¡ • Decrease ¡in ¡entropy ¡(uncertainty) ¡aper ¡spliong ¡ X 1 X 2 Y In our running example: T T T T F T IG(X 1 ) = H(Y) – H(Y|X 1 ) T T T = 0.65 – 0.33 T F T IG(X 1 ) > 0  we prefer the split! F T T F F F

Learning ¡decision ¡trees ¡ • Start ¡from ¡empty ¡decision ¡tree ¡ • Split ¡on ¡ next ¡best ¡a4ribute ¡(feature) ¡ – Use, ¡for ¡example, ¡informa>on ¡gain ¡to ¡select ¡ aYribute: ¡ • Recurse ¡

When ¡to ¡stop? ¡ First split looks good! But, when do we stop?

Base Case One Don’t split a node if all matching records have the same output value

Base Case Two Don’t split a node if data points are identical on remaining attributes

Base ¡Cases: ¡An ¡idea ¡ • Base ¡Case ¡One: ¡If ¡all ¡records ¡in ¡current ¡data ¡ subset ¡have ¡the ¡same ¡output ¡then ¡don’t ¡recurse ¡ • Base ¡Case ¡Two: ¡If ¡all ¡records ¡have ¡exactly ¡the ¡ same ¡set ¡of ¡input ¡aYributes ¡then ¡don’t ¡recurse ¡ Proposed Base Case 3: If all attributes have small information gain then don’t recurse • This is not a good idea

The ¡problem ¡with ¡proposed ¡case ¡3 ¡ y = a XOR b The information gains:

If ¡we ¡omit ¡proposed ¡case ¡3: ¡ The resulting decision tree: y = a XOR b Instead, perform pruning after building a tree

Decision ¡trees ¡will ¡overfit ¡

Decision Trees Lecture 12 David Sontag New York - PowerPoint PPT Presentation

Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Economic Outcomes of Percutaneous Coronary Intervention Performed at Sites With and Without

CEO Update Jonathan Curtright OPEN HEALTH AFF INFO 3 1 WHERE WERE GOING TODAY

The Problem of Output Measurement Feedback Control Under Set-valued Uncertainty : from Theory to

Oliviers Ricci curvature and applications Sunhyuk Lim Ohio State University lim.991@osu.edu

Mydoctorsaidwhat? Astudyoflanguagea6tudes

Errudite: Scalable, Reproducible, and Testable Error Analysis Tongshuang (Sherry) Wu @tongshuangwu

A popula(on-based evalua(on of the delivery of care for people living with HIV in Ontario Claire

APPROPRIATE PATIENT CARE IN A TIMELY MANNER www.patienteer.co PROCESS CRAIG BURKE FORMER NURSE