Tree Based Methods (Ensemb mble Scheme mes) Machine Learning - PowerPoint PPT Presentation

Tree Based Methods (Ensemb mble Scheme mes) Machine Learning Spring 2018 Feb 26 2018 Kasthuri Kannan kasthuri.kannan@nyumc.org

Ov Over ervi view • Decision Trees • Overview • Spli1ng nodes • Limita8ons • Bagging/Bootstrap Aggrega8ng and Boos8ng • How bagging reduces variance? • Boos8ng • Random Forests • Overview • Why RF works? • Cancer Genomics Applica8on

Cl Classific fica=on on • Given a collec8on of records (training set ) • Each record contains a set of aLributes, one of the aLributes is the class/label • Find a model for class aLribute as a func8on of the values of other aLributes. • Goal: previously unseen records should be assigned a class as accurately as possible. • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Classific Cl fica=on on Il Illustra=on on Learning Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No algorithm 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No Induction Yes 5 No Large 95K No 6 No Medium 60K Learn No 7 Yes Large 220K Model Yes 8 No Small 85K No 9 No Medium 75K Yes 10 No Small 90K Model 10 Training Set Apply Model Tid Attrib1 Attrib2 Attrib3 Class ? 11 No Small 55K ? 12 Yes Medium 80K Deduction ? 13 Yes Large 110K ? 14 No Small 95K 15 No Large 67K ? 10 Test Set Courtesy: www.cs.kent.edu/~jin/DM07/

Classifi fica=on Examp mples • Predic8ng tumor cells as benign or malignant • Classifying credit card transac8ons as legi8mate or fraudulent • Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil • Categorizing news stories as finance, weather, entertainment, sports, etc. • Several algorithms – Decision trees, Support Vector Machines, Rule-based Methods etc.

Decision Tree (Examp mple) Splitting Attributes Refund Marital Taxable Tid Cheat Status Income 1 Yes Single 125K No Refund 2 No Married 100K No Yes No No 3 No Single 70K No 4 Yes Married 120K NO MarSt Yes 5 No Divorced 95K Married Single, Divorced 6 No Married 60K No TaxInc NO 7 Yes Divorced 220K No < 80K > 80K Yes 8 No Single 85K No 9 No Married 75K NO YES Yes 10 No Single 90K 10 Model: Decision Tree Training Data

Decision Tree (Another Examp mple) Single, MarSt Married Divorced Tid Refund Marital Taxable Cheat Status Income NO Refund No 1 Yes Single 125K No Yes No 2 No Married 100K 3 No Single 70K No NO TaxInc 4 Yes Married 120K No < 80K > 80K 5 No Divorced 95K Yes NO YES No 6 No Married 60K No 7 Yes Divorced 220K Yes 8 No Single 85K There could be more than one tree that fits 9 No Married 75K No the same data! 10 No Single 90K Yes 10

Applying the mo model Once the decision tree is built, the model can be used to test an unclassified data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Assign Cheat to “ No ” Married Single, Divorced TaxInc NO < 80K > 80K NO YES

Tumo mor Classifi fica=on (Examp mple) Tumor samples/patients given by expression in (gene1,gene2) Normal samples/patients given by expression in (gene1,gene2) gene2 (1.5,0.8) x < 1 No Yes y = 1 y < 1 y < 0.5 y = 0.5 No No Yes Yes T T N N gene1 x = 1

Decision Tree Algorithms ms • Many Algorithms: • Hunt ’ s Algorithm (one of the earliest) • CART • ID3, C4.5 • SLIQ,SPRINT Hunt’s Algorithm (General Idea) Let D t be the set of training records that reach a node t General Procedure: If D t contains records that belong the same class y t , then t is a leaf node labeled as y t If D t is an empty set, then t is a leaf node labeled by the default class, y d If D t contains records that belong to more than one class, use an aLribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.

Hunt’s Algorithm m (Illustra=on) Tid Refund Marital Taxable Cheat Status Income Refund Don ’ t No Yes Cheat 1 Yes Single 125K No 2 No Married 100K No Don ’ t Don ’ t Cheat Cheat 3 No Single 70K No 4 Yes Married 120K No Yes 5 No Divorced 95K No 6 No Married 60K Refund Refund Yes No 7 Yes Divorced 220K No Yes No 8 No Single 85K Yes Don ’ t Marital Don ’ t Cheat Marital 9 No Married 75K No Cheat Status Status 10 No Single 90K Yes Single, Married Single, Married Divorced 10 Divorced Don ’ t Cheat Taxable Don ’ t Cheat Cheat Income < 80K >= 80K Cheat Don ’ t Cheat

Tr Tree Induc=on • Greedy strategy. • Split the records based on an aLribute test that op8mizes certain criterion. • Main ques8ons • Determine how to split the records • Binary or mul8-way split? • How to determine the best split? • Determine when to stop spli1ng

Test Condi=on (SpliHng Based on Nomi minal/Ordinal AKributes) Mul8-way split: Use as many par88ons as dis8nct values. Car Type Family Luxury Sports Binary split: Divides values into two subsets. Need to find op8mal par88oning. Car Type Car Type OR {Sports, {Family, {Family} {Sports} Luxury} Luxury}

Te Test Condi=on (SpliHng Based on Con=nuous AKributes) Taxable Taxable Income Income? > 80K? < 10K > 80K Yes No [10K,25K) [25K,50K) [50K,80K) (i) Binary split (ii) Multi-way split

Binary vs. Mul=-way split – – which is the best? hLp://www.cse.msu.edu/~cse802/DecisionTrees.pdf

Determi mining Best Split Before Spli1ng: 10 records of class 0, 10 records of class 1 Own Car Student Car? Type? ID? Family Luxury c 1 c 20 Yes No c 10 c 11 Sports C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0 C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1 Which aLribute is the best for spli1ng?

Determi mining Best Split • Greedy approach: • Nodes with homogeneous class distribu8on are preferred • Need a measure of node impurity: C0: 5 C0: 9 C1: 5 C1: 1 Non-homogeneous, Homogeneous, High degree of impurity Low degree of impurity

Measures of Node Imp mpurity • Gini Index 2 GINI ( t ) 1 [ p ( j | t )] ∑ = − j • Entropy Entropy ( t ) p ( j | t ) log p ( j | t ) = − ∑ j • Misclassifica8on error Error ( t ) 1 max P ( i | t ) = − i

Comp mpu=ng Measures of Node Imp mpurity C0 N00 Before SpliWng: M0 C1 N01 A? B? Yes No Yes No Node N1 Node N2 Node N3 Node N4 C0 N10 C0 N20 C0 N30 C0 N40 C1 N11 C1 N21 C1 N31 C1 N41 M2 M3 M4 M1 M12 M34 Gain = M0 – M12 vs M0 – M34

Comp mpu=ng Measures of Node Imp mpurity (Gini Index) • Gini Index for a given node t : 2 GINI ( t ) 1 [ p ( j | t )] ∑ = − j (NOTE: p( j | t) is the rela8ve frequency of class j at node t) • Maximum (1 - 1/ n c ) when records are equally distributed among all classes, implying least interes8ng informa8on • Minimum (0.0) when all records belong to one class, implying most interes8ng informa8on

Examp mple (Gini Index of a Node) A? 2 GINI ( t ) 1 [ p ( j | t )] ∑ = − j Yes No Node N1 Node N2 M1 C0# 0" P(C0) = 0/6 = 0 P(C1) = 6/6 = 1 C0# 0" C0# 1" C1# 6" C1# 5" C1# 6" Gini = 1 – P(C0) 2 – P(C1) 2 = 1 – 0 – 1 = 0 # # # M2 M2 M1 C0# 1" P(C1) = 1/6 P(C2) = 5/6 C1# 5" Gini = 1 – (1/6) 2 – (5/6) 2 = 0.278 #

SpliH SpliHng ng base based d on n Gini ini Inde ndex • Used in CART, SLIQ, SPRINT • When a node p is split into k par88ons (children), the quality of split is computed as, where, n L = number of records at the lej child node, n R = number of records at the right child node • Split on the aLribute that maximizes Gini split

SpliH SpliHng ng Base ased d on n Gini ini Inde ndex • Splits into two par88ons • Effect of Weighing par88ons: Parent Larger and purer par88ons are sought for. • C1 6 C2 6 B? Gini = 0.500 Yes No Node N1 Node N2 Gini(N1) = 1 – (5/6) 2 – (2/6) 2 = 0.194 N1 N2 Gini(Children) C1 5 1 = 7/12 * 0.194 + Gini split(B) = 0.5-0.3 = 0.2 Gini(N2) 5/12 * 0.528 C2 2 4 = 1 – (1/6) 2 – (4/6) 2 = 0.333 Gini=0.333 = 0.528

Measures of Node Imp mpurity (Entropy) • Entropy at a given node t: Entropy ( t ) p ( j | t ) log p ( j | t ) = − ∑ j • (NOTE: p( j | t) is the rela8ve frequency of class j at node t). • Measures homogeneity of a node. Maximum (log n c ) when records are equally distributed among all classes implying least • informa8on Minimum (0.0) when all records belong to one class, implying most informa8on • • Entropy based computa8ons are similar to the GINI index computa8ons

Examp mple (Entropy) C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 C2 6 Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0 P(C1) = 1/6 P(C2) = 5/6 C1 1 C2 5 Entropy = – (1/6) log 2 (1/6) – (5/6) log 2 (1/6) = 0.65 P(C1) = 2/6 P(C2) = 4/6 C1 2 C2 4 Entropy = – (2/6) log 2 (2/6) – (4/6) log 2 (4/6) = 0.92

Tree Based Methods (Ensemb mble Scheme mes) Machine Learning - PowerPoint PPT Presentation

Tree Based Methods (Ensemb mble Scheme mes) Machine Learning Spring 2018 Feb 26 2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Ov Over ervi view Decision Trees Overview Spli1ng nodes Limita8ons Bagging/Bootstrap

Seas ason onal al Ensemb mble e Foreca ecasting sting Application lication on SuMeg Megha

Mes Maxwell Equations of Software janneke@gnu.org FOSDEM17 2017-02-05 janneke@gnu.org

MES Technical Presentation for Liquid Detergent April, 2014 LION ECO CHEMICALS SDN BHD R&D

MES OT/IT Convergence Martin Kelman MES Senior Consultant The Independent Solution Provider

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Scheme Announcements Scheme Scheme is a Dialect of Lisp 4 Scheme is a Dialect of Lisp What

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

2/27/2012 PATIENT-CENTERED O UTCO MES RESEARCH INSTITUTE PATIENT-CENTERED O UTCO MES RESEARCH

Biohacking: Some/mes Spooky Stuff and Some/mes Wonderful Stuff

Mes essag age e in M Mes essag age M e Mec echa hanism sm in M Mode odern 8 n 802.11

Design ign of a of a high re high resolution solution ense nsemble mble pre predic dictio

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Tree-based Methods Here we describe tree-based methods for regression and classification.

Tree-based Methods Here we describe tree-based methods for regression and classification.

Program Overview George Demetri, MD Co-Director, Ludwig Center at Harvard Senior Vice President

Suicide in Missouri: Where We Stand Liz Sale, PhD Missouri Institute of Mental Health

Hemicraniectomy: Is it time? R. Webster Crowley, M.D. Assistant Professor of Neurosurgery Rush

New Therapies in ANCA- -Associated Associated New Therapies in ANCA Renal Vasculitis Renal

Debiasing Skin Lesion Datasets and Models? Not So Fast Alceu Bissoto, Eduardo Valle, Sandra

Management of Hypertension in Chronic Kidney Disease UCSF Advances in Internal Medicine CME

DISCLOSURE(S) CERA (Caisse dEpargne Rhne Alpes) Research Support: foundation for financing

Billing for CPMs Preventative Procedures Cod Code(s) Des Descrip iption 99381-99397

Tree Based Methods (Ensemb mble Scheme mes) Machine Learning - PowerPoint PPT Presentation

Tree Based Methods (Ensemb mble Scheme mes) Machine Learning Spring 2018 Feb 26 2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Ov Over ervi view Decision Trees Overview Spli1ng nodes Limita8ons Bagging/Bootstrap

Seas ason onal al Ensemb mble e Foreca ecasting sting Application lication on SuMeg Megha

Mes Maxwell Equations of Software janneke@gnu.org FOSDEM17 2017-02-05 janneke@gnu.org

MES Technical Presentation for Liquid Detergent April, 2014 LION ECO CHEMICALS SDN BHD R&amp;D

MES OT/IT Convergence Martin Kelman MES Senior Consultant The Independent Solution Provider

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Scheme Announcements Scheme Scheme is a Dialect of Lisp 4 Scheme is a Dialect of Lisp What

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

2/27/2012 PATIENT-CENTERED O UTCO MES RESEARCH INSTITUTE PATIENT-CENTERED O UTCO MES RESEARCH

Biohacking: Some/mes Spooky Stuff and Some/mes Wonderful Stuff

Mes essag age e in M Mes essag age M e Mec echa hanism sm in M Mode odern 8 n 802.11

Design ign of a of a high re high resolution solution ense nsemble mble pre predic dictio

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Tree-based Methods Here we describe tree-based methods for regression and classification.

Tree-based Methods Here we describe tree-based methods for regression and classification.

Program Overview George Demetri, MD Co-Director, Ludwig Center at Harvard Senior Vice President

Suicide in Missouri: Where We Stand Liz Sale, PhD Missouri Institute of Mental Health

Hemicraniectomy: Is it time? R. Webster Crowley, M.D. Assistant Professor of Neurosurgery Rush

New Therapies in ANCA- -Associated Associated New Therapies in ANCA Renal Vasculitis Renal

Debiasing Skin Lesion Datasets and Models? Not So Fast Alceu Bissoto, Eduardo Valle, Sandra

Management of Hypertension in Chronic Kidney Disease UCSF Advances in Internal Medicine CME

DISCLOSURE(S) CERA (Caisse dEpargne Rhne Alpes) Research Support: foundation for financing

Billing for CPMs Preventative Procedures Cod Code(s) Des Descrip iption 99381-99397

MES Technical Presentation for Liquid Detergent April, 2014 LION ECO CHEMICALS SDN BHD R&D