Decision Trees l Highly used and successful l Iteratively split the - PowerPoint PPT Presentation

Decision Trees l Highly used and successful l Iteratively split the Data Set into subsets one attribute at a time, using most informative attributes first – Thus, constructively chooses which attributes to use and ignore l Continue until you can label each leaf node with a class l Attribute Features – discrete/nominal (can extend to continuous features) l Smaller/shallower trees (i.e. using just the most informative attributes) generalizes the best – Searching for smallest tree takes exponential time l Typically use a greedy iterative approach to create the tree by selecting the currently most informative attribute to use CS 472 - Decision Trees 1

Decision Tree Learning l Assume A 1 is nominal binary feature (Size: S/L) l Assume A 2 is nominal 3 value feature (Color: R/G/B) l A goal is to get “pure” leaf nodes. What would you do? R G A 2 B A 1 S L CS 472 - Decision Trees 2

Decision Tree Learning l Assume A 1 is nominal binary feature (Size: S/L) l Assume A 2 is nominal 3 value feature (Color: R/G/B) l Next step for left and right children? A 1 R R G A 2 A 2 G B B A 1 A 1 S L S L CS 472 - Decision Trees 3

Decision Tree Learning l Assume A 1 is nominal binary feature (Size: S/L) l Assume A 2 is nominal 3 value feature (Color: R/G/B) l Decision surfaces are axis aligned Hyper-Rectangles A 1 A 2 R R G A 2 G A 2 B B A 1 A 1 S L S L CS 472 - Decision Trees 4

Decision Tree Learning l Assume A 1 is nominal binary feature (Size: S/L) l Assume A 2 is nominal 3 value feature (Color: R/G/B) l Decision surfaces are axis aligned Hyper-Rectangles A 1 A 2 R R G A 2 G A 2 B B A 1 A 1 S L S L CS 472 - Decision Trees 5

ID3 Learning Approach l C is a set of examples l A test on attribute A partitions C into { C i , C 2 ,...,C |A| } where |A| is the number of values A can take on l Start with TS as C and first find a good A for root l Continue recursively until subsets unambiguously classified, you run out of attributes, or some stopping criteria is reached CS 472 - Decision Trees 6

Which Attribute/Feature to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection CS 472 - Decision Trees 7

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection n majority n total Purity CS 472 - Decision Trees 8

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection n majority n total – Want both purity and statistical significance (e.g SS#) CS 472 - Decision Trees 9

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection n majority n maj + 1 n total n total + | C | – Want both purity and statistical significance – Laplacian CS 472 - Decision Trees 10

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection n majority n maj + 1 n total n total + | C | – This is just for one node – Best attribute will be good across many/most of its partitioned nodes CS 472 - Decision Trees 11

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of a node after attribute selection | A | n majority n maj + 1 n maj , i + 1 n total , i ∑ ⋅ n total n total + | C | n total n total , i + | C | i = 1 – Now we just try each attribute to see which gives the highest score, and we split on that attribute and repeat at the next level CS 472 - Decision Trees 12

Which Attribute to split on l Twenty Questions - what are good questions, ones which when asked decrease the information remaining l Regularity required l What would be good attribute tests for a DT l Let’s come up with our own approach for scoring the quality of each possible attribute – then pick highest | A | n majority n maj + 1 n maj , i + 1 n total , i ∑ ⋅ n total n total + | C | n total n total , i + | C | i = 1 – Sum of Laplacians – a reasonable and common approach – Another approach (used by ID3): Entropy l Just replace Laplacian part with information(node) CS 472 - Decision Trees 13

Information l Information of a message in bits: I ( m ) = -log 2 ( p m ) l If there are 16 equiprobable messages, I for each message is -log 2 (1/16) = 4 bits l If there is a set S of messages of only c types (i.e. there can be many of the same type [class] in the set), then information for one message is still: I = -log 2 ( p m ) l If the messages are not equiprobable then could we represent them with less bits? – Highest disorder (randomness) is maximum information CS 472 - Decision Trees 14

Information Gain Metric l Info( S ) is the average amount of information needed to identify the class of an example in S log 2 (| C |) | C | ∑ Info p i log 2 ( p i ) − l Info( S ) = Entropy( S ) = i = 1 l 0 £ Info( S ) £ log 2 (| C |), | C | is # of output classes 0 1 prob l Expected Information after partitioning using A : | A | | S i | l Info A ( S ) = where | A | is # of values ∑ | S | Info ( S i ) for attribute A i = 1 l Gain( A ) = Info( S ) - Info A ( S ) (i.e. minimize Info A ( S )) l Gain does not deal directly with the statistical significance issue more on that later – CS 472 - Decision Trees 15

ID3 Learning Algorithm 1. S = Training Set 2. Calculate gain for each remaining attribute: Gain( A ) = Info( S ) - Info A ( S ) 3. Select highest and create a new node for each partition 4. For each partition – if pure (one class) or if stopping criteria met (pure enough or small enough set remaining), then end – else if > 1 class then go to 2 with remaining attributes, or end if no remaining attributes and label with most common class of parent – else if empty, label with most common class of parent (or set as null) |%| 𝐽𝑜𝑔𝑝 𝑇 = − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! !"# ( ( |%| 𝑇 𝑇 ' ' 𝐽𝑜𝑔𝑝 𝐵 𝑇 = ( 𝑇 𝐽𝑜𝑔𝑝 𝑇 ' = ( 𝑇 , − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! '"# '"# !"# CS 472 - Decision Trees 16

ID3 Learning Algorithm 1. S = Training Set 2. Calculate gain for each remaining attribute: Gain( A ) = Info( S ) - Info A ( S ) 3. Select highest and create a new node for each partition 4. For each partition – if one class (or if stopping criteria met) then end – else if > 1 class then go to 2 with remaining attributes, or end if no remaining attributes and label with most common class of parent – else if empty, label with most common class of parent (or set as null) Meat Crust Veg Quality N,Y D,S,T N,Y B,G,Gr |%| Y Thin N Great 𝐽𝑜𝑔𝑝 𝑇 = − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! N Deep N Bad !"# N Stuffed Y Good ( ( |%| Y Stuffed Y Great 𝑇 𝑇 ' ' 𝐽𝑜𝑔𝑝 𝐵 𝑇 = ( 𝑇 𝐽𝑜𝑔𝑝 𝑇 ' = ( 𝑇 , − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! Y Deep N Good Y Deep Y Great '"# '"# !"# N Thin Y Good Y Deep N Good CS 472 - Decision Trees 17 N Thin N Bad

Meat Crust Veg Quality Example and Homework N,Y D,S,T N,Y B,G,Gr Y Thin N Great |%| N Deep N Bad N Stuffed Y Good 𝐽𝑜𝑔𝑝 𝑇 = − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! Y Stuffed Y Great !"# Y Deep N Good Y Deep Y Great ( ( |%| 𝑇 𝑇 N Thin Y Good ' ' 𝐽𝑜𝑔𝑝 𝐵 𝑇 = ( 𝑇 𝐽𝑜𝑔𝑝 𝑇 ' = ( 𝑇 , − ( 𝑞 ! 𝑚𝑝𝑕 & 𝑞 ! Y Deep N Good '"# '"# !"# N Thin N Bad l Info( S ) = - 2/9·log 2 2/9 - 4/9·log 2 4/9 -3/9·log 2 3/9 = 1.53 – Not necessary unless you want to calculate information gain l Starting with all instances, calculate gain for each attribute l Let’s do Meat: l Info Meat ( S ) = ? – Information Gain is ? CS 472 - Decision Trees 18

Decision Trees l Highly used and successful l Iteratively split the - PowerPoint PPT Presentation

Decision Trees l Highly used and successful l Iteratively split the Data Set into subsets one attribute at a time, using most informative attributes first Thus, constructively chooses which attributes to use and ignore l Continue until you can

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

New +! +!+ +! +! +!Oscillation +! Results from NOvA Jeremy Wolcott (Tufts

Generative Adversarial Network Tianze Wang tianzew@kth.se The Course Web Page

Callcentre rollout strategy Presented by: Myprotector Group MYPROTECTOR PRODUCTS product set-up

FOUNDATIONS [Track One : Believer To Disciple] Lesson 05 : Prayer Lesson 05 : Prayer Prayer

COPS and RAP Overview Raj Yavatkar, Intel (on behalf of the RAP working group) 9/10/98 Policy

(example) Joint IAEA-ICTP Essential Knowladge Workshop on Nuclear Power Plant Design Safety

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

Contextual Privacy Management in Extended RBAC Model Nabil Ajam, Nora Cuppens, Frdric

Decision Trees l Highly used and successful l Iteratively split the - PowerPoint PPT Presentation

Decision Trees l Highly used and successful l Iteratively split the Data Set into subsets one attribute at a time, using most informative attributes first Thus, constructively chooses which attributes to use and ignore l Continue until you can

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

New +! +!+ +! +! +!Oscillation +! Results from NOvA Jeremy Wolcott (Tufts

Generative Adversarial Network Tianze Wang tianzew@kth.se The Course Web Page

Callcentre rollout strategy Presented by: Myprotector Group MYPROTECTOR PRODUCTS product set-up

FOUNDATIONS [Track One : Believer To Disciple] Lesson 05 : Prayer Lesson 05 : Prayer Prayer

COPS and RAP Overview Raj Yavatkar, Intel (on behalf of the RAP working group) 9/10/98 Policy

(example) Joint IAEA-ICTP Essential Knowladge Workshop on Nuclear Power Plant Design Safety

Static and dynamic verification Static and dynamic V&amp;V Software inspections Concerned

Contextual Privacy Management in Extended RBAC Model Nabil Ajam, Nora Cuppens, Frdric

Static and dynamic verification Static and dynamic V&V Software inspections Concerned