Lecture 24: Other (Non-linear) Classifjers: Decision Tree Learning, - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 24: Other (Non-linear) Classifjers: Decision Tree Learning, Boosting, and Support Vector Classifjcation Instructor: Prof. Ganesh Ramakrishnan October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . 1 / 25

. Humidity . . . . . . . . Decision Trees: Cascade of step functions on individual features Outlook Wind Yes . No Yes No Yes rain sunny overcast high normal strong weak October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 / 25

. . . . . . . . . . . . . . . . . . Use cases for Decision Tree Learning October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . 3 / 25

. Overcast Cool Sunny D9 No Weak High Mild Sunny D8 Yes Strong Normal Cool D7 Weak No Strong Normal Cool Rain D6 Yes Weak Normal Cool Rain D5 Yes Weak Normal Yes Mild Strong October 20, 2016 No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 Yes High D10 Mild Overcast D12 Yes Strong Normal Mild Sunny D11 Yes Weak Normal Mild Rain High Rain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D4 High Yes Weak High Hot Overcast D3 No Strong High Hot Sunny D2 No Weak Hot . Sunny D1 PlayTennis Wind Humidity Temperature Outlook Day The Canonical Playtennis Dataset . . . . . 4 / 25

. . . . . . . . . . . . . . . Decision tree representation Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classifjcation How would we represent: M of N October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 5 / 25 ∧ , ∨ , XOR ( A ∧ B ) ∨ ( C ∧ ¬ D ∧ E )

Answer : That which brings about maximum reduction in impurity Imp S v of the data subset S is a sample of training examples, p C i is proportion of examples belonging to class C i in S p C i log p C i i = expected reduction in entropy due to splitting/sorting on S H S v . Which attribute is best? induced by S v Top-Down Induction of Decision Trees Main loop: . . . . . i Entropy measures impurity of S : H S v . . K i Gain S i Gain S i H S v Values i S v October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 / 25 . . . . 1 φ i ← the “best” decision attribute for next node 2 Assign φ i as decision attribute for node 3 For each value of φ i , create new descendant of node 4 Sort training examples to leaf nodes 5 If training examples perfectly classifjed, Then STOP, Else iterate over new leaf nodes

S is a sample of training examples, p C i is proportion of examples belonging to class C i in S p C i log p C i i = expected reduction in entropy due to splitting/sorting on S H S v . . . . . . . . Top-Down Induction of Decision Trees Main loop: Which attribute is best? i Entropy measures impurity of S : H S K . Gain S i Gain S i H S v Values i S v October 20, 2016 . . . . . . . . . . . . . . . . . . . 6 / 25 . . . . . . . . . . . . 1 φ i ← the “best” decision attribute for next node 2 Assign φ i as decision attribute for node 3 For each value of φ i , create new descendant of node 4 Sort training examples to leaf nodes 5 If training examples perfectly classifjed, Then STOP, Else iterate over new leaf nodes Answer : That which brings about maximum reduction in impurity Imp ( S v ) of the data subset S v ⊆ D induced by φ i = v .

. . . . . . . . . . . . . . . . Top-Down Induction of Decision Trees Main loop: Which attribute is best? K October 20, 2016 . . . . . . . . . . . . . 6 / 25 . . . . . . . . . . . 1 φ i ← the “best” decision attribute for next node 2 Assign φ i as decision attribute for node 3 For each value of φ i , create new descendant of node 4 Sort training examples to leaf nodes 5 If training examples perfectly classifjed, Then STOP, Else iterate over new leaf nodes Answer : That which brings about maximum reduction in impurity Imp ( S v ) of the data subset S v ⊆ D induced by φ i = v . S is a sample of training examples, p C i is proportion of examples belonging to class C i in S ∑ Entropy measures impurity of S : H ( S ) ≡ − p C i log 2 p C i i =1 Gain ( S , φ i ) = expected reduction in entropy due to splitting/sorting on φ i | S v | Gain ( S , φ i ) ≡ H ( S ) − ∑ | S | H ( S v ) v ∈ Values ( φ i )

. . . . . . . . . . . . Common Impurity Measures (Tutorial 9) . Name Entropy K Gini Index K Class (Min Prob) Error argmin Table: Decision Tree: Impurity measurues These measure the extent of spread /confusion of the probabilities over the classes October 20, 2016 . . . . . . . . . . . . . . . 7 / 25 . . . . . . . . . . . . | S v ij | ( ) ∑ φ s = arg max Imp ( S ) − | S | Imp ( S v ij ) V ( φ i ) ,φ i v ij ∈ V ( φ i ) where S ij ⊆ D is a subset of dataset such that each instance x has attribute value φ i ( x ) = v ij . Imp ( S ) ∑ − Pr ( C i ) • log ( Pr ( C i )) i =1 ∑ Pr ( C i )(1 − Pr ( C i )) i =1 i (1 − Pr ( C i ))

. . . . . . . . . . . . . . . . Alternative impurity measures (Tutorial 9) Figure: Plot of Entropy, Gini Index and Misclassifjcation Accuracy. Source: https://inspirehep.net/record/1225852/files/TPZ_Figures_impurity.png These measure the extent of spread/confusion of the probabilities over the classes October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . . 8 / 25

Structural Regularization 2 based on Occam’s razor 3 size misclassifications val tree . Use parametric/non-parametric hypothesis tests . . . . . . . Regularization in Decision Tree Learning Premise: Split data into train and validation set 1 1 stop growing when data split not statistically signifjcant 2 . grow full tree, then post-prune tree Minimum Description Length (MDL): minimize size tree Achieved as follows: Do until further pruning is harmful (1) Evaluate impact on validation set of pruning each possible node (plus those below it) (2) Greedily remove the one that most improves validation set accuracy 3 convert tree into a set of rules and post-prune each rule independently (C4.5 Decision Tree Learner) 1 Note: The test set still remains separate 2 Like we discussed in the case of Convolutional Neural Networks 3 Prefer the shortest hypothesis that fjts the data October 20, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 / 25

size misclassifications val tree . stop growing when data split not statistically signifjcant . . . . . . . . Regularization in Decision Tree Learning Premise: Split data into train and validation set 1 1 grow full tree, then post-prune tree 2 . Minimum Description Length (MDL): minimize size tree Achieved as follows: Do until further pruning is harmful (1) Evaluate impact on validation set of pruning each possible node (plus those below it) (2) Greedily remove the one that most improves validation set accuracy 3 convert tree into a set of rules and post-prune each rule independently (C4.5 Decision Tree Learner) 1 Note: The test set still remains separate 2 Like we discussed in the case of Convolutional Neural Networks 3 Prefer the shortest hypothesis that fjts the data October 20, 2016 . . . . . . . . . . . . . . . . . 9 / 25 . . . . . . . . . . . . . Structural Regularization 2 based on Occam’s razor 3 ⋆ Use parametric/non-parametric hypothesis tests

. 1 . . . . . . . . . Regularization in Decision Tree Learning Premise: Split data into train and validation set 1 stop growing when data split not statistically signifjcant . 2 grow full tree, then post-prune tree (1) Evaluate impact on validation set of pruning each possible node (plus those below it) (2) Greedily remove the one that most improves validation set accuracy 3 convert tree into a set of rules and post-prune each rule independently (C4.5 Decision Tree Learner) 1 Note: The test set still remains separate 2 Like we discussed in the case of Convolutional Neural Networks 3 Prefer the shortest hypothesis that fjts the data October 20, 2016 . . . . . . . . . . . . . . . . . 9 / 25 . . . . . . . . . . . . Structural Regularization 2 based on Occam’s razor 3 ⋆ Use parametric/non-parametric hypothesis tests ⋆ Minimum Description Length (MDL): minimize size ( tree ) + size ( misclassifications val ( tree )) ⋆ Achieved as follows: Do until further pruning is harmful

Lecture 24: Other (Non-linear) Classifjers: Decision Tree Learning, - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . . Lecture 24: Other (Non-linear) Classifjers: Decision Tree Learning, Boosting, and Support Vector Classifjcation Instructor: Prof. Ganesh Ramakrishnan October 20, 2016 . . . . . . . . .

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Splines linear non-linearity September 9, 2019 Splines linear non-linearity September 9,

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Basic Blocks and Traces Lecture 8 Canonical Trees signature CANON = sig val linearize :

Decision trees Subhransu Maji CMPSCI 670: Computer Vision November 1, 2016 Recall: Steps

Decision Trees with Numeric Tests Industrial-strength algorithms For an algorithm to be useful

Continuous Improvement Toolkit Decision Tree Continuous Improvement Toolkit . www.citoolkit.com

Decision Tree and Random Forest Implementations for fast Fitlering of Sensor Data Sebastian

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

FARs Related to Rating Sec. 25.815 Width of aisle. The passenger aisle width at any point between

Agenda ABOUT US WHAT IS COGNOSOS? PROJECT SCOPE OVERVIEW OF PORT AUTHORITIES COMPETITVE

Benin Kingdom YEAR FIVE Autumn 1 LESSON FIVE WHAT WAS THE TRANSATLANTIC SLAVE TRADE?