Introduction to Machine Learning CART: Splitting Criteria - PowerPoint PPT Presentation

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml

TREES Classification Tree: Regression Tree: Iris Data 2.5 -1.20 -0.42 0.98 -0.20 -0.01 2.0 Species Petal.Width 1.5 ● setosa versicolor 1.0 virginica ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2 4 6 Petal.Length � c Introduction to Machine Learning – 1 / 12

SPLITTING CRITERIA How to find good splitting rules to define the tree? = ⇒ empirical risk minimization � c Introduction to Machine Learning – 2 / 12

SPLITTING CRITERIA: FORMALIZATION Let N ⊆ D be the data that is assigned to a terminal node N of a tree. Let c be the predicted constant value for the data assigned to N : ˆ y ≡ c for all ( x , y ) ∈ N . Then the risk R ( N ) for a leaf is simply the average loss for the data assigned to that leaf under a given loss function L : 1 � R ( N ) = L ( y , c ) |N| ( x , y ) ∈N The prediction is given by the optimal constant c = arg min c R ( N ) � c Introduction to Machine Learning – 3 / 12

SPLITTING CRITERIA: FORMALIZATION A split w.r.t. feature x j at split point t divides a parent node N into N 1 = { ( x , y ) ∈ N : x j ≤ t } and N 2 = { ( x , y ) ∈ N : x j > t } . In order to evaluate how good a split is, we compute the empirical risks in both child nodes and sum it up R ( N , j , t ) = |N 1 | |N| R ( N 1 ) + |N 2 | |N| R ( N 2 )   1 � � = L ( y , c 1 ) + L ( y , c 2 )   |N| ( x , y ) ∈N 1 ( x , y ) ∈N 2 finding the best way to split N into N 1 , N 2 means solving R ( N , j , t ) arg min j , t � c Introduction to Machine Learning – 4 / 12

SPLITTING CRITERIA: REGRESSION For regression trees, we usually use L 2 loss: 1 � ( y − c ) 2 R ( N ) = |N| ( x , y ) ∈N The best constant prediction under L 2 is the mean 1 � c = ¯ y N = y |N| ( x , y ) ∈N � c Introduction to Machine Learning – 5 / 12

SPLITTING CRITERIA: REGRESSION This means the best split is the one that minimizes the (pooled) variance of the target distribution in the child nodes N 1 and N 2 : We can also interpret this as a way of measuring the impurity of the target distribution, i.e., how much it diverges from a constant in each of the child nodes. For L 1 loss, c is the median of y ∈ N . � c Introduction to Machine Learning – 6 / 12

SPLITTING CRITERIA: CLASSIFICATION Typically uses either Brier score (so: L 2 loss on probabilities) or Bernoulli loss (as in logistic regression) as loss functions Predicted probabilities in node N are simply the class proportions in the node: 1 π ( N ) � ˆ = I ( y = k ) k |N| ( x , y ) ∈N This is the optimal prediction under both the logistic / Bernoulli loss and the Brier loss. 0.6 Class prob. 0.4 0.2 0.0 1 2 3 Label � c Introduction to Machine Learning – 7 / 12

SPLITTING CRITERIA: COMMENTS Splitting criteria for trees are usually defined in terms of "impurity reduction". Instead of minimizing empirical risk in the child nodes over all possible splits, a measure of “impurity” of the distribution of the target y in the child nodes is minimized. For regression trees, the “impurity” of a node is usuallly defined as the variance of the y ( i ) in the node. Minimizing this “variance impurity” is equivalent to minimizing the squared error loss for a predicted constant in the nodes. � c Introduction to Machine Learning – 8 / 12

SPLITTING CRITERIA: COMMENTS Minimizing the Brier score is equivalent to minimizing the Gini impurity g π ( N ) π ( N ) � I ( N ) = ( 1 − ˆ ˆ ) k k k = 1 Minimizing the Bernoulli loss is equivalent to minimizing entropy impurity g π ( N ) π ( N ) � I ( N ) = − ˆ log ˆ k k k = 1 The approach based on loss functions instead of impurity measures is simpler and more straightforward, mathematically equivalent and shows that growing a tree can be understood in terms of empirical risk minimization. � c Introduction to Machine Learning – 9 / 12

SPLITTING WITH MISCLASSIFICATION LOSS Why don’t we use the misclassification loss for classification trees? I.e., always predict the majority class in each child node and count how many errors we make. In many other cases, we are interested in minimizing this kind of error, but have to approximate it by some other criterion instead since the misclassification loss does not have derivatives that we can use for optimization. We don’t need derivatives when we optimize the tree, so we could go for it! This is possible, but Brier score and Bernoulli loss are more sensitive to changes in the node probabilities, and therefore often preferred � c Introduction to Machine Learning – 10 / 12

SPLITTING WITH MISCLASSIFICATION LOSS Example: two-class problem with 400 obs in each class and two possible splits: Split 1: Split 2: class 0 class 1 class 0 class 1 N 1 N 1 300 100 400 200 N 2 N 2 100 300 0 200 Both splits are equivalent in terms of misclassification error, they each misclassify 200 observations. But: Split 2 produces one pure node and is probably preferable. Brier loss (Gini impurity) and Bernoulli loss (entropy impurity) prefer the second split � c Introduction to Machine Learning – 11 / 12

SPLITTING WITH MISCLASSIFICATION LOSS Calculation for Gini: Split 1 : |N 1 | + |N 2 | π ( N 1 ) π ( N 1 ) π ( N 2 ) π ( N 2 ) |N| · 2 · ˆ ˆ |N| · 2 · ˆ ˆ = 0 1 0 1 4 · 1 3 1 4 · 3 3 4 + 4 = 16 4 · 2 · 2 3 3 · 1 1 1 4 · 0 · 1 = Split 2 : 3 + 3 � c Introduction to Machine Learning – 12 / 12

Introduction to Machine Learning CART: Splitting Criteria - PowerPoint PPT Presentation

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml TREES Classification Tree: Regression Tree: Iris Data 2.5 -1.20 -0.42 0.98 -0.20 -0.01 2.0 Species Petal.Width 1.5 setosa versicolor

CART Workgroup Update Presented by Jonathan Chin Introduction CART Fact of the Day: The

COUNTY ANIMAL RESPONSE TEAMS (CART) Amy Wheeler - Oneida County CART Senior Telecommunicator,

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work

Introduction to Machine Learning CART: Stopping Criteria & Pruning

CARE Advisory Research & Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Town Halls - Proposed Golf Cart Path Project December 2017 & January 2018 1 Agenda

Comparative Study of C5.0 and CART algorithms Presenter: Alvin Nguyen Presentation Framework 1.

Training Presentation Submitting a Requisition The training for submitting a requisition begins

NEW PRODUCT LAUNCH: MC300 MC CART Part Number: MC300 FASTER Rough-in an entire suite using

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Jet Impinging on a Cart Andrew Ning September 12, 2016 1 Case 1: Cart fixed We will select a

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym

Risk bounds for cl classification and re regre ression rules that interpolate Daniel Hsu

Recent Results on Algorithmic Fairness and Meta-Learning Massimiliano Pontil Computational

The Failure of a Clearinghouse: Empirical Evidence Vincent Bignon Guillaume Vuillemey Banque de

Introduction to Machine Learning CART: Splitting Criteria - PowerPoint PPT Presentation

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml TREES Classification Tree: Regression Tree: Iris Data 2.5 -1.20 -0.42 0.98 -0.20 -0.01 2.0 Species Petal.Width 1.5 setosa versicolor

CART Workgroup Update Presented by Jonathan Chin Introduction CART Fact of the Day: The

COUNTY ANIMAL RESPONSE TEAMS (CART) Amy Wheeler - Oneida County CART Senior Telecommunicator,

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud &amp; Paris Descartes Joint work

Introduction to Machine Learning CART: Stopping Criteria &amp; Pruning

CARE Advisory Research &amp; Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Town Halls - Proposed Golf Cart Path Project December 2017 &amp; January 2018 1 Agenda

Comparative Study of C5.0 and CART algorithms Presenter: Alvin Nguyen Presentation Framework 1.

Training Presentation Submitting a Requisition The training for submitting a requisition begins

NEW PRODUCT LAUNCH: MC300 MC CART Part Number: MC300 FASTER Rough-in an entire suite using

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Jet Impinging on a Cart Andrew Ning September 12, 2016 1 Case 1: Cart fixed We will select a

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

9.54 class 8 Supervised learning Optimization, regularization, kernels Shimon Ullman + Tomaso

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym

Risk bounds for cl classification and re regre ression rules that interpolate Daniel Hsu

Recent Results on Algorithmic Fairness and Meta-Learning Massimiliano Pontil Computational

The Failure of a Clearinghouse: Empirical Evidence Vincent Bignon Guillaume Vuillemey Banque de

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work

Introduction to Machine Learning CART: Stopping Criteria & Pruning

CARE Advisory Research & Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Town Halls - Proposed Golf Cart Path Project December 2017 & January 2018 1 Agenda