Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud - PowerPoint PPT Presentation

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) J-M. Poggi Influence measures for CART

Introduction Influence measures for CART CART Exploring the Paris Tax Revenues dataset CART Classification And Regression Trees, Breiman et al. (1984) ▶ Learning set L = { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } , n i.i.d. observations of a random vector ( X , Y ) ▶ Vector X = ( X 1 , ..., X p ) of explanatory variables, X ∈ ℝ p , and Y ∈ 풴 where 풴 is either a class label or a numerical response ▶ For classification problems, a classifier t is a mapping t : ℝ p → 풴 and the Bayes classifier is to estimate ▶ For regression problems, we suppose that Y = f ( X ) + 휀 and f is the regression function to estimate J-M. Poggi Influence measures for CART

Introduction Influence measures for CART CART Exploring the Paris Tax Revenues dataset CART tree CART tree as a piecewise constant function J-M. Poggi Influence measures for CART

Introduction Influence measures for CART CART Exploring the Paris Tax Revenues dataset Growing step, stopping rule: ▶ recursive partitioning by maximizing local decreasing heterogeneity ▶ do not split a pure node or a node containing a few data Pruning step: ▶ the maximal tree overfits the data ▶ an optimal tree is pruned subtree by penalizing the prediction error by the model complexity Penalized criterion f ∣ T , ℒ n ) + 훼 ∣ ˜ T ∣ crit 훼 ( T ) = R n ( f , ˆ n R n ( f , ˆ f ∣ T , ℒ n ) the error term (MSE for regression or misclassification rate) ∣ ˜ T ∣ the number of leaves of T J-M. Poggi Influence measures for CART

Introduction Influence measures for CART CART Exploring the Paris Tax Revenues dataset CART Classification And Regression Trees, Breiman et al. (1984) ▶ nonparametric model + data partitioning ▶ numerical + categorical predictors ▶ easy to interpret models ▶ non linear modelling ▶ base rule for: bagging, boosting, random forests ▶ single framework for: regression, binary or multiclass classification ▶ see Zhang, Singer (2010) and Hastie, Tibshirani, Friedman (2009) ▶ In the sequel, CART trees obtained using ▶ R package rpart ▶ the default parameters (Gini heterogeneity function to grow the maximal tree and pruning with 10-fold CV) J-M. Poggi Influence measures for CART

Introduction Influence measures for CART CART Exploring the Paris Tax Revenues dataset CART and stability ▶ CART instability ▶ Cheze, Poggi (2006) outiliers using boosting ▶ Briand et al. (2009) sensitivity using a similarity measure between trees ▶ Bousquet, Elisseeff (2002) stability through jackknife ▶ Classically, robustness deals with model stability, considered globally ▶ Focus on individual observations diagnosis issues rather than model properties or variable selection problems ▶ We use decision trees to perform diagnosis on observations ▶ We use influence function, a classical diagnostic method to measure the perturbation induced by a single observation: stability issue through jackknife J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence Influence measures for CART ▶ Quantifying the differences between ▶ reference tree T obtained from the complete sample ℒ n ▶ jackknife trees ( T ( − i ) ) 1 ⩽ i ⩽ n obtained from ( ℒ n ∖ { ( X i , Y i ) } ) 1 ⩽ i ⩽ n Three kinds of IF for CART ▶ we derive three kinds of IF based on jackknife trees ▶ influence on predictions focusing on predictive performance ▶ influence on partitions highlighting the tree structure following a classical distinction, see Miglio and Soffritti (2004) + ▶ CART specific influence derived from the pruned sequences of trees J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence Influence on predictions I 1 and I 2 are based only on the predictions Definition I 1 and I 2 ▶ I 1 , closely related to the resubstitution estimate of the prediction error, evaluates the impact of a single change on all the predictions ∑ n I 1 ( x i ) = 1 l T ( x k ) ∕ = T ( − i ) ( x k ) k = 1 ▶ I 2 , closely related to the leave-one-out estimate of the prediction error I 2 ( x i ) = 1 l T ( x i ) ∕ = T ( − i ) ( x i ) J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence Influence on predictions I 3 is based on the distribution of the labels in each leaf Definition I 3 ▶ I 3 measures the distance between the distribution of the label in the nodes where x i falls ( ) I 3 ( x i ) = d p x i , T , p x i , T ( − i ) where d is the total variation distance ∑ J A ⊂{ 1 ; ... ; J } ∣ p ( A ) − q ( A ) ∣ = 2 − 1 d ( p , q ) = max ∣ p ( j ) − q ( j ) ∣ j = 1 J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence Influence on partitions Definition ▶ I 4 measures the variations on the number of clusters in each partition I 4 ( x i ) = ∣ T ( − i ) ∣ − ∣ T ∣ ▶ I 5 is based on the dissimilarity difference between the two partitions ( T ( − i ) ) ˜ T , ˜ I 5 ( x i ) = 1 − J ( T ( − i ) ) T , ˜ ˜ where J is the Jaccard dissimilarity between the partitions of T ( − i ) and ˜ ℒ defined by ˜ T (the sets of the leaves of the trees) ▶ Jaccard coefficient J ( C 1 , C 2 ) = a a + b + c a = number of pairwise points of ℒ in the same cluster in both partitions C 1 and C 2 b (resp. c )= number of pairwise points in the same cluster in C 1 , but not in C 2 (resp. in C 2 , but not in C 1 ) J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence CART specific influence Focus on the cp complexity cost constant ▶ consider the N cp ⩽ K T + ∑ 1 ⩽ i ⩽ n K T ( − i ) distinct values { cp 1 ; . . . ; cp N cp } where K T is the length of the sequence leading to tree T ▶ usually N cp << K T + ∑ 1 ⩽ i ⩽ n K T ( − i ) , since the jackknife sequences are the same for many observations Definition I 6 ▶ I 6 is the number of complexities for which these predicted labels differ N cp ∑ I 6 ( x i ) = 1 l T cpj ( x i ) ∕ = T ( − i ) ( x i ) cpj j = 1 1 ( x i ) indicates if the reference and jackknife subtrees l T cpj ( x i ) ∕ = T ( − i ) cpj corresponding to the same complexity cp j provide different predicted labels for x i J-M. Poggi Influence measures for CART

Presentation Introduction Influence on predictions Influence measures for CART Influence on partitions Exploring the Paris Tax Revenues dataset CART specific notion of influence CART tree: pruning sequence Penalized criterion f ∣ T , ℒ n ) + 훼 ∣ ˜ T ∣ crit 훼 ( T ) = R n ( f , ˆ n R n ( f , ˆ f ∣ T , ℒ n ) the error term and ∣ ˜ T ∣ the number of leaves Pruning procedure: how to find T 훼 minimizing crit 훼 ( T ) for any given 훼 ▶ a finite decreasing (nested) sequence of subtrees pruned from T max T K = { t 1 } ≺ T K − 1 ≺ ... ≺ T 1 corresponding to critical complexities 0 = 훼 1 < 훼 2 < ... < 훼 K − 1 < 훼 K such that if 훼 k ≤ 훽 < 훼 k + 1 then T 훽 = T 훼 k = T k ▶ Remark: this sequence is a subsequence of the best trees of m leaves J-M. Poggi Influence measures for CART

Introduction Presentation Influence measures for CART Classification problem Exploring the Paris Tax Revenues dataset Influential cities PATARE dataset ▶ Variables = characteristics of ▶ Tax revenues of households in the distribution of the tax 2007 from the 143 cities revenues per city surrounding Paris ▶ For each city: ▶ Cities are grouped into four ▶ first and 9th deciles (D1, D9) counties (“d´ epartement” in ▶ quartiles (Q1, Q2 and Q3) french) ▶ mean, and % of the tax ▶ Paris: 20 ”arrondissements” revenues coming from the (districts) salaries and treatments ▶ Seine-Saint-Denis (north of (PtSal) Paris): 40 cities ▶ Hauts-de-Seine (west of Paris): 36 cities ● ▶ Val-de-Marne (south of ● Paris): 48 cities ● ● ● ● ● ● ● ● ▶ Data freely available on http://www.data-publica. ● com/data ● J-M. Poggi Influence measures for CART

Introduction Presentation Influence measures for CART Classification problem Exploring the Paris Tax Revenues dataset Influential cities PATARE dataset: the classification problem ▶ supervised classification problem (quaternary explained variable): to predict the county of the city with the characteristics of the tax revenues distribution ● ● ▶ it cannot be easily retrieved from the explanatory variables ● ● ● considered without the county ● ● ● ● information ● poor recovery of counties through clusters: map of the ● cities drawn according to a k -means ( k =4) clustering ● superimposed with the borders of the counties J-M. Poggi Influence measures for CART

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud - PowerPoint PPT Presentation

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) J-M. Poggi Influence measures for CART Introduction Influence measures for CART CART

CART Workgroup Update Presented by Jonathan Chin Introduction CART Fact of the Day: The

COUNTY ANIMAL RESPONSE TEAMS (CART) Amy Wheeler - Oneida County CART Senior Telecommunicator,

CARE Advisory Research & Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Comparative Study of C5.0 and CART algorithms Presenter: Alvin Nguyen Presentation Framework 1.

Training Presentation Submitting a Requisition The training for submitting a requisition begins

NEW PRODUCT LAUNCH: MC300 MC CART Part Number: MC300 FASTER Rough-in an entire suite using

Town Halls - Proposed Golf Cart Path Project December 2017 & January 2018 1 Agenda

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Jet Impinging on a Cart Andrew Ning September 12, 2016 1 Case 1: Cart fixed We will select a

INFLUENCE OF LEAD ON ORGANO - INFLUENCE OF LEAD ON ORGANO- - INFLUENCE OF LEAD ON ORGANO

Social influence Conformity Informational influence Influence that produces conformity when a

Chimeric Antigen Receptor (CAR)-T cells David Lebwohl, MD, Sr. VP and Global Program Head, CART

Crash Cart therapy for Severe Jaundice Dr Sandeep Kadam Neonatologist Pune Objectives

Training Presentation Creating a Shopping Cart The Home/Shop page is the starting point for

3dcart Shopping Cart Software CODiE Award Nominees - Product Presentation v3.1 Who is 3DCart?

Checkout Usability Guidelines for reducing cart abandonment Rachele DiTullio - User Experience

The project Automation of reverse parking using rules and logic, regardless of the size of the

Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao

DETECTION OF HOUSING AND AGRICULTURE AREAS ON DRY-RIVERBEDS FOR THE EVALUATION OF RISK BY

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Results for the mass di ff erence between the long- and short-lived K mesons for physical quark

The Roper resonance from spatially large interpolation fields The QCD Collaboration: Mingyang

ECS 256 Group Project Saheel Godhane Paari Kandappan Jack Norman Ivana Zetko UC Davis

Sta$s$cal model training DTW, EM, and HMM training DTW:

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud - PowerPoint PPT Presentation

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) J-M. Poggi Influence measures for CART Introduction Influence measures for CART CART

CART Workgroup Update Presented by Jonathan Chin Introduction CART Fact of the Day: The

COUNTY ANIMAL RESPONSE TEAMS (CART) Amy Wheeler - Oneida County CART Senior Telecommunicator,

CARE Advisory Research &amp; Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Comparative Study of C5.0 and CART algorithms Presenter: Alvin Nguyen Presentation Framework 1.

Training Presentation Submitting a Requisition The training for submitting a requisition begins

NEW PRODUCT LAUNCH: MC300 MC CART Part Number: MC300 FASTER Rough-in an entire suite using

Town Halls - Proposed Golf Cart Path Project December 2017 &amp; January 2018 1 Agenda

Preliminary Match-up of AIRS to ARM CART Soundings and AVN Grids Eric Fetzer AIRS Science Team

Jet Impinging on a Cart Andrew Ning September 12, 2016 1 Case 1: Cart fixed We will select a

INFLUENCE OF LEAD ON ORGANO - INFLUENCE OF LEAD ON ORGANO- - INFLUENCE OF LEAD ON ORGANO

Social influence Conformity Informational influence Influence that produces conformity when a

Chimeric Antigen Receptor (CAR)-T cells David Lebwohl, MD, Sr. VP and Global Program Head, CART

Crash Cart therapy for Severe Jaundice Dr Sandeep Kadam Neonatologist Pune Objectives

Training Presentation Creating a Shopping Cart The Home/Shop page is the starting point for

3dcart Shopping Cart Software CODiE Award Nominees - Product Presentation v3.1 Who is 3DCart?

Checkout Usability Guidelines for reducing cart abandonment Rachele DiTullio - User Experience

The project Automation of reverse parking using rules and logic, regardless of the size of the

Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao

DETECTION OF HOUSING AND AGRICULTURE AREAS ON DRY-RIVERBEDS FOR THE EVALUATION OF RISK BY

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

Results for the mass di ff erence between the long- and short-lived K mesons for physical quark

The Roper resonance from spatially large interpolation fields The QCD Collaboration: Mingyang

ECS 256 Group Project Saheel Godhane Paari Kandappan Jack Norman Ivana Zetko UC Davis

Sta$s$cal model training DTW, EM, and HMM training DTW:

CARE Advisory Research & Training Ltd. (CART) A-1102/1103, 11th Floor, Kanakia Wall Street,

Town Halls - Proposed Golf Cart Path Project December 2017 & January 2018 1 Agenda