Stability Assessment of Tree Ensembles and Psychotrees Using the - PowerPoint PPT Presentation

Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart Schneider 12 Achim Zeileis 3 Carolin Strobl 2 Ludwig Maximilian University of Munich1 University of Zurich2 University of Innsbruck3 28.02.2020 1 Philipp, Zeileis, and Strobl (2016) and Philipp et al. (2018)

Decision Trees stablelearner stablelearner and Tree Ensembles stablelearner and psychotree s

Decision Trees

Classification, Regression and Model-Based Trees Decision trees are supervised learners that predict the value of a target variable based on several input variables: 1 simulated_gene_1 p = 0.015 ≤ 8.592 > 8.592 2 gene_3207 p = 0.028 ≤ 8.453 > 8.453 Node 3 (n = 7) Node 4 (n = 21) Node 5 (n = 33) 1 1 1 Bipolar disorder Bipolar disorder Bipolar disorder 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 Healthy control Healthy control Healthy control 0.2 0.2 0.2 0 0 0 In R , e.g., party or partykit (Hothorn, Hornik, and Zeileis 2006; Zeileis, Hothorn, and Hornik 2008)

Classification, Regression and Model-Based Trees ◮ Easy to understand and interpret ◮ Handles both numerical and categorical data ◮ But: A single tree can be very non-robust

Classification, Regression and Model-Based Trees 1 gender p = 0.01 Female Male Node 2 (n = 25) Node 3 (n = 36) 1 1 Bipolar disorder Bipolar disorder 0.8 0.8 0.6 0.6 0.4 0.4 Healthy control Healthy control 0.2 0.2 0 0

stablelearner

stablelearner stablelearner (Philipp, Zeileis, and Strobl 2016; Philipp et al. 2018): ◮ A toolkit of descriptive measures and graphical illustrations based on resampling and refitting ◮ Can be used to assess the stability of the variable and cutpoint selection in recursive partitioning

stablelearner - How does it work? Single Tree Tree Ensemble 1. Original Tree 2. Resampling & Refitting 3. Aggregating & Visualizing

stablelearner library ("partykit") library ("stablelearner") data ("Bipolar2009", package = "stablelearner") Bipolar2009 $ simulated_gene_2 <- cut (Bipolar2009 $ simulated_gene_2, breaks = 3, ordered_result = TRUE) str (Bipolar2009, list.len = 6) ## 'data.frame': 61 obs. of 106 variables: ## $ age : int 41 51 29 45 45 29 33 56 48 42 ... ## $ brain_pH : num 6.6 6.67 6.7 6.03 6.35 6.39 6.51 6.07 6.5 6.65 ... ## $ status : Factor w/ 2 levels "Bipolar disorder",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 1 2 1 2 ... ## $ gene_921 : num 8.33 7.99 8.01 7.83 8.51 ... ## $ gene_4211 : num 6.25 7.02 6.54 6.14 6.65 ... ## [list output truncated] ct <- ctree (status ~ ., data = Bipolar2009) ct_stable <- stabletree (ct)

stablelearner - summary summary (ct_stable) ## ## Call: ## partykit::ctree(formula = status ~ ., data = Bipolar2009) ## ## Sampler: ## B = 500 ## Method = Bootstrap sampling with 100.0% data ## ## Variable selection overview: ## ## freq * mean * ## simulated_gene_1 0.514 1 0.514 1 ## simulated_gene_2 0.178 0 0.178 0 ## gene_4318 0.128 0 0.128 0 ## gene_3069 0.104 0 0.104 0 ## gene_3207 0.094 1 0.094 1 ## gene_31 0.062 0 0.062 0 ## gene_1440 0.060 0 0.060 0 ## gene_6935 0.046 0 0.048 0 ## gene_9850 0.046 0 0.046 0 ...

stablelearner - barplot Variable selection frequencies 100 80 Relative frequency (in %) 60 40 20 0 1 2 8 9 7 1 0 5 0 8 9 0 0 6 7 2 H 9 7 6 6 4 1 r 6 0 9 3 7 4 5 2 8 1 4 7 8 6 8 2 5 0 e 1 4 7 9 3 8 0 7 1 7 6 8 4 1 3 3 6 1 8 4 2 8 8 0 9 7 6 2 0 6 8 6 7 1 5 7 6 5 7 6 4 5 6 2 2 6 9 4 6 5 1 2 6 7 3 3 0 9 6 4 9 2 e _ e _ 3 1 0 6 2 0 _ 3 4 4 9 3 5 8 6 7 2 0 6 9 0 6 1 7 8 3 4 6 _ p 6 6 8 0 7 4 1 0 8 1 0 8 d e 2 3 6 9 8 4 4 5 6 0 3 0 5 4 0 7 8 5 4 9 6 6 8 7 2 0 0 6 8 8 1 2 5 9 1 8 a g 9 2 4 3 8 5 1 7 5 3 4 2 8 2 7 1 3 8 6 2 0 5 9 1 1 1 7 6 3 8 1 9 0 2 2 1 3 7 1 8 9 6 9 0 8 3 3 9 4 0 6 2 5 8 4 4 3 3 6 7 6 7 6 9 8 0 0 6 0 8 7 0 9 8 3 3 0 9 2 2 0 8 6 9 8 1 5 6 1 9 2 5 3 3 1 6 9 5 8 9 7 4 6 2 9 0 7 9 1 0 4 1 4 3 6 5 6 7 7 6 2 8 4 2 e n e n _ 4 _ 3 _ 3 n e _ 1 6 _ _ 9 1 7 2 1 _ 6 _ 2 _ 1 1 5 _ 8 n 1 3 _ 2 _ 4 _ 6 5 1 1 1 e n _ 5 _ e _ 9 1 9 _ 9 _ 9 2 1 1 7 1 2 _ 5 1 1 1 0 1 9 _ 5 _ 6 2 2 1 1 2 1 _ e _ 6 1 4 _ 7 1 3 1 7 _ 8 8 _ 1 7 2 _ 2 1 1 7 1 5 1 6 5 1 _ 7 _ 7 _ 4 1 9 1 1 1 0 0 1 _ 2 2 _ _ 9 7 _ 1 2 2 1 _ 9 e _ _ 5 1 4 _ 4 _ 4 1 1 1 8 1 1 _ 9 _ 9 7 _ _ 4 _ 2 _ 4 e _ 1 1 _ 1 1 8 1 9 5 1 _ 4 2 1 2 1 _ 4 1 9 1 8 5 _ 1 0 9 _ 1 1 8 _ 1 9 1 5 g _ _ g n e e n n e e g n e e n n e e _ e _ n e e n n e e _ n e r i a e _ n e n e n e _ e e _ g n e e n n e e _ n e n e e _ e _ _ e n e e _ e _ e _ n e n e e _ e _ e _ e n n e e _ e n e _ _ e n e e n e _ n e e _ e _ e _ e _ _ e n e e n n e _ e e _ e _ e _ n e e n n e e n e _ e _ n e e n e n e _ n e n e _ e e _ _ e n e n e n e n e n e n e e n e _ n e e _ e _ _ e n e e _ e _ e n e _ e _ n e e _ e n e _ e n e _ e _ d d e e e e e e n n e e e n e b n e e e n n e g e n e e n n n e n n n e e n n n g e n e n n e e n e n n n n n e e e n n n n e e e e n n e g e n e e n n n e e e e e e g n e n n n e n n e n n e n e n e n n a e t e t a g g g g g g g e g e g g g g e g e g g g g g e g e g g g e g g g e g e g e g g e g e g e g g g e g e g e g g e g g e g e g g g e g e g g e g e g e g e g g g g e g e g e e g g g g g e g g e g g g e g g g e e g g e g g g g g g e g g g e g e g e g g e g e g g e g e g g e g e g g e g g e u l l u i m i m s s

stablelearner - plot simulated_gene_ 1 1 500 400 300 Counts 200 100 0 8 9 10 11 12 13

stablelearner - plot f ( x ) f ( x ) 1 1 0 0 x x 0 0 . 5 1 0 0 . 5 1 2 1 1 500 500 400 400 300 300 Counts Counts 200 200 100 100 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

stablelearner and Tree Ensembles

What About Tree Ensembles e.g., random forests? Single Tree Tree Ensemble 1. Original Tree Base Learner 2. Resampling & Refitting Resampling & Refitting 3. Aggregating & Visualizing Aggregating & Visualizing Two possibilities: 1. Fit a random forest in stablelearner using, e.g., ctree s as a base learner 2. Fit a random forest using the randomForest function of the randomForest package (Liaw and Wiener 2002), or the cforest function (of the party or partykit package) and coerce the forest to a stabletree object using the as.stabletree function

Random Forests in stablelearner Possibility 1: Use an appropriately specified ctree as a base learner and mimic a cforest of the partykit package: ct_base <- ctree (status ~ ., data = Bipolar2009, control = ctree_control (mtry = 11, teststat = "quadratic", testtype = "Univariate", mincriterion = 0, saveinfo = FALSE)) cf_stable <- stabletree (ct_base, sampler = subsampling, savetrees = TRUE, B = 500, v = 0.632) Note that this allows for custom builds, e.g., with respect to the resampling method ( bootstrap , subsampling , samplesplitting , jackknife , splithalf or own sampling functions).

Random Forests in stablelearner summary (cf_stable, original = FALSE) ## ## Call: ## ctree(formula = status ~ ., data = Bipolar2009, control = ctree_control(mtry = 11, ## teststat = "quadratic", testtype = "Univariate", mincriterion = 0, ## saveinfo = FALSE)) ## ## Sampler: ## B = 500 ## Method = Subsampling with 63.2% data ## ## Variable selection overview: ## ## freq mean ## simulated_gene_1 0.152 0.152 ## simulated_gene_2 0.132 0.134 ## gene_4318 0.118 0.118 ## gene_3069 0.098 0.098 ## gene_2807 0.072 0.072 ## gene_1440 0.068 0.068 ## gene_12029 0.052 0.052 ...

Stability Assessment of Tree Ensembles and Psychotrees Using the - PowerPoint PPT Presentation

Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart Schneider 12 Achim Zeileis 3 Carolin Strobl 2 Ludwig Maximilian University of Munich1 University of Zurich2 University of Innsbruck3 28.02.2020 1

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Session 12 Tree-based models: tree and rpart Two libraries The tree library is like the

Education Endowment (TREE) Fund TREE Fund is a 501(c)3 nonprofit organization that supports

Services Using E-Tree Service Type Ethernet Private Tree (EP-Tree) and Ethernet Virtual Private

Basic Blocks and Traces Lecture 8 Canonical Trees signature CANON = sig val linearize :

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Another tree example Phylogenetic tree Patient 1 Plan Clone Phylogeny B C RFTA16 Om1

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

Mental Health Courts: Solving Criminal Justice Problems or Perpetuating Criminal Justice

Talking about complex substance use, chem sex, and serious mental health problems

Developing a Hospital-Based Performance Improvement Project to Reduce 30-Day Psychiatric

Maryland Department of Health Population Health Summit: Innovation Under the Maryland Model

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo Quaresma Outline Introduction

Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani Martin C. Herbordt Computer

Constitutions and Government Responses to Financial Crises Ragnhildur Helgadttir There are

Introduction to Information Retrieval http://informationretrieval.org IIR 13: Text Classification

Stability Assessment of Tree Ensembles and Psychotrees Using the - PowerPoint PPT Presentation

Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart Schneider 12 Achim Zeileis 3 Carolin Strobl 2 Ludwig Maximilian University of Munich1 University of Zurich2 University of Innsbruck3 28.02.2020 1

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira &amp; Lus Torgo Ensembles for Time

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Session 12 Tree-based models: tree and rpart Two libraries The tree library is like the

Education Endowment (TREE) Fund TREE Fund is a 501(c)3 nonprofit organization that supports

Services Using E-Tree Service Type Ethernet Private Tree (EP-Tree) and Ethernet Virtual Private

Basic Blocks and Traces Lecture 8 Canonical Trees signature CANON = sig val linearize :

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Another tree example Phylogenetic tree Patient 1 Plan Clone Phylogeny B C RFTA16 Om1

CS 764: Topics in Database Management Systems Lecture 9: B-tree Locking Xiangyao Yu 10/5/2020 1

Mental Health Courts: Solving Criminal Justice Problems or Perpetuating Criminal Justice

Talking about complex substance use, chem sex, and serious mental health problems

Developing a Hospital-Based Performance Improvement Project to Reduce 30-Day Psychiatric

Maryland Department of Health Population Health Summit: Innovation Under the Maryland Model

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo Quaresma Outline Introduction

Fast Binding Site Mapping using GPUs and CUDA Bharat Sukhwani Martin C. Herbordt Computer

Constitutions and Government Responses to Financial Crises Ragnhildur Helgadttir There are

Introduction to Information Retrieval http://informationretrieval.org IIR 13: Text Classification

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time