stability assessment of tree ensembles and psychotrees
play

Stability Assessment of Tree Ensembles and Psychotrees Using the - PowerPoint PPT Presentation

Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart Schneider 12 Achim Zeileis 3 Carolin Strobl 2 Ludwig Maximilian University of Munich1 University of Zurich2 University of Innsbruck3 28.02.2020 1


  1. Stability Assessment of Tree Ensembles and Psychotrees Using the stablelearner 1 package Lennart Schneider 12 Achim Zeileis 3 Carolin Strobl 2 Ludwig Maximilian University of Munich1 University of Zurich2 University of Innsbruck3 28.02.2020 1 Philipp, Zeileis, and Strobl (2016) and Philipp et al. (2018)

  2. Decision Trees stablelearner stablelearner and Tree Ensembles stablelearner and psychotree s

  3. Decision Trees

  4. Classification, Regression and Model-Based Trees Decision trees are supervised learners that predict the value of a target variable based on several input variables: 1 simulated_gene_1 p = 0.015 ≤ 8.592 > 8.592 2 gene_3207 p = 0.028 ≤ 8.453 > 8.453 Node 3 (n = 7) Node 4 (n = 21) Node 5 (n = 33) 1 1 1 Bipolar disorder Bipolar disorder Bipolar disorder 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 Healthy control Healthy control Healthy control 0.2 0.2 0.2 0 0 0 In R , e.g., party or partykit (Hothorn, Hornik, and Zeileis 2006; Zeileis, Hothorn, and Hornik 2008)

  5. Classification, Regression and Model-Based Trees ◮ Easy to understand and interpret ◮ Handles both numerical and categorical data ◮ But: A single tree can be very non-robust

  6. Classification, Regression and Model-Based Trees 1 gender p = 0.01 Female Male Node 2 (n = 25) Node 3 (n = 36) 1 1 Bipolar disorder Bipolar disorder 0.8 0.8 0.6 0.6 0.4 0.4 Healthy control Healthy control 0.2 0.2 0 0

  7. stablelearner

  8. stablelearner stablelearner (Philipp, Zeileis, and Strobl 2016; Philipp et al. 2018): ◮ A toolkit of descriptive measures and graphical illustrations based on resampling and refitting ◮ Can be used to assess the stability of the variable and cutpoint selection in recursive partitioning

  9. stablelearner - How does it work? Single Tree Tree Ensemble 1. Original Tree 2. Resampling & Refitting 3. Aggregating & Visualizing

  10. stablelearner library ("partykit") library ("stablelearner") data ("Bipolar2009", package = "stablelearner") Bipolar2009 $ simulated_gene_2 <- cut (Bipolar2009 $ simulated_gene_2, breaks = 3, ordered_result = TRUE) str (Bipolar2009, list.len = 6) ## 'data.frame': 61 obs. of 106 variables: ## $ age : int 41 51 29 45 45 29 33 56 48 42 ... ## $ brain_pH : num 6.6 6.67 6.7 6.03 6.35 6.39 6.51 6.07 6.5 6.65 ... ## $ status : Factor w/ 2 levels "Bipolar disorder",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 1 2 1 2 ... ## $ gene_921 : num 8.33 7.99 8.01 7.83 8.51 ... ## $ gene_4211 : num 6.25 7.02 6.54 6.14 6.65 ... ## [list output truncated] ct <- ctree (status ~ ., data = Bipolar2009) ct_stable <- stabletree (ct)

  11. stablelearner - summary summary (ct_stable) ## ## Call: ## partykit::ctree(formula = status ~ ., data = Bipolar2009) ## ## Sampler: ## B = 500 ## Method = Bootstrap sampling with 100.0% data ## ## Variable selection overview: ## ## freq * mean * ## simulated_gene_1 0.514 1 0.514 1 ## simulated_gene_2 0.178 0 0.178 0 ## gene_4318 0.128 0 0.128 0 ## gene_3069 0.104 0 0.104 0 ## gene_3207 0.094 1 0.094 1 ## gene_31 0.062 0 0.062 0 ## gene_1440 0.060 0 0.060 0 ## gene_6935 0.046 0 0.048 0 ## gene_9850 0.046 0 0.046 0 ...

  12. stablelearner - barplot Variable selection frequencies 100 80 Relative frequency (in %) 60 40 20 0 1 2 8 9 7 1 0 5 0 8 9 0 0 6 7 2 H 9 7 6 6 4 1 r 6 0 9 3 7 4 5 2 8 1 4 7 8 6 8 2 5 0 e 1 4 7 9 3 8 0 7 1 7 6 8 4 1 3 3 6 1 8 4 2 8 8 0 9 7 6 2 0 6 8 6 7 1 5 7 6 5 7 6 4 5 6 2 2 6 9 4 6 5 1 2 6 7 3 3 0 9 6 4 9 2 e _ e _ 3 1 0 6 2 0 _ 3 4 4 9 3 5 8 6 7 2 0 6 9 0 6 1 7 8 3 4 6 _ p 6 6 8 0 7 4 1 0 8 1 0 8 d e 2 3 6 9 8 4 4 5 6 0 3 0 5 4 0 7 8 5 4 9 6 6 8 7 2 0 0 6 8 8 1 2 5 9 1 8 a g 9 2 4 3 8 5 1 7 5 3 4 2 8 2 7 1 3 8 6 2 0 5 9 1 1 1 7 6 3 8 1 9 0 2 2 1 3 7 1 8 9 6 9 0 8 3 3 9 4 0 6 2 5 8 4 4 3 3 6 7 6 7 6 9 8 0 0 6 0 8 7 0 9 8 3 3 0 9 2 2 0 8 6 9 8 1 5 6 1 9 2 5 3 3 1 6 9 5 8 9 7 4 6 2 9 0 7 9 1 0 4 1 4 3 6 5 6 7 7 6 2 8 4 2 e n e n _ 4 _ 3 _ 3 n e _ 1 6 _ _ 9 1 7 2 1 _ 6 _ 2 _ 1 1 5 _ 8 n 1 3 _ 2 _ 4 _ 6 5 1 1 1 e n _ 5 _ e _ 9 1 9 _ 9 _ 9 2 1 1 7 1 2 _ 5 1 1 1 0 1 9 _ 5 _ 6 2 2 1 1 2 1 _ e _ 6 1 4 _ 7 1 3 1 7 _ 8 8 _ 1 7 2 _ 2 1 1 7 1 5 1 6 5 1 _ 7 _ 7 _ 4 1 9 1 1 1 0 0 1 _ 2 2 _ _ 9 7 _ 1 2 2 1 _ 9 e _ _ 5 1 4 _ 4 _ 4 1 1 1 8 1 1 _ 9 _ 9 7 _ _ 4 _ 2 _ 4 e _ 1 1 _ 1 1 8 1 9 5 1 _ 4 2 1 2 1 _ 4 1 9 1 8 5 _ 1 0 9 _ 1 1 8 _ 1 9 1 5 g _ _ g n e e n n e e g n e e n n e e _ e _ n e e n n e e _ n e r i a e _ n e n e n e _ e e _ g n e e n n e e _ n e n e e _ e _ _ e n e e _ e _ e _ n e n e e _ e _ e _ e n n e e _ e n e _ _ e n e e n e _ n e e _ e _ e _ e _ _ e n e e n n e _ e e _ e _ e _ n e e n n e e n e _ e _ n e e n e n e _ n e n e _ e e _ _ e n e n e n e n e n e n e e n e _ n e e _ e _ _ e n e e _ e _ e n e _ e _ n e e _ e n e _ e n e _ e _ d d e e e e e e n n e e e n e b n e e e n n e g e n e e n n n e n n n e e n n n g e n e n n e e n e n n n n n e e e n n n n e e e e n n e g e n e e n n n e e e e e e g n e n n n e n n e n n e n e n e n n a e t e t a g g g g g g g e g e g g g g e g e g g g g g e g e g g g e g g g e g e g e g g e g e g e g g g e g e g e g g e g g e g e g g g e g e g g e g e g e g e g g g g e g e g e e g g g g g e g g e g g g e g g g e e g g e g g g g g g e g g g e g e g e g g e g e g g e g e g g e g e g g e g g e u l l u i m i m s s

  13. stablelearner - plot simulated_gene_ 1 1 500 400 300 Counts 200 100 0 8 9 10 11 12 13

  14. stablelearner - plot f ( x ) f ( x ) 1 1 0 0 x x 0 0 . 5 1 0 0 . 5 1 2 1 1 500 500 400 400 300 300 Counts Counts 200 200 100 100 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

  15. stablelearner and Tree Ensembles

  16. What About Tree Ensembles e.g., random forests? Single Tree Tree Ensemble 1. Original Tree Base Learner 2. Resampling & Refitting Resampling & Refitting 3. Aggregating & Visualizing Aggregating & Visualizing Two possibilities: 1. Fit a random forest in stablelearner using, e.g., ctree s as a base learner 2. Fit a random forest using the randomForest function of the randomForest package (Liaw and Wiener 2002), or the cforest function (of the party or partykit package) and coerce the forest to a stabletree object using the as.stabletree function

  17. Random Forests in stablelearner Possibility 1: Use an appropriately specified ctree as a base learner and mimic a cforest of the partykit package: ct_base <- ctree (status ~ ., data = Bipolar2009, control = ctree_control (mtry = 11, teststat = "quadratic", testtype = "Univariate", mincriterion = 0, saveinfo = FALSE)) cf_stable <- stabletree (ct_base, sampler = subsampling, savetrees = TRUE, B = 500, v = 0.632) Note that this allows for custom builds, e.g., with respect to the resampling method ( bootstrap , subsampling , samplesplitting , jackknife , splithalf or own sampling functions).

  18. Random Forests in stablelearner summary (cf_stable, original = FALSE) ## ## Call: ## ctree(formula = status ~ ., data = Bipolar2009, control = ctree_control(mtry = 11, ## teststat = "quadratic", testtype = "Univariate", mincriterion = 0, ## saveinfo = FALSE)) ## ## Sampler: ## B = 500 ## Method = Subsampling with 63.2% data ## ## Variable selection overview: ## ## freq mean ## simulated_gene_1 0.152 0.152 ## simulated_gene_2 0.132 0.134 ## gene_4318 0.118 0.118 ## gene_3069 0.098 0.098 ## gene_2807 0.072 0.072 ## gene_1440 0.068 0.068 ## gene_12029 0.052 0.052 ...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend