why and how to use random forest
play

Why and how to use random forest Introduction Construction R - PowerPoint PPT Presentation

Why and how to use random forest Introduction Construction R functions variable importance measures Variable importance (and how you shouldnt) Tests for variable importance Conditional importance Summary Carolin Strobl (LMU M


  1. Why and how to use random forest Introduction Construction R functions variable importance measures Variable importance (and how you shouldn’t) Tests for variable importance Conditional importance Summary Carolin Strobl (LMU M¨ unchen) and Achim Zeileis (WU Wien) References carolin.strobl@stat.uni-muenchen.de useR! 2008, Dortmund

  2. Introduction Introduction Construction R functions Random forests Variable importance Tests for variable importance Conditional importance Summary References

  3. Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences Summary References

  4. Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary References

  5. Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary ◮ can deal with “small n large p”-problems, high-order References interactions, correlated predictor variables

  6. Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary ◮ can deal with “small n large p”-problems, high-order References interactions, correlated predictor variables ◮ are used not only for prediction, but also to assess variable importance

  7. (Small) random forest Introduction 1 1 1 Construction 1 Start Start Start Start p < 0.001 p < 0.001 p < 0.001 p < 0.001 ≤ 8 ≤ > 8 > ≤ 12 > 12 ≤ 1 > 1 R functions ≤ 8 ≤ > 8 > 2 3 2 7 2 3 n = 13 Age Age n = 49 n = 8 Number y = (0.308, 0.692) p < 0.001 p < 0.001 y = (1, 0) y = (0.375, 0.625) p < 0.001 2 3 n = 15 Start ≤ ≤ 87 > 87 ≤ 68 > 68 ≤ 4 > 4 y = (0.4, 0.6) p < 0.001 4 5 3 6 4 7 Variable n = 36 Start Number n = 12 Age n = 31 ≤ ≤ 14 > > 14 y = (1, 0) p < 0.001 p < 0.001 y = (0.25, 0.75) p < 0.001 y = (0.806, 0.194) ≤ 13 > 13 ≤ 4 > 4 ≤ 125 > 125 4 5 importance n = 34 n = 32 6 7 4 5 5 6 y = (0.882, 0.118) y = (1, 0) n = 16 n = 16 n = 11 n = 9 n = 31 n = 11 y = (0.75, 0.25) y = (1, 0) y = (1, 0) y = (0.556, 0.444) y = (1, 0) y = (0.818, 0.182) 1 Tests for variable 1 1 Number 1 Start Start p < 0.001 Start p < 0.001 p < 0.001 p < 0.001 importance ≤ 5 ≤ > 5 2 9 ≤ 12 > 12 ≤ 14 > 14 Age n = 11 ≤ 12 ≤ > > 12 p < 0.001 y = (0.364, 0.636) 2 7 2 7 Age Number Age n = 35 ≤ 81 ≤ > > 81 p < 0.001 p < 0.001 p < 0.001 y = (1, 0) Conditional 2 3 3 4 n = 38 Number n = 33 Start ≤ 18 > 18 ≤ 3 > 3 ≤ 71 > 71 y = (0.711, 0.289) p < 0.001 y = (1, 0) p < 0.001 importance 3 4 8 9 3 4 ≤ 12 ≤ > > 12 n = 10 Number n = 28 n = 21 n = 15 Start 5 6 ≤ ≤ 3 > > 3 y = (0.9, 0.1) p < 0.001 y = (1, 0) y = (0.952, 0.048) y = (0.933, 0.067) p < 0.001 n = 13 Start y = (0.385, 0.615) p < 0.001 ≤ 4 > 4 ≤ 12 > 12 4 5 ≤ 15 > 15 n = 25 n = 18 5 6 5 6 7 8 y = (1, 0) y = (0.889, 0.111) n = 12 n = 10 n = 16 n = 15 Summary n = 12 n = 12 y = (0.417, 0.583) y = (0.2, 0.8) y = (0.375, 0.625) y = (0.733, 0.267) y = (0.833, 0.167) y = (1, 0) 1 1 Start 1 1 Number p < 0.001 Start Start p < 0.001 p < 0.001 p < 0.001 References ≤ ≤ 12 > > 12 ≤ 6 > 6 2 7 2 7 ≤ ≤ 12 > 12 ≤ 8 > 8 Age Start Number n = 10 p < 0.001 p < 0.001 p < 0.001 y = (0.5, 0.5) 2 5 2 5 Age Start Start Age ≤ ≤ 27 > > 27 ≤ 13 ≤ > > 13 ≤ 3 > 3 p < 0.001 p < 0.001 p < 0.001 p < 0.001 3 4 8 9 3 6 n = 10 Number n = 11 n = 37 Start n = 37 y = (1, 0) p < 0.001 y = (0.818, 0.182) y = (1, 0) ≤ 81 ≤ > 81 > ≤ 13 > 13 ≤ 3 > 3 ≤ 136 > 136 p < 0.001 y = (0.865, 0.135) ≤ ≤ 4 > > 4 ≤ 13 > 13 3 4 6 7 3 4 6 7 5 6 n = 20 n = 16 n = 11 n = 34 n = 12 n = 14 n = 47 n = 8 4 5 n = 14 n = 9 y = (0.85, 0.15) y = (0.188, 0.812) y = (0.818, 0.182) y = (1, 0) y = (0.667, 0.333) y = (0.143, 0.857) y = (1, 0) y = (0.75, 0.25) n = 10 n = 24 y = (0.357, 0.643) y = (0.111, 0.889) y = (0.8, 0.2) y = (1, 0) 1 1 1 1 Start Start Start Start p < 0.001 p < 0.001 p < 0.001 p < 0.001 ≤ 8 > 8 2 3 ≤ ≤ 8 > 8 > ≤ 12 ≤ > 12 ≤ 12 > 12 n = 18 Start y = (0.5, 0.5) p < 0.001 2 5 2 5 2 3 Start Start Age Start n = 28 Start ≤ 12 > 12 p < 0.001 p < 0.001 p < 0.001 p < 0.001 y = (0.607, 0.393) p < 0.001 4 5 n = 18 Number ≤ 1 ≤ > 1 > ≤ ≤ 12 > > 12 ≤ ≤ 71 > 71 > ≤ 14 > 14 ≤ 14 > 14 y = (0.833, 0.167) p < 0.001 ≤ 3 > 3 3 4 6 7 3 4 6 7 4 5 n = 9 n = 13 n = 12 n = 47 n = 15 n = 17 n = 17 n = 32 n = 21 n = 32 6 7 y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167) y = (1, 0) y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118) y = (1, 0) y = (0.905, 0.095) y = (1, 0) n = 30 n = 15 y = (1, 0) y = (0.933, 0.067)

  8. Construction of a random forest Introduction Construction R functions Variable importance Tests for variable importance Conditional importance Summary References

  9. Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance Tests for variable importance Conditional importance Summary References

  10. Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance ◮ fit a classification tree to each bootstrap sample Tests for variable importance ⇒ ntree trees Conditional importance Summary References

  11. Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance ◮ fit a classification tree to each bootstrap sample Tests for variable importance ⇒ ntree trees Conditional importance ◮ creates diverse set of trees because Summary ◮ trees are instable w.r.t. changes in learning data References ⇒ ntree different looking trees (bagging) ◮ randomly preselect mtry splitting variables in each split ⇒ ntree more different looking trees (random forest)

  12. Random forests in R Introduction ◮ randomForest (pkg: randomForest ) Construction R functions ◮ reference implementation based on CART trees Variable (Breiman, 2001; Liaw and Wiener, 2008) importance Tests for variable – for variables of different types: biased in favor of importance Conditional continuous variables and variables with many categories importance (Strobl, Boulesteix, Zeileis, and Hothorn, 2007) Summary ◮ cforest (pkg: party ) References ◮ based on unbiased conditional inference trees (Hothorn, Hornik, and Zeileis, 2006) + for variables of different types: unbiased when subsampling, instead of bootstrap sampling, is used (Strobl, Boulesteix, Zeileis, and Hothorn, 2007)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend