Why and how to use random forest Introduction Construction R - PowerPoint PPT Presentation

Why and how to use random forest Introduction Construction R functions variable importance measures Variable importance (and how you shouldn’t) Tests for variable importance Conditional importance Summary Carolin Strobl (LMU M¨ unchen) and Achim Zeileis (WU Wien) References carolin.strobl@stat.uni-muenchen.de useR! 2008, Dortmund

Introduction Introduction Construction R functions Random forests Variable importance Tests for variable importance Conditional importance Summary References

Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences Summary References

Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary References

Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary ◮ can deal with “small n large p”-problems, high-order References interactions, correlated predictor variables

Introduction Introduction Construction R functions Random forests Variable importance Tests for variable ◮ have become increasingly popular in, e.g., genetics and importance Conditional importance the neurosciences [imagine a long list of references here] Summary ◮ can deal with “small n large p”-problems, high-order References interactions, correlated predictor variables ◮ are used not only for prediction, but also to assess variable importance

(Small) random forest Introduction 1 1 1 Construction 1 Start Start Start Start p < 0.001 p < 0.001 p < 0.001 p < 0.001 ≤ 8 ≤ > 8 > ≤ 12 > 12 ≤ 1 > 1 R functions ≤ 8 ≤ > 8 > 2 3 2 7 2 3 n = 13 Age Age n = 49 n = 8 Number y = (0.308, 0.692) p < 0.001 p < 0.001 y = (1, 0) y = (0.375, 0.625) p < 0.001 2 3 n = 15 Start ≤ ≤ 87 > 87 ≤ 68 > 68 ≤ 4 > 4 y = (0.4, 0.6) p < 0.001 4 5 3 6 4 7 Variable n = 36 Start Number n = 12 Age n = 31 ≤ ≤ 14 > > 14 y = (1, 0) p < 0.001 p < 0.001 y = (0.25, 0.75) p < 0.001 y = (0.806, 0.194) ≤ 13 > 13 ≤ 4 > 4 ≤ 125 > 125 4 5 importance n = 34 n = 32 6 7 4 5 5 6 y = (0.882, 0.118) y = (1, 0) n = 16 n = 16 n = 11 n = 9 n = 31 n = 11 y = (0.75, 0.25) y = (1, 0) y = (1, 0) y = (0.556, 0.444) y = (1, 0) y = (0.818, 0.182) 1 Tests for variable 1 1 Number 1 Start Start p < 0.001 Start p < 0.001 p < 0.001 p < 0.001 importance ≤ 5 ≤ > 5 2 9 ≤ 12 > 12 ≤ 14 > 14 Age n = 11 ≤ 12 ≤ > > 12 p < 0.001 y = (0.364, 0.636) 2 7 2 7 Age Number Age n = 35 ≤ 81 ≤ > > 81 p < 0.001 p < 0.001 p < 0.001 y = (1, 0) Conditional 2 3 3 4 n = 38 Number n = 33 Start ≤ 18 > 18 ≤ 3 > 3 ≤ 71 > 71 y = (0.711, 0.289) p < 0.001 y = (1, 0) p < 0.001 importance 3 4 8 9 3 4 ≤ 12 ≤ > > 12 n = 10 Number n = 28 n = 21 n = 15 Start 5 6 ≤ ≤ 3 > > 3 y = (0.9, 0.1) p < 0.001 y = (1, 0) y = (0.952, 0.048) y = (0.933, 0.067) p < 0.001 n = 13 Start y = (0.385, 0.615) p < 0.001 ≤ 4 > 4 ≤ 12 > 12 4 5 ≤ 15 > 15 n = 25 n = 18 5 6 5 6 7 8 y = (1, 0) y = (0.889, 0.111) n = 12 n = 10 n = 16 n = 15 Summary n = 12 n = 12 y = (0.417, 0.583) y = (0.2, 0.8) y = (0.375, 0.625) y = (0.733, 0.267) y = (0.833, 0.167) y = (1, 0) 1 1 Start 1 1 Number p < 0.001 Start Start p < 0.001 p < 0.001 p < 0.001 References ≤ ≤ 12 > > 12 ≤ 6 > 6 2 7 2 7 ≤ ≤ 12 > 12 ≤ 8 > 8 Age Start Number n = 10 p < 0.001 p < 0.001 p < 0.001 y = (0.5, 0.5) 2 5 2 5 Age Start Start Age ≤ ≤ 27 > > 27 ≤ 13 ≤ > > 13 ≤ 3 > 3 p < 0.001 p < 0.001 p < 0.001 p < 0.001 3 4 8 9 3 6 n = 10 Number n = 11 n = 37 Start n = 37 y = (1, 0) p < 0.001 y = (0.818, 0.182) y = (1, 0) ≤ 81 ≤ > 81 > ≤ 13 > 13 ≤ 3 > 3 ≤ 136 > 136 p < 0.001 y = (0.865, 0.135) ≤ ≤ 4 > > 4 ≤ 13 > 13 3 4 6 7 3 4 6 7 5 6 n = 20 n = 16 n = 11 n = 34 n = 12 n = 14 n = 47 n = 8 4 5 n = 14 n = 9 y = (0.85, 0.15) y = (0.188, 0.812) y = (0.818, 0.182) y = (1, 0) y = (0.667, 0.333) y = (0.143, 0.857) y = (1, 0) y = (0.75, 0.25) n = 10 n = 24 y = (0.357, 0.643) y = (0.111, 0.889) y = (0.8, 0.2) y = (1, 0) 1 1 1 1 Start Start Start Start p < 0.001 p < 0.001 p < 0.001 p < 0.001 ≤ 8 > 8 2 3 ≤ ≤ 8 > 8 > ≤ 12 ≤ > 12 ≤ 12 > 12 n = 18 Start y = (0.5, 0.5) p < 0.001 2 5 2 5 2 3 Start Start Age Start n = 28 Start ≤ 12 > 12 p < 0.001 p < 0.001 p < 0.001 p < 0.001 y = (0.607, 0.393) p < 0.001 4 5 n = 18 Number ≤ 1 ≤ > 1 > ≤ ≤ 12 > > 12 ≤ ≤ 71 > 71 > ≤ 14 > 14 ≤ 14 > 14 y = (0.833, 0.167) p < 0.001 ≤ 3 > 3 3 4 6 7 3 4 6 7 4 5 n = 9 n = 13 n = 12 n = 47 n = 15 n = 17 n = 17 n = 32 n = 21 n = 32 6 7 y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167) y = (1, 0) y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118) y = (1, 0) y = (0.905, 0.095) y = (1, 0) n = 30 n = 15 y = (1, 0) y = (0.933, 0.067)

Construction of a random forest Introduction Construction R functions Variable importance Tests for variable importance Conditional importance Summary References

Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance Tests for variable importance Conditional importance Summary References

Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance ◮ fit a classification tree to each bootstrap sample Tests for variable importance ⇒ ntree trees Conditional importance Summary References

Construction of a random forest Introduction Construction R functions ◮ draw ntree bootstrap samples from original sample Variable importance ◮ fit a classification tree to each bootstrap sample Tests for variable importance ⇒ ntree trees Conditional importance ◮ creates diverse set of trees because Summary ◮ trees are instable w.r.t. changes in learning data References ⇒ ntree different looking trees (bagging) ◮ randomly preselect mtry splitting variables in each split ⇒ ntree more different looking trees (random forest)

Random forests in R Introduction ◮ randomForest (pkg: randomForest ) Construction R functions ◮ reference implementation based on CART trees Variable (Breiman, 2001; Liaw and Wiener, 2008) importance Tests for variable – for variables of different types: biased in favor of importance Conditional continuous variables and variables with many categories importance (Strobl, Boulesteix, Zeileis, and Hothorn, 2007) Summary ◮ cforest (pkg: party ) References ◮ based on unbiased conditional inference trees (Hothorn, Hornik, and Zeileis, 2006) + for variables of different types: unbiased when subsampling, instead of bootstrap sampling, is used (Strobl, Boulesteix, Zeileis, and Hothorn, 2007)

Why and how to use random forest Introduction Construction R - PowerPoint PPT Presentation

Why and how to use random forest Introduction Construction R functions variable importance measures Variable importance (and how you shouldnt) Tests for variable importance Conditional importance Summary Carolin Strobl (LMU M

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

THE USE AND AUDIENCES OF THE USE AND AUDIENCES OF NATIONAL FOREST NATIONAL FOREST

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

US Forest Service Presentation Forest Health and Water Implications United State Forest Service

National Forest Monitoring and National Forest Inventory at FAO FAO Forestry

Forest Health Protection Priorities in the US Forest Service Rick Cooksey Continental Dialogue

Logic-based Evaluation of Forest Logic-based Evaluation of Forest Ecosystem Sustainability

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,

The Boreal Forest Overview - Introduction to the Boreal Forest - Why the Boreal is important -

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Lecture 15 : Pairs of Discrete Random Variables 0/ 21 Today we start Chapter 5. The transition we

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

I use Blue Waters to simulate an ultracold inferno. Micheline Soley Micheline Soley, Harvard

Artificial Intelligence George Konidaris gdk@cs.brown.edu Fall 2019 1410 Team Instructor :

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics

OBJECT ORIENTED PROGRAMMING Coin.java and CoinTester.java This excellent tutorial written by

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst cole Polytechnique

Sambuz

Useful Links

Newsletter

Mail Us

Why and how to use random forest Introduction Construction R - PowerPoint PPT Presentation

Why and how to use random forest Introduction Construction R functions variable importance measures Variable importance (and how you shouldnt) Tests for variable importance Conditional importance Summary Carolin Strobl (LMU M

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

THE USE AND AUDIENCES OF THE USE AND AUDIENCES OF NATIONAL FOREST NATIONAL FOREST

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

US Forest Service Presentation Forest Health and Water Implications United State Forest Service

National Forest Monitoring and National Forest Inventory at FAO FAO Forestry

Forest Health Protection Priorities in the US Forest Service Rick Cooksey Continental Dialogue

Logic-based Evaluation of Forest Logic-based Evaluation of Forest Ecosystem Sustainability

PERTINENT FACTS ABOUT THE FOREST SURVEY &quot;What is the Forest Survey? Edward C. Crafts, Chief,

The Boreal Forest Overview - Introduction to the Boreal Forest - Why the Boreal is important -

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Lecture 15 : Pairs of Discrete Random Variables 0/ 21 Today we start Chapter 5. The transition we

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

I use Blue Waters to simulate an ultracold inferno. Micheline Soley Micheline Soley, Harvard

Artificial Intelligence George Konidaris gdk@cs.brown.edu Fall 2019 1410 Team Instructor :

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics

OBJECT ORIENTED PROGRAMMING Coin.java and CoinTester.java This excellent tutorial written by

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian

Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst cole Polytechnique

Sambuz

Useful Links

Newsletter

Mail Us

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,