introd u ction to random forest
play

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R - PowerPoint PPT Presentation

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor Random Forest be er performance sample s u bset of the feat u res impro v ed v ersion of bagging red u ced correlation bet w een the sampled trees TREE


  1. Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  2. Random Forest be � er performance sample s u bset of the feat u res impro v ed v ersion of bagging red u ced correlation bet w een the sampled trees TREE - BASED MODELS IN R

  3. Random Forest in R library(randomForest) ?randomForest TREE - BASED MODELS IN R

  4. randomForest E x ample library(randomForest) # Train a default RF model (500 trees) model <- randomForest(formula = response ~ ., data = train) TREE - BASED MODELS IN R

  5. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  6. Understanding the Random Forest model o u tp u t TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  7. Random Forest o u tp u t # Print the credit_model output print(credit_model) Call: randomForest(formula = default ~ ., data = credit_train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 24.12% Confusion matrix: no yes class.error no 516 46 0.08185053 yes 147 91 0.61764706 TREE - BASED MODELS IN R

  8. O u t - of - bag error matri x # Grab OOB error matrix & take a look err <- credit_model$err.rate head(err) OOB no yes [1,] 0.3414634 0.2657005 0.5375000 [2,] 0.3311966 0.2462908 0.5496183 [3,] 0.3232831 0.2476636 0.5147929 [4,] 0.3164933 0.2180294 0.5561224 [5,] 0.3197756 0.2095808 0.5801887 [6,] 0.3176944 0.2115385 0.5619469 TREE - BASED MODELS IN R

  9. O u t - of - bag error estimate OOB # Look at final OOB error rate 0.24125 oob_err <- err[nrow(err), "OOB"] print(oob_err) print(credit_model) Call: randomForest(formula = default ~ ., data = credit_train) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB i f 24 12% TREE - BASED MODELS IN R

  10. Plot the OOB error rates TREE - BASED MODELS IN R

  11. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  12. OOB error v s . test set error TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  13. Ad v antages & Disad v antages of OOB estimates ? Can e v al u ate y o u r model w itho u t a separate test set ? Comp u ted a u tomaticall y b y the randomForest() f u nction ? OOB Error onl y estimates error ( not AUC , log - loss , etc .) ? Can ' t compare Random Forest performance to other t y pes of models TREE - BASED MODELS IN R

  14. Let ' s practice ! TR E E - BASE D MOD E L S IN R

  15. T u ning a Random Forest model TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor

  16. Random Forest H y perparameters ntree : n u mber of trees mtr y : n u mber of v ariables randoml y sampled as candidates at each split sampsi z e : n u mber of samples to train on nodesi z e : minim u m si z e ( n u mber of samples ) of the terminal nodes ma x nodes : ma x im u m n u mber of terminal nodes TREE - BASED MODELS IN R

  17. T u ning mtr y w ith t u neRF () # Execute the tuning process set.seed(1) res <- tuneRF(x = train_predictor_df, y = train_response_vector, ntreeTry = 500) # Look at results print(res) mtry OOBError 2.OOB 2 0.2475 4.OOB 4 0.2475 8.OOB 8 0.2425 TREE - BASED MODELS IN R

  18. Let ' s practice ! TR E E - BASE D MOD E L S IN R

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend