Random Forests and other Ensembles of Independent Predictors Prof. - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Random Forests and other Ensembles of Independent Predictors Prof. Mike Hughes Many slides attributable to: Liping Liu and Roni Khardon (Tufts) T. Q. Chen (UW), James, Witten, Hastie, Tibshirani (ISL/ESL books) 2

Ensembles : Unit Objectives Big idea: We can improve performance by aggregating decisions from MANY predictors • Today: Predictors are Independently Trained • Using bootstrap samples of examples: “Bagging” • Using random subsets of features • Exemplary method: Random Forest / ExtraTrees • Next class: Predictors are Sequentially Trained • Each successive predictor “boosts” performance • Exemplary method: XGBoost Mike Hughes - Tufts COMP 135 - Spring 2019 3

Motivating Example 3 binary classifiers Model predictions as independent random variables Each one is correct 70% of the time What is chance that majority vote is correct? Mike Hughes - Tufts COMP 135 - Spring 2019 4

Key Idea: Diversity • Vary the training data Mike Hughes - Tufts COMP 135 - Spring 2019 10

Bootstrap Sampling Mike Hughes - Tufts COMP 135 - Spring 2019 11

Bootstrap Sampling in Python Mike Hughes - Tufts COMP 135 - Spring 2019 12

Bootstrap Aggregation: BAgg-ing • Draw B “replicas” of training set • Use bootstrap sampling with replacement • Make prediction by averaging Mike Hughes - Tufts COMP 135 - Spring 2019 13

Regression Example: 1 tree Image Credit: Adele Cutler’s slides

Regression Example: 10 trees The solid black line is the ground-truth, Red lines are predictions of single regression trees Image Credit: Adele Cutler’s slides

Regression Average of 10 trees The solid black line is the ground-truth, The blue line is the prediction of the average of 10 regression trees Image Credit: Adele Cutler’s slides

Binary Classification Image Credit: Adele Cutler’s slides

Decision Boundary: 1 tree Image Credit: Adele Cutler’s slides

Decision boundary: 25 trees Image Credit: Adele Cutler’s slides

Average over 25 trees Image Credit: Adele Cutler’s slides

Variance of averages • Given B independent observations z 1 , z 2 , . . . z B • Each one has variance v • Compute the mean of the B observations B z = 1 X z b ¯ B b =1 • What is variance of this estimator? Mike Hughes - Tufts COMP 135 - Spring 2019 21

Why Bagging Works: Reduce Variance! • Flexible learners applied to small datasets have high variance w.r.t. the data distribution • Small change in training set -> big change in predictions on heldout set • Bagging decreases heldout error by decreasing the variance of predictions • Bagging can be applied to any base classifiers/regressors Mike Hughes - Tufts COMP 135 - Spring 2019 22

Another Idea for Diversity • Vary the features Mike Hughes - Tufts COMP 135 - Spring 2019 23

Random Forest Combine example diversity AND feature diversity For t = 1 to T (# trees): Draw independent bootstrap sample of training set. Greedy train tree on random subsample of features For each node within a maximum depth : Randomly select M features from F features Find the best split among these M features Average the trees to get predictions for new data. Mike Hughes - Tufts COMP 135 - Spring 2019 24

Credit: ISL textbook Single tree Mike Hughes - Tufts COMP 135 - Spring 2019 25

Extremely Randomized Trees aka “ExtraTrees” in sklearn Speed , example diversity, and feature diversity For t = 1 to T (# trees): Draw independent bootstrap sample of training set. Greedy train tree on random subsample of features For each node within a maximum depth : Randomly select m features from F features Find the best split among M variables Try 1 random split at each of M variables, then select the best split of these Mike Hughes - Tufts COMP 135 - Spring 2019 26

Mike Hughes - Tufts COMP 135 - Spring 2019 27

Applications of Random Forest in Industry Microsoft Kinect RGB-D camera How does the Kinect classify each pixel into a body part? Mike Hughes - Tufts COMP 135 - Spring 2019 29

Summary: Ensembles of Independent Base Classifiers • Average over independent base predictors • Why it works: Reduce variance • PRO • Often better heldout performance than base model • CON • Training B separate models is expensive, but can be parallelized Mike Hughes - Tufts COMP 135 - Spring 2019 31

Random Forests and other Ensembles of Independent Predictors Prof. - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Random Forests and other Ensembles of Independent Predictors Prof. Mike Hughes Many slides attributable to: Liping Liu and Roni Khardon (Tufts) T. Q.

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

Forests NSW Forests NSW Spotted Gum ( Corymbia spp.) Tree improvement and deployment strategy

Edition: May 2020 2 FORWARD LOOKING STATEMENT This presentation contains forward-looking

DELEGATE n i n e m n BUSINESS r a o g g v MEETING y i a n

LogCLEF 2011 Overview Giorgio di Nunzio Thomas Mandl Johannes Leveling University of Padua

Magna Carta Group Members: Kirby Gee, April Yu, Gavin Mai Problem Prior Work - Self reporting

Dioc e se of Bisma r c k Pa r ish Se r vic e s Payroll Se rvic e s 97 Par ishe s

Forensics for Managers x Ryan Washington MBA, CISSP, CCE, CEH, NSA/IAM 703-961-9456 Extension

Doing the Unthinkable Learning to Forgive Ephesians 4:30-32 30) And do not grieve the Holy

ProtoGENI and undergraduate courses Gary Wong 1 1 2 2 3 3 4 4 Getting started

Random Forests and other Ensembles of Independent Predictors Prof. - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Random Forests and other Ensembles of Independent Predictors Prof. Mike Hughes Many slides attributable to: Liping Liu and Roni Khardon (Tufts) T. Q.

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

A Look at our Wyoming Forests December 18 - 20, 2013 Governors Task Force on Forests Forests

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Forests and Climate Forests and Climate Keeping Earth a Livable Place Keeping Earth a Livable

South- -East East Pahang Pahang Peat Peat South Swamp Forests, Malaysia Swamp Forests,

Mangrove forests and sea level rise 1 / 48 00001 - 00:00:01 Mangrove forests and sea level rise

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Our Changing Forests Harvard Forest Schoolyard Project August 22, 2019 1. How do forests change?

Conservation Plan Update Liz Dent, State Forests Division Chief Brian Pew, State Forests Deputy

Forests NSW Forests NSW Spotted Gum ( Corymbia spp.) Tree improvement and deployment strategy

Edition: May 2020 2 FORWARD LOOKING STATEMENT This presentation contains forward-looking

DELEGATE n i n e m n BUSINESS r a o g g v MEETING y i a n

LogCLEF 2011 Overview Giorgio di Nunzio Thomas Mandl Johannes Leveling University of Padua

Magna Carta Group Members: Kirby Gee, April Yu, Gavin Mai Problem Prior Work - Self reporting

Dioc e se of Bisma r c k Pa r ish Se r vic e s Payroll Se rvic e s 97 Par ishe s

Forensics for Managers x Ryan Washington MBA, CISSP, CCE, CEH, NSA/IAM 703-961-9456 Extension

Doing the Unthinkable Learning to Forgive Ephesians 4:30-32 30) And do not grieve the Holy

ProtoGENI and undergraduate courses Gary Wong 1 1 2 2 3 3 4 4 Getting started

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.