Random Forests and other Ensembles of Independent Predictors Prof. - - PowerPoint PPT Presentation

random forests
SMART_READER_LITE
LIVE PREVIEW

Random Forests and other Ensembles of Independent Predictors Prof. - - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Random Forests and other Ensembles of Independent Predictors Prof. Mike Hughes Many slides attributable to: Liping Liu and Roni Khardon (Tufts) T. Q.


slide-1
SLIDE 1

Random Forests

and other Ensembles of Independent Predictors

2

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/

  • Prof. Mike Hughes

Many slides attributable to: Liping Liu and Roni Khardon (Tufts)

  • T. Q. Chen (UW),

James, Witten, Hastie, Tibshirani (ISL/ESL books)

slide-2
SLIDE 2

Ensembles: Unit Objectives

Big idea: We can improve performance by aggregating decisions from MANY predictors

  • Today: Predictors are Independently Trained
  • Using bootstrap samples of examples: “Bagging”
  • Using random subsets of features
  • Exemplary method: Random Forest / ExtraTrees
  • Next class: Predictors are Sequentially Trained
  • Each successive predictor “boosts” performance
  • Exemplary method: XGBoost

3

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-3
SLIDE 3

Motivating Example

3 binary classifiers

Model predictions as independent random variables Each one is correct 70% of the time What is chance that majority vote is correct?

4

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-4
SLIDE 4

Motivating Example

5 binary classifiers

Model predictions as independent random variables Each one is correct 70% of the time What is chance that majority vote is correct?

6

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-5
SLIDE 5

Motivating Example

101 binary classifiers

Model predictions as independent random variables Each one is correct 70% of the time What is chance that majority vote is correct?

8

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-6
SLIDE 6

10

Mike Hughes - Tufts COMP 135 - Spring 2019

Key Idea: Diversity

  • Vary the training data
slide-7
SLIDE 7

Bootstrap Sampling

11

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-8
SLIDE 8

Bootstrap Sampling in Python

12

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-9
SLIDE 9

Bootstrap Aggregation: BAgg-ing

  • Draw B “replicas” of training set
  • Use bootstrap sampling with replacement
  • Make prediction by averaging

13

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-10
SLIDE 10

Regression Example: 1 tree

Image Credit: Adele Cutler’s slides

slide-11
SLIDE 11

Regression Example: 10 trees

The solid black line is the ground-truth, Red lines are predictions of single regression trees

Image Credit: Adele Cutler’s slides

slide-12
SLIDE 12

Regression Average of 10 trees

The solid black line is the ground-truth, The blue line is the prediction of the average of 10 regression trees Image Credit: Adele Cutler’s slides

slide-13
SLIDE 13

Binary Classification

Image Credit: Adele Cutler’s slides

slide-14
SLIDE 14

Decision Boundary: 1 tree

Image Credit: Adele Cutler’s slides

slide-15
SLIDE 15

Decision boundary: 25 trees

Image Credit: Adele Cutler’s slides

slide-16
SLIDE 16

Average over 25 trees

Image Credit: Adele Cutler’s slides

slide-17
SLIDE 17

Variance of averages

21

Mike Hughes - Tufts COMP 135 - Spring 2019

  • Given B independent observations
  • Each one has variance v
  • Compute the mean of the B observations
  • What is variance of this estimator?

z1, z2, . . . zB

¯ z = 1 B

B

X

b=1

zb

slide-18
SLIDE 18

Why Bagging Works: Reduce Variance!

22

Mike Hughes - Tufts COMP 135 - Spring 2019

  • Flexible learners applied to small datasets have

high variance w.r.t. the data distribution

  • Small change in training set -> big change in

predictions on heldout set

  • Bagging decreases heldout error by decreasing

the variance of predictions

  • Bagging can be applied to any base

classifiers/regressors

slide-19
SLIDE 19

Another Idea for Diversity

  • Vary the features

23

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-20
SLIDE 20

Random Forest

24

Mike Hughes - Tufts COMP 135 - Spring 2019

Combine example diversity AND feature diversity For t = 1 to T (# trees): Draw independent bootstrap sample of training set. Greedy train tree on random subsample of features For each node within a maximum depth: Randomly select M features from F features Find the best split among these M features Average the trees to get predictions for new data.

slide-21
SLIDE 21

25

Mike Hughes - Tufts COMP 135 - Spring 2019 Single tree Credit: ISL textbook

slide-22
SLIDE 22

Extremely Randomized Trees aka “ExtraTrees” in sklearn

26

Mike Hughes - Tufts COMP 135 - Spring 2019

Speed, example diversity, and feature diversity For t = 1 to T (# trees): Draw independent bootstrap sample of training set. Greedy train tree on random subsample of features For each node within a maximum depth: Randomly select m features from F features Find the best split among M variables Try 1 random split at each of M variables, then select the best split of these

slide-23
SLIDE 23

27

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-24
SLIDE 24

28

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-25
SLIDE 25

Applications of Random Forest in Industry

29

Mike Hughes - Tufts COMP 135 - Spring 2019 Microsoft Kinect RGB-D camera

How does the Kinect classify each pixel into a body part?

slide-26
SLIDE 26

30

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-27
SLIDE 27

Summary: Ensembles of Independent Base Classifiers

  • Average over independent base predictors
  • Why it works: Reduce variance
  • PRO
  • Often better heldout performance than base model
  • CON
  • Training B separate models is expensive, but can be

parallelized

31

Mike Hughes - Tufts COMP 135 - Spring 2019