BBM406 Fundamentals of Machine Learning Lecture 19: What is - PowerPoint PPT Presentation

Photo byUnsplash user @nathananderson BBM406 Fundamentals of   Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019

Last time… Decision Trees slide by David Sontag 2

Last time… Information Gain • Decrease&in&entropy&(uncertainty)&aper&spliong& X 1 X 2 Y In our running example: T T T T F T IG(X 1 ) = H(Y) – H(Y|X 1 ) T T T = 0.65 – 0.33 T F T IG(X 1 ) > 0  we prefer the split! F T T slide by David Sontag F F F 3

    Last time… Continuous features • Binary tree, split on attribute X - One branch: X < t - Other branch: X ≥ t • Search through possible values of t - Seems hard!!! • But only a finite number of t ’s are important:   X j c 1 c 2 t 1 t 2 • Sort data according to X into { x 1 ,..., x m } • Consider split points of the form x i + ( x i +1 – x i )/2 • Moreover, only splits between examples from di ff erent classes matter!   slide by David Sontag X j c 1 c 2 4 t 2 t 1

Last time… Decision trees will overfit • Standard decision trees have no learning bias - Training set error is always zero! (If there is no label noise) • - Lots of variance - Must introduce some bias towards simpler trees   • Many strategies for picking simpler trees - Fixed depth - Fixed number of leaves   slide by David Sontag • Random forests 5

Today • Ensemble Methods - Bagging Random Forests • 6

Ensemble Methods • High level idea   – Generate multiple hypotheses   – Combine them to a single classifier   • Two important questions   – How do we generate multiple hypotheses   • we have only one sample   – How do we combine the multiple hypotheses   • Majority, AdaBoost, ... slide by Yishay Mansour 7

Bias/Variance Tradeo ff Bias/Variance&Tradeoff& slide by David Sontag Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001 8

Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 9

Fighting the bias-variance tradeo ff • Simple (a.k.a. weak) learners are good - e.g., naïve Bayes, logistic regression, decision stumps (or shallow decision trees) - Low variance, don’t usually overfit • Simple (a.k.a. weak) learners are bad   – High bias, can’t solve hard learning problems slide by Aarti Singh 10

Reduce Variance Without Increasing Bias • Averaging reduces variance: (when prediction are independent ) • Average models to reduce model variance • One problem: - Only one training set - Where do multiple models come from? slide by David Sontag 11

Bagging (Bootstrap Aggregating) • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . slide by David Sontag - Classify new instance by majority vote / average. 12

Bagging • Best case: • In practice: - models are correlated, so reduction is smaller   than 1/N - variance of models trained on fewer training cases usually somewhat larger slide by David Sontag 13

14 Bagging Example slide by David Sontag

CART* decision boundary slide by David Sontag * A decision tree learning algorithm; very similar to ID3 15

100 bagged trees slide by David Sontag • Shades of blue/red indicate strength of vote for particular classification 16

Random Forests 17

Random Forests • Ensemble method specifically designed for decision tree classifiers • Introduce two sources of randomness: “Bagging” and “Random input vectors” - Bagging method: each tree is grown using a bootstrap sample of training data - Random vector method: At each node , best split is chosen from a random sample of m attributes instead of all attributes slide by David Sontag 18

Classification tree Classification tree Data in feature space training ?" ?" ?" slide by Nando de Freitas [Criminisi et al., 2011] 19

Use information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 20

Advanced: Gaussian information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 21

Split&1& 𝜾 =1 𝜾 =2 … Split node (train) leaf% leaf% Leaf model: probabilistic Split node (test) Node%weak%learner% leaf% slide by Nando de Freitas [Criminisi et al., 2011] 22

Alternative node decisions slide by Nando de Freitas axis aligned oriented line conic section examples of weak learners 23

Building a random tree Building a random tree slide by Nando de Freitas 24

Random Forests algorithm slide by Nando de Freitas 25 [From the book of Hastie, Friedman and Tibshirani]

Randomization slide by Nando de Freitas 26

Building a forest (ensemble) Tree t=1 t=2 t=3 slide by Nando de Freitas 27

E ff ect of forest size slide by Nando de Freitas 28

29 E ff ect of forest size slide by Nando de Freitas

E ff ect of more classes and noise Effect of more classes and noise slide by Nando de Freitas 30 [Criminisi et al, 2011]

E ff ect of more classes and noise slide by Nando de Freitas 31

E ff ect of tree depth (D) Training'points:'4.class'mixed' slide by Nando de Freitas D=3 D=6 D=15 (underfitting) (overfitting) 32

E ff ect of bagging no bagging => max-margin slide by Nando de Freitas 33

Random Forests and the Kinect slide by Nando de Freitas 34

Random Forests and the Kinect adopted from Nando de Freitas depth image body parts 3D joint proposals [Jamie Shotton et al., 2011] 35

Random Forests and the Kinect • Use computer graphics to generate plenty of data synthetic (train & test) real (test) adopted from Nando de Freitas [Jamie Shotton et al., 2011] 36

Reduce Bias 2 and Decrease Variance? • Bagging reduces variance by averaging • Bagging has little e ff ect on bias • Can we average and reduce bias? • Yes: Boosting slide by David Sontag 37

Next Lecture: Boosting 38

BBM406 Fundamentals of Machine Learning Lecture 19: What is - PowerPoint PPT Presentation

Photo byUnsplash user @nathananderson BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019 Last time Decision Trees slide by David Sontag

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

BBM406 Fundamentals of Machine Learning Lecture 14: Deep Convolutional Networks Aykut Erdem

Extreme statistics of random and quantum chaotic states Steve Tomsovic Washington State

Random Number Generation with Multiple Streams for Sequential and Parallel Computing Pierre

Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The

DISCRETE PROBABILITY Discrete Probability is a finite or countable set called the

EE456 Digital Communications Professor Ha Nguyen September 2016 EE456 Digital

Random Networks, Graphical Models and Exchangeability Alessandro Rinaldo Carnegie Mellon

Stable maps: scaling limits of random planar maps with large faces G. Miermont , joint with J.-F

TIPS REMEMBER JAVASCRIPT IS VERY, VERY CASE SENSITIVE RESERVED WORDS List by category