bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 19: What is - PowerPoint PPT Presentation

Photo byUnsplash user @nathananderson BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019 Last time Decision Trees slide by David Sontag


  1. Photo byUnsplash user @nathananderson BBM406 Fundamentals of 
 Machine Learning Lecture 19: What is Ensemble Learning? Bagging Random Forests Aykut Erdem // Hacettepe University // Fall 2019

  2. Last time… Decision Trees slide by David Sontag 2

  3. Last time… Information Gain • Decrease&in&entropy&(uncertainty)&aper&spliong& X 1 X 2 Y In our running example: T T T T F T IG(X 1 ) = H(Y) – H(Y|X 1 ) T T T = 0.65 – 0.33 T F T IG(X 1 ) > 0  we prefer the split! F T T slide by David Sontag F F F 3

  4. 
 
 Last time… Continuous features • Binary tree, split on attribute X - One branch: X < t - Other branch: X ≥ t • Search through possible values of t - Seems hard!!! • But only a finite number of t ’s are important: 
 X j c 1 c 2 t 1 t 2 • Sort data according to X into { x 1 ,..., x m } • Consider split points of the form x i + ( x i +1 – x i )/2 • Moreover, only splits between examples from di ff erent classes matter! 
 slide by David Sontag X j c 1 c 2 4 t 2 t 1

  5. Last time… Decision trees will overfit • Standard decision trees have no learning bias - Training set error is always zero! (If there is no label noise) • - Lots of variance - Must introduce some bias towards simpler trees 
 • Many strategies for picking simpler trees - Fixed depth - Fixed number of leaves 
 slide by David Sontag • Random forests 5

  6. Today • Ensemble Methods - Bagging Random Forests • 6

  7. Ensemble Methods • High level idea 
 – Generate multiple hypotheses 
 – Combine them to a single classifier 
 • Two important questions 
 – How do we generate multiple hypotheses 
 • we have only one sample 
 – How do we combine the multiple hypotheses 
 • Majority, AdaBoost, ... slide by Yishay Mansour 7

  8. Bias/Variance Tradeo ff Bias/Variance&Tradeoff& slide by David Sontag Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001 8

  9. Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 9

  10. Fighting the bias-variance tradeo ff • Simple (a.k.a. weak) learners are good - e.g., naïve Bayes, logistic regression, decision stumps (or shallow decision trees) - Low variance, don’t usually overfit • Simple (a.k.a. weak) learners are bad 
 – High bias, can’t solve hard learning problems slide by Aarti Singh 10

  11. Reduce Variance Without Increasing Bias • Averaging reduces variance: (when prediction are independent ) • Average models to reduce model variance • One problem: - Only one training set - Where do multiple models come from? slide by David Sontag 11

  12. Bagging (Bootstrap Aggregating) • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . slide by David Sontag - Classify new instance by majority vote / average. 12

  13. Bagging • Best case: • In practice: - models are correlated, so reduction is smaller 
 than 1/N - variance of models trained on fewer training cases usually somewhat larger slide by David Sontag 13

  14. 14 Bagging Example slide by David Sontag

  15. CART* decision boundary slide by David Sontag * A decision tree learning algorithm; very similar to ID3 15

  16. 100 bagged trees slide by David Sontag • Shades of blue/red indicate strength of vote for particular classification 16

  17. Random Forests 17

  18. Random Forests • Ensemble method specifically designed for decision tree classifiers • Introduce two sources of randomness: “Bagging” and “Random input vectors” - Bagging method: each tree is grown using a bootstrap sample of training data - Random vector method: At each node , best split is chosen from a random sample of m attributes instead of all attributes slide by David Sontag 18

  19. Classification tree Classification tree Data in feature space training ?" ?" ?" slide by Nando de Freitas [Criminisi et al., 2011] 19

  20. Use information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 20

  21. Advanced: Gaussian information gain to decide splits Split&1& Before&split& Split&2& slide by Nando de Freitas [Criminisi et al., 2011] 21

  22. Split&1& 𝜾 =1 𝜾 =2 … Split node (train) leaf% leaf% Leaf model: probabilistic Split node (test) Node%weak%learner% leaf% slide by Nando de Freitas [Criminisi et al., 2011] 22

  23. Alternative node decisions slide by Nando de Freitas axis aligned oriented line conic section examples of weak learners 23

  24. Building a random tree Building a random tree slide by Nando de Freitas 24

  25. Random Forests algorithm slide by Nando de Freitas 25 [From the book of Hastie, Friedman and Tibshirani]

  26. Randomization slide by Nando de Freitas 26

  27. Building a forest (ensemble) Tree t=1 t=2 t=3 slide by Nando de Freitas 27

  28. E ff ect of forest size slide by Nando de Freitas 28

  29. 29 E ff ect of forest size slide by Nando de Freitas

  30. E ff ect of more classes and noise Effect of more classes and noise slide by Nando de Freitas 30 [Criminisi et al, 2011]

  31. E ff ect of more classes and noise slide by Nando de Freitas 31

  32. E ff ect of tree depth (D) Training'points:'4.class'mixed' slide by Nando de Freitas D=3 D=6 D=15 (underfitting) (overfitting) 32

  33. E ff ect of bagging no bagging => max-margin slide by Nando de Freitas 33

  34. Random Forests and the Kinect slide by Nando de Freitas 34

  35. Random Forests and the Kinect adopted from Nando de Freitas depth image body parts 3D joint proposals [Jamie Shotton et al., 2011] 35

  36. Random Forests and the Kinect • Use computer graphics to generate plenty of data synthetic (train & test) real (test) adopted from Nando de Freitas [Jamie Shotton et al., 2011] 36

  37. Reduce Bias 2 and Decrease Variance? • Bagging reduces variance by averaging • Bagging has little e ff ect on bias • Can we average and reduce bias? • Yes: Boosting slide by David Sontag 37

  38. Next Lecture: Boosting 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend