boosting
play

Boosting Machine Learning - 10601 Geoff Gordon, MiroslavDudk - PDF document

11/9/2009 Boosting Machine Learning - 10601 Geoff Gordon, MiroslavDudk ([[[partly based on slides of Rob Schapire and Carlos Guestrin] http://www.cs.cmu.edu/~ggordon/10601/ November 9, 2009 Ensembles of trees BAGGING and BOOSTING RANDOM


  1. 11/9/2009 Boosting Machine Learning - 10601 Geoff Gordon, MiroslavDudík ([[[partly based on slides of Rob Schapire and Carlos Guestrin] http://www.cs.cmu.edu/~ggordon/10601/ November 9, 2009 Ensembles of trees BAGGING and BOOSTING RANDOM FORESTS • learn many big trees • learn many small trees ( weak classifiers) • each tree aims to fit • each tree ‘specializes’ to a the same target concept different part of target – random training sets concept – randomized tree growth – reweight training examples – higher weights where still errors • voting ≈ averaging: • voting increases expressivity: DECREASE in VARIANCE DECREASE in BIAS 1

  2. 11/9/2009 Boosting • boosting = general method of converting rough rules of thumb (e.g., decision stumps) into highly accurate prediction rule Boosting • boosting = general method of converting rough rules of thumb (e.g., decision stumps) into highly accurate prediction rule • technically: – assume given “weak” learning algorithm that can consistently find classifiers ( “rules of thumb” ) at least slightly better than random, say, accuracy ≥ 55% (in two-class setting) – given sufficient data, a boosting algorithm can provably construct single classifier with very high accuracy , say, 99% 2

  3. 11/9/2009 AdaBoost [Freund-Schapire 1995] 3

  4. 11/9/2009 weak classifiers = decision stumps (vertical or horizontal half-planes) 4

  5. 11/9/2009 5

  6. 11/9/2009 A typical run of AdaBoost • training error rapidly drops (combining weak learners increases expressivity) • test error does not increase with number of trees T (robustness to overfitting) 6

  7. 11/9/2009 7

  8. 11/9/2009 Bounding true error [Freund-Schapire 1997] • T = number of rounds • d = VC dimension of weak learner • m = number of training examples 8

  9. 11/9/2009 Bounding true error (a first guess) A typical run contradicts a naïve bound 9

  10. 11/9/2009 Finer analysis: margins [Schapire et al. 1998] Empirical evidence: margin distribution 10

  11. 11/9/2009 Theoretical evidence: large margins  simple classifiers Previously More technically… Bound depends on: • d = VC dimension of weak learner • m = number of training examples • entire distribution of training margins 11

  12. 11/9/2009 Practical advantages of AdaBoost Application: detecting faces [Viola-Jones 2001] 12

  13. 11/9/2009 Caveats “Hard” predictions can slow down learning! 13

  14. 11/9/2009 Confidence-rated Predictions [Schapire-Singer 1999] Confidence-rated Predictions 14

  15. 11/9/2009 Confidence-rated predictions help a lot! Loss in logistic regression 15

  16. 11/9/2009 Loss in AdaBoost Logistic regression vs AdaBoost 16

  17. 11/9/2009 Benefits of model-fitting view What you should know about boosting • weak classifiers  strong classifiers – weak: slightly better than random on training data – strong: eventually zero error on training data • AdaBoost prevents overfitting by increasing margins • regimes when AdaBoost overfits – weak learner too strong: use small trees or stop early – data noisy: stop early • AdaBoost vs Logistic Regression – exponential loss vs log loss – single-coordinate updates vs full optimization 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend