(Model Combination) Duen Horng (Polo) Chau Associate Professor, - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242: Data & Visual Analytics Ensemble Methods (Model Combination) Duen Horng (Polo) Chau Associate Professor, College of Computing Associate Director, MS Analytics Georgia Tech Mahdi Roozbahani Lecturer, Computational Science & Engineering, Georgia Tech Founder of Filio, a visual asset management platform Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos

Numerous Possible Classifiers! Cross   vali Classifier Training Testing time Accuracy time dation kNN classifier None Can be slow Slow ?? Decision trees Slow Very slow Very fast ?? Naïve Bayes Fast None Fast ?? classifier … … … … … 3

Which Classifier/Model to Choose? Possible strategies: • Go from simplest model to more complex model until you obtain desired accuracy • Discover a new model if the existing ones do not work for you • Combine all (simple) models 4

Common Strategy: Bagging ( B ootstrap Agg regat ing ) Originally designed for combining multiple models, to improve classification “stability” [Leo Breiman, 94] Uses random training datasets (sampled from one dataset) http://statistics.about.com/od/Applications/a/What-Is-Bootstrapping.htm 5

Common Strategy: Bagging ( B ootstrap Agg regat ing ) Consider the data set S = {(x i , y i )} i=1,..,n • Pick a sample S * with replacement of size n (S* called a “bootstrap sample”) • Train on S * to get a classifier f * • Repeat above steps B times to get f 1 , f 2 ,...,f B • Final classifier f(x) = majority {f b (x)} j=1,...,B http://statistics.about.com/od/Applications/a/What-Is-Bootstrapping.htm 6

Bagging decision trees Consider the data set S • Pick a sample S * with replacement of size n • Grow a decision tree T b • Repeat B times to get T 1 ,...,T B • The final classifier will be 8

Random Forests Almost identical to bagging decision trees, except we introduce some randomness: • Randomly pick m of the d available attributes, at every split when growing the tree (i.e., d - m attributes ignored) Bagged random decision trees = Random forests 9

Explicit CV not necessary • Unbiased test error can be estimated using out-of-bag data points (OOB error estimate) • You can still do CV explicitly, but that's not necessary, since research shows that OOB estimate is as accurate Section 15.3.1 of http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr http://stackoverflow.com/questions/18541923/what-is-out-of-bag-error-in-random-forests 10

Important points about random forests Algorithm (hyper) parameters • Usual values for m: • Usual value for B : keep adding trees until training error stabilizes 11

Important points about random forests Algorithm (hyper) parameters • Size/#nodes of each tree • as in when building a decision tree • May randomly pick an attribute, and may even randomly pick the split point! • Significantly simplifies implementation and increases training speed • PERT - Perfect Random Tree Ensembles http://www.interfacesymposia.org/I01/I2001Proceedings/ACutler/ACutler.pdf • Extremely randomized trees http://orbi.ulg.be/bitstream/2268/9357/1/geurts-mlj-advance.pdf 12

Advantages • Efficient and simple training • Allows you to work with simple classifiers • Random-forests generally useful and accurate in practice (one of the best classifiers) • The other is gradient-boosted tree http://fastml.com/what-is-better-gradient-boosted-trees-or-random-forest/ • Embarrassingly parallelizable 13

Final words Reading material • Bagging: ESL Chapter 8.7 • Random forests: ESL Chapter 15 http://www- stat.stanford.edu/~tibs/ElemStatLearn/printing s/ESLII_print10.pdf 14

(Model Combination) Duen Horng (Polo) Chau Associate Professor, - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242: Data & Visual Analytics Ensemble Methods (Model Combination) Duen Horng (Polo) Chau Associate Professor, College of Computing Associate Director, MS Analytics Georgia Tech Mahdi Roozbahani

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Outline Multi-Engine Machine Translation 1 Alignment Search Space Features Match Model

Delivering Disciplined Delivering Disciplined Growth Kinross Friendly Combination Kinross

Proposed Combination with Ascott Residence Trust (the Combination) 26 September 2019

Proposed Combination with Ascott Residence Trust (the Combination) 3 July 2019 Disclaimer

Combination and PDF Fit tools Used in ATLAS. A. Cooper-Sarkar, S. Glazov, V. Radescu, A.

Scrambling as the Combination of Relaxed Context-Free Grammars in a Model-Theoretic Grammar

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Understanding the Literature on Model Selection and Model Combination Yuhong Yang School of

Accounting Todays Agenda Business Combination Accounting - Accounting Refresher - Pushdown

Combination and QCD Analysis of Charm Production Cross Section Measurements in DIS at HERA Kenan

Creating a leading specialist lender in the UK Recommended all-share combination of OSB and

Paddy Pow er Betfair plc Combination of US Business w ith FanDuel 23 May 2018 2 Summary

How to open your combination locker. Created by Chris Riggs and Pete Yackus How to | open

Some Sponsors find a combination Slide 1 Restau Re stauran rant/Fa t/Family mily Ser

Delivering Disciplined Delivering Disciplined Growth Kinross Friendly Combination Kinross

Lunar perturbation of the metric associated to the averaged orbital transfer Bernard Bonnard IHP

AIRS stratospheric temperature retrievals at full horizontal resolution Lars Hoffmann 1 and M.

Continuous Improvement Toolkit 5 Whys Continuous Improvement Toolkit . www.citoolkit.com Managing

ARNES Joe Hanc Arnes, Ljubljana, Slovenija joze.hanc@arnes.si TF-NOC meeting, Ljubljana 15.

Moduli Stabilisation and the Statistics of SUSY Breaking in the Landscape Igor Brckel Summer

INTRODUCTION INTO WORKING WITH R SESSION 1 VERSION 17/11/2019 BENJAMIN ZIEPERT INTRODUCTION

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

The Bose polaron- theory and experiments Georg M. Bruun Aarhus University R. S. Christensen, J.