Ensemble Methods Roman Kern KDDM2 Roman Kern, ISDS, TU Graz 1 - PDF document

www.tugraz.at > Motivation : Consider Kaggle, routinely the winners employ Ensemble Methods ensembles to gain an advantage. > Goal : In this lecture, the main approaches for ensembles will SCIENCE be presented and their main assumptions. PASSION TECHNOLOGY Ensemble Methods Roman Kern KDDM2 Roman Kern, ISDS, TU Graz 1 > www.tugraz.at KDDM2 > Ensembles can be utilised in a supervised, as well as unsuper- www.tugraz.at Ensemble Methods vised seting. Outline > Ensembles play an important part in data science . 1 Introduction 2 Classification 3 Clustering Roman Kern, ISDS, TU Graz 2 KDDM2 www.tugraz.at Introduction Motivation & Basics Roman Kern, ISDS, TU Graz 3 KDDM2 www.tugraz.at Introduction Ensemble Methods Intro Qick facts Basic Idea: Have multiple models and a method to combine them into a single one. Predominately used in classification and regression Sometimes called: combined models, meta learning, commitee machines, multiple classifier systems Ensemble methods do have a long history and used in statistics for more than 200 years Roman Kern, ISDS, TU Graz 4 KDDM2

www.tugraz.at > ... or integrate different sources of evidence. Introduction > One might not always aware of working with an ensemble. Ensemble Methods Intro > Page https://xgboost.readthedocs.io/en/latest/ tutorials/model.html gives a nice example of an ensemble Types of ensembles method. > Goal: Predict if someone likes computer games. ... different hypothesis > First tree is built upon the age, and the second one on the daily commute behaviour. ... different algorithms > The prediction is then based on their combination . > In some ensemble the hypothesis changes during learning ... different parts of the data set (e.g., boosting, learning to correct the errors of the other ensemble members) Roman Kern, ISDS, TU Graz 5 KDDM2 > Do you need more data ? No (but it certainly helps). www.tugraz.at Introduction Basic Approaches Ensemble Methods Intro • Averaging Motivation • Voting ... as every model has its limitations • Probabilistic methods Goal: combine the strength of all models e.g., improve the accuracy of using an ensemble e.g., be more robust in regard to noise Roman Kern, ISDS, TU Graz 6 KDDM2 www.tugraz.at Introduction Ensemble Methods Intro Combination of Models Need a function to combine the results from the models For real values output Linear combination Product rule For categorical output, e.g. class labels Majority vote Roman Kern, ISDS, TU Graz 7 KDDM2 > Assuming a dataset comprising independent variables x , and www.tugraz.at Introduction dependent variables y , Ensemble Methods Intro > ... with the goal to predict y , given x (i.e., discriminative classifier) Linear combination > The simplest form such a function is a linear combination of the models’ output f t , i.e. a weighted average . Simple form of combining the output of an ensemble > ... and its combination g . Given T models, f t ( y | x ) g ( y | x ) = � T t = 1 w t f t ( y | x ) Problem of estimating the optimal weights ( w t ) e.g., simple solution: use the uniform distribution: w t = 1 / T Roman Kern, ISDS, TU Graz 8 KDDM2

www.tugraz.at Introduction Ensemble Methods Intro Product rule Alternative form of combining the output of an ensemble � T g ( y | x ) = 1 t = 1 f t ( y | x ) w t Z ... where Z is a normalisation factor Again, estimating the weights is non-trivial Roman Kern, ISDS, TU Graz 9 KDDM2 > Like the other two previous cases, this is just one example. www.tugraz.at Introduction > The exact way the models are combined is an essential part of Ensemble Methods Intro the ensemble. Majority Vote Combining the output, if categorical The models produce a label as output, e.g. h t ( x ) ∈ { + 1 , − 1 } H ( x ) = sign ( � T t = 1 w t h t ( x )) If the weights are non-uniform, it is a weighted vote Roman Kern, ISDS, TU Graz 10 KDDM2 www.tugraz.at > Key insights, which will be later analysed more closely. Introduction > ... we need diversity. Ensemble Methods Intro > Simple explanation : Just using the very same model multiple times will not improve our results. Selection of models > Most of the methods implicitly integrate diversity. The models should not be identical, i.e. produce identical results ... therefore an ensemble should represent a degree of diversity Two basic types of achieving this diversity Implicitly , e.g. by integrating randomness (bagging) Explicitly , e.g. integrate variance into the process (boosting) Roman Kern, ISDS, TU Graz 11 KDDM2 www.tugraz.at Introduction Ensemble Methods Intro Motivation for ensemble methods (1/2) Statistical Large number of hypothesis (in relation to training data-set) Not clear, which hypothesis is the best Using an ensemble reduces the risk of picking a bad model Roman Kern, ISDS, TU Graz 12 KDDM2

www.tugraz.at Introduction Ensemble Methods Intro Motivation for ensemble methods (2/2) Computational Avoid local minima Partially addressed by heuristics Representational A single model/hypothesis might not be able to represent the data Dieterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems (pp. 1-15). Roman Kern, ISDS, TU Graz 13 KDDM2 www.tugraz.at Classification Ensemble Methods for Classification Roman Kern, ISDS, TU Graz 14 KDDM2 www.tugraz.at > It depends on the combination, whether one can separate the Classification two terms. Diversity Underlying question How much of the ensemble prediction is due to the accuracies of the individual models and how much due to their combination ? → express the ensemble error as two terms: Error of individual models Impact of interactions, the diversity Roman Kern, ISDS, TU Graz 15 KDDM2 > The lhs represents the difference b/w the prediction of the (en- www.tugraz.at Classification semble) method g () and the ground truth d . Diversity > Actually there is a tradeoff of bias, variance and covariance, known as accuracy-diversity dilemma. Regression error for the linear combination Squared error of the ensemble regression ( g ( x ) − d ) 2 = 1 t = 1 ( g t ( x ) − d ) 2 − 1 � T � T t = 1 ( g t ( x ) − g ( x )) 2 T T First term: error of the individual models Second term: interactions between the predictions ... the ambiguity, ≥ 0 → Therefore it is preferable to increase the ambiguity (diversity) Roman Kern, ISDS, TU Graz 16 KDDM2 Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross-validation and active learning. In Advances in neural information processing systems (pp. 231–238). Cambridge, MA: MIT Press. Kuncheva,

www.tugraz.at > The bigger the correlation is b/w the models (i.e., the more Classification similar they are), the higher the error. Diversity > So, independent models should be preferred (as long their individual, respective error is sufficiently small). Classification error for the linear combination > ... later we see that sufficiently small is just beter than random guessing. For a simple averaging ensemble (and some assumptions) e ave = e add ( 1 + δ ( T − 1 ) ) T ... where e add is the error of the individual model ... and δ being the correlation between the models Tumer, K., & Ghosh, J. (1996). Error correlation and error reduction in ensemble classifiers. Connection Science 8(3–4), 385–403. Roman Kern, ISDS, TU Graz 17 KDDM2 > Weak learner might be just beter than random guessing. www.tugraz.at Classification Approaches Basic Approaches Bagging - combines strong learners → reduce variance Boosting - combines weak learners → reduce bias Many more: mixture of experts, cascades, ... Roman Kern, ISDS, TU Graz 18 KDDM2 www.tugraz.at > Sample from the dataset will create subsets that should be Classification independent. Bootstrap > Of course the dataset needs to be sufficiently large. Bootstrap Sampling Create a distribution of data-sets from a single dataset If used within ensemble methods, it is typically called bagging Simple approach, but has shown to increase performance Davison, A. C., & Hinkley, D. (2006). Bootstrap methods and their applications (8th ed.). Cambridge: Cambridge Series in Statistical and Probabilistic Mathematics Roman Kern, ISDS, TU Graz 19 KDDM2 > → not so good for simple models. www.tugraz.at Classification Bagging Bagging Each member of the ensemble is generated by a different dataset Good for unstable models ... where small differences in the input dataset yield big differences in output Also known as high variance models Note: Bagging is an abbreviation for bootstrap aggregating Breiman, L. (1998). Arcing classifiers. Annals of Statistics, 26(3), 801–845. Roman Kern, ISDS, TU Graz 20 KDDM2

Ensemble Methods Roman Kern KDDM2 Roman Kern, ISDS, TU Graz 1 - PDF document

www.tugraz.at > Motivation : Consider Kaggle, routinely the winners employ Ensemble Methods ensembles to gain an advantage. > Goal : In this lecture, the main approaches for ensembles will SCIENCE be presented and their main assumptions.

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Allowing the user to move a region

Open Source Home Automation An Introduction to Home Assistant Rob Peck (rob@robpeck.com) What

Earth observation image processing with the ORFEO ToolBox Remote sensing real image processing M.

Design and Use of RegESM: Reg ional E arth S ystem M odel Ufuk UtkuTuruncoglu 1,2 (1) Istanbul

Agenda How to solve and answer in-depth examination tasks Preparing For and Writing A

3+ Snippets on Learning Features: Material for Brainstorming

Protected EAP-TLV IETF 56 (internet-draft-salowey-eap-protectedtlv-01.txt) Joe Salowey

Fixed-Parameter Algorithms, IA166 Sebastian Ordyniak Faculty of Informatics Masaryk University

Ensemble Methods Roman Kern KDDM2 Roman Kern, ISDS, TU Graz 1 - PDF document

www.tugraz.at > Motivation : Consider Kaggle, routinely the winners employ Ensemble Methods ensembles to gain an advantage. > Goal : In this lecture, the main approaches for ensembles will SCIENCE be presented and their main assumptions.

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun &amp; Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Allowing the user to move a region

Open Source Home Automation An Introduction to Home Assistant Rob Peck (rob@robpeck.com) What

Earth observation image processing with the ORFEO ToolBox Remote sensing real image processing M.

Design and Use of RegESM: Reg ional E arth S ystem M odel Ufuk UtkuTuruncoglu 1,2 (1) Istanbul

Agenda How to solve and answer in-depth examination tasks Preparing For and Writing A

3+ Snippets on Learning Features: Material for Brainstorming

Protected EAP-TLV IETF 56 (internet-draft-salowey-eap-protectedtlv-01.txt) Joe Salowey

Fixed-Parameter Algorithms, IA166 Sebastian Ordyniak Faculty of Informatics Masaryk University

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are