Ensemble methods CS 446 Why ensembles? Standard machine learning - PowerPoint PPT Presentation

Ensemble methods CS 446

Why ensembles? Standard machine learning setup: ◮ We have some data. ◮ We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). ◮ We output the best on a validation set. 1 / 27

Why ensembles? Standard machine learning setup: ◮ We have some data. ◮ We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). ◮ We output the best on a validation set. Question: can we do better than the best? 1 / 27

Why ensembles? Standard machine learning setup: ◮ We have some data. ◮ We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). ◮ We output the best on a validation set. Question: can we do better than the best? What if we use an ensemble/aggregate/combination ? 1 / 27

Why ensembles? Standard machine learning setup: ◮ We have some data. ◮ We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). ◮ We output the best on a validation set. Question: can we do better than the best? What if we use an ensemble/aggregate/combination ? We’ll consider two approaches: boosting and bagging . 1 / 27

Bagging 2 / 27

Bagging? This first approach is based upon a simple idea: ◮ If the predictors have indepedent errors , a majority vote of their outputs should be good. Let’s first check this. 3 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 10 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 20 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.000 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 30 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 40 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 50 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . #classifiers = n = 60 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Red: all classifiers wrong. #classifiers = n = 2, fraction red = 0.16 0.5 0.4 0.3 0.2 0.1 0.0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Red: all classifiers wrong. #classifiers = n = 3, fraction red = 0.064 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Red: all classifiers wrong. #classifiers = n = 6, fraction red = 0.004096 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Red: all classifiers wrong. #classifiers = n = 7, fraction red = 0.0016384 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 10, fraction green = 0.366897 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 20, fraction green = 0.244663 0.175 0.150 0.125 0.100 0.075 0.050 0.025 0.000 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 30, fraction green = 0.175369 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 40, fraction green = 0.129766 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 50, fraction green = 0.0978074 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Combining classifiers Suppose we have n classifiers. Suppose each is wrong independently with probability 0 . 4 . Model classifier errors as random variables ( Z i ) n i =1 (thus E ( Z i ) = 0 . 4 ). We can model the distribution of errors with Binom ( n, 0 . 4) . Green: at least half classifiers wrong. #classifiers = n = 60, fraction green = 0.0746237 0.10 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 4 / 27

Ensemble methods CS 446 Why ensembles? Standard machine learning - PowerPoint PPT Presentation

Ensemble methods CS 446 Why ensembles? Standard machine learning setup: We have some data. We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). We output the best on a validation set. 1 / 27 Why ensembles? Standard

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Takahito Aoto Nao Hirokawa Naoki Nishida Julian Nagele Overview CoCo 2016 Tools Cops

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Hereditary Substitution for the -Calculus Harley Eades and Aaron Stump Computer Science

Frdric Mothe, Gilles Le Mogudec, Grard Nepveu, Emmanuel Bucket LERFOB : Laboratoire

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

2 3 4 Extension of the presentation at ARMS@AAMAS 2015 5 6 7 8 Compute new deployment

343H: Honors AI Lecture 24: ML: Decision trees and neural networks 4/22/2014 Kristen Grauman

Ensemble methods CS 446 Why ensembles? Standard machine learning - PowerPoint PPT Presentation

Ensemble methods CS 446 Why ensembles? Standard machine learning setup: We have some data. We train 10 predictors (3-nn, least squares, SVM, ResNet, . . . ). We output the best on a validation set. 1 / 27 Why ensembles? Standard

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Ensemble Methods Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1.

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math &amp; CS, Emory

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun &amp; Rich Zemels

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Takahito Aoto Nao Hirokawa Naoki Nishida Julian Nagele Overview CoCo 2016 Tools Cops

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Hereditary Substitution for the -Calculus Harley Eades and Aaron Stump Computer Science

Frdric Mothe, Gilles Le Mogudec, Grard Nepveu, Emmanuel Bucket LERFOB : Laboratoire

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

2 3 4 Extension of the presentation at ARMS@AAMAS 2015 5 6 7 8 Compute new deployment

343H: Honors AI Lecture 24: ML: Decision trees and neural networks 4/22/2014 Kristen Grauman

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory

CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are