CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud - PowerPoint PPT Presentation

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22

Boosting Recall that an ensemble is a set of predictors whose individual decisions are combined in some way to classify new examples. (Previous lecture) Bagging : Train classifiers independently on random subsets of the training data. (This lecture) Boosting : Train classifiers sequentially, each time focusing on training data points that were previously misclassified. Let us start with the concept of weak learner/classifier (or base classifiers). UofT CSC 411: 05-Ensembles II 2 / 22

Weak Learner/Classifier (Informal) Weak learner is a learning algorithm that outputs a hypothesis (e.g., a classifier) that performs slightly better than chance, e.g., it predicts the correct label with probability 0 . 6. We are interested in weak learners that are computationally efficient. ◮ Decision trees ◮ Even simpler: Decision Stump: A decision tree with only a single split [Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that its guarantee holds only probabilistically.] UofT CSC 411: 05-Ensembles II 3 / 22

Weak Classifiers These weak classifiers, which are decision stumps, consist of the set of horizontal and vertical half spaces. Vertical half spaces Horizontal half spaces UofT CSC 411: 05-Ensembles II 4 / 22

Weak Classifiers Vertical half spaces Horizontal half spaces A single weak classifier is not capable of making the training error very small. It only perform slightly better than chance, i.e., the error of classifier h according to the given weights w = ( w 1 , . . . , w N ) (with � N i =1 w i = 1 and w i ≥ 0) N � err = w i I { h ( x i ) � = y i } i =1 is at most 1 2 − γ for some γ > 0. Can we combine a set of weak classifiers in order to make a better ensemble of classifiers? Boosting: Train classifiers sequentially, each time focusing on training data points that were previously misclassified. UofT CSC 411: 05-Ensembles II 5 / 22

AdaBoost (Adaptive Boosting) Key steps of AdaBoost: 1. At each iteration we re-weight the training samples by assigning larger weights to samples (i.e., data points) that were classified incorrectly. 2. We train a new weak classifier based on the re-weighted samples. 3. We add this weak classifier to the ensemble of classifiers. This is our new classifier. 4. We repeat the process many times. The weak learner needs to minimize weighted error. AdaBoost reduces bias by making each classifier focus on previous mistakes. UofT CSC 411: 05-Ensembles II 6 / 22

AdaBoost Example Training data [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 7 / 22

AdaBoost Example Round 1 � 1 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } 10 , . . . , 1 � = 3 w = ⇒ Train a classifier (using w ) ⇒ err 1 = � N 10 10 i =1 w i ⇒ α 1 = 1 2 log 1 − err 1 = 1 2 log( 1 0 . 3 − 1) ≈ 0 . 42 ⇒ H ( x ) = sign ( α 1 h 1 ( x )) err 1 [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 8 / 22

AdaBoost Example Round 2 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } w = updated weights ⇒ Train a classifier (using w ) ⇒ err 2 = = 0 . 21 � N i =1 w i ⇒ α 2 = 1 2 log 1 − err 3 = 1 1 2 log( 0 . 21 − 1) ≈ 0 . 66 ⇒ H ( x ) = sign ( α 1 h 1 ( x ) + α 2 h 2 ( x )) err 3 UofT CSC 411: 05-Ensembles II 9 / 22 [Slide credit: Verma & Thrun]

AdaBoost Example Round 3 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } w = updated weights ⇒ Train a classifier (using w ) ⇒ err 3 = = 0 . 14 � N i =1 w i ⇒ α 3 = 1 2 log 1 − err 2 = 1 1 2 log( 0 . 14 − 1) ≈ 0 . 91 ⇒ H ( x ) = sign ( α 1 h 1 ( x ) + α 2 h 2 ( x ) + α 3 h 3 ( x )) err 2 [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 10 / 22

AdaBoost Example Final classifier [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 11 / 22

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud - PowerPoint PPT Presentation

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22 Boosting Recall that an ensemble is a set of predictors whose individual decisions are

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Closing Thoughts Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411 Lecture 8: Linear Classification II Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 13: Probabilistic Models I Roger Grosse, Amir-massoud Farahmand, and Juan

Ensemble methods CS 446 Why ensembles? Standard machine learning setup: We have some data.

Takahito Aoto Nao Hirokawa Naoki Nishida Julian Nagele Overview CoCo 2016 Tools Cops

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Hereditary Substitution for the -Calculus Harley Eades and Aaron Stump Computer Science

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

2 3 4 Extension of the presentation at ARMS@AAMAS 2015 5 6 7 8 Compute new deployment

343H: Honors AI Lecture 24: ML: Decision trees and neural networks 4/22/2014 Kristen Grauman

European HTCondor Workshop December 2014 summary Ian Collier (Brial Bockelman, Greg Thain, Todd

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud - PowerPoint PPT Presentation

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22 Boosting Recall that an ensemble is a set of predictors whose individual decisions are

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun &amp; Rich Zemels lectures

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 20: Closing Thoughts Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun &amp; Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &amp;

CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 14: Principal Components Analysis &amp; Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun &amp; Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun &amp; Rich Zemels lectures

CSC 411 Lecture 8: Linear Classification II Roger Grosse, Amir-massoud Farahmand, and Juan

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun &amp; Rich Zemels lectures Sanja

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CSC 411 Lecture 13: Probabilistic Models I Roger Grosse, Amir-massoud Farahmand, and Juan

Ensemble methods CS 446 Why ensembles? Standard machine learning setup: We have some data.

Takahito Aoto Nao Hirokawa Naoki Nishida Julian Nagele Overview CoCo 2016 Tools Cops

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Hereditary Substitution for the -Calculus Harley Eades and Aaron Stump Computer Science

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

2 3 4 Extension of the presentation at ARMS@AAMAS 2015 5 6 7 8 Compute new deployment

343H: Honors AI Lecture 24: ML: Decision trees and neural networks 4/22/2014 Kristen Grauman

European HTCondor Workshop December 2014 summary Ian Collier (Brial Bockelman, Greg Thain, Todd

CSC 411: Lecture 02: Linear Regression Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411: Lecture 19: Reinforcement Learning Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 08: Generative Models for Classification Class based on Raquel Urtasun &

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Class based on Raquel

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja