CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Outline • Ensemble Learning – Bagging – Boosting University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Supervised Learning • So far… – K-nearest neighbours – Mixture of Gaussians – Logistic regression – Support vector machines – HMMs – Perceptrons – Neural networks • Which technique should we pick? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Ensemble Learning • Sometimes each learning technique yields a different hypothesis • But no perfect hypothesis… • Could we combine several imperfect hypotheses into a better hypothesis? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Ensemble Learning • Analogies: – Elections combine voters’ choices to pick a good candidate – Committees combine experts’ opinions to make better decisions • Intuitions: – Individuals often make mistakes, but the “majority” is less likely to make mistakes. – Individuals often have partial knowledge, but a committee can pool expertise to make better decisions. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Ensemble Learning • Definition: method to select and combine an ensemble of hypotheses into a (hopefully) better hypothesis • Can enlarge hypothesis space – Perceptrons • linear separators – Ensemble of perceptrons • polytope University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Bagging • Majority Voting instance x h 1 classification h 2 Majority(h 1 (x),h 2 (x),h 3 (x),h 4 (x),h 5 (x)) h 3 h 4 h 5 For the classification to be wrong, at least 3 out of 5 hypotheses have to be wrong Ensemble of hypotheses University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Bagging • Assumptions: – Each h i makes error with probability p – The hypotheses are independent • Majority voting of n hypotheses: n – k hypotheses make an error: ( k ) p k (1-p) n-k n – Majority makes an error: Σ k>n/2 ( k ) p k (1-p) n-k – With n=5, p=0.1 è err(majority) < 0.01 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Weighted Majority • In practice – Hypotheses rarely independent – Some hypotheses have less errors than others • Let’s take a weighted majority • Intuition: – Decrease weight of correlated hypotheses – Increase weight of good hypotheses University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Boosting • Very popular ensemble technique • Computes a weighted majority • Can “boost” a “weak learner” • Operates on a weighted training set University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Weighted Training Set • Learning with a weighted training set – Supervised learning à minimize train. error – Bias algorithm to learn correctly instances with high weights • Idea: when an instance is misclassified by a hypothesis, increase its weight so that the next hypothesis is more likely to classify it correctly University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Boosting Framework • Set all instance weights w x to 1 • Repeat – h i ß learn(dataset, weights) – Increase w x of misclassified instances x • Until sufficient number of hypotheses • Ensemble hypothesis is the weighted majority of h i ’s with weights w i proportional to the accuracy of h i University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Boosting Framework University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

AdaBoost (Adaptive Boosting) • w j ß 1/N " j w: vector of N instance weights • For m=1 to M do z: vector of M hypoth. weights – h m ß learn(dataset,w) – err ß 0 – For each (x j ,y j ) in dataset do • If h m (x j ) ¹ y j then err ß err + w j – For each (x j ,y j ) in dataset do • If h m (x j ) = y j then w j ß w j err / (1-err) – w ß normalize(w) – z m ß log [(1-err) / err] • Return weighted-majority(h,z) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

What can we boost? • Weak learner: produces hypotheses at least as good as random classifier. • Examples: – Rules of thumb – Decision stumps (decision trees of one node) – Perceptrons – Naïve Bayes models University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Boosting Paradigm • Advantages – No need to learn a perfect hypothesis – Can boost any weak learning algorithm – Boosting is very simple to program – Good generalization • Paradigm shift – Don’t try to learn a perfect hypothesis – Just learn simple rules of thumbs and boost them University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Boosting Paradigm • When we already have a bunch of hypotheses, boosting provides a principled approach to combine them • Useful for – Sensor fusion – Combining experts University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Applications • Any supervised learning task – Collaborative filtering (Netflix challenge) – Body part recognition (Kinect) – Spam filtering – Speech recognition/natural language processing – Data mining – Etc. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Netflix Challenge • Problem: predict movie ratings based on database of ratings by previous users • Launch: 2006 – Goal: improve Netflix predictions by 10% – Grand Prize: 1 million $ University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Progress • 2007: BellKor 8.43% improvement • 2008: – No individual algorithm improves by > 9.43% – Top two teams BellKor and BigChaos unite • Start of ensemble learning • Jointly improve by > 9.43% • June 26, 2009: – Top 3 teams BellKor, BigChaos and Pragmatic unite – Jointly improve > 10% – 30 days left for anyone to beat them University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

The Ensemble • Formation of “Grand Prize Team”: – Anyone could join – Share of $1 million grand prize proportional to improvement in team score – Improvement: 9.46% • 5 days to the deadline – “The Ensemble” team is born • Union of Grand Prize team and Vanderlay Industries • Ensemble of many researchers University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Finale • Last Day: July 26, 2009 • 6:18 pm: – BellKor’s Pragmatic Chaos: 10.06% improv. • 6:38 pm: – The Ensemble: 10.06% improvement • Tie breaker: time of submission University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Ensemble Learning Bagging

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Fran

Information market based recommender systems fusion Efthimios Bothos Konstantinos Christidis

Ensemble and Boosting Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Anon-Pass: Practical Anonymous Subscriptions Michael Z. Lee , Alan M. Dunn , Jonathan Katz

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA & Co-founder at Dalibo

CMSC320 Introduction to Data One thing we might want to What data is available? What is the

Sambuz

Useful Links

Newsletter

Mail Us

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Ensemble Learning Bagging

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Fran

Information market based recommender systems fusion Efthimios Bothos Konstantinos Christidis

Ensemble and Boosting Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Anon-Pass: Practical Anonymous Subscriptions Michael Z. Lee , Alan M. Dunn , Jonathan Katz

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA &amp; Co-founder at Dalibo

CMSC320 Introduction to Data One thing we might want to What data is available? What is the

Sambuz

Useful Links

Newsletter

Mail Us

Anonymization Beyond GDPR 1 WHO I AM Damien Clochard PostgreSQL DBA & Co-founder at Dalibo