CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. - - PowerPoint PPT Presentation

cs480 680 lecture 22 july 22 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. - - PowerPoint PPT Presentation

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Ensemble Learning Bagging


slide-1
SLIDE 1

CS480/680 Lecture 22: July 22, 2019

Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

CS480/680 Spring 2019 Pascal Poupart 2

Outline

  • Ensemble Learning

– Bagging – Boosting

University of Waterloo

slide-3
SLIDE 3

CS480/680 Spring 2019 Pascal Poupart 3

Supervised Learning

  • So far…

– K-nearest neighbours – Mixture of Gaussians – Logistic regression – Support vector machines – HMMs – Perceptrons – Neural networks

  • Which technique should we pick?

University of Waterloo

slide-4
SLIDE 4

CS480/680 Spring 2019 Pascal Poupart 4

Ensemble Learning

  • Sometimes each learning technique yields a

different hypothesis

  • But no perfect hypothesis…
  • Could we combine several imperfect

hypotheses into a better hypothesis?

University of Waterloo

slide-5
SLIDE 5

CS480/680 Spring 2019 Pascal Poupart 5

Ensemble Learning

  • Analogies:

– Elections combine voters’ choices to pick a good candidate – Committees combine experts’ opinions to make better decisions

  • Intuitions:

– Individuals often make mistakes, but the “majority” is less likely to make mistakes. – Individuals often have partial knowledge, but a committee can pool expertise to make better decisions.

University of Waterloo

slide-6
SLIDE 6

CS480/680 Spring 2019 Pascal Poupart 6

Ensemble Learning

  • Definition: method to select and combine an

ensemble of hypotheses into a (hopefully) better hypothesis

  • Can enlarge hypothesis space

– Perceptrons

  • linear separators

– Ensemble of perceptrons

  • polytope

University of Waterloo

slide-7
SLIDE 7

CS480/680 Spring 2019 Pascal Poupart 7

Bagging

  • Majority Voting

h2 h1 h3 h4 h5

x

Majority(h1(x),h2(x),h3(x),h4(x),h5(x))

Ensemble of hypotheses instance classification

For the classification to be wrong, at least 3 out of 5 hypotheses have to be wrong

University of Waterloo

slide-8
SLIDE 8

CS480/680 Spring 2019 Pascal Poupart 8

Bagging

  • Assumptions:

– Each hi makes error with probability p – The hypotheses are independent

  • Majority voting of n hypotheses:

– k hypotheses make an error: (k) pk(1-p)n-k – Majority makes an error: Σk>n/2 (k) pk(1-p)n-k – With n=5, p=0.1 è err(majority) < 0.01

n n

University of Waterloo

slide-9
SLIDE 9

CS480/680 Spring 2019 Pascal Poupart 9

Weighted Majority

  • In practice

– Hypotheses rarely independent – Some hypotheses have less errors than others

  • Let’s take a weighted majority
  • Intuition:

– Decrease weight of correlated hypotheses – Increase weight of good hypotheses

University of Waterloo

slide-10
SLIDE 10

CS480/680 Spring 2019 Pascal Poupart 10

Boosting

  • Very popular ensemble technique
  • Computes a weighted majority
  • Can “boost” a “weak learner”
  • Operates on a weighted training set

University of Waterloo

slide-11
SLIDE 11

CS480/680 Spring 2019 Pascal Poupart 11

Weighted Training Set

  • Learning with a weighted training set

– Supervised learning à minimize train. error – Bias algorithm to learn correctly instances with high weights

  • Idea: when an instance is misclassified by a

hypothesis, increase its weight so that the next hypothesis is more likely to classify it correctly

University of Waterloo

slide-12
SLIDE 12

CS480/680 Spring 2019 Pascal Poupart 12

Boosting Framework

  • Set all instance weights wx to 1
  • Repeat

– hi ß learn(dataset, weights) – Increase wx of misclassified instances x

  • Until sufficient number of hypotheses
  • Ensemble hypothesis is the weighted majority
  • f hi’s with weights wi proportional to the

accuracy of hi

University of Waterloo

slide-13
SLIDE 13

CS480/680 Spring 2019 Pascal Poupart 13

Boosting Framework

University of Waterloo

slide-14
SLIDE 14

CS480/680 Spring 2019 Pascal Poupart 14

AdaBoost (Adaptive Boosting)

  • wj ß 1/N "j
  • For m=1 to M do

– hm ß learn(dataset,w) – err ß 0 – For each (xj,yj) in dataset do

  • If hm(xj) ¹ yj then err ß err + wj

– For each (xj,yj) in dataset do

  • If hm(xj) = yj then wj ß wj err / (1-err)

– w ß normalize(w) – zm ß log [(1-err) / err]

  • Return weighted-majority(h,z)

w: vector of N instance weights z: vector of M hypoth. weights

University of Waterloo

slide-15
SLIDE 15

CS480/680 Spring 2019 Pascal Poupart 15

What can we boost?

  • Weak learner: produces hypotheses at least as

good as random classifier.

  • Examples:

– Rules of thumb – Decision stumps (decision trees of one node) – Perceptrons – Naïve Bayes models

University of Waterloo

slide-16
SLIDE 16

CS480/680 Spring 2019 Pascal Poupart 16

Boosting Paradigm

  • Advantages

– No need to learn a perfect hypothesis – Can boost any weak learning algorithm – Boosting is very simple to program – Good generalization

  • Paradigm shift

– Don’t try to learn a perfect hypothesis – Just learn simple rules of thumbs and boost them

University of Waterloo

slide-17
SLIDE 17

CS480/680 Spring 2019 Pascal Poupart 17

Boosting Paradigm

  • When we already have a bunch of hypotheses,

boosting provides a principled approach to combine them

  • Useful for

– Sensor fusion – Combining experts

University of Waterloo

slide-18
SLIDE 18

CS480/680 Spring 2019 Pascal Poupart 18

Applications

  • Any supervised learning task

– Collaborative filtering (Netflix challenge) – Body part recognition (Kinect) – Spam filtering – Speech recognition/natural language processing – Data mining – Etc.

University of Waterloo

slide-19
SLIDE 19

CS480/680 Spring 2019 Pascal Poupart 19

Netflix Challenge

  • Problem: predict movie ratings based on

database of ratings by previous users

  • Launch: 2006

– Goal: improve Netflix predictions by 10% – Grand Prize: 1 million $

University of Waterloo

slide-20
SLIDE 20

CS480/680 Spring 2019 Pascal Poupart 20

Progress

  • 2007: BellKor 8.43% improvement
  • 2008:

– No individual algorithm improves by > 9.43% – Top two teams BellKor and BigChaos unite

  • Start of ensemble learning
  • Jointly improve by > 9.43%
  • June 26, 2009:

– Top 3 teams BellKor, BigChaos and Pragmatic unite – Jointly improve > 10% – 30 days left for anyone to beat them

University of Waterloo

slide-21
SLIDE 21

CS480/680 Spring 2019 Pascal Poupart 21

The Ensemble

  • Formation of “Grand Prize Team”:

– Anyone could join – Share of $1 million grand prize proportional to improvement in team score – Improvement: 9.46%

  • 5 days to the deadline

– “The Ensemble” team is born

  • Union of Grand Prize team and Vanderlay Industries
  • Ensemble of many researchers

University of Waterloo

slide-22
SLIDE 22

CS480/680 Spring 2019 Pascal Poupart 22

Finale

  • Last Day: July 26, 2009
  • 6:18 pm:

– BellKor’s Pragmatic Chaos: 10.06% improv.

  • 6:38 pm:

– The Ensemble: 10.06% improvement

  • Tie breaker: time of submission

University of Waterloo