CSC 411 Lecture 5: Ensembles II
Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla
University of Toronto
UofT CSC 411: 05-Ensembles II 1 / 22
CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud - - PowerPoint PPT Presentation
CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22 Boosting Recall that an ensemble is a set of predictors whose individual decisions are
UofT CSC 411: 05-Ensembles II 1 / 22
UofT CSC 411: 05-Ensembles II 2 / 22
◮ Decision trees ◮ Even simpler: Decision Stump: A decision tree with only a single split [Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that its guarantee holds only probabilistically.] UofT CSC 411: 05-Ensembles II 3 / 22
UofT CSC 411: 05-Ensembles II 4 / 22
Vertical half spaces Horizontal half spaces
i=1 wi = 1 and
N
2 − γ for some γ > 0.
UofT CSC 411: 05-Ensembles II 5 / 22
UofT CSC 411: 05-Ensembles II 6 / 22
UofT CSC 411: 05-Ensembles II 7 / 22
i=1 wiI{h1(x(i)) = t(i)}
i=1 wi
UofT CSC 411: 05-Ensembles II 8 / 22
i=1 wiI{h1(x(i)) = t(i)}
i=1 wi
UofT CSC 411: 05-Ensembles II 9 / 22
i=1 wiI{h1(x(i)) = t(i)}
i=1 wi
UofT CSC 411: 05-Ensembles II 10 / 22
UofT CSC 411: 05-Ensembles II 11 / 22
UofT CSC 411: 05-Ensembles II 12 / 22
i=1, weak classifier WeakLearn (a classification
N for i = 1, . . . , N
◮ Fit a classifier to data using weighted samples (ht ← WeakLearn(DN, w)),
h∈H N
◮ Compute weighted error errt = N
i=1 wi I{ht(x(i))=t(i)}
N
i=1 wi
◮ Compute classifier coefficient αt = 1 2 log 1−errt errt ◮ Update data weights
t=1 αtht(x)
CSC 411: 05-Ensembles II 13 / 22
UofT CSC 411: 05-Ensembles II 14 / 22
2 − γ for all t = 1, . . . , T with γ > 0. The training error of the
t=1 αtht(x)
N
UofT CSC 411: 05-Ensembles II 15 / 22
UofT CSC 411: 05-Ensembles II 16 / 22
[Slide credit: Robert Shapire’s Slides, http://www.cs.princeton.edu/courses/archive/spring12/cos598A/schedule.html ] UofT CSC 411: 05-Ensembles II 17 / 22
◮ There is a neat trick for computing the total intensity in a rectangle in
◮ So it is easy to evaluate a huge number of base classifiers and they are
◮ The algorithm adds classifiers greedily based on their quality on the
UofT CSC 411: 05-Ensembles II 18 / 22
◮ Pre-define weak classifiers, so optimization=selection ◮ Change loss function for weak learners: false positives less costly than
◮ Smart way to do inference in real-time (in 2001 hardware) UofT CSC 411: 05-Ensembles II 19 / 22
UofT CSC 411: 05-Ensembles II 20 / 22
UofT CSC 411: 05-Ensembles II 21 / 22
◮ Reduces bias ◮ Increases variance (large ensemble can cause overfitting) ◮ Sequential ◮ High dependency between ensemble elements
◮ Reduces variance (large ensemble can’t cause overfitting) ◮ Bias is not changed (much) ◮ Parallel ◮ Want to minimize correlation between ensemble elements.
UofT CSC 411: 05-Ensembles II 22 / 22