CSC 411: Lecture 17: Ensemble Methods I
Class based on Raquel Urtasun & Rich Zemel’s lectures Sanja Fidler
University of Toronto
March 23, 2016
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 1 / 34
CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel - - PowerPoint PPT Presentation
CSC 411: Lecture 17: Ensemble Methods I Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto March 23, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 1 / 34 Today
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 1 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 2 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 3 / 34
◮ Parallel training with different training sets
◮ Sequential training, iteratively re-weighting training examples so
◮ Parallel training with objective encouraging division of labor: mixture
◮ Also known as meta-learning ◮ Typically applied to weak models, such as decision stumps (single-node
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 4 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 5 / 34
◮ reduce sensitivity to individual data points
◮ Averaging models can reduce bias substantially by increasing capacity,
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 6 / 34
◮ Accurate (better than guessing) ◮ Diverse (different errors on new examples)
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 7 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 8 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 9 / 34
◮ ”Our experience is that most efforts should be concentrated in deriving
◮ ”We strongly believe that the success of an ensemble approach
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 10 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 11 / 34
bag(x) = 1
M
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 12 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 13 / 34
◮ That seems like a weak assumption but beware!
◮ Theorists showed how to do this and it actually led to an effective new
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 14 / 34
◮ How do we re-weight the data?
◮ How do we weight the models in the committee? Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 15 / 34
n
N
n [ym(xn) = t(n)]
0 o.w.
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 16 / 34
N
n [ym(xn) = t(n)]
0 o.w.
n
n
n exp{αm[ym(x(n)) = t(n)]}
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 17 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 18 / 34
n=1, and WeakLearn: learning procedure, produces classifier y(x)
n (x) = 1/N
◮ ym(x) = WeakLearn({x}, t, w), fit classifier by minimizing
N
n [ym(xn) = t(n)] ◮ Compute unnormalized error rate
N
n [ym(xn) = t(n)] ◮ Compute classifier coefficient αm = 1 2 log 1−ǫm ǫm ◮ Update data weights
n
n exp{−αmt(n)ym(x(n))}
n=1 w m n exp{−αmt(n)ym(x(n))}
M
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 19 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 20 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 21 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 22 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 23 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 24 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 25 / 34
◮ stagewise additive modeling (Friedman et. al. 2000)
N
m
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 26 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 27 / 34
N
m
m−1
N
N
n exp
n = exp
CSC 411: 17-Ensemble Methods I March 23, 2016 28 / 34
N
n exp
2
right
n + e
αm 2
n
αm 2 − e −αm 2
n [t(n) = ym(x(n))]
2
n
n
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 29 / 34
n=1, and WeakLearn: learning procedure, produces classifier y(x)
n (x) = 1/N
◮ ym(x) = WeakLearn({x}, t, w), fit classifier by minimizing
N
n [ym(xn) = t(n)] ◮ Compute unnormalized error rate
N
n [ym(xn) = t(n)] ◮ Compute classifier coefficient αm = 1 2 log 1−ǫm ǫm ◮ Update data weights
n
n exp{−αmt(n)ym(x(n))}
n exp{−αmt(n)ym(x(n))}
M
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 30 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 31 / 34
◮ There is a neat trick for computing the total intensity in a rectangle in
◮ So its easy to evaluate a huge number of base classifiers and they are
◮ The algorithm adds classifiers greedily based on their quality on the
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 32 / 34
◮ Pre-define weak classifiers, so optimization=selection ◮ Change loss function for weak learners: false positives less costly than
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 33 / 34
Urtasun, Zemel, Fidler (UofT) CSC 411: 17-Ensemble Methods I March 23, 2016 34 / 34