Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS
Statistical Machine Learning
A Crash Course
Part III: Boosting
- 11.05.2012
Statistical Machine Learning A Crash Course Part III: Boosting - - - PowerPoint PPT Presentation
Statistical Machine Learning A Crash Course Part III: Boosting - 11.05.2012 Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS Combining Classifiers Horse race prediction: Stefan Roth, 11.05.2012 | Department of Computer
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
2
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
than-random rules.
3
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
cases).
4
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
(those most often misclassified by previous rules of thumb)
rules-of-thumb
5
t=1
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
“weak classifiers” with a (training) error of
that has a training error of .
6
≤ 1
2 − γ
≤
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
7
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
8
1st weak classifier reweighted training data
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
9
1st weak classifier 2nd weak classifier reweighted training data
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
10
1st weak classifier 2nd weak classifier 3rd weak classifier
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
11
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
12
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
so that the weighted error with weights is minimized.
where is chosen such that sums to 1.
13
N
Dt
# of boosting rounds
Dt+1
Zt
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
every round.
examples:
14
t=1
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
15
h∈H t
N
with
⇥t ≤ 1
2 − t,
t > 0
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
16
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
17
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
18
N
T
[Freund & Schapire]
T
t=1
i=1
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
19
with
N
N
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
20
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
21
Increase the weight on incorrectly classified examples Decrease the weight on correctly classified examples
1 Zt Dt(i) exp {−αtyiht(xi)}
exp{αtyiht(xi)} =
if yi ⌅= ht(xi) ⇥ 1 if yi = ht(xi)
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
22
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
23
t = 0
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
24
h∈H t
2
t = 1
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
25
t = 2
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
26
t = 3
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
27
t = 4
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
28
t = 5
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
29
t = 6
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
30
t = 7
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
31
t = 40
h∈H t
2
1 Zt Dt(i) exp {−αtyiht(xi)}
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
is compared to random guessing.
can make the boosted classifier perform arbitrarily well (on the training data).
32
2 − ⇥t
T
t=1
T
t=1
t
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
classifiers.
the model starts to overfit the training data.
33
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
0.
34
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
35
(xi,yi)
(xi,yi)
(xi,yi)
T
with
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
difficult cases.
36
Rounds 5 100 1000 Training error 0.0 0.0 0.0 Test error 8.4 3.3 3.1 % margins ≤ 0.5 7.7 0.0 0.0 Minimum margin 0.14 0.52 0.55
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
classifications.
37
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
networks).
38
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
39
γt → 0
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
40
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
41
T
with
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
42
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
43
and translation
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Image source: A. Zisserman
44
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
45
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
First couple of features selected are quite intuitive.
46
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS | 47
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Viola, Jones and Snow, ICCV’03
48
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Viola, Jones and Snow, ICCV’03
49
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Viola, Jones and Snow, ICCV’03
Examples of simple linear filters. Many different possible filters of this type. 24x24 windows applied at multiple scales. 45,396 possible features in each window.
50
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Viola, Jones and Snow, ICCV’03
51
Stefan Roth, 11.05.2012 | Department of Computer Science | GRIS |
Viola, Jones and Snow, ICCV’03
52