CS 6316 Machine Learning
Boosting
Yangfeng Ji
Department of Computer Science University of Virginia
CS 6316 Machine Learning Boosting Yangfeng Ji Department of - - PowerPoint PPT Presentation
CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of Virginia Overview The Bias-Variance Decomposition The expected error is decomposed as 2 { h ( x , S ) E [ h ( x , S )]} 2 + {
Department of Computer Science University of Virginia
2
◮ Boosting: start with simple classifiers, and gradually
make a powerful one
◮ Bagging: create multiple copies of data and train
classifiers on each of them, then combine them together
3
4
6
7
DS {b · sign(x·,j − θ) : θ ∈ R, j ∈ [d]}
8
DS {b · sign(x·,j − θ) : θ ∈ R, j ∈ [d]}
DS with j 1 and b +1
x1 x2
8
m
9
m
m, then
i1 1[h(xi) yi]
9
◮ Sort training examples, such that
x1,j ≤ x2,j ≤ · · · ≤ xm,j (8)
10
◮ Sort training examples, such that
x1,j ≤ x2,j ≤ · · · ≤ xm,j (8)
◮ Define
Θj {
xi,j+xi+1,j 2
: i ∈ [m − 1]} ∪ {(x1,j − 1), (xm,j + 1)}
10
◮ Sort training examples, such that
x1,j ≤ x2,j ≤ · · · ≤ xm,j (8)
◮ Define
Θj {
xi,j+xi+1,j 2
: i ∈ [m − 1]} ∪ {(x1,j − 1), (xm,j + 1)}
◮ Try each θ′ ∈ Θj and find the minimal risk with j
LD(hθ′,j,b)
m
Di · 1[hθ′,j(xi) yi] (9)
10
◮ Sort training examples, such that
x1,j ≤ x2,j ≤ · · · ≤ xm,j (8)
◮ Define
Θj {
xi,j+xi+1,j 2
: i ∈ [m − 1]} ∪ {(x1,j − 1), (xm,j + 1)}
◮ Try each θ′ ∈ Θj and find the minimal risk with j
LD(hθ′,j,b)
m
Di · 1[hθ′,j(xi) yi] (9)
10
x1 x2
11
x1 x2
11
x1 x2
11
x1 x2
11
13
T
13
T
13
1: Input: S {(x1, y1), . . . , (xm, ym))}, weak learner A,
2: Initialize D(1) ( 1
m , . . . , 1 m)
3: for t 1, . . . , T do 8: end for 9: Output: the hypothesis hS(x) sign(T
t1 wtht(x))
14
1: Input: S {(x1, y1), . . . , (xm, ym))}, weak learner A,
2: Initialize D(1) ( 1
m , . . . , 1 m)
3: for t 1, . . . , T do 4:
8: end for 9: Output: the hypothesis hS(x) sign(T
t1 wtht(x))
14
1: Input: S {(x1, y1), . . . , (xm, ym))}, weak learner A,
2: Initialize D(1) ( 1
m , . . . , 1 m)
3: for t 1, . . . , T do 4:
5:
i1 D(t) i 1[ht(xi) yi]
8: end for 9: Output: the hypothesis hS(x) sign(T
t1 wtht(x))
14
1: Input: S {(x1, y1), . . . , (xm, ym))}, weak learner A,
2: Initialize D(1) ( 1
m , . . . , 1 m)
3: for t 1, . . . , T do 4:
5:
i1 D(t) i 1[ht(xi) yi]
6:
2 log( 1 ǫt − 1)
8: end for 9: Output: the hypothesis hS(x) sign(T
t1 wtht(x))
14
1: Input: S {(x1, y1), . . . , (xm, ym))}, weak learner A,
2: Initialize D(1) ( 1
m , . . . , 1 m)
3: for t 1, . . . , T do 4:
5:
i1 D(t) i 1[ht(xi) yi]
6:
2 log( 1 ǫt − 1)
7:
i
i
j1 D(t) j exp(−wt yjht(xj))
8: end for 9: Output: the hypothesis hS(x) sign(T
t1 wtht(x))
14
(a) t 1
15
(a) t 1 (b) t 2
15
(a) t 1 (b) t 2 (c) t 3
15
T
16
17
17
18
18
Mohri, M., Rostamizadeh, A., and Talwalkar, A. (2018). Foundations of machine learning. MIT press. Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
19