✬ ✫ ✩ ✪ Boosting: more than an ensemble method for prediction
Peter B¨ uhlmann ETH Z¨ urich
1
Boosting: more than an ensemble method for prediction Peter B - - PowerPoint PPT Presentation
Boosting: more than an ensemble method for prediction Peter B uhlmann ETH Z urich 1 1. Historically: Boosting is about multiple predictions Data: ( X 1 , Y 1 ) , . . . , ( X n , Y n ) (i.i.d. or stationary),
1
2
3
4
5
6
7
8
9
10
11
boosting iterations MSE 20 40 60 80 100 18 19 20 21 22
12
boosting iterations MSE 50 100 150 200 250 300 11 12 13 14 15 16
13
14
15
ˆ θ(x) = ˆ β ˆ S x( ˆ S), ˆ βj = n
i=1 YiX(j) i / n
i=1 (X(j) i )2, ˆ S = arg min j n
i=1 (Yi − ˆ βj X(j) i )2
16
17
18
10 20 30 40 −0.15 −0.10 −0.05 0.00 0.05
sorted regression coefficients
selected genes
19
20
Y = 10 sin(πX1X2) + 20(X3 − 0.5)2 + 10X4 + 5X5 + N (0, 1), X = (X1, . . . , X20) ∼ Unif.([0, 1]20)
100 200 300 400 500 4 5 6 7
p=20, p−eff=10, n=50
boosting iterations MSE
MARS L2Boost
AIC_c stopped
21
22
23
24
25
100 200 300 400 500 2 3 4 5 6
uncorrelated design
boosting iterations MSE AIC−stopped L2Boost Lasso fwd.var.sel. OLS 100 200 300 400 500 2 3 4 5 6
correlated design
boosting iterations MSE AIC−stopped L2Boost Lasso fwd.var.sel. OLS
26
27
28
29
30
31
32
33
34
100 200 300 400 500 2 3 4 5 6
uncorrelated design
boosting iterations MSE AIC−stopped L2Boost Lasso fwd.var.sel. OLS 100 200 300 400 500 2 3 4 5 6
correlated design
boosting iterations MSE AIC−stopped L2Boost Lasso fwd.var.sel. OLS
35
100 200 300 400 500 4 5 6 7
p=20, p−eff=10, n=50
boosting iterations MSE
MARS L2Boost
AIC_c stopped
36
m )
37
0.0 0.4 0.8 −4 −2 2 4
df=3.5
predictor 0.0 0.4 0.8 −4 −2 2 4
df=2.7
predictor 0.0 0.4 0.8 −4 −2 2 4
df=0
predictor 0.0 0.4 0.8 −4 −2 2 4
df=2.6
predictor 0.0 0.4 0.8 −4 −2 2 4
df=4.9
predictor 0.0 0.4 0.8 −4 −2 2 4
df=5.3
predictor 0.0 0.4 0.8 −4 −2 2 4
df=6.9
predictor 0.0 0.4 0.8 −4 −2 2 4
df=8.2
predictor 0.0 0.4 0.8 −4 −2 2 4
df=6.3
predictor 0.0 0.4 0.8 −4 −2 2 4
df=6.4
predictor 0.0 0.4 0.8 −4 −2 2 4
df=0.9
predictor 0.0 0.4 0.8 −4 −2 2 4
df=2.1
predictor
38
39
40
boosting
m generalization squared error 50 100 150 200 0.2 0.4 0.6 0.8
varying df
degrees of freedom generalization squared error 10 20 30 40 0.2 0.4 0.6 0.8
41
42
43
44
45
46
−3 −2 −1 1 2 3 1 2 3 4 5 6 7
monotone
yf loss exp log−lik. SVM 0−1 −3 −2 −1 1 2 3 1 2 3 4 5 6 7
non−monotone
yf loss L2 L1 0−1
47
48
49
50
51
52
53
54
55
56
(0 < δ < 1)
57
58
−3 −2 −1 1 2 3 −2 −1 1 2
threshold functions
z hard−thresholding nn−garrote soft−thresholding
59
60
61
−3 −2 −1 1 2 3 −2 −1 1 2
threshold functions
z hard−thresholding nn−garrote soft−thresholding
62
63
100 200 300 400 500 2 3 4 5 6 7
interaction modelling: p = 20, effective p = 5
boosting iterations MSE L2Boosting SparseL2Boosting MARS
Y = 10 sin(πX1X2) + 20(X3 − 0.5)2 + 10X4 + 5X5 + N (0, 1) X = (X1, . . . , X20) ∼ Unif.([0, 1]20)
64
−2 −1 1 2 3 −10 −8 −6 −4 −2 log−expression log−concentration 4.5 5.0 5.5 6.0 6.5 7.0 7.5 −10 −8 −6 −4 −2 log−expression log−concentration 4 5 6 7 8 −10 −8 −6 −4 −2 log−expression log−concentration 3 4 5 6 7 8 −10 −8 −6 −4 −2 log−expression log−concentration
65
66