1
Model Building:
Ensemble Methods
Max Kuhn and Kjell Johnson Nonclinical Statistics, Pfizer
1
Model Building: Ensemble Methods Max Kuhn and Kjell Johnson - - PowerPoint PPT Presentation
Model Building: Ensemble Methods Max Kuhn and Kjell Johnson Nonclinical Statistics, Pfizer 1 1 Splitting Example Boston Housing Searching though the first left split ( ), the best split again uses the lower status % In the
1
1
2
2
3
3
4
4
5
sample
predictions
5
6
6
7
7
8
8
9
Randomly select a subset of variables from original data
Build trees
9
10
10
11
11
12
12
13
13
14
14
15
Stage 1 Build weighted tree
n=200 n=90 n=110 X1 > 5.2 X1 < 5.2
Compute stage weight βstage 1 = f(32.9) Reweigh
(wi=1,2,..., n) Determine weight of ith observation: The larger the error, the higher the weight 2
n=200 n=64 n=136 X27 > 22.4 X27 < 22.4
βstage 2 = f(26.7) Determine weight of ith observation . . . M
n=200 n=161 n=39 X6 > 0 X6 < 0
βstage M = f(29.5) Compute error
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
32
32
33
33
34
34
35
35
36
36
37
Excellent Very Good Average Fair Poor
37
38
ZV = zero var predictor, NZV = near-zero var predictor, CS = center+scale, HCP = highly correlated predictor * Depends on implementation
38
39
39
40
Tree Regression PLS MARS RF/Bagging Boosted Tree SVM NNet
40