Cross-validation and the Bootstrap
- In the section we discuss two resampling methods:
cross-validation and the bootstrap.
1 / 44
Cross-validation and the Bootstrap In the section we discuss two - - PowerPoint PPT Presentation
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1 / 44 Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the
1 / 44
1 / 44
1 / 44
2 / 44
High Bias Low Variance Low Bias High Variance
Prediction Error Model Complexity
Training Sample Test Sample Low High
3 / 44
4 / 44
5 / 44
6 / 44
2 4 6 8 10 16 18 20 22 24 26 28
Degree of Polynomial Mean Squared Error
2 4 6 8 10 16 18 20 22 24 26 28
Degree of Polynomial Mean Squared Error
7 / 44
8 / 44
8 / 44
9 / 44
10 / 44
11 / 44
11 / 44
12 / 44
12 / 44
2 4 6 8 10 16 18 20 22 24 26 28
LOOCV
Degree of Polynomial Mean Squared Error
2 4 6 8 10 16 18 20 22 24 26 28
10−fold CV
Degree of Polynomial Mean Squared Error 13 / 44
2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Flexibility Mean Squared Error
2 5 10 20 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Flexibility Mean Squared Error
2 5 10 20 5 10 15 20
Flexibility Mean Squared Error
14 / 44
15 / 44
15 / 44
15 / 44
16 / 44
16 / 44
17 / 44
17 / 44
18 / 44
18 / 44
18 / 44
19 / 44
Predictors CV folds Selected set
Samples Outcome 20 / 44
Predictors Samples
Selected set Outcome CV folds
21 / 44
22 / 44
23 / 44
24 / 44
24 / 44
25 / 44
−2 −1 1 2 −2 −1 1 2
X Y
−2 −1 1 2 −2 −1 1 2
X Y
−3 −2 −1 1 2 −3 −2 −1 1 2
X Y
−2 −1 1 2 3 −3 −2 −1 1 2
X Y
26 / 44
27 / 44
28 / 44
0.4 0.5 0.6 0.7 0.8 0.9 50 100 150 200 0.3 0.4 0.5 0.6 0.7 0.8 0.9 50 100 150 200 True Bootstrap 0.3 0.4 0.5 0.6 0.7 0.8 0.9
α α α
29 / 44
30 / 44
2.8 5.3 3 1.1 2.1 2 2.4 4.3 1 Y X Obs 2.8 5.3 3 2.4 4.3 1 2.8 5.3 3 Y X Obs 2.4 4.3 1 2.8 5.3 3 1.1 2.1 2 Y X Obs 2.4 4.3 1 1.1 2.1 2 1.1 2.1 2 Y X Obs Original Data (Z)
1 *
Z
2 *
Z Z *B
1 *
ˆ α
2 *
ˆ α ˆ α*B
!! !! !! !! ! !! !! !! !! !! !! !! !!
31 / 44
32 / 44
Real World Estimate Random Sampling Data dataset Estimated Estimate Bootstrap World Random Sampling Bootstrap Bootstrap Population Population
1, z∗ 2, . . . z∗ n)
33 / 44
34 / 44
34 / 44
35 / 44
35 / 44
35 / 44
36 / 44
36 / 44
36 / 44
36 / 44
36 / 44
36 / 44
36 / 44
37 / 44
38 / 44
38 / 44
39 / 44
40 / 44
41 / 44
Response Pre−validated Predictor Observations Predictors Omitted data Logistic Regression Fixed predictors
42 / 44
43 / 44
44 / 44