SLIDE 10 3
Simulation study
Training
X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM
⋅ ⋅ ⋅
A1 A2 A3 A4 AM
Validation
ˆ ϑV
1
ˆ ϑV
2
ˆ ϑV
3
ˆ ϑV
4
ˆ ϑV
M
⋅ ⋅ ⋅
selection rule
X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM
⋅ ⋅ ⋅ Test
ˆ ϑE
2
ˆ ϑE
4
ˆ ϑE
M
CI2 CI4 CIM ϕ2 ϕ4 ϕM
+ + + + + +
ϕ1 ϕ3
⋅ ⋅ ⋅
argmax
X ˆ Y ˆ f2
Evaluation Learning
Idea: simulate data and train, select and evaluate binary classifiers in different scenarios
24 artificial classification tasks 72,000 replications of complete ML pipeline 28,800,000 distinct models (EN, CART, SVM, XGB)
Goal: comparison of different evaluation strategies
default: best validation model only within 1 SE: all models within 1 SE of best validation model
Max Westphal Improving Model Selection
7 / 9