Improving Model Selection by Employing the Test Data Max Westphal, - - PowerPoint PPT Presentation

improving model selection by employing the test data
SMART_READER_LITE
LIVE PREVIEW

Improving Model Selection by Employing the Test Data Max Westphal, - - PowerPoint PPT Presentation

3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long


slide-1
SLIDE 1

3

Improving Model Selection by Employing the Test Data

Max Westphal, Werner Brannath

University of Bremen, Germany Institute for Statistics Research Training Group π3 mwestphal@uni-bremen.de https://github.com/maxwestphal/

ICML 2019, Long Beach June 11, 2018

Max Westphal Improving Model Selection

1 / 9

slide-2
SLIDE 2

3

Train-Validation-Test Split Training

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

A1 A2 A3 A4 AM

Validation

ˆ ϑV

1

ˆ ϑV

2

ˆ ϑV

3

ˆ ϑV

4

ˆ ϑV

M

⋅ ⋅ ⋅

argmax

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

Learning

Max Westphal Improving Model Selection

2 / 9

slide-3
SLIDE 3

3

Train-Validation-Test Split Training

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

A1 A2 A3 A4 AM

Validation

ˆ ϑV

1

ˆ ϑV

2

ˆ ϑV

3

ˆ ϑV

4

ˆ ϑV

M

⋅ ⋅ ⋅

argmax

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

Learning

Max Westphal Improving Model Selection

2 / 9

slide-4
SLIDE 4

3

Train-Validation-Test Split Training

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

A1 A2 A3 A4 AM

Validation

ˆ ϑV

1

ˆ ϑV

2

ˆ ϑV

3

ˆ ϑV

4

ˆ ϑV

M

⋅ ⋅ ⋅

argmax

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

Learning

Max Westphal Improving Model Selection

2 / 9

slide-5
SLIDE 5

3

Train-Validation-Test Split

Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data.

Max Westphal Improving Model Selection

3 / 9

slide-6
SLIDE 6

3

Train-Validation-Test Split

X ˆ Y

ˆ f1

X ˆ Y

ˆ f2

X ˆ Y

ˆ f3

X ˆ Y

ˆ f4

X ˆ Y

ˆ fM

⋅ ⋅ ⋅ Test

ˆ ϑE

4

CI4 ϕ4

+ +

Evaluation

Max Westphal Improving Model Selection

4 / 9

slide-7
SLIDE 7

3

Train-Validation-Test Split

Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data! Easy-to-use strategy, allowing for a reliable performance assessment and simple inference. However, we have no way to fix a bad model selection after having observed the test data.

Max Westphal Improving Model Selection

5 / 9

slide-8
SLIDE 8

3

Simultaneous Model Evaluation Training

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

A1 A2 A3 A4 AM

Validation

ˆ ϑV

1

ˆ ϑV

2

ˆ ϑV

3

ˆ ϑV

4

ˆ ϑV

M

⋅ ⋅ ⋅

selection rule

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

Learning

Max Westphal Improving Model Selection

6 / 9

slide-9
SLIDE 9

3

Simultaneous Model Evaluation

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅ Test

ˆ ϑE

2

ˆ ϑE

4

ˆ ϑE

M

CI2 CI4 CIM ϕ2 ϕ4 ϕM

+ + + + + +

argmax

X ˆ Y ˆ f2

Evaluation

Max Westphal Improving Model Selection

6 / 9

slide-10
SLIDE 10

3

Simulation study

Training

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅

A1 A2 A3 A4 AM

Validation

ˆ ϑV

1

ˆ ϑV

2

ˆ ϑV

3

ˆ ϑV

4

ˆ ϑV

M

⋅ ⋅ ⋅

selection rule

X ˆ Y ˆ f1 X ˆ Y ˆ f2 X ˆ Y ˆ f3 X ˆ Y ˆ f4 X ˆ Y ˆ fM

⋅ ⋅ ⋅ Test

ˆ ϑE

2

ˆ ϑE

4

ˆ ϑE

M

CI2 CI4 CIM ϕ2 ϕ4 ϕM

+ + + + + +

ϕ1 ϕ3

⋅ ⋅ ⋅

argmax

X ˆ Y ˆ f2

Evaluation Learning

Idea: simulate data and train, select and evaluate binary classifiers in different scenarios

24 artificial classification tasks 72,000 replications of complete ML pipeline 28,800,000 distinct models (EN, CART, SVM, XGB)

Goal: comparison of different evaluation strategies

default: best validation model only within 1 SE: all models within 1 SE of best validation model

Max Westphal Improving Model Selection

7 / 9

slide-11
SLIDE 11

Simulation Results

slide-12
SLIDE 12

Simulation Results

slide-13
SLIDE 13

3

Conclusions

When in doubt, delay the final model choice to the test data. Improvements in model performance and probability to correctly identify a good model in all investigated scenarios. Adjustment for multiple comparisons via approximate parametric procedure taking into account model similarity (maxT-approach). Questions & feedback welcomed! mwestphal@uni-bremen.de https://github.com/maxwestphal/

POSTER #123 (Pacific Ballroom)

Max Westphal Improving Model Selection

9 / 9