improving model selection by employing the test data
play

Improving Model Selection by Employing the Test Data Max Westphal, - PowerPoint PPT Presentation

3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long


  1. 3 Improving Model Selection by Employing the Test Data Max Westphal, Werner Brannath University of Bremen, Germany Institute for Statistics Research Training Group π 3 mwestphal@uni-bremen.de https://github.com/maxwestphal/ ICML 2019, Long Beach June 11, 2018 1 / 9 Improving Model Selection Max Westphal

  2. 3 Train-Validation-Test Split X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 X ˆ ˆ X ˆ ˆ f 2 Y ϑ V ˆ f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ϑ V ˆ X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal

  3. 3 Train-Validation-Test Split X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 X ˆ ˆ X ˆ ˆ f 2 Y ϑ V ˆ f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ϑ V ˆ X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal

  4. 3 Train-Validation-Test Split ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation argmax ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 2 / 9 Improving Model Selection Max Westphal

  5. 3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data. 3 / 9 Improving Model Selection Max Westphal

  6. 3 Train-Validation-Test Split ˆ X f 1 Y ˆ X f 2 ˆ ˆ Y ˆ ˆ X f 3 Y + + ϕ 4 CI 4 ˆ X f 4 ˆ ϑ E ˆ Y 4 Test ⋅ ⋅ ⋅ ˆ X f M ˆ Y Evaluation 4 / 9 Improving Model Selection Max Westphal

  7. 3 Train-Validation-Test Split Particularly in regulated environments, we need a reliable performance assessment before implementing a prediction model in practice. Example: disease diagnosis / prognosis based on clinical data Usually recommended strategy: Evaluate a single final model on independent test data! Easy-to-use strategy, allowing for a reliable performance assessment and simple inference. However, we have no way to fix a bad model selection after having observed the test data. 5 / 9 Improving Model Selection Max Westphal

  8. 3 Simultaneous Model Evaluation ˆ ˆ ˆ ˆ X f 1 Y ϑ V ˆ X f 1 Y 1 A 1 ˆ ˆ ˆ ˆ X f 2 Y ϑ V ˆ X f 2 Y 2 A 2 ˆ ˆ ˆ ˆ X f 3 Y ˆ ϑ V X f 3 Y 3 A 3 ˆ ˆ ˆ ˆ X f 4 Y ϑ V ˆ X f 4 Y A 4 4 Training Validation selection rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M ˆ ˆ X f M ˆ ˆ ϑ V X f M ˆ Y Y M Learning 6 / 9 Improving Model Selection Max Westphal

  9. 3 Simultaneous Model Evaluation ˆ X f 1 Y ˆ + + ϕ 2 CI 2 ˆ X f 2 ˆ ϑ E ˆ Y 2 X f 3 ˆ ˆ Y + + ϕ 4 CI 4 X ˆ ˆ ˆ f 4 Y ϑ E 4 Test argmax ˆ X f 2 Y ˆ ⋅ ⋅ ⋅ + + ϕ M CI M ˆ ˆ ˆ X f M Y ϑ E M Evaluation 6 / 9 Improving Model Selection Max Westphal

  10. 3 Simulation study ϕ 1 X ˆ ˆ X ˆ ˆ f 1 Y ϑ V ˆ f 1 Y 1 A 1 + + ϕ 2 CI 2 X f 2 ˆ Y ˆ ϑ V ˆ X f 2 ˆ ˆ Y ϑ E ˆ 2 2 A 2 ϕ 3 ˆ ˆ X f 3 Y ˆ ˆ ϑ V X f 3 Y ˆ 3 A 3 + + CI 4 ϕ 4 X f 4 ˆ ˆ ϑ V ˆ X f 4 ˆ ˆ ϑ E ˆ Y Y A 4 4 4 Training Validation Test selection argmax X ˆ ˆ f 2 Y rule ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ A M + + CI M ϕ M X f M ˆ ˆ ϑ V ˆ X f M ˆ ˆ ϑ E ˆ Y Y M M Learning Evaluation Idea: simulate data and train, select and evaluate binary classifiers in different scenarios 24 artificial classification tasks 72,000 replications of complete ML pipeline 28,800,000 distinct models (EN, CART, SVM, XGB) Goal: comparison of different evaluation strategies default : best validation model only within 1 SE : all models within 1 SE of best validation model 7 / 9 Improving Model Selection Max Westphal

  11. Simulation Results

  12. Simulation Results

  13. 3 Conclusions When in doubt, delay the final model choice to the test data. Improvements in model performance and probability to correctly identify a good model in all investigated scenarios. Adjustment for multiple comparisons via approximate parametric procedure taking into account model similarity (maxT-approach). Questions & feedback welcomed! mwestphal@uni-bremen.de https://github.com/maxwestphal/ POSTER #123 (Pacific Ballroom) 9 / 9 Improving Model Selection Max Westphal

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend