modeldb a system for managing ml models
play

ModelDB : a system for managing ML models Manasi Vartak , PhD - PowerPoint PPT Presentation

ModelDB : a system for managing ML models Manasi Vartak , PhD Candidate MIT Database Group mvartak@csail.mit.edu | @DataCereal Why Model Management? IMDB Prediction Task Given data about movies (e.g. year made, studio, genres, actors)


  1. ModelDB : a system for managing ML models Manasi Vartak , PhD Candidate MIT Database Group mvartak@csail.mit.edu | @DataCereal

  2. Why Model Management?

  3. IMDB Prediction Task • Given data about movies (e.g. year made, studio, genres, actors) • Predict IMDB_score

  4. Model 1 LinearRegression Accuracy: 62%

  5. Model 2 Accuracy: 68% CrossValidation

  6. Model 3 RandomForest Accuracy: 75% CrossValidation

  7. Model 4 FeatureEngg RandomForest Accuracy: 80% CrossValidation

  8. Model 50 GBDT FeatureEngg Accuracy: 84% CrossValidation

  9. Why is this a problem? Did my colleague do that • No record of experiments already? How did normalization • Insights lost along the way affect my ROC? What params did I use? • Difficult to reproduce results Where is the prod • Cannot search for or query models version of the model for churn? • Difficult to collaborate How does someone review your model?

  10. ModelDB: an end-to-end model management system Query Ingest models, Store and version metadata modeling artifacts Collaborate, Reproduce results

  11. ModelDB Architecture Scala spark.ml ModelDB Backend thrift ModelDB Python Frontend: vis + query scikit-learn Storage … Events Light Client

  12. Demo

  13. ML Infrastructure • DBMSs Data • Spark + A/B testing Processing • Hive + Model Retraining • CSV Custom • Spark.ml • Custom • sklearn Model Model Serving • TF-serving Management • R Training • Clipper • DL frmks • H2O + Visualizations + Interpretability Monitoring Custom + Debugging

  14. Benefits of model management Offline Online Developer Model Monitoring Productivity + Provenance + Model performance over time + Reproducibility + Anomaly detection + Meta-analyses + Trigger retraining Increased Fast Failure Transparency Analyses + What models have been built + How was this model built? + How well do models work? + What has changed? + Auditability

  15. At last NIPS • Initial version of ModelDB with sklearn, spark.ml support • Early adopters (banks, financial firms), early feedback • Focus on developer productivity

  16. Since last NIPS! • Initial release of ModelDB in Feb early 2017 • Adoption/evaluation at Adobe, banks, financial institutions, and tech companies • Won AIGrant for open-source projects • See papers at SIGMOD, NIPS workshops

  17. Since last NIPS! • Easy installation: docker, pip • In the (research) pipeline • Light clients (R, YAML, • Data and intermediate packages outside of sklearn) storage • Flexible metadata storage • Model diagnosis • Collecting metrics over time • Fine-grained visualizations

  18. ModelDB so far • Incredible inbound interest • Banks, finance, insurance, tech • Lots of feature requests (e..g monitoring, diagnosis, DL). More than research resources can handle :) • Validation • Every data scientist building > 10 models needs model management and is looking for these tools • Vision: Industry standard tool for managing ML models and metadata

  19. Moving to Apache Incubation • With MIT, Adobe, other partners (*MLSys community) • Open development to wider community • Contributions across industry • Roadmap • Multiple storage backends, DL frameworks, R • Monitoring capabilities

  20. Call for Contributions! • Community over code • Build once, reuse many times • Why? • It will measurably improve your workflow • Pay it forward • Be part of larger open-source project

  21. How to Contribute • Test it out and give feedback • Share: teams, meetups, data science meetings, blogs • Documentation • Code: • Lots of issues on GitHub • Add support for your favorite ML frameworks

  22. Informal Meeting at MLSys • Interested in testing/adopting ModelDB? • Did you build such a system, can you share lessons? • Open-source Contributors! • How/when • Whova app (“Model Management Meetup”) • mvartak@csail.mit.edu • Poster

  23. People

  24. ModelDB https://github.com/mitdbg/modeldb http://modeldb.csail.mit.edu Manasi Vartak | @DataCereal

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend