data science in the cloud
play

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk - PowerPoint PPT Presentation

Data Science in the Cloud Stefan Krawczyk @stefkrawczyk linkedin.com/in/skrawczyk November 2016 Who are Data Scientists? Means: skills vary wildly But theyre in demand and expensive The Sexiest Job of the 21st Century - HBR


  1. Online & Streamed Computation Do you need to recompute: ● Very likely ○ features for all users? you start with predicted results for all users? ○ a batch system Are you heavily dependent on your ● ETL running every night? ● Online vs Streamed depends on in house factors: ○ Number of models How often they change ○ We use online ○ Cadence of output required system for In house eng. expertise recommendations ○ ○ etc.

  2. Streamed Example

  3. Streamed Example

  4. Streamed Example

  5. Streamed Example

  6. Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○

  7. Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○ Requires better software engineering practices ● ○ Code portability/reuse Designing APIs/Tools Data Scientists will use ○

  8. Online/Streaming Thoughts Dedicated infrastructure → More room on batch infrastructure ● ○ Hopefully $$$ savings Hopefully less stressed Data Scientists ○ Requires better software engineering practices ● ○ Code portability/reuse Designing APIs/Tools Data Scientists will use ○ Prototyping on AWS Lambda & Kinesis was surprisingly quick ● ○ Need to compile C libs on an amazon linux instance

  9. What’s in a Model? Scaling model knowledge

  10. Ever: Had someone leave and then nobody understands how they trained their ● models?

  11. Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○

  12. Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ●

  13. Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○

  14. Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○ Wanted to compare model performance over time? ●

  15. Ever: Had someone leave and then nobody understands how they trained their ● models? Or you didn’t remember yourself? ○ Had performance dip in models and you have trouble figuring out why? ● Or not known what’s changed between model deployments? ○ Wanted to compare model performance over time? ● Wanted to train a model in R/Python/Spark and then deploy it a webserver? ●

  16. Produce Model Artifacts

  17. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ●

  18. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○

  19. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ●

  20. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ●

  21. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● How do you deal with organizational drift?

  22. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● How do you deal with organizational drift? Makes it easy to keep an archive and track changes over time

  23. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● Helps a lot with model debugging & diagnosis! How do you deal with organizational drift? Makes it easy to keep an archive and track changes over time

  24. Produce Model Artifacts Isn’t that just saving the coefficients/model values? ● NO! ○ Why? ● Helps a lot with model debugging & diagnosis! How do you deal with organizational drift? Makes it easy to keep an archive and track Can more easily use in changes over time downstream processes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend