it takes a village to raise a machine learning model
play

It Takes a Village to Raise a Machine Learning Model Lucian Lita - PowerPoint PPT Presentation

It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver Algorithms @datariver Data Big Data Sheep @bigdatasheep n n 5yr more data is


  1. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver

  2. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver

  3. Algorithms @datariver

  4. Data Big Data Sheep @bigdatasheep n n 5yr more data is better than complex algorithms #BigData Big Data Sheep @bigdatasheep n n 4yr more clean data is better than more data #BigData Big Data Sheep @bigdatasheep n n 3yr more labeled data is better than more data #BigData Big Data Sheep @bigdatasheep n n 2yr more smart data is better than purple data #BigData **inflated historical depiction @datariver

  5. Data @datariver

  6. Next Frontier: well designed software architectures Personalization, experimentation, anomaly detection, fraud detection … @datariver

  7. Battle Plan Personalization deep dive sw architecture flavor Anomaly detection quick peek Music streaming, advertising, medical informatics brief stories @datariver

  8. @datariver

  9. … x 1 … x 1 … x 1 … x 1 x 1 x all Reasonable coverage. Reasonable coverage. Product as is. Segmentation. Personalization. No customization. @datariver

  10. Childhood. Approaches. @datariver

  11. Broad Deep @datariver

  12. Push-button Push-scientist App App API delivery storage Optimization -- ML algorithms -- data: more, better, smarter -- features, selection @datariver

  13. Push-button Push-scientist App App API API delivery delivery storage storage Scale & Automation Optimization -- model build -- ML algorithms -- model deploy -- data: more, better, smarter -- single instrumentation -- features, selection @datariver

  14. Push-scientist Invest in ML; start with a thin system How much effort put into Platform & Automation? (A) best you can do in x weeks (B) one step above prototype (C) enough baling wire & duct tape to support a first use case @datariver

  15. Push-button Invest in scale & automation; basic ML How much effort put into ML? (A) best generic model setup in y weeks? (B) noticeably better than random? (C) pack enough punch to be visible, but not more @datariver

  16. Push-button Push-scientist @datariver

  17. Adolescence. Platform Patterns. @datariver

  18. (A) Stored App personalized feedback content API (capture) API (retrieve) pre-computed content periodically batch train model periodically run models @datariver

  19. (B) On-the Fly App personalized feedback content API (capture) API (compute) compute on-the-fly periodically batch train model @datariver

  20. (C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver

  21. (C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver

  22. Maturity. Patterns and Assumptions. @datariver

  23. Model Building Model Deployment What do you really need? Data Store Do you need it now ? Content Delivery Analytics Data Capture @datariver

  24. Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver

  25. Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver

  26. Model Deployment. What do you really need? API M i M i+1 envt ditto versioning deploy performance sharing security scalability HA @datariver

  27. Personalization Delivery. What do you really need? @datariver

  28. Personalization Delivery. What do you really need? API instrument ditto exploit explore performance sharing security scalability HA @datariver

  29. Data Store. What do you really need? API t content ditto performance HA history scalability consumers governance triggers sharing @datariver

  30. Data Store. To HA or not to HA. later (blasphemy) now revenue in-app driver critical user infrastructure benefit cost known build & use cases operate @datariver

  31. Data Store. APIs @datariver

  32. Data Capture. What do you really need? API t triggers consumers content ditto history sharing performance scalability security HA @datariver

  33. Analytics. What do you really need? API t content ditto performance history scalability flexibility consumers @datariver

  34. Analytics. Experimentation & Personalization @datariver

  35. Data Lake. What do you really need? say ‘big data lake’ one more time! @datariver

  36. Evolving Architecture. Before you know it … @datariver

  37. Apps direct in-app personalized personalized feedback content data content content API (compute) API (delivery) API (push) API (capture) 4 2 2 1 3 run models Event 1 raw data Log 3 or features RT train models Analytics periodically Model Deployment Model Building re-run new models API (analytics) periodically 4 **terribly incomplete, mildly inaccurate

  38. Not an Exact Blueprint

  39. Know this non-trivial no one-size fits all Upfront what do you really need? know thy target architecture As you embark … Do it! working system in weeks fast iterations – ship & test interfaaaaaaaces!

  40. village model **not drawn to effort scale

  41. Software architecture is the next frontier! Fail fast still applies! Personalize your personalization platform! @datariver

  42. better algorithms more, better, smarter well designed data software architectures next frontier @datariver

  43. A Brief Look at Anomaly Detection @datariver

  44. Applications ¡ System health – servers, network ¡ Cyber-intrusion detection ¡ Enterprise anomaly detection ¡ Image processing ¡ Textual anomaly detection ¡ Sensor networks ¡ Fraud detection ¡ Medical anomaly detection ¡ Industrial damage detection ¡ … @datariver

  45. Algorithms ¡ Supervised ¡ Unsupervised ¡ Generic statistical ¡ Information theory ¡ … “What algorithms are you going to use?” @datariver

  46. Data Low data volume Invest in data acquisition Invest in high coverage High data volume Invest in defining signal Invest in labeling, tools, and crowdsourcing @datariver

  47. Architectures Again Data Collectors Labeling Processors (M&A) Clickstream, User Input … Crowdsourcing broad: time bounded Real time, DBs … Active learning deep: open ended Capture Labeling Compute run models **check assumptions @datariver

  48. Advertising @datariver

  49. Music Streaming @datariver

  50. Medical Informatics @datariver

  51. better algorithms more, better, smarter well designed data software architectures next frontier @datariver

  52. Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver

  53. Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver

  54. @datariver

  55. Extra Content @datariver

  56. Security. What do you really need? @datariver

  57. @datariver

  58. App. Who does the App talk to? (a) (b) App App personalized dynamic personalized content data content API (retrieve) API (compute) -- apply op logic -- retrieve static data -- retrieve pre-computed -- apply op logic content -- compute features -- run model -- log actions @datariver

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend