ml infra at an early stage
play

ML Infra at an Early-Stage Feature Service Spencer Barton, Data - PowerPoint PPT Presentation

ML Infra at an Early-Stage Feature Service Spencer Barton, Data Scientist April 2019 2 Branch in the Numbers 3 Our mission is to deliver world-class financial services to the mobile generation. 4 From Install to Approval in Minutes 1


  1. ML Infra at an Early-Stage Feature Service Spencer Barton, Data Scientist April 2019

  2. 2 Branch in the Numbers

  3. 3

  4. Our mission is to deliver world-class financial services to the mobile generation. 4

  5. From Install to Approval in Minutes 1 ANSWER 3 QUESTIONS TO REGISTER KYC checks with external APIs, mobile data mined and analysed. 2 ELIGIBLE LOAN OFFERS ARE DISPLAYED Credit score calculated in seconds. 3 DEPOSIT TO BANK ACCOUNT OR MOBILE WALLET Repayment schedule set and monitored.

  6. How Branch works behind the scenes Collect Generate Credit Phone Data Features Model We collect We extract We predict probability ● Text messages ● Bank balance of repayment ● Installed apps ● Number of contacts ● Contact lists ● Read the FAQ ● In-app events ● Installed Facebook app 6

  7. How do I build ML into my product? 7

  8. Big Firms Can Build Custom ML Infrastructure 5 engineers 10 engineers 2 product managers 5 engineers 5 data scientists 10 engineers Source: Bighead - Airbnb’s End-to-End Machine Learning Platform 8

  9. Can the rest of us do machine learning? We too can build infrastructure but must be strategic. Build a Feature Service! 9

  10. What does a feature service do for me? ● Faster development of new features ● Reduce bugs with consistent feature definitions ● Speed-up slow feature calculations ● Easy feature discovery and sharing 10

  11. Where do you start? 11

  12. You want to start basic 12 https://en.wikipedia.org/wiki/Linear_regression

  13. You will gradually mature your ML 13 https://towardsdatascience.com/polynomial-regression-bbe8b9d97491

  14. The basics will only get you so far 14

  15. What do you focus on beyond the basics? Gather Data Build Features Train Model Serve Model 15

  16. We needed to improve our features Our data sources were in ok shape but ● Differences in features between dev, training and production lead to bugs ● Inconsistent feature definitions lead to bugs ● Feature creation was a training bottleneck 16

  17. We invested in infrastructure to improve features. We decided to build a Feature Service 17

  18. What is a Feature Service? A Feature Service computes a feature vector for a specific object at a specific time. Get features for user 90234 Feature Service Feature vector for user 90234 { “average_bank_balance”: 324090, “number_referrals”: 15, “read_faq”: true } 18

  19. Features are computed relative to a timestamp Get features for user 90234 on 2016-10-2 Feature Service Feature vector for user 90234 on 2016-10-2 { “average_bank_balance”: 504090, “number_referrals”: 0, “read_faq”: false } 19

  20. Features are accessed by a simple API GET feature/bank_balance/v0_1?pid=12314 GET feature/bank_balance/v0_3?pid=1214&date=2017-12-3 GET feature/loan_repayment/v0_1?pid=3531 date for feature name feature version historical features pid = primary id, like user id 20

  21. Why build a custom solution? Build Train Model Serve Model Gather Data Features Feature Service 21

  22. What are we building? ● Server infrastructure Feature ● Cache infrastructure Service ● A Python framework 22

  23. Data source dependencies were messy Inference Raw Data Source A Training Raw Data Source B Development Write Read 23

  24. We abstracted complicated data sources Inference Raw Data Source A Feature Training Service Raw Data Source B Development Write Read 24

  25. Features were being created all over the place Inference Raw Data Source A Training Raw Data Source B Development Write Read 25

  26. Every step of ML shares consistent features Inference Feature Training Service Development Write Read 26

  27. New models were recreating features Model 1 Raw Data Source A Model 2 Raw Data Source B Write Read 27

  28. ML models now share the same features Model 1 Feature Model 2 Service Model 3 Write Read 28

  29. The Feature Service server helps a lot ● Abstracted data sources ● Shared features ● Consistent features Now onto storage…. 29

  30. Features were computed once and forgotten Inference Inference Inference for user 3 for user 3 for user 3 Time Compute all Compute all Compute all features the same the same features again features again 30

  31. We built feature storage and caching Feature Service Analytics Feature Monitoring Storage Write Read 31

  32. We sped up training with a cache Model Inference Training Iteration Time Use cached Use cached Calculate features for features for and cache model model training features in development production Feature Storage Write Read 32

  33. Feature storage helps too ● Remove recomputation of features ● Enable analytics and monitoring ● Increase training speed 33

  34. We built with simple components Flask App Feature Deployed on AWS Service Elastic Beanstalk AWS DynamoDB Feature Storage Write Read 34

  35. Simple infrastructure solved many problems Inference Simple (Flask) App Raw Data Source A Common source Feature Training Service Data abstraction Raw Data Source B Development Caching Feature Analytics Write Storage Monitoring Read 35

  36. How do we actually generate features? Feature Raw Data Development Service Text messages Bank balance Source Write Read 36

  37. We built a framework Features are composed of ● One or more Extractors which pull data from a Raw Data Source ● Many Transformers which convert the data into a numeric or categorical features Feature: average_bank_balance “average_bank_balance”: Raw Data Extract Select bank Pull out Average 324090 Source SMS messages values S3 Extractors Transformers 37

  38. Extractors and Transformers are shared Feature: average_bank_balance “average_bank_balance”: Extract Select bank Pull out Average 324090 SMS messages values Raw Data Source S3 Feature: maximum_bank_balance “maximum_bank_balance”: Extract Select bank Pull out Maximum 500034 SMS messages values 38

  39. Framework example Everything is built on base classes with automated testing Features are built on versioned extracts and transforms As flexible as Python Chain of transformations Custom one-off transforms 39

  40. Feature versions support new models Old Credit Buggy feature Model bank_balance:v1 Feature Service Flask App New Credit Bug fixed: Model bank_balance:v2 Write Read 40

  41. The framework makes development easy ● Feature definitions are consistent ● New features are easy to build from shared components ● Versioning allows backwards compatibility and bug fixes 41

  42. The Feature Service solves many problems Inference Simple (Flask) App Raw Data Source A Common source Feature Training Service Data abstraction Raw Data Source B Development Framework: Consistency Caching Feature Easy development Analytics Write Storage Versioning Monitoring Read 42

  43. Should I build a Feature Service? ● Is feature quality a problem for you? ● Are your data sources complex and varied? ● Do you want to support multiple models? ● Are your features difficult to compute? 43

  44. We’re benefitting from our Feature Service ● Feature generation time reduced! ● Fixed a lot of bugs by using the framework! ● New models without remaking features! ● New data scientists can contribute within a week of joining! ● And our model performance has improved! 44

  45. What should I take away? ● You don’t have to be a big company to use ML infrastructure ● But your resources are limited so be strategic ● And invest in a Feature Service! ● Stay informed because the landscape changes fast ○ Airbnb Big Head may be open sourced soon 45

  46. The Team Dennis Van Der Staay Dave Bernthal Ting Ting Liu Nick Handel Spencer Barton 46

  47. Thank You! Spencer Barton spencer@branch.co 47

  48. Appendix 48

  49. Who else is talking about Feature Services? Nick Handel delivering an earlier version of this presentation ● Varant Zanoyan, Zipline at Airbnb ● Uber’s Michelangelo ● 49

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend