Designing an ML - Minded Product and a Product-Minded ML System ACM Webinar January 23, 2019 Grace Huang
Personalized Homefeed
Personalized Homefeed Personalization: Scoring and ranking Picking the best of the best among candidates
A ranking model • Supervised learning with labels: ◦ 1 = some positive engagements ◦ 0 = no engagement or negative actions • Learns to predict a positive engagement (ranking) score • Pins are then sorted by engagement score = f(pin, user, …)
Prediction Data Feature Model (Store and collection engineering training serve)
Components of A production ML system Data Training Serving Evaluation Launch Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model (production test) model) Data (for O ffm ine training/ evaluations o ffm ine testing)
We will focus on data, evaluation and shipping Data Training Serving Evaluation Launch Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model (production test) model) Data (for O ffm ine training/ evaluations o ffm ine testing)
Considerations for a data pipeline Data Training Serving Evaluation Shipping Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model (production test) model) Data (for O ffm ine training/ evaluations o ffm ine testing)
Engagement score = f(pin, user…) User profile Pin User’s past actions: engagement signals The perfect path 36 to cold brew Derived user profiles from past actions Ca ff einated Inc. Omar Seyal Cravings Derived pin information
Considerations for a data pipeline Data Training Serving Evaluation Shipping Data (to On-line Predictions make experiments - Logging (and changes) Data predictions) - Aggregations (ETLs) pipeline - ETL management libraries Launch (training/ - Data validation Model (production test) - Monitoring and alerts for the pipeline model) Data (for O ffm ine training/ evaluations o ffm ine testing)
Training data should be carefully managed Data Training Serving Evaluation Shipping Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model - Sampling scheme (production test) - Version control model) Data (for O ffm ine - Monitoring feature distribution changes training/ evaluations - Feature extraction and transformations o ffm ine - Feature value validation testing) - Shared feature store or individual pipelines
Training and serving data discrepency(skew)? Data Training Serving Evaluation Shipping Data (to On-line Predictions make - Training data sampled differently from serving experiments Data predictions) data? pipeline - There is a lag to certain features being populated? Launch (training/ (e.g. takes a long time to compute) Model (production test) - Logging change? model) - ETL breaks? Data (for O ffm ine - Seasonality training/ evaluations - Market differences o ffm ine testing)
How to evaluate a candidate model Data Training Serving Evaluation Shipping Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model (production test) - Your favorite offline performance model) measures Data (for O ffm ine - Human evaluation training/ evaluations - Custom tools (e.g. side by side, o ffm ine simulated debuggers for sanity testing) check, funnels..etc)
How to evaluate a candidate model Data Training Serving Evaluation Shipping • - Goal metrics • - Leading indicators • - Debug metrics Data (to • - Guardrail metrics On-line Predictions make • - Custom tools experiments Data predictions) - Metrics vs. loss function pipeline Launch (training/ Model (production test) model) Data (for O ffm ine training/ evaluations o ffm ine testing)
Shipping criteria should include… Data Training Serving Evaluation Launch Data (to On-line Predictions - Metrics make experiments Data predictions) - Infrastructure cost pipeline - Maintenance overhead Launch (training/ (regularization!) Model (production test) - Product vision model) - Cannibilization Data (for O ffm ine - Speed vs. iteration training/ evaluations o ffm ine testing)
Once shipped, continue to monitor Data Training Serving Evaluation Launch Data (to On-line Predictions make experiments - Continuous monitoring: Data predictions) - Goal metrics on dashboards pipeline - Alerts for data and prediction Launch (training/ Model distribution drifts (production test) - Runbook, tools and model) Data (for O ffm ine delegation for investigations training/ evaluations o ffm ine testing)
Automation is key Data Training Serving Evaluation Launch Data (to On-line Predictions make experiments Data predictions) pipeline Launch (training/ Model (production test) model) Data (for O ffm ine training/ evaluations o ffm ine testing)
Lessons learned #1 Beware of Data and System Bias #2 Testing & Monitoring …..(Do it!) #3 Good Infrastructure Speeds Up Iteration #4 Measurement and Understanding are Crucial #5 Build a Sustainable Ecosystem #6 Design a ML Minded Product , and a Product Minded ML System
#1 Beware of Data and System Bias
Engagement data complements pin information VS
Engagement data is a double-sided sword! VS
Remove bias and effects of the existing system as much as possible (so rich doesn’t get richer)
#2 Testing & Monitoring …..(Do it!)
Some important metric Not good!!! Weeks…….
GBDT Migration to Neural Network Some important metric Not good!!! Weeks…….
Offline data distribution != Online data distribution Offline data distribution != Online data distribution Data coverage drop or corruption -> Silent failures Data change Migration to Neural Network Some important metric Not good!!! Weeks…….
#3 Good Infrastructure Speeds Up Iteration
Can multiple engineers work on the system simultaneously? • Are there automated training/deploy pipelines? Can they ship multiple experiments at once? • Are there effective offline analysis tools to help reduce amount of live experiments needed?
#4 Measurement and Understanding are Crucial
Offline performance != Online performance • Final bar is running on live traffic wo Line Title Subtitle Baseline Guide > • Run experiments to learn itle or Subtitle Bullet Top Guide > wo Line Subtitle Bullet Top Guide >
Invest in toolings and experiments to understand the blackbox • Ablation experiments • Are sub-populations of users disproportionally impacted • Analyses and tools to help us understand long term, ecosystem e ff ect ! 31
It’s easy to get what you wish for, but not what you want……. (Goodharts Law)
#5 Build a Sustainable Ecosystem
Do we handle cold starts elegantly? Are we taking care of fresh, less impressed content? Lower Ranking Score Higher Ranking Score Fresher Older
Do we handle cold starts elegantly? Are we taking care of content with missing features (or features whose generation is delayed)? Streaky, offensive content!
Build a system with tight negative feedback, and make use of (explicit) negative signals as much as possible • Model / Objective Function - Change label / prediction target / model architecture so that negative events are tied to the objective function we optimize • Features - Add more features that help in predicting negative events But separate spam/racy filtering from negative signal incorporation in ML models
#6 Design a ML Minded Product , and a Product Minded ML System
Do you really need ML?
For complex problems like diversity and freshness, ML components need to work in concert Beware of bottleneck!!
Important to have a way to build policy and product vision into the ML system
Independent surfaces for exploitation vs. exploration Exploration Exploitation
Build a system for users tomorrow (or users you really care about) Global engagement Local engagement
Thank you Confidential � 43
Recommend
More recommend