Designing an ML - Minded Product and a Product-Minded ML System - - PowerPoint PPT Presentation

designing an ml minded product and a product minded ml
SMART_READER_LITE
LIVE PREVIEW

Designing an ML - Minded Product and a Product-Minded ML System - - PowerPoint PPT Presentation

Designing an ML - Minded Product and a Product-Minded ML System ACM Webinar January 23, 2019 Grace Huang Personalized Homefeed Personalized Homefeed Personalization: Scoring and ranking Picking the best of the best among candidates A


slide-1
SLIDE 1

Designing an ML-Minded Product and a Product-Minded ML System

ACM Webinar January 23, 2019 Grace Huang

slide-2
SLIDE 2

Personalized Homefeed

slide-3
SLIDE 3

Personalized Homefeed

Personalization: Scoring and ranking

Picking the best of the best among candidates

slide-4
SLIDE 4
  • Supervised learning with labels:
  • 1 = some positive engagements
  • 0 = no engagement or negative actions
  • Learns to predict a positive engagement (ranking) score
  • Pins are then sorted by engagement score = f(pin, user, …)


A ranking model

slide-5
SLIDE 5

Data collection Feature engineering Model training Prediction (Store and serve)

slide-6
SLIDE 6

Components of A production ML system

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Launch Serving

Predictions

slide-7
SLIDE 7

We will focus on data, evaluation and shipping

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Serving

Predictions

Launch

slide-8
SLIDE 8

Considerations for a data pipeline

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

slide-9
SLIDE 9

Cravings Omar Seyal

The perfect path to cold brew

36 Caffeinated Inc.

User’s past actions: engagement signals Derived user profiles from past actions User profile Pin Derived pin information

Engagement score = f(pin, user…)

slide-10
SLIDE 10

Considerations for a data pipeline

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

  • Logging (and changes)
  • Aggregations (ETLs)
  • ETL management libraries
  • Data validation
  • Monitoring and alerts for the pipeline
slide-11
SLIDE 11

Training data should be carefully managed

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

  • Sampling scheme
  • Version control
  • Monitoring feature distribution changes
  • Feature extraction and transformations
  • Feature value validation
  • Shared feature store or individual pipelines
slide-12
SLIDE 12

Training and serving data discrepency(skew)?

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

  • Training data sampled differently from serving

data?

  • There is a lag to certain features being populated?

(e.g. takes a long time to compute)

  • Logging change?
  • ETL breaks?
  • Seasonality
  • Market differences
slide-13
SLIDE 13

How to evaluate a candidate model

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

  • Your favorite offline performance

measures

  • Human evaluation
  • Custom tools (e.g. side by side,

simulated debuggers for sanity check, funnels..etc)

slide-14
SLIDE 14

How to evaluate a candidate model

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Shipping Serving

Predictions

  • - Goal metrics
  • - Leading indicators
  • - Debug metrics
  • - Guardrail metrics
  • - Custom tools
  • Metrics vs. loss function
slide-15
SLIDE 15

Shipping criteria should include…

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Serving

Predictions

  • Metrics
  • Infrastructure cost
  • Maintenance overhead

(regularization!)

  • Product vision
  • Cannibilization
  • Speed vs. iteration

Launch

slide-16
SLIDE 16

Once shipped, continue to monitor

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Serving

Predictions

  • Continuous monitoring:
  • Goal metrics on dashboards
  • Alerts for data and prediction

distribution drifts

  • Runbook, tools and

delegation for investigations

Launch

slide-17
SLIDE 17

Automation is key

Model Data pipeline (training/ test) Data (to make predictions) On-line experiments Launch (production model) Data (for training/

  • ffmine

testing) Offmine evaluations

Data Training Evaluation Serving

Predictions

Launch

slide-18
SLIDE 18

Lessons learned

#1 Beware of Data and System Bias #2 Testing & Monitoring …..(Do it!) #3 Good Infrastructure Speeds Up Iteration #4 Measurement and Understanding are Crucial #5 Build a Sustainable Ecosystem #6 Design a ML Minded Product , and a Product Minded ML System

slide-19
SLIDE 19

#1 Beware of Data and System Bias

slide-20
SLIDE 20

VS

Engagement data complements pin information

slide-21
SLIDE 21

VS

Engagement data is a double-sided sword!

slide-22
SLIDE 22

Remove bias and effects of the existing system as much as possible (so rich doesn’t get richer)

slide-23
SLIDE 23

#2 Testing & Monitoring …..(Do it!)

slide-24
SLIDE 24

Weeks……. Some important metric Not good!!!

slide-25
SLIDE 25

Weeks……. Some important metric Not good!!!

GBDT Migration to Neural Network

slide-26
SLIDE 26

Weeks……. Some important metric Not good!!!

Offline data distribution != Online data distribution Offline data distribution != Online data distribution Data coverage drop or corruption -> Silent failures

Migration to Neural Network Data change

slide-27
SLIDE 27

#3 Good Infrastructure Speeds Up Iteration

slide-28
SLIDE 28

Can multiple engineers work on the system simultaneously?

  • Are there automated training/deploy

pipelines? Can they ship multiple experiments at once?

  • Are there effective offline analysis tools to

help reduce amount of live experiments needed?

slide-29
SLIDE 29

#4 Measurement and Understanding are Crucial

slide-30
SLIDE 30
  • Final bar is running on live traffic
  • Run experiments to learn

wo Line Title Subtitle Baseline Guide > wo Line Subtitle Bullet Top Guide > itle or Subtitle Bullet Top Guide >

Offline performance != Online performance

slide-31
SLIDE 31

!31

Invest in toolings and experiments to understand the blackbox

  • Ablation experiments
  • Are sub-populations of users disproportionally

impacted

  • Analyses and tools to help us understand long term,

ecosystem effect

slide-32
SLIDE 32

It’s easy to get what you wish for, but not what you want……. (Goodharts Law)

slide-33
SLIDE 33

#5 Build a Sustainable Ecosystem

slide-34
SLIDE 34

Are we taking care of fresh, less impressed content?

Lower
 Ranking Score Higher 
 Ranking Score

Fresher Older

Do we handle cold starts elegantly?

slide-35
SLIDE 35

Are we taking care of content with missing features (or features whose generation is delayed)?

Do we handle cold starts elegantly?

Streaky, offensive content!

slide-36
SLIDE 36

Build a system with tight negative feedback, and make use of (explicit) negative signals as much as possible

But separate spam/racy filtering from negative signal incorporation in ML models

  • Model / Objective Function - Change label / prediction

target / model architecture so that negative events are tied to the objective function we optimize 


  • Features - Add more features that help in predicting

negative events


slide-37
SLIDE 37

#6 Design a ML Minded Product , and a Product Minded ML System

slide-38
SLIDE 38

Do you really need ML?

slide-39
SLIDE 39

For complex problems like diversity and freshness, ML components need to work in concert

Beware of bottleneck!!

slide-40
SLIDE 40

Important to have a way to build policy and product vision into the ML system

slide-41
SLIDE 41

Independent surfaces for exploitation vs. exploration Exploration Exploitation

slide-42
SLIDE 42

Build a system for users tomorrow (or users you really care about)

Global engagement Local engagement

slide-43
SLIDE 43

Confidential

43

Thank you