Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace - - PowerPoint PPT Presentation

machine learning pipeline for real time forecasting uber
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace - - PowerPoint PPT Presentation

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace Chong Sun, Danny Yuan Forecasting On A Global Scale Cases For Real-Time Forecasting 01.01.17 Dynamic Pricing: Every Minute, Every Where Dynamic Pricing: Every Minute, Every


slide-1
SLIDE 1

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace

Chong Sun, Danny Yuan

slide-2
SLIDE 2

Forecasting On A Global Scale

slide-3
SLIDE 3
slide-4
SLIDE 4

01.01.17

Cases For Real-Time Forecasting

slide-5
SLIDE 5

Dynamic Pricing: Every Minute, Every Where

slide-6
SLIDE 6

Dynamic Pricing: Every Minute, Every Where, Every Trip

slide-7
SLIDE 7

We Forecast Time Series

slide-8
SLIDE 8

We Forecast Time Series For Given Geo Locations

slide-9
SLIDE 9
slide-10
SLIDE 10

A Few Constraints

  • More recent data has more signals
slide-11
SLIDE 11

A Few Constraints

  • Smaller areas have more noise
slide-12
SLIDE 12

A Few Constraints

  • Smaller areas have more noise
slide-13
SLIDE 13

A Few Constraints

  • More recent data has more signals
  • Smaller areas have more noise
  • We were rolling out business city by city with competing

models ○ FFT ○ Kalman Filter ○ Regressions ○ LSTM

slide-14
SLIDE 14

First Pipeline

slide-15
SLIDE 15

The Training Pipeline

slide-16
SLIDE 16

The Training Pipeline

slide-17
SLIDE 17

The Training Pipeline

slide-18
SLIDE 18

The Training Pipeline

  • Airflow
  • PySpark
  • SciPy
slide-19
SLIDE 19

The Training Pipeline

  • Cassandra
slide-20
SLIDE 20

A Need for Fast Time Series DB

  • Cassandra
  • Elasticsearch
slide-21
SLIDE 21

A Need For Streaming Data

  • Kafka
slide-22
SLIDE 22

A Need For Unified Feature Engine

slide-23
SLIDE 23

A Digression To Feature Engine

slide-24
SLIDE 24

A Digression To Feature Engine

  • DataFlow API
slide-25
SLIDE 25

A Digression To Feature Engine

  • Flink
slide-26
SLIDE 26

A Digression To Feature Engine

  • Reusable functions
  • Schema driven
  • Discoverable by meta data
slide-27
SLIDE 27

Inferencing Pipeline

  • Elasticsearch
slide-28
SLIDE 28

Inferencing Pipeline

slide-29
SLIDE 29

Real-time Visualization

slide-30
SLIDE 30

Real-time Validation

slide-31
SLIDE 31

A New Challenge: Model Management

slide-32
SLIDE 32
slide-33
SLIDE 33

More Signals

slide-34
SLIDE 34

Scalable Model Evaluation

slide-35
SLIDE 35

Metrics-as-a-Service

slide-36
SLIDE 36

Model Lifecycle Management System (MLMS)

slide-37
SLIDE 37

What if you're supporting 5+ teams, 10+ products with 4000+ model instances in production

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

Machine Learning Model Lifecycle

slide-43
SLIDE 43

Machine Learning Model Lifecycle

slide-44
SLIDE 44

Machine Learning Model Lifecycle

slide-45
SLIDE 45

Machine Learning Model Lifecycle

slide-46
SLIDE 46

Machine Learning Model Lifecycle

slide-47
SLIDE 47

Machine Learning Model Lifecycle

slide-48
SLIDE 48

Common Questions in the process ...

  • Where am I going to save and serve my models?
  • How do I keep track of the model metadata, e.g., training data used?
  • How can I easily find a previous model for testing and performance comparison?
  • How can I automatically deploy a large scale number of models?
  • When should I decide to trigger model re-training?
  • How can I make sure I would not override any (production) models?
  • How do we manage multiple dependent models?
  • … ...
slide-49
SLIDE 49

Common Questions in the process ...

  • Where am I going to save and serve my models?
  • How do I keep track of the model metadata, e.g., training data used?
  • How can I easily find a previous model for testing and performance comparison?
  • How can I automatically deploy a large scale number of models?
  • When should I decide to trigger model re-training?
  • How can I make sure I would not override any (production) models?
  • How do we manage multiple dependent models?
  • … ...

Model Lifecycle Management System (MLMS)

slide-50
SLIDE 50

MLMS Design Principles

  • Immutable Models
  • Model Neutral
  • Flexible
  • Automated Dynamic Orchestration
slide-51
SLIDE 51

MLMS Architecture

slide-52
SLIDE 52

MLMS Architecture

slide-53
SLIDE 53

MLMS Architecture

slide-54
SLIDE 54

MLMS Architecture

slide-55
SLIDE 55

MLMS Architecture

slide-56
SLIDE 56

MLMS Architecture

slide-57
SLIDE 57

MLMS Architecture

slide-58
SLIDE 58

Machine Learning Model Lifecycle MLMS

slide-59
SLIDE 59

Data Science and Engineering Work Flow

slide-60
SLIDE 60

Data Scientists And Engineers Work In Lock Steps

slide-61
SLIDE 61

Engineers Are Blocked Before Modeling Is Done

slide-62
SLIDE 62

Time For Productization Is Often Squeezed

slide-63
SLIDE 63

Rolling Out To All Cities Are Slow And Painful

slide-64
SLIDE 64

Analysis of Bottlenecks

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java)

slide-65
SLIDE 65

Analysis of Bottlenecks

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) Restricted Models

slide-66
SLIDE 66

Analysis of Bottlenecks

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS → Eng Knowledge Transfer Reimplementing Model

slide-67
SLIDE 67

Analysis of Bottlenecks

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS/Eng Model Parity

slide-68
SLIDE 68

Analysis of Bottlenecks

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS/Eng Performance Debug

slide-69
SLIDE 69

Key Insight: Can We All Enjoy One ML Ecosystem?

slide-70
SLIDE 70

Unified Framework → Many Benefits

  • Standardized project structure
  • Out-of-box support of local and remote deployment
  • Reusable algorithms and framework
  • Design review between engineer and DS
  • Code review between engineer and DS
  • Who codes, who debugs
slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74
slide-75
SLIDE 75

TensorFlow

Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Java) Model Serving Production (Eng, Java) Restricted Models DS → Eng Knowledge Transfer DS/Eng Model Parity Eng Model Performance Debug Dev (Python) Train (Python) Serve (Python/Java) TensorFlow Graph (C++) Client Runtime Reimplementing Model

slide-76
SLIDE 76

Enable DS to Write Production-Ready Code

  • Tensorflow

○ Efficient core ○ DS-friendly API

  • Engineers focusing on optimization and automation

○ Parallelization of algorithms ○ End-to-end automation ○ Visualization ○ Integration ○ Project scaffolding

slide-77
SLIDE 77

Example

Build your own FTRL Use a framework

slide-78
SLIDE 78

Building Tools

  • Model Lifecycle Management System
  • Hyperparameter Tuning
  • Horovod for Distributed TensorFlow Training
slide-79
SLIDE 79

Conclusion

  • A fully automated MLMS is key to the success of complex ML

systems

  • A single framework for DS and engineers boosts productivity
  • Building great tools is crucial to ML projects
slide-80
SLIDE 80

Q & A

slide-81
SLIDE 81
slide-82
SLIDE 82

How do we make the forecasts?

slide-83
SLIDE 83

Batch forecasting (2015)

Batch Forecast Data Sources Forecasts (ARIMA, FFT)

slide-84
SLIDE 84

Batch forecasting + Real-time Adjustment

Batch Forecast Data Sources Forecasts (ARIMA, FFT) Realtime Adjust & Serve Consumer (Exponential Smoothing)

slide-85
SLIDE 85

Issues Observed

Not many ML libraries for Node.js Real-time component (Node.js) can not support CPU intensive computation Can not handle large scale data features in real-time Can not share code for batch and online processing

slide-86
SLIDE 86

Second Generation of Forecasting Engine

(Inspired by DataFlow and TensorFlow) Some interesting design principles: Both realtime and batch prediction: prediction is minute level, backtesting/evaluation requires batch processing

slide-87
SLIDE 87

Machine Learning Model Lifecycle

slide-88
SLIDE 88

MLMS Architecture

Given model_name=linear_demand_model and city_id=1 When status == 'alerting' and time_sustained > 3 days Then retrainModel(model_name, city_id, model_version)

slide-89
SLIDE 89