SLIDE 1 Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace
Chong Sun, Danny Yuan
SLIDE 2
Forecasting On A Global Scale
SLIDE 3
SLIDE 4 01.01.17
Cases For Real-Time Forecasting
SLIDE 5
Dynamic Pricing: Every Minute, Every Where
SLIDE 6
Dynamic Pricing: Every Minute, Every Where, Every Trip
SLIDE 7
We Forecast Time Series
SLIDE 8
We Forecast Time Series For Given Geo Locations
SLIDE 9
SLIDE 10 A Few Constraints
- More recent data has more signals
SLIDE 11 A Few Constraints
- Smaller areas have more noise
SLIDE 12 A Few Constraints
- Smaller areas have more noise
SLIDE 13 A Few Constraints
- More recent data has more signals
- Smaller areas have more noise
- We were rolling out business city by city with competing
models ○ FFT ○ Kalman Filter ○ Regressions ○ LSTM
SLIDE 14
First Pipeline
SLIDE 15
The Training Pipeline
SLIDE 16
The Training Pipeline
SLIDE 17
The Training Pipeline
SLIDE 18 The Training Pipeline
SLIDE 19 The Training Pipeline
SLIDE 20 A Need for Fast Time Series DB
SLIDE 21 A Need For Streaming Data
SLIDE 22
A Need For Unified Feature Engine
SLIDE 23
A Digression To Feature Engine
SLIDE 24 A Digression To Feature Engine
SLIDE 25 A Digression To Feature Engine
SLIDE 26 A Digression To Feature Engine
- Reusable functions
- Schema driven
- Discoverable by meta data
SLIDE 27 Inferencing Pipeline
SLIDE 28
Inferencing Pipeline
SLIDE 29
Real-time Visualization
SLIDE 30
Real-time Validation
SLIDE 31
A New Challenge: Model Management
SLIDE 32
SLIDE 33
More Signals
SLIDE 34
Scalable Model Evaluation
SLIDE 35
Metrics-as-a-Service
SLIDE 36
Model Lifecycle Management System (MLMS)
SLIDE 37
What if you're supporting 5+ teams, 10+ products with 4000+ model instances in production
SLIDE 38
SLIDE 39
SLIDE 40
SLIDE 41
SLIDE 42
Machine Learning Model Lifecycle
SLIDE 43
Machine Learning Model Lifecycle
SLIDE 44
Machine Learning Model Lifecycle
SLIDE 45
Machine Learning Model Lifecycle
SLIDE 46
Machine Learning Model Lifecycle
SLIDE 47
Machine Learning Model Lifecycle
SLIDE 48 Common Questions in the process ...
- Where am I going to save and serve my models?
- How do I keep track of the model metadata, e.g., training data used?
- How can I easily find a previous model for testing and performance comparison?
- How can I automatically deploy a large scale number of models?
- When should I decide to trigger model re-training?
- How can I make sure I would not override any (production) models?
- How do we manage multiple dependent models?
- … ...
SLIDE 49 Common Questions in the process ...
- Where am I going to save and serve my models?
- How do I keep track of the model metadata, e.g., training data used?
- How can I easily find a previous model for testing and performance comparison?
- How can I automatically deploy a large scale number of models?
- When should I decide to trigger model re-training?
- How can I make sure I would not override any (production) models?
- How do we manage multiple dependent models?
- … ...
Model Lifecycle Management System (MLMS)
SLIDE 50 MLMS Design Principles
- Immutable Models
- Model Neutral
- Flexible
- Automated Dynamic Orchestration
SLIDE 51
MLMS Architecture
SLIDE 52
MLMS Architecture
SLIDE 53
MLMS Architecture
SLIDE 54
MLMS Architecture
SLIDE 55
MLMS Architecture
SLIDE 56
MLMS Architecture
SLIDE 57
MLMS Architecture
SLIDE 58
Machine Learning Model Lifecycle MLMS
SLIDE 59
Data Science and Engineering Work Flow
SLIDE 60
Data Scientists And Engineers Work In Lock Steps
SLIDE 61
Engineers Are Blocked Before Modeling Is Done
SLIDE 62
Time For Productization Is Often Squeezed
SLIDE 63
Rolling Out To All Cities Are Slow And Painful
SLIDE 64 Analysis of Bottlenecks
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java)
SLIDE 65 Analysis of Bottlenecks
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) Restricted Models
SLIDE 66 Analysis of Bottlenecks
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS → Eng Knowledge Transfer Reimplementing Model
SLIDE 67 Analysis of Bottlenecks
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS/Eng Model Parity
SLIDE 68 Analysis of Bottlenecks
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Go/Java) Model Serving Production (Eng, Go/Java) DS/Eng Performance Debug
SLIDE 69
Key Insight: Can We All Enjoy One ML Ecosystem?
SLIDE 70 Unified Framework → Many Benefits
- Standardized project structure
- Out-of-box support of local and remote deployment
- Reusable algorithms and framework
- Design review between engineer and DS
- Code review between engineer and DS
- Who codes, who debugs
SLIDE 71
SLIDE 72
SLIDE 73
SLIDE 74
SLIDE 75 TensorFlow
Model Exploration (DS, Python) Model Training and Serving Implementation (DS/Eng, Python/Java) Model Serving Production (Eng, Java) Restricted Models DS → Eng Knowledge Transfer DS/Eng Model Parity Eng Model Performance Debug Dev (Python) Train (Python) Serve (Python/Java) TensorFlow Graph (C++) Client Runtime Reimplementing Model
SLIDE 76 Enable DS to Write Production-Ready Code
○ Efficient core ○ DS-friendly API
- Engineers focusing on optimization and automation
○ Parallelization of algorithms ○ End-to-end automation ○ Visualization ○ Integration ○ Project scaffolding
SLIDE 77
Example
Build your own FTRL Use a framework
SLIDE 78 Building Tools
- Model Lifecycle Management System
- Hyperparameter Tuning
- Horovod for Distributed TensorFlow Training
SLIDE 79 Conclusion
- A fully automated MLMS is key to the success of complex ML
systems
- A single framework for DS and engineers boosts productivity
- Building great tools is crucial to ML projects
SLIDE 80
Q & A
SLIDE 81
SLIDE 82
How do we make the forecasts?
SLIDE 83 Batch forecasting (2015)
Batch Forecast Data Sources Forecasts (ARIMA, FFT)
SLIDE 84 Batch forecasting + Real-time Adjustment
Batch Forecast Data Sources Forecasts (ARIMA, FFT) Realtime Adjust & Serve Consumer (Exponential Smoothing)
SLIDE 85
Issues Observed
Not many ML libraries for Node.js Real-time component (Node.js) can not support CPU intensive computation Can not handle large scale data features in real-time Can not share code for batch and online processing
SLIDE 86
Second Generation of Forecasting Engine
(Inspired by DataFlow and TensorFlow) Some interesting design principles: Both realtime and batch prediction: prediction is minute level, backtesting/evaluation requires batch processing
SLIDE 87
Machine Learning Model Lifecycle
SLIDE 88 MLMS Architecture
Given model_name=linear_demand_model and city_id=1 When status == 'alerting' and time_sustained > 3 days Then retrainModel(model_name, city_id, model_version)
SLIDE 89