Joseph E. Gonzalez
jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.
Intelligent Services Serving Machine Learning Joseph E. Gonzalez - - PowerPoint PPT Presentation
Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Big Training Data Models Contemporary
Joseph E. Gonzalez
jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.
Contemporary Learning Systems
Contemporary Learning Systems
MLlib
Create
MLC
LIBSVM
VW
Oryx 2
BIDMach
Training
What happens after we train a model?
Dashboards and Reports Conference Papers Drive Actions
Training
What happens after we train a model?
Dashboards and Reports Conference Papers Drive Actions
Suggesting Items at Checkout Fraud Detection Cognitive Assistance Internet of Things Low-Latency Personalized Rapidly Changing
Train
Train
9
The Life of a Query in an Intelligent Service
Web Serving Tier
User
Product Info
Intelligent Service
User Data Model Info
Lookup Model Feature Lookup Feature Lookup Top-K Query Request: Items like x New Page Images … Top Items Content Request Feedback: Preferred Item Feedback
μ σ ρ
∑ ∫ α β
math
Essential Attributes of Intelligent Services Responsive
Intelligent applications are interactive
Adaptive
ML models out-of-date the moment learning is done
Manageable
Many models created by multiple people
Compute predictions in < 20ms for complex
under heavy query load with system failures. Models Queries
Top K
Features
SELECT * FROM users JOIN items, click_logs, pages WHERE …
Experiment: End-to-end Latency in Spark MLlib
To JSON HTTP Req. Feature Trans.
Evaluate Model
Encode Prediction HTTP Response
Count out of 1000 Latency measured in milliseconds
NOP (Avg = 5.5, P99 = 20.6) Single Logistic Regression (Avg = 21.8, P99 = 38.6) Decision Tree (Avg = 22.4, P99 = 63.8) One-vs-all LR (10-class) (Avg = 137.7, P99 = 217.7) 100 Tree Random Forrest (Avg = 50.5, P99 = 73.4)
500 Tree Random Forrest (Avg = 172.56, P99 = 268.7) 500 Tree Random Forrest (Avg = 172.6, P99 = 268.7)
AlexNet CNN (Avg = 418.7, P99 = 549.8) End-to-end Latency for Digits Classification 784 dimension input Served using MLlib and Dato Inc.
4.3 21.8 22.4 137.7 50.5 172.6 418.7
50 100 150 200 250 300 350 400 450
Predict Avg Is "4" LR Decision Tree 10-Class LR 100 Random Forrest 500 Random Forrest C++ AlexNet
Latency in Milliseconds
Adaptive to Change at All Scales
Months Rate of Change Minutes Population Granularity of Data Session Shopping for Mom Shopping for Me
Adaptive to Change at All Scales
Months Rate of Change Minutes Population Granularity of Data Session Shopping for Mom Shopping for Me
Population
Law of Large Numbers à Change Slow Rely on efficient offline retraining à High-throughput Systems
Months
Adaptive to Change at All Scales
Months Rate of Change Minutes Population Granularity of Data Session
Small Data à Rapidly Changing Low Latency à Online Learning Sensitive to feedback bias
Shopping for Mom Shopping for Me
The Feedback Loop
I once looked at cameras on Amazon …
Similar cameras and accessories
Opportunity for
Bandit Algorithms
Bandits present new challenges:
Exploration / Exploitation Tradeoff
Systems that can take actions can adversely bias future data.
Opportunity for Bandits!
Bandits present new challenges:
Management: Collaborative Development
Teams of data-scientists working on similar tasks Ø“competing” features and models Complex model dependencies:
Cat Photo
isCat Cuteness Predictor Cat Classifier Animal Classifier Cute! isAnimal
Predictive Services UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Predictive Services UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Active Research Project
Focuses on the multi-task learning (MTL) domain
[CIDR’15, LearningSys’15]
Spam Classification
f1( ) → f2( ) →
Content Rec. Scoring
Session 1:
f1( ) →
Session 2:
f2( ) →
Localized Anomaly Detection
f1( ) → f2( ) →
Personalized Models (Mulit-task Learning)
[CIDR’15, LearningSys’15]
Input Output
“Separate” model for each user/context.
Personalized Models (Mulit-task Learning)
Split
Personalization Model Feature Model
[CIDR’15, LearningSys’15]
Hybrid Offline + Online Learning
Split
Personalization Model Feature Model
Update the user weights online:
Update feature functions offline using batch solvers
Hybrid Online + Offline Learning Results
Similar Test Error Substantially Faster Training
User Pref. Change
Hybrid Offline Full Hybrid Offline Full
Split
Cache Feature Evaluation Input
Split
Cache Feature Evaluation Input Feature Caching Across Users Anytime Feature Evaluation Approximate Feature Hashing
Feature Hash Table h(x)
Hash input:
f(x; θ)
Compute feature:f(x; θ) New input: x
Hash new input: h(z)
Use Wrong Value! à LSH hash fn.
New input z 6= x
Feature Hash Table
f(x; θ)
Feature Hash Table
Hash new input:
f(x; θ)
h(z)
Use Value Anyways! à Req. LSH
x ≈ z ⇒ h(x) = h(z)
Locality-Sensitive Hashing:
f(x; θ) ≈ f(z; θ) ⇒ h(x) = h(z)
Locality-Sensitive Caching:
Compute features asynchronously: if a particular element does not arrive use estimator instead
Always able to render a prediction by the latency deadline
f1(x; θ) wu1 + E [f2(x; θ)] wu2 + f3(x; θ) wu3
No Coarsening
Coarsening + Anytime Predictions
Overly Coarsened More Features
Better
Best
Coarser Hash
fi(x; θ) ≈ E [fi(x; θ)]
Checkout our poster!
fi(x; θ) ≈ fi(z; θ)
Spark Streaming
Spark SQL Graph X ML library BlinkDB MLbase
Velox
Training
Management + Serving
Spark
HDFS, S3, … Tachyon
Model Manager Prediction Service
Mesos
Predictive Services UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
ØElastic scaling and load balancing of docker.io containers ØAWS Cloudwatch Metrics and Reporting ØServes Dato Create models, scikit-learn, and custom python ØDistributed shared caching: scale-out to address latency ØREST management API: Demo?
Production ready model serving and management system
Predictive Services UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Caching, Bandits, & Management Online/Offline Learning Latency vs. Accuracy
Key Insights:
Responsive Adaptive Manageable
Train
Future of Learning Systems
Thank You
jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley
joseph@dato.com, Co-Founder @ Dato