Intelligent Services Serving Machine Learning Joseph E. Gonzalez - - PowerPoint PPT Presentation

intelligent services
SMART_READER_LITE
LIVE PREVIEW

Intelligent Services Serving Machine Learning Joseph E. Gonzalez - - PowerPoint PPT Presentation

Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Big Training Data Models Contemporary


slide-1
SLIDE 1

Joseph E. Gonzalez

jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.

Intelligent Services

Serving Machine Learning

slide-2
SLIDE 2

Contemporary Learning Systems

Training Data Models

Big Big

slide-3
SLIDE 3

Contemporary Learning Systems

MLlib

Create

MLC

LIBSVM

VW

Oryx 2

BIDMach

slide-4
SLIDE 4

Training

Data Model

What happens after we train a model?

Dashboards and Reports Conference Papers Drive Actions

slide-5
SLIDE 5

Training

Data Model

What happens after we train a model?

Dashboards and Reports Conference Papers Drive Actions

slide-6
SLIDE 6

Suggesting Items at Checkout Fraud Detection Cognitive Assistance Internet of Things Low-Latency Personalized Rapidly Changing

slide-7
SLIDE 7

Train

Data Model

slide-8
SLIDE 8

Train

Data Model Actions

slide-9
SLIDE 9

9

Machine Learning Intelligent Services

slide-10
SLIDE 10

The Life of a Query in an Intelligent Service

Web Serving Tier

User

Product Info

Intelligent Service

User Data Model Info

Lookup Model Feature Lookup Feature Lookup Top-K Query Request: Items like x New Page Images … Top Items Content Request Feedback: Preferred Item Feedback

μ σ ρ

∑ ∫ α β

math

slide-11
SLIDE 11

Essential Attributes of Intelligent Services Responsive

Intelligent applications are interactive

Adaptive

ML models out-of-date the moment learning is done

Manageable

Many models created by multiple people

slide-12
SLIDE 12

Responsive: Now and Always

Compute predictions in < 20ms for complex

under heavy query load with system failures. Models Queries

Top K

Features

SELECT * FROM users JOIN items, click_logs, pages WHERE …

slide-13
SLIDE 13

Experiment: End-to-end Latency in Spark MLlib

To JSON HTTP Req. Feature Trans.

Evaluate Model

Encode Prediction HTTP Response

4

slide-14
SLIDE 14

Count out of 1000 Latency measured in milliseconds

NOP (Avg = 5.5, P99 = 20.6) Single Logistic Regression (Avg = 21.8, P99 = 38.6) Decision Tree (Avg = 22.4, P99 = 63.8) One-vs-all LR (10-class) (Avg = 137.7, P99 = 217.7) 100 Tree Random Forrest (Avg = 50.5, P99 = 73.4)

500 Tree Random Forrest (Avg = 172.56, P99 = 268.7) 500 Tree Random Forrest (Avg = 172.6, P99 = 268.7)

AlexNet CNN (Avg = 418.7, P99 = 549.8) End-to-end Latency for Digits Classification 784 dimension input Served using MLlib and Dato Inc.

slide-15
SLIDE 15

4.3 21.8 22.4 137.7 50.5 172.6 418.7

50 100 150 200 250 300 350 400 450

Predict Avg Is "4" LR Decision Tree 10-Class LR 100 Random Forrest 500 Random Forrest C++ AlexNet

Latency in Milliseconds

slide-16
SLIDE 16

Adaptive to Change at All Scales

Months Rate of Change Minutes Population Granularity of Data Session Shopping for Mom Shopping for Me

slide-17
SLIDE 17

Adaptive to Change at All Scales

Months Rate of Change Minutes Population Granularity of Data Session Shopping for Mom Shopping for Me

Population

Law of Large Numbers à Change Slow Rely on efficient offline retraining à High-throughput Systems

Months

slide-18
SLIDE 18

Adaptive to Change at All Scales

Months Rate of Change Minutes Population Granularity of Data Session

Small Data à Rapidly Changing Low Latency à Online Learning Sensitive to feedback bias

Shopping for Mom Shopping for Me

slide-19
SLIDE 19

The Feedback Loop

I once looked at cameras on Amazon …

Similar cameras and accessories

Opportunity for

Bandit Algorithms

Bandits present new challenges:

  • computation overhead
  • complicates caching + indexing
slide-20
SLIDE 20

Exploration / Exploitation Tradeoff

Systems that can take actions can adversely bias future data.

Opportunity for Bandits!

Bandits present new challenges:

  • Complicates caching + indexing
  • tuning + counterfactual reasoning
slide-21
SLIDE 21

Management: Collaborative Development

Teams of data-scientists working on similar tasks Ø“competing” features and models Complex model dependencies:

Cat Photo

isCat Cuteness Predictor Cat Classifier Animal Classifier Cute! isAnimal

slide-22
SLIDE 22

Predictive Services UC Berkeley AMPLab

Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan

slide-23
SLIDE 23

Predictive Services UC Berkeley AMPLab

Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan

Active Research Project

slide-24
SLIDE 24

Velox Model Serving System

Focuses on the multi-task learning (MTL) domain

[CIDR’15, LearningSys’15]

Spam Classification

f1( ) → f2( ) →

Content Rec. Scoring

Session 1:

f1( ) →

Session 2:

f2( ) →

Localized Anomaly Detection

f1( ) → f2( ) →

slide-25
SLIDE 25

Velox Model Serving System

Personalized Models (Mulit-task Learning)

[CIDR’15, LearningSys’15]

Input Output

“Separate” model for each user/context.

slide-26
SLIDE 26

Personalized Models (Mulit-task Learning)

Split

Personalization Model Feature Model

Velox Model Serving System

[CIDR’15, LearningSys’15]

slide-27
SLIDE 27

Hybrid Offline + Online Learning

Split

Personalization Model Feature Model

Update the user weights online:

  • Simple to train + more robust model
  • Address rapidly changing user statistics

Update feature functions offline using batch solvers

  • Leverage high-throughput systems (Apache Spark)
  • Exploit slow change in population statistics

f(x; θ)T wu

slide-28
SLIDE 28

Hybrid Online + Offline Learning Results

Similar Test Error Substantially Faster Training

User Pref. Change

Hybrid Offline Full Hybrid Offline Full

slide-29
SLIDE 29

Evaluating the Model

Split

Cache Feature Evaluation Input

slide-30
SLIDE 30

Evaluating the Model

Split

Cache Feature Evaluation Input Feature Caching Across Users Anytime Feature Evaluation Approximate Feature Hashing

slide-31
SLIDE 31

Feature Caching

Feature Hash Table h(x)

Hash input:

f(x; θ)

Compute feature:f(x; θ) New input: x

slide-32
SLIDE 32

LSH Cache Coarsening

Hash new input: h(z)

Use Wrong Value! à LSH hash fn.

New input z 6= x

Feature Hash Table

f(x; θ)

slide-33
SLIDE 33

LSH Cache Coarsening

Feature Hash Table

Hash new input:

f(x; θ)

h(z)

Use Value Anyways! à Req. LSH

x ≈ z ⇒ h(x) = h(z)

Locality-Sensitive Hashing:

f(x; θ) ≈ f(z; θ) ⇒ h(x) = h(z)

Locality-Sensitive Caching:

slide-34
SLIDE 34

Anytime Predictions

Compute features asynchronously: if a particular element does not arrive use estimator instead

Always able to render a prediction by the latency deadline

f1(x; θ) wu1 + E [f2(x; θ)] wu2 + f3(x; θ) wu3

__ __ __

slide-35
SLIDE 35

No Coarsening

Coarsening + Anytime Predictions

Overly Coarsened More Features

  • Approx. Expectation

Better

Best

Coarser Hash

fi(x; θ) ≈ E [fi(x; θ)]

Checkout our poster!

fi(x; θ) ≈ fi(z; θ)

slide-36
SLIDE 36

Spark Streaming

Spark SQL Graph X ML library BlinkDB MLbase

Velox

Training

Management + Serving

Spark

HDFS, S3, … Tachyon

Model Manager Prediction Service

Mesos

Part of Berkeley Data Analytics Stack

slide-37
SLIDE 37

Predictive Services UC Berkeley AMPLab

Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan

slide-38
SLIDE 38

Dato Predictive Services

ØElastic scaling and load balancing of docker.io containers ØAWS Cloudwatch Metrics and Reporting ØServes Dato Create models, scikit-learn, and custom python ØDistributed shared caching: scale-out to address latency ØREST management API: Demo?

Production ready model serving and management system

slide-39
SLIDE 39

Predictive Services UC Berkeley AMPLab

Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan

Caching, Bandits, & Management Online/Offline Learning Latency vs. Accuracy

Key Insights:

Responsive Adaptive Manageable

slide-40
SLIDE 40

Train

Data Model Actions

Future of Learning Systems

slide-41
SLIDE 41

Thank You

Joseph E. Gonzalez

jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley

joseph@dato.com, Co-Founder @ Dato