Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - - PowerPoint PPT Presentation

putting deep learning models in production sahil dua
SMART_READER_LITE
LIVE PREVIEW

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets imagine! @sahildua2305 @sahildua2305 But ... @sahildua2305 @sahildua2305 whoami Software Developer @ Booking.com Previously - Deep Learning


slide-1
SLIDE 1

@sahildua2305 @sahildua2305

Putting Deep Learning Models in Production Sahil Dua

slide-2
SLIDE 2

@sahildua2305 @sahildua2305

Let’s imagine!

slide-3
SLIDE 3

@sahildua2305 @sahildua2305

But ...

slide-4
SLIDE 4

@sahildua2305 @sahildua2305

whoami

➔ Software Developer @ Booking.com ➔ Previously - Deep Learning Infrastructure ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker

slide-5
SLIDE 5

@sahildua2305 @sahildua2305

➔ Deep Learning at Booking.com ➔ Life-cycle of a model ➔ Training Models ➔ Serving Predictions

Agenda

slide-6
SLIDE 6

@sahildua2305 @sahildua2305

Deep Learning at Booking.com

slide-7
SLIDE 7

@sahildua2305 @sahildua2305

1.4 million+

active properties in 220+ countries

1,500,000+

room nights booked every 24 hours

Scale highlights.

slide-8
SLIDE 8

@sahildua2305 @sahildua2305

Deep Learning

➔ Image understanding ➔ Translations ➔ Ads bidding ➔ ...

slide-9
SLIDE 9

@sahildua2305 @sahildua2305

Image Tagging

slide-10
SLIDE 10

@sahildua2305 @sahildua2305

Image Tagging

slide-11
SLIDE 11

@sahildua2305 @sahildua2305

Image Tagging

Sea view: 6.38 Balcony/Terrace: 4.82 Photo of the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70

slide-12
SLIDE 12

@sahildua2305 @sahildua2305

slide-13
SLIDE 13

@sahildua2305 @sahildua2305

Image Tagging

Using the image tag information in the right context Swimming pool, Breakfast Buffet, etc.

slide-14
SLIDE 14

@sahildua2305 @sahildua2305

Lifecycle of a model

slide-15
SLIDE 15

@sahildua2305 @sahildua2305

Deploy

Lifecycle of a model

Train Data Analysis

slide-16
SLIDE 16

@sahildua2305 @sahildua2305

Training a Model - on laptop

slide-17
SLIDE 17

@sahildua2305 @sahildua2305

Training a Model - on laptop

slide-18
SLIDE 18

@sahildua2305 @sahildua2305

Machine Learning workload

➔ Computationally intensive workload ➔ Often not highly parallelizable algorithms ➔ 10 to 100 GBs of data

slide-19
SLIDE 19

@sahildua2305 @sahildua2305

Why Kubernetes (k8s)?

➔ Isolation ➔ Elasticity ➔ Flexibility

slide-20
SLIDE 20

@sahildua2305 @sahildua2305

Why k8s – GPUs?

➔ In alpha since 1.3 ➔ Speed up 20X-50X

resources: limits: alpha.kubernetes.io/nvidia-gpu: 1

slide-21
SLIDE 21

@sahildua2305 @sahildua2305

Training with k8s

➔ Base images with ML frameworks ◆ TensorFlow, Torch, VowpalWabbit, etc. ➔ Training code is installed at start time ➔ Data access - Hadoop (or PVs)

slide-22
SLIDE 22

@sahildua2305 @sahildua2305

Startup

Code Training pod

.. start.sh train.py evaluate.py

slide-23
SLIDE 23

@sahildua2305 @sahildua2305

Startup

Data

.. start.sh train.py evaluate.py PV

Training pod

slide-24
SLIDE 24

@sahildua2305 @sahildua2305

Streaming logs back

Logs

.. start.sh train.py evaluate.py PV

Training pod

slide-25
SLIDE 25

@sahildua2305 @sahildua2305

Exports the model

.. start.sh train.py evaluate.py PV

Training pod model

slide-26
SLIDE 26

@sahildua2305 @sahildua2305

Serving predictions

slide-27
SLIDE 27

@sahildua2305 @sahildua2305

Serving Predictions

Model

Client

Input Features Prediction

slide-28
SLIDE 28

@sahildua2305 @sahildua2305

Serving Predictions

Model 1

Client

Input Features Prediction Model X

Client

Input Features Prediction

slide-29
SLIDE 29

@sahildua2305 @sahildua2305

Serving Predictions

Model 1

Client

Input Features Prediction Model X

Client

Input Features Prediction

slide-30
SLIDE 30

@sahildua2305 @sahildua2305

Serving Predictions

➔ Stateless app with common code ➔ Containerized ➔ No model in image ➔ REST API for predictions

slide-31
SLIDE 31

@sahildua2305 @sahildua2305

Serving Predictions

App

Client

Input Features Prediction model

slide-32
SLIDE 32

@sahildua2305 @sahildua2305

Serving Predictions

➔ Get trained model from Hadoop ➔ Load model in memory ➔ Warm it up ➔ Expose HTTP API ➔ Respond to the probes

slide-33
SLIDE 33

@sahildua2305 @sahildua2305

Serving Predictions

Client

Input Features Prediction

slide-34
SLIDE 34

@sahildua2305 @sahildua2305

Serving Predictions

Client

Input Features Prediction

Client

Input Features Prediction

slide-35
SLIDE 35

@sahildua2305 @sahildua2305

Deploying a new model

➔ Create new Deployment ➔ Create new HTTP Route ➔ Wait for liveness/readiness probe

slide-36
SLIDE 36

@sahildua2305 @sahildua2305

Performance

PredictionTime = RequestOverhead + N*ComputationTime

N is the number of instances to predict on

slide-37
SLIDE 37

@sahildua2305 @sahildua2305

Optimizing for Latency

➔ Do not predict if you can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference

slide-38
SLIDE 38

@sahildua2305 @sahildua2305

Optimizing for Throughput

➔ Do not predict if you can precompute ➔ Batch requests ➔ Parallelize requests

slide-39
SLIDE 39

@sahildua2305 @sahildua2305

Summary

➔ Training models in pods ➔ Serving models ➔ Optimizing serving for latency/throughput

slide-40
SLIDE 40

@sahildua2305 @sahildua2305

Next steps

➔ Tooling to control hundred deployments ➔ Autoscale prediction service ➔ Hyper parameter tuning for training

slide-41
SLIDE 41

@sahildua2305 @sahildua2305

Want to get in touch?

LinkedIn / Twitter / GitHub

@sahildua2305

Website

www.sahildua.com

slide-42
SLIDE 42

@sahildua2305 @sahildua2305

THANK YOU

@sahildua2305