putting deep learning models in production sahil dua
play

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets imagine! @sahildua2305 @sahildua2305 But ... @sahildua2305 @sahildua2305 whoami Software Developer @ Booking.com Previously - Deep Learning


  1. Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305

  2. Let’s imagine! @sahildua2305 @sahildua2305

  3. But ... @sahildua2305 @sahildua2305

  4. whoami ➔ Software Developer @ Booking.com ➔ Previously - Deep Learning Infrastructure ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker @sahildua2305 @sahildua2305

  5. Agenda ➔ Deep Learning at Booking.com ➔ Life-cycle of a model ➔ Training Models ➔ Serving Predictions @sahildua2305 @sahildua2305

  6. Deep Learning at Booking.com @sahildua2305 @sahildua2305

  7. Scale highlights . 1,500,000 + 1.4 million + room nights active properties booked in 220+ countries every 24 hours @sahildua2305 @sahildua2305

  8. Deep Learning ➔ Image understanding ➔ Translations ➔ Ads bidding ➔ ... @sahildua2305 @sahildua2305

  9. Image Tagging @sahildua2305 @sahildua2305

  10. Image Tagging @sahildua2305 @sahildua2305

  11. Image Tagging Sea view: 6.38 Balcony/Terrace: 4.82 Photo of the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70 @sahildua2305 @sahildua2305

  12. @sahildua2305 @sahildua2305

  13. Image Tagging Using the image tag information in the right context Swimming pool, Breakfast Buffet, etc. @sahildua2305 @sahildua2305

  14. Lifecycle of a model @sahildua2305 @sahildua2305

  15. Lifecycle of a model Data Train Deploy Analysis @sahildua2305 @sahildua2305

  16. Training a Model - on laptop @sahildua2305 @sahildua2305

  17. Training a Model - on laptop @sahildua2305 @sahildua2305

  18. Machine Learning workload ➔ Computationally intensive workload ➔ Often not highly parallelizable algorithms ➔ 10 to 100 GBs of data @sahildua2305 @sahildua2305

  19. Why Kubernetes (k8s)? ➔ Isolation ➔ Elasticity ➔ Flexibility @sahildua2305 @sahildua2305

  20. Why k8s – GPUs? ➔ In alpha since 1.3 ➔ Speed up 20X-50X resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 @sahildua2305 @sahildua2305

  21. Training with k8s ➔ Base images with ML frameworks ◆ TensorFlow, Torch, VowpalWabbit, etc. ➔ Training code is installed at start time ➔ Data access - Hadoop (or PVs) @sahildua2305 @sahildua2305

  22. Startup Training pod Code .. start.sh train.py evaluate.py @sahildua2305 @sahildua2305

  23. Startup Training pod Data .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  24. Streaming logs back Training pod Logs .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  25. Exports the model Training pod model .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  26. Serving predictions @sahildua2305 @sahildua2305

  27. Serving Predictions Input Features Client Model Prediction @sahildua2305 @sahildua2305

  28. Serving Predictions Input Features Client Model 1 Prediction Input Features Client Model X Prediction @sahildua2305 @sahildua2305

  29. Serving Predictions Input Features Client Model 1 Prediction Input Features Client Model X Prediction @sahildua2305 @sahildua2305

  30. Serving Predictions ➔ Stateless app with common code ➔ Containerized ➔ No model in image ➔ REST API for predictions @sahildua2305 @sahildua2305

  31. Serving Predictions Input App Features Client model Prediction @sahildua2305 @sahildua2305

  32. Serving Predictions ➔ Get trained model from Hadoop ➔ Load model in memory ➔ Warm it up ➔ Expose HTTP API ➔ Respond to the probes @sahildua2305 @sahildua2305

  33. Serving Predictions Input Features Client Prediction @sahildua2305 @sahildua2305

  34. Serving Predictions Input Features Client Prediction Input Features Client Prediction @sahildua2305 @sahildua2305

  35. Deploying a new model ➔ Create new Deployment ➔ Create new HTTP Route ➔ Wait for liveness/readiness probe @sahildua2305 @sahildua2305

  36. Performance PredictionTime = RequestOverhead + N*ComputationTime N is the number of instances to predict on @sahildua2305 @sahildua2305

  37. Optimizing for Latency ➔ Do not predict if you can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference @sahildua2305 @sahildua2305

  38. Optimizing for Throughput ➔ Do not predict if you can precompute ➔ Batch requests ➔ Parallelize requests @sahildua2305 @sahildua2305

  39. Summary ➔ Training models in pods ➔ Serving models ➔ Optimizing serving for latency/throughput @sahildua2305 @sahildua2305

  40. Next steps ➔ Tooling to control hundred deployments ➔ Autoscale prediction service ➔ Hyper parameter tuning for training @sahildua2305 @sahildua2305

  41. Want to get in touch? LinkedIn / Twitter / GitHub @sahildua2305 Website www.sahildua.com @sahildua2305 @sahildua2305

  42. THANK YOU @sahildua2305 @sahildua2305 @sahildua2305

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend