cs 744 clipper
play

CS 744: CLIPPER Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - - PowerPoint PPT Presentation

CS 744: CLIPPER Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 2 grading - Midterm details - Course Project template MACHINE LEARNING So FAR MACHINE LEARNING: INFERENCE GOALS - Interactive latencies (tail latency <


  1. CS 744: CLIPPER Shivaram Venkataraman Fall 2019

  2. ADMINISTRIVIA - Assignment 2 grading - Midterm details - Course Project template

  3. MACHINE LEARNING So FAR

  4. MACHINE LEARNING: INFERENCE

  5. GOALS - Interactive latencies (tail latency < 100ms) - High throughput to handle load - Improved prediction accuracy - Generality (?)

  6. ARCHITECHTURE

  7. MODEL CONTAINERS - Run using Docker containers - Can be replicated across machines

  8. MODEL ABSTRACTION LAYER Caching - Improve performance for frequent queries - LRU eviction policy - Important for feedback

  9. BATCHING, QUEUING Goals, Insight - Increase latency (within SLO) for improved throughput - Reduce RPC overheads - GPU / BLAS acceleration Approach - Per container queues. - Maximum batch size. - Why?

  10. ADAPTIVE BATCHING AIMD: Additive Inc Multiplicative Dec Why ? 5 4 Batch Size 3 2 Delayed: Wait until batch exists 1 Why? 0 0 2 4 6 8 10 Time

  11. MODEL SELECTION

  12. SINGLE MODEL SELECTION Multi-Arm Bandit formulation - Explore vs Exploit - Regret: Loss by not picking optimal action - Goal: Minimize regret Clipper - Exp3 algorithm - Single evaluation - Scales to more models

  13. MULTI MODELS Ensemble - Combine output from models (weighted average) - How do we get the weights ? Robust Prediction - React to model changes - Output confidence score

  14. STRAGGLER MITIGATION Why do stragglers occur? Approach

  15. TAKEAWAYS • ML inference: Workloads + Requirements • Layered architecture provides generality • Caching, Batching, Replication to improve latency, throughput • Multi-Arm bandits to improve accuracy

  16. DISCUSSION https://forms.gle/pZMuhCWcap2q3LQJ9

  17. (Discussion question from last week) Considering AllReduce using MPI as the baseline parallel programming task. Discuss the improvements made by MapReduce, Spark over MPI and discuss if/how Ray further contributes to the comparison.

  18. Consider a scenario where you run a model serving service that hosts a number of different models. The traffic for some models is sporadic (e.g. only a few hours where they are used). What are some advantages / disadvantages of using Clipper for such a service?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend