Salus Fine-grained GPU Sharing Primitives for Deep Learning - PowerPoint PPT Presentation

Salus Fine-grained GPU Sharing Primitives for Deep Learning Applications Advisor: Mosharaf Chowdhury 2020-03-03 By Peifeng Yu

Deep Learning Becomes Ubiquitous • Computer vision • Natural language processing • Speech • Robotics Applications • Intelligent assistant: Google Now, Siri, Cortana • Face recognition • Video content understanding 2

A Brief Introduction to Deep Learning • Training: • Forward & backward pass • Iterative Errors Dog Cat Raccoon ✗ 3

A Brief Introduction to Deep Learning • Training: • Inference: • Forward & backward pass • Forward pass • Iterative Cat 4

Accelerate Deep Learning with GPUs Neural GPUs Networks Inherently Parallel Matrix Operations FLOPS 5

Exclusive Access to GPU An application can have multiple GPUs, but each GPU usually belongs to exactly one application at a time . Advantages • Simplifies hardware design • Efficiency Disadvantages • Lack of flexibility 6

Exclusive Access: Lack of Flexibility • Hinders the scheduling ability of GPU cluster managers • Underutilization • Hyper-parameter tuning (AutoML) • Model serving (inference) 7

Exclusive Access: Lack of Flexibility • Hinders the scheduling ability of GPU cluster managers • Starting or suspending job is expensive • Often easier to just do non-preemptive scheduling → FIFO • Head-of-line blocking 8

Exclusive Access: Lack of Flexibility • Underutilization • Variance in memory usage → Overprovision Model Peak Memory Usage VAE 28M Super Resolution 529M Deep Speech 3993M Inception4 11355M 9

How Can We Efficiently Share a GPU for Deep Learning Applications?

GPU Sharing • Existing sharing solutions Dynamic Dynamic Flexible Flexible Approach Approach Efficiency Efficiency Memory Memory Scheduling Scheduling Static Partitioning (SP) Static Partitioning (SP) No No No No Yes Yes Multi-Process Service (MPS) Yes No No 11

Design Goals Dynamic Dynamic Dynamic Flexible Flexible Flexible Approach Approach Approach Efficiency Efficiency Efficiency Memory Memory Memory Scheduling Scheduling Scheduling Static Partitioning (SP) Static Partitioning (SP) No No No No Yes Yes Multi-Process Service (MPS) Multi-Process Service (MPS) Yes Yes No No No No Minimize deployment overhead Ideal Yes Yes Yes • No new hardware • No modification from user side 12

Salus Fine-grained GPU Sharing Primitives for Deep Learning A consolidated execution service enabling sharing primitives • Fast job switching, • Memory sharing without modifying any with the goal to • User scripts, • Support new scheduler for GPU, • Operating systems, or • Improve GPU utilization • Hardware 13

Salus in DL Stack User scripts Salus Adaptor … Tensorflow PyTorch Deep Learning Frameworks Deep Learning Frameworks CNTK Others Salus Execution Service Salus Execution Service … CPU GPU FPGA ASIC 14

Salus Components 1. Salus Execution Service 1. Salus Adaptor Transfer computation graph 2. Salus Adaptor 2. Salus Execution Service Consolidates all GPU accesses 15

Salus in One Slide Salus User Script User Script … • Create session DL Framework DL Framework … • Send computation graph Salus Adaptor Salus Adaptor • For each iteration: • Send input • Check memory Session • Queue in scheduler Memory Scheduler Manager GPU 16

Sharing Primitives • Efficient job switching • Memory sharing: GPU lane abstraction • Memory sharing: GPU lane abstraction 17

Sharing Primitives: Efficient Job Switching Existing Approaches Time Scale Stop and restart (checkpointing) 10~100s Generate snapshot [1] ~1s Bottleneck: data (memory) transfer [1]: W. Xiao et al. “Gandiva: Introspective Cluster Scheduling for Deep Learning”. In: OSDI. 2018. 18

Understand DL Job Memory • 3 types of memory: • Model • Ephemeral • Framework-internal 19

Understand DL Job Memory • 3 types of memory: • Model • Ephemeral • Framework-internal • Data transfer time is non-negligible • Can be over 2X of corresponding inference latency • Model memory << GPU memory capacity Why not keep multiple jobs’ model in memory for fast switching? 20

Sharing Primitives: Efficient Job Switching Job switching is done by determine which job’s iteration to run next. • Minimal switching overhead • Flexible scheduling policies A trade-off between maximum utilization and execution performance 21

Sharing Primitives • Efficient job switching Memory Job 1 Job 2 Time 22

Sharing Primitives • Efficient job switching • Memory sharing: GPU lane Lane 1 Job 1 Memory Job 2 Job 3 Lane 0 Time 23

Sharing Primitives: Memory Sharing • Efficient job switching • Memory sharing: GPU lane GPU GP U lan ane e = Contin = ontinuous uous phys ysical ical me memor mory y + GP GPU U str tream am • Time-slicing within lane, parallel across lanes • Dynamic re-partitioning (lane assignment) • Avoid in-lane fragmentation 24

GPU Lane: Best Fit & Safety Condition • A lane cannot accept arbitrary number of jobs • The Safety Condition determines whether a job can go in a lane ! 𝑄 " + max 𝑈 " ≤ 𝐷 + " " 𝑄 " : Model and framework-internal memory for job 𝑗 𝑈 " : Ephemeral memory for job 𝑗 𝐷 + : Memory capacity of lane 𝑚 25

GPU Lane: Best Fit & Safety Condition • A lane cannot accept arbitrary number of jobs • The Safety Condition determines whether a job can go in a lane ! 𝑄 " + ! 𝑈 " ≤ 𝐷 + Static Partitioning: " " 𝑄 " : Model and framework-internal memory for job 𝑗 𝑈 " : Ephemeral memory for job 𝑗 𝐷 + : Memory capacity of lane 𝑚 26

Salus Scheduling Polices FIFO is suboptimal • HOL blocking • Underutilization With Salus • Packing: achieves higher utilization • Preemption: enables prioritization • Fairness: equalizes the resource usage • … • What’s more? Still a huge design space! 27

Evaluation Deployment and evaluation on Intel E5-2670 with 2x NVIDIA Tesla P100 with 15 workloads 1. Flexible scheduler 2. Faster hyper-parameter tuning 3. High GPU utilization for inference 28

A ProductionTrace • 100 jobs from a production trace [1] • 4 schedulers implemented as demo • SRTF vs FIFO: 3.19x improvement in Avg. JCT [1]: G. Juncheng et al. “Tiresias: A GPU Cluster Manager for Distributed Deep Learning”. In: NSDI. 2019. 29

Sub-second Level Switching • Slice of the 100 job trace, time is normalized • Sub-second switching 30

Hyper-parameter Exploration • 2 sets of hyper-parameter exploration • 300 exploration jobs in each set • Makespan is important 31

Pack Inference Applications 42 DL inference applications in 1 GPU • • User facing services: latency 32

Salus Fine-grained GPU Sharing Primitives for Deep Learning Open sourced at: https://github.com/SymbioticLab/Salus • Prebuilt docker image available 33

Salus Fine-grained GPU Sharing Primitives for Deep Learning - PowerPoint PPT Presentation

Salus Fine-grained GPU Sharing Primitives for Deep Learning Applications Advisor: Mosharaf Chowdhury 2020-03-03 By Peifeng Yu Deep Learning Becomes Ubiquitous Computer vision Natural language processing Speech Robotics

Salus- on- T r e nt: an e - le ar ning, Ac c e ssible L e ar ning T own for He alth (ST

Salus Seny Kamara - Microsoft Research Payman Mohassel U. of Calgary Ben Riva Tel Aviv U.

Celebrating the chance for a better life Salus populi suprema lex esto, which translates as Let

Financial Disclosure Speaker has no financial interests in any of the products discussed within

Who Ordered Humble Pie? Paper delivered by Elio Gatti 10th Annual International Arts and Health

Financial Aid Process Financial Aid Timeline Winter/Spring of entering year Matriculated

Scalable Frequent Sequence Mining With Flexible Subsequence Constraints Alexander Renz Wieland 1

MA Macroeconomics 9. Sticky Prices and the Phillips Curve Karl Whelan School of Economics, UCD

Theoretical Implications CS 535: Deep Learning Machine Learning Theory: Basic setup Generic

On the Quest for Flexible Modelling Esther Guerra, Juan de Lara MISO - Modelling & Software

Integrating Flexible Support for Security Policies into the Linux Operating System

CSS Variables @guilh https://sass-lang.com/guide http://lesscss.org/features/#variables-feature

Novel Approaches to High-Power Proton Beams Jeffrey Eldred - Fermilab NuFACT 2019 - WG3 August

The State Flexibility to Stabilize the Market Grant Program CCIIO Oversight Group 2/14/2018

A Flexible Software Framework for Testbeds In Real-World Experiments and Temperature-Controlled

Pol Olivella-Rosell CITCEA-UPC DAY 1: SMART GRIDS TABLE 2: REGULATORY CHALLENGES AND BUSINESS

Biomutualism Bio-inspiration Material properties o Passive o Flexibility Robotic Fish o

Ind ndet etermi minat ate A Anal nalys ysis For Force M e Met ethod 1 The force

From Small to Tiny: How to Co-design ML Models, Computational Precision and Circuits in the

Sweeny Fracs 2, 3 and 4 construction OLD OCEAN, TX Cautionary Statement This presentation

Federal Flexibility Request Update Background and Overview of CCA- Driven Flexibility Requests

Delek US Holdings, Inc. Second Quarter 2020 Earnings Call August 5, 2020 Disclaimers 2 Second

Promoting Flexible Idea Generation During Audit Planning Elizabeth (E.B.) Altiero University of

Lexical flexibility in English: A preliminary study Daniel W. Hieber University of California,

Salus Fine-grained GPU Sharing Primitives for Deep Learning - PowerPoint PPT Presentation

Salus Fine-grained GPU Sharing Primitives for Deep Learning Applications Advisor: Mosharaf Chowdhury 2020-03-03 By Peifeng Yu Deep Learning Becomes Ubiquitous Computer vision Natural language processing Speech Robotics

Salus- on- T r e nt: an e - le ar ning, Ac c e ssible L e ar ning T own for He alth (ST

Salus Seny Kamara - Microsoft Research Payman Mohassel U. of Calgary Ben Riva Tel Aviv U.

Celebrating the chance for a better life Salus populi suprema lex esto, which translates as Let

Financial Disclosure Speaker has no financial interests in any of the products discussed within

Who Ordered Humble Pie? Paper delivered by Elio Gatti 10th Annual International Arts and Health

Financial Aid Process Financial Aid Timeline Winter/Spring of entering year Matriculated

Scalable Frequent Sequence Mining With Flexible Subsequence Constraints Alexander Renz Wieland 1

MA Macroeconomics 9. Sticky Prices and the Phillips Curve Karl Whelan School of Economics, UCD

Theoretical Implications CS 535: Deep Learning Machine Learning Theory: Basic setup Generic

On the Quest for Flexible Modelling Esther Guerra, Juan de Lara MISO - Modelling &amp; Software

Integrating Flexible Support for Security Policies into the Linux Operating System

CSS Variables @guilh https://sass-lang.com/guide http://lesscss.org/features/#variables-feature

Novel Approaches to High-Power Proton Beams Jeffrey Eldred - Fermilab NuFACT 2019 - WG3 August

The State Flexibility to Stabilize the Market Grant Program CCIIO Oversight Group 2/14/2018

A Flexible Software Framework for Testbeds In Real-World Experiments and Temperature-Controlled

Pol Olivella-Rosell CITCEA-UPC DAY 1: SMART GRIDS TABLE 2: REGULATORY CHALLENGES AND BUSINESS

Biomutualism Bio-inspiration Material properties o Passive o Flexibility Robotic Fish o

Ind ndet etermi minat ate A Anal nalys ysis For Force M e Met ethod 1 The force

From Small to Tiny: How to Co-design ML Models, Computational Precision and Circuits in the

Sweeny Fracs 2, 3 and 4 construction OLD OCEAN, TX Cautionary Statement This presentation

Federal Flexibility Request Update Background and Overview of CCA- Driven Flexibility Requests

Delek US Holdings, Inc. Second Quarter 2020 Earnings Call August 5, 2020 Disclaimers 2 Second

Promoting Flexible Idea Generation During Audit Planning Elizabeth (E.B.) Altiero University of

Lexical flexibility in English: A preliminary study Daniel W. Hieber University of California,

On the Quest for Flexible Modelling Esther Guerra, Juan de Lara MISO - Modelling & Software