allox compute allocation in hybrid clusters
play

AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun - PowerPoint PPT Presentation

<EURO/SYS20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 1 Resource Allocation in Clusters 99 % Fairness Performance Utilization 2 Resource


  1. <EURO/SYS’20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 1

  2. Resource Allocation in Clusters 99 % Fairness Performance Utilization 2

  3. Resource Allocation Design Space Single Resource Multiple Resources Time Space Interchangeable Traditional CPU Sharing Memory Sharing DRF (nsdi’11), Carbyne (osdi’16) HUG (nsdi’16) Deadlines Performance & Fairness TetriSched AlloX (eurosys’16) 3

  4. Interchangeability in Resources Same applications run on different resource types Tensorflow, Caffe, CNNLab, Tensorflow Pytorch, Matlab PaddlePaddle CPU GPU CPU GPU TPU FPGA GPU Modern Frameworks support Interchangeability https://github.com/PaddlePaddle/Paddle https://github.com/cnnlabs 4

  5. Heterogeneity in hybrid CPU/GPU Clusters Traditional nodes Expensive GPUs Speed-up rates are distinct Intel E5 2.4Ghz CPU vs. Nvidia K80 GPU 5

  6. Overload if most users prefer GPUs Expensive GPUs are overloaded while CPUs are under-utilized Microsoft Azure Let’s explore some solutions 6

  7. Join the Shortest Queue (JSQ) GPU1 GPU2 Processing times JSQ CPU1 (GPU, CPU) CPU2 -69% makespan J1 (40, 50 ) GPU1 Optimal J2 (30, 40 ) -54% avg. compl. time GPU2 J3 ( 35 , 150) CPU1 J4 ( 50 , 160) CPU2 JSQ does not consider processing times 7

  8. Shortest Job First (SJF) GPU1 GPU2 Processing times CPU1 (GPU, CPU) SJF CPU2 J1 (10, 20 ) -75% makespan J2 (15, 25 ) -60% avg. compl. time GPU1 J3 ( 20 , 100) Optimal GPU2 J4 ( 20 , 90) CPU1 CPU2 SJF does not consider speed-up rates 8

  9. AlloX – Minimize Avg. Completion Time Convert the scheduling & placement J1 J?(G1) J? (G2) J? (G3) GPU Jobs J2 J? (C1) J?(C2) J? (C3) CPU J3 G1 into min-cost bipartite matching C1 J1 G2 J2 C2 J3 G3 C3 solved in polynomial time 9

  10. AlloX – Maintains Fairness for interchangeable resources User A may not be happy if we keep putting him on CPU. Idea: Prioritize users with low fairness scores F who run jobs on the unfavorable resources F-1 < F-2 User 1 F-1 GPU User 2 F-2 CPU F-1 > F-2 10

  11. Estimation Tool AlloX System Sample the jobs kubectl Jobs Estimate the processing times CPU configuration Processing times GPU configuration Scheduler Fairness : Pick the set of users with least fair scores Kubernetes Scheduling : Decide to place jobs on CPUs or GPUs. Placement constraints Configure a job to run on CPU or GPU Resource Placer kubelet GPUs CPUs 11

  12. Performance of AlloX ×10 ! Avg. completion time (mins) 2968 3 DRF : Dominant Resource Fairness + FIFO Resource configurations are fixed 2 ES : Equal Share + SJF 1149 Keep filling the available resources 1 SRPT : Shortest Remaining Processing Time 139 97 Impractical switching between CPU&GPU ES AlloX SRPT DRF AlloX reduces up to 95% avg. completion time TensorFlow CNN benchmarks 12

  13. <EURO/SYS’20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend