AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun - PowerPoint PPT Presentation

Jun 05, 2023 •506 likes •654 views

<EURO/SYS20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 1 Resource Allocation in Clusters 99 % Fairness Performance Utilization 2 Resource

<EURO/SYS’20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 1
Resource Allocation in Clusters 99 % Fairness Performance Utilization 2
Resource Allocation Design Space Single Resource Multiple Resources Time Space Interchangeable Traditional CPU Sharing Memory Sharing DRF (nsdi’11), Carbyne (osdi’16) HUG (nsdi’16) Deadlines Performance & Fairness TetriSched AlloX (eurosys’16) 3
Interchangeability in Resources Same applications run on different resource types Tensorflow, Caffe, CNNLab, Tensorflow Pytorch, Matlab PaddlePaddle CPU GPU CPU GPU TPU FPGA GPU Modern Frameworks support Interchangeability https://github.com/PaddlePaddle/Paddle https://github.com/cnnlabs 4
Heterogeneity in hybrid CPU/GPU Clusters Traditional nodes Expensive GPUs Speed-up rates are distinct Intel E5 2.4Ghz CPU vs. Nvidia K80 GPU 5
Overload if most users prefer GPUs Expensive GPUs are overloaded while CPUs are under-utilized Microsoft Azure Let’s explore some solutions 6
Join the Shortest Queue (JSQ) GPU1 GPU2 Processing times JSQ CPU1 (GPU, CPU) CPU2 -69% makespan J1 (40, 50 ) GPU1 Optimal J2 (30, 40 ) -54% avg. compl. time GPU2 J3 ( 35 , 150) CPU1 J4 ( 50 , 160) CPU2 JSQ does not consider processing times 7
Shortest Job First (SJF) GPU1 GPU2 Processing times CPU1 (GPU, CPU) SJF CPU2 J1 (10, 20 ) -75% makespan J2 (15, 25 ) -60% avg. compl. time GPU1 J3 ( 20 , 100) Optimal GPU2 J4 ( 20 , 90) CPU1 CPU2 SJF does not consider speed-up rates 8
AlloX – Minimize Avg. Completion Time Convert the scheduling & placement J1 J?(G1) J? (G2) J? (G3) GPU Jobs J2 J? (C1) J?(C2) J? (C3) CPU J3 G1 into min-cost bipartite matching C1 J1 G2 J2 C2 J3 G3 C3 solved in polynomial time 9
AlloX – Maintains Fairness for interchangeable resources User A may not be happy if we keep putting him on CPU. Idea: Prioritize users with low fairness scores F who run jobs on the unfavorable resources F-1 < F-2 User 1 F-1 GPU User 2 F-2 CPU F-1 > F-2 10
Estimation Tool AlloX System Sample the jobs kubectl Jobs Estimate the processing times CPU configuration Processing times GPU configuration Scheduler Fairness : Pick the set of users with least fair scores Kubernetes Scheduling : Decide to place jobs on CPUs or GPUs. Placement constraints Configure a job to run on CPU or GPU Resource Placer kubelet GPUs CPUs 11
Performance of AlloX ×10 ! Avg. completion time (mins) 2968 3 DRF : Dominant Resource Fairness + FIFO Resource configurations are fixed 2 ES : Equal Share + SJF 1149 Keep filling the available resources 1 SRPT : Shortest Remaining Processing Time 139 97 Impractical switching between CPU&GPU ES AlloX SRPT DRF AlloX reduces up to 95% avg. completion time TensorFlow CNN benchmarks 12
<EURO/SYS’20> AlloX : Compute Allocation in Hybrid Clusters Tan N. Le Xiao Sun Mosharaf Chowdhury Zhenhua Liu tnle@cs.stonybrook.edu 13

Recommend

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP University VP University WHEN DO YOU USE HYBRI D CONSTRUCTI ON? Hybrid Construction is an economical alternative to conventional steel structures

431 views • 15 slides

I nternational research The evidence on clusters is clear Firms located in clusters are more

I nternational research The evidence on clusters is clear Firms located in clusters are more competitive than similar firms located outside clusters UKs most significant clusters 31 clusters, 8% UK firms, 20% output Source:

169 views • 15 slides

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems Using Clusters for Scalable Services Using Clusters for Scalable Services Clusters are a

883 views • 46 slides

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist ance for t he Dist ance for t he Dist ance for t he Dist ance Learning Environment Learning Environment Learning Environment Learning

161 views • 13 slides

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Hybrid Automobiles 1 Hybrid Automobiles 2 Observations about Hybrid Observations about Hybrid Automobiles Automobiles A hybrid car propels itself on fuel or batteries A hybrid car propels itself on fuel or batteries It combines

185 views • 4 slides

More Register Allocation Last time Register allocation Global allocation via graph

More Register Allocation Last time Register allocation Global allocation via graph coloring Today More register allocation Clarifications from last time Finish improvements on basic graph coloring concept Procedure

341 views • 18 slides

1 1 easy to compute , 1 easy to compute 2

1 1 easy to compute , 1 easy to compute 2 easy to compute 2 easy to compute 2 unbiased biased estimator 3

609 views • 15 slides

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference for Property and Investment October 14 15, 2020 Internationales Congress Center Mnchen exporeal.net Building networks EXPO REAL Hybrid

314 views • 13 slides

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Controller Hybrid System Alberto Bemporad Alberto Bemporad

615 views • 18 slides

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing Hybrid Architectures An Advanced Platform for Hybrid NLP: Deep Thought Applications for Hybrid Processing Conclusion and Outlook LTII

635 views • 25 slides

Locational narratives in creative clusters An exploration of place, reputation and creative

Locational narratives in creative clusters An exploration of place, reputation and creative industries in the Netherlands Yosha Wijngaarden PhD Candidate Clusters? Clusters? Creative clusters? Why do creative entrepreneurs locate in

240 views • 11 slides

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager Jeff Chase, David Irwin, Laura Grit, Justin Moore, Sara Sprenkle Department of Computer Science Duke University Dynamic Virtual Clusters Dynamic

590 views • 21 slides

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15 January 2010 15 January 2010 0 Allocation processes why are they needed? Allocation rules are needed to share out all energy in an LDZ

500 views • 16 slides

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer Logical Systems Lab Computer Science Department Carnegie Mellon University LICS18 1 / 21 Outline: Hybrid { Dynamics, Logic, Power } Hybrid

353 views • 34 slides

OPEN COMPUTE BRIEF 7x24 Exchange Carolinas Chapter 2017 Winter Meeting AGENDA Open

OPEN COMPUTE BRIEF 7x24 Exchange Carolinas Chapter 2017 Winter Meeting AGENDA Open Compute Project History & Definition General Open Compute Server & Rack Overview Detailed Open Compute Compliant Rack Overview The

473 views • 19 slides

CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL

CUDA (Compute Unified Device Architecture) and OpenCL (Open Compute Language): Programming GPUs CUDA (Compute Unified Device Dr. Bharathwaj Bharath Muthuswamy Architecture) and OpenCL (Open Compute About me... Language):

964 views • 50 slides

The Complete Proof Theory of Hybrid Systems Andr e Platzer aplatzer@cs.cmu.edu Computer

The Complete Proof Theory of Hybrid Systems Andr e Platzer aplatzer@cs.cmu.edu Computer Science Department Carnegie Mellon University, Pittsburgh, PA 0.5 0.4 0.3 0.2 1.0 0.1 0.8 0.6 0.4 0.2 Andr e Platzer (CMU) The Complete

969 views • 96 slides

Hybrid Systems Verification and Robotics Andr e Platzer aplatzer@cs.cmu.edu Computer Science

Hybrid Systems Verification and Robotics Andr e Platzer aplatzer@cs.cmu.edu Computer Science Department Carnegie Mellon University, Pittsburgh, PA http://symbolaris.com/ 0.5 0.4 0.3 0.2 1.0 0.1 0.8 0.6 0.4 0.2 Andr e Platzer

1.06k views • 90 slides

An introduction to computational psycholinguistics: Modeling human sentence processing Shravan

An introduction to computational psycholinguistics: Modeling human sentence processing Shravan Vasishth University of Potsdam, Germany http://www.ling.uni-potsdam.de/ vasishth vasishth@acm.org September 2005, Bochum Neural structure 1 A

747 views • 28 slides

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday

Doomsday Dark Matter Doomsday Dark Matter or Some stones are better left unturned Doomsday Dark Matter Doomsday Dark Matter Weak scale susy? Doomsday Dark Matter Weak scale susy? High scale susy? Doomsday Dark Matter Weak scale susy?

914 views • 56 slides

Control Flow Coalescing on a Hybrid Dataflow/von Neumann GPGPU Dani Voitsechov Yoav Etsion

Control Flow Coalescing on a Hybrid Dataflow/von Neumann GPGPU Dani Voitsechov Yoav Etsion Technion - Israel Institute of Technology Electrical Engineering and Computer Science Departments Massively Parallel Computing Massively

466 views • 27 slides

Model Checking of Hybrid Systems Goran Frehse AVACS Autumn School, October 1, 2015 Univ.

Model Checking of Hybrid Systems Goran Frehse AVACS Autumn School, October 1, 2015 Univ. Grenoble Alpes Verimag, 2 avenue de Vignate, Centre Equation, 38610 Gires, France, frehse@imag.fr Overview Hybrid Automata Set-Based Reachability

1.09k views • 96 slides

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram (Ghent) ,

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram (Ghent) , Jennifer B. Sartor (Ghent), Kathryn S. Mckinley (Google), and Lieven Eeckhout (Ghent) Shoaib.Akram@UGent.be DRAM is facing challenges Scalability

554 views • 29 slides

Dos and Donts of a Hybrid Environment MySQL and MongoDB Introduction Im Rick Vasquez a

Dos and Donts of a Hybrid Environment MySQL and MongoDB Introduction Im Rick Vasquez a TAM at Percona. I have all three types of customers: MongoDB only 1. MySQL only 2. MongoDB and MySQL hybrid 3. The Dos of a Hybrid

562 views • 14 slides