Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , - PowerPoint PPT Presentation

Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , Karsten Schwan @ Georgia Tech Niraj Tolia @ Maginatics Vanish Talwar, Parthasarathy Ranganathan @ HP Labs USENIX ATC 2011 – Portland, OR, USA

Increasing Popularity of Accelerators 2007 2011 2008 2009 2010 • IBM Cell- • IBM Cell- • Increasing • Tegras in • Amazon EC2 based- cellphones based popularity of adopts GPUs Playstation RoadRunner NVIDIA GPUs • Keeneland • Tianhe-1A powered • CUDA and Nebulae desktops and programmab supercomput laptops le GPUs for ers in Top500 developers 2

Example x86-GPU System PCIe 3

Example x86-GPU System Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 4

Example x86-GPU System C-like CUDA-based applications (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 5

Example x86-GPU System C-like CUDA-based applications CUDA Kernels (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 6

Example x86-GPU System C-like CUDA-based applications CUDA Kernels (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe Design flaw: Bulk of logic in drivers which were meant to be for simple operations like read, write and handle interrupts Shortcoming: Inaccessibility and one scheduling fits all 7

Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA 8

Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications 9

Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications • Expected utilization of GPUs across applications in some domains “may” follow patterns to allow sharing 10

Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications • Expected utilization of GPUs across applications in some domains “may” follow patterns to allow sharing Need for accelerator sharing: resource sharing is now supported in NVIDIA’s Fermi architecture Concern: Can driver scheduling do a good job? 11

NVIDIA GPU Sharing – Driver Default • Xeon Quadcore with 2 8800GTX NVIDIA Max GPUs, driver 169.09, CUDA SDK 1.1 • Coulomb Potential 50% [CP] benchmark Median from parboil benchmark suite Min • Result of sharing two GPUs among four instances of the application 12

NVIDIA GPU Sharing – Driver Default • Xeon Quadcore with 2 8800GTX NVIDIA Max GPUs, driver 169.09, CUDA SDK 1.1 • Coulomb Potential 50% [CP] benchmark Median from parboil benchmark suite Min • Result of sharing two GPUs among four instances of the application Driver can: efficiently implement computation and data interactions between host and accelerator Limitations: Call ordering suffers when sharing – any scheme used is static and cannot adapt to different system expectations 13

Re-thinking Accelerator-based Systems 14

Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? 15

Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? • Sharing of accelerators − Are there efficient methods to utilize a heterogeneous pool of resources? − Can applications share accelerators without a big hit in efficiency? 16

Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? • Sharing of accelerators − Are there efficient methods to utilize a heterogeneous pool of resources? − Can applications share accelerators without a big hit in efficiency? • Coordination across different processor types − How do you deal with multiple scheduling domains? − Does coordination obtain any performance gains? 17

Pegasus addresses the urgent need for systems support to smartly manage accelerators. 18

Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) 19

Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) It leverages new opportunities presented by increased adoption of virtualization technology in commercial, cloud computing, and even high performance infrastructures. 20

Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) It leverages new opportunities presented by increased adoption of virtualization technology in commercial, cloud computing, and even high performance infrastructures. (Virtualization provided by Xen hypervisor and Dom0 management domain) 21

ACCELERATORS AS FIRST CLASS CITIZENS 22

Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Device Linux Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Traditional Devices Traditional Devices 23

Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Device Linux Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices 24

Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices 25

Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) GPU Application CUDA API Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 26

Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU CUDA API Backend GPU Frontend Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 27

Manageability Extending Xen for Closed NVIDIA GPUs VM VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU Application GPU CUDA API CUDA API Backend GPU Frontend GPU Frontend Traditional Runtime + Device Linux Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 28

Manageability Extending Xen for Closed NVIDIA GPUs VM VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU Application Mgmt GPU CUDA API CUDA API Extension Backend GPU Frontend GPU Frontend Traditional Runtime + Device Linux Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 29

Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , - PowerPoint PPT Presentation

Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , Karsten Schwan @ Georgia Tech Niraj Tolia @ Maginatics Vanish Talwar, Parthasarathy Ranganathan @ HP Labs USENIX ATC 2011 Portland, OR, USA Increasing

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

CrossBow: From Hardware Virtualized NICs t\ to Virtualized Networks Sunay Tripathi, Nicolas

Building a Fast, Virtualized Building a Fast, Virtualized Data Plane with Data Plane with

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

Industrial Hardware and Software Verification with ACL2 Warren A. Hunt, Jr. 1 , Matt Kaufmann 1 ,

"On the Efficacy of a Fused CPU + GPU Processor (or APU) for Parallel Computing"

SoK: A Study of Using Hardware- assisted Isolated Execu<on Environments for Security Fengwei

The Barrelfish operating system for CMPs: research issues Tim Harris Based on slides by Andrew

WEYERHAEUSER Earnings Release 3rd Quarter 2011 10/28/2011 1 FORWARD-LOOKING STATEMENT

Return-oriented Programming: Exploitation without Code Injection Erik Buchanan, Ryan Roemer,

Passive treatment of highly contaminated iron-rich acid mine drainage C.M. Neculita 1 , T.V.

Understanding PCIe performance for end host networking Rolf Neugebauer , Gianni Antichi, Jos

Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , - PowerPoint PPT Presentation

Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , Karsten Schwan @ Georgia Tech Niraj Tolia @ Maginatics Vanish Talwar, Parthasarathy Ranganathan @ HP Labs USENIX ATC 2011 Portland, OR, USA Increasing

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

CrossBow: From Hardware Virtualized NICs t\ to Virtualized Networks Sunay Tripathi, Nicolas

Building a Fast, Virtualized Building a Fast, Virtualized Data Plane with Data Plane with

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

Industrial Hardware and Software Verification with ACL2 Warren A. Hunt, Jr. 1 , Matt Kaufmann 1 ,

&quot;On the Efficacy of a Fused CPU + GPU Processor (or APU) for Parallel Computing&quot;

SoK: A Study of Using Hardware- assisted Isolated Execu&lt;on Environments for Security Fengwei

The Barrelfish operating system for CMPs: research issues Tim Harris Based on slides by Andrew

WEYERHAEUSER Earnings Release 3rd Quarter 2011 10/28/2011 1 FORWARD-LOOKING STATEMENT

Return-oriented Programming: Exploitation without Code Injection Erik Buchanan, Ryan Roemer,

Passive treatment of highly contaminated iron-rich acid mine drainage C.M. Neculita 1 , T.V.

Understanding PCIe performance for end host networking Rolf Neugebauer , Gianni Antichi, Jos

"On the Efficacy of a Fused CPU + GPU Processor (or APU) for Parallel Computing"

SoK: A Study of Using Hardware- assisted Isolated Execu<on Environments for Security Fengwei