Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, - PowerPoint PPT Presentation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. “ GPUvm: Why not virtualizing GPUs at the hypervisor?.” USENIX ATC’ 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware Research Group

Overview • Demands, introductions, challenges of virtual GPUs • Distinctive features of GPUvm • Re-evaluate GPUvm with additional benchmarks – Hard to set up the testbed – Some functionalities do not work – Over 200x overheads on average – Unfairness issue – Over 40% throughput loss #2

Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #3

GPU Virtualization Challenges • Diverse hardware • Undocumented APIs • Closed-source GPUs and drivers • Deep graphics stack • Coupled layers • Significant overheads • Limited flexibility #6

GPU Virtualization Comparisons Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #7

GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #8

GPU Virtualization Examples Front-end M. Dowty, VMware SVGA , SIGOPS- OSR’09 Device emulation J. Duato, rCUDA , HiPC’10,11 API remoting G. Giunta, gVirtuS , European Conference on Parallel Processing’10 AMD MxGPU ( FirePro ), VMworld’15 Mediated-passthrough NVIDIA GRID vGPU, 15 KVMGT (Intel GVT-g ), 14 Back-end Passthrough Amazon Elastic Compute Cloud (AWS EC2 ) #14

GPUvm Features Similar approaches when virtualizing at hypervisor-level Front-end Device emulation Exposes a native device model to VMs API remoting Forwards commands to GPU virtual aggregator Back-end Mediated-passthrough Passes-through some operations (I/O requests) to hardware #15

Full-virtualization vs. Para-virtualization Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #16

Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #17

Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps vGPU driver GPU driver API vGPU driver API Back End API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver Hypervisor GPU driver API Hypervisor API GPU driver GPU #18

Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #19

Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #20

GPUvm Overview • Access aggregator #21

GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #24

GPUvm Overview • Access aggregator • Shadow channel – Mapped by a virtual channel • Shadow page table • Virtual scheduler – FIFO – CREDIT – BAND (bandwidth-aware non-preemptive device) #28

Why GPUvm? • Open-source • Overheads – FV (36x) PV (1.9x) Easier to Easier to analyze upgrade/swap/optimize • performance/mechanism Good open architecture components – Decoupled components – Native device model, virtual MMIO, shadow channels, shadow page tables, virtual schedulers Significant • Not-so-good aspects performance impact – Interposes guest access to memory-mapped resources – Shadows expensive resources • Trade-off of hypervisor-level full-virtualization #29

MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #30

MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #31

Testbed • Specific hardware – NVIDIA Quadro 6000 NVC0 – GF100GL vs. GF100 (GTX 480) (different region addresses) • Specific software – Fedora 16 (Kernel 3.6.5) – Xen HVM (4.2.0) – Gdev (commit 605e69e7) – GCC 4.6.3 – NVCC 4.2 – Boost 1.4.7 #32

Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #33

Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #34

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, - PowerPoint PPT Presentation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. GPUvm: Why not virtualizing GPUs at the hypervisor?. USENIX ATC 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware

full year results full year results full year results full full year results full year results full

Virtualization Virtualization Memory virtualization Process feels like it has its own

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Overview

AMD Pacifica Virtualization Technology AMD Unveils Virtualization Platform AMD Pacifica

KVM MMU Virtualization Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Index What is MMU

Virtualization and SDN Applications 2 Virtualization Network Virtualization Sharing

Virtualization What is Virtualization? Virtualization is the simulation of the software and/

EUROPA: Efficient User-Mode Packet Forwarding in Network Virtualization Virtualization Yong

Virtualization. A dream within a dream Type 1 Virtualization Hypervisor run on bare

Shareholder Value Reconsidered Simon Deakin and Ajit Singh Centre for Business Research

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

IO Virtualization Kedar & Ozzie Overview Benefits Challenges Full Virtualization

FULL YEAR RESULTS FULL YEAR RESULTS. 2017 FULL YEAR RESULTS FULL YEAR RESULTS . 2017 . 2017 .

Linux Virtualization Kir Kolyshkin <kir@openvz.org> OpenVZ project manager What is

Virtualization and High Availability Mika Karlstedt AMICT'08 May 2008 Faculty of Science

Virtualization in Fedora Virtualization in Fedora (KVM based) (KVM based) Kashyap Chamarthy

On Overlapping Communication and File I/O in Collective Write Operation Raafat Feki and Edgar

Prio: Private, Robust, and Efficient Computation of Aggregate Statistics Henry Corrigan-Gibbs and

Collective Rationality in Graph Aggregation Ulle Endriss Institute for Logic, Language and

The dCacheBillingAggregator Gregory J. Sharp Daniel S. Riley Overview The dCache file system

Datadog: A Real-Time Metrics Database for Trillions of Points/Day Ian NOWLAND

Demand Management from an Aggregator's Perspective David Brewster, President May 21, 2009

A Convenient Framework for Efficient Parallel Multipass Algorithms Markus Weimer Joint Work with

Address Subcommittee Meeting May 10, 2017 1:00 2:30 PM Eastern U.S. Dept. of Transportation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, - PowerPoint PPT Presentation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. GPUvm: Why not virtualizing GPUs at the hypervisor?. USENIX ATC 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware

full year results full year results full year results full full year results full year results full

Virtualization Virtualization Memory virtualization Process feels like it has its own

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&amp;D Overview

AMD Pacifica Virtualization Technology AMD Unveils Virtualization Platform AMD Pacifica

KVM MMU Virtualization Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt; Index What is MMU

Virtualization and SDN Applications 2 Virtualization Network Virtualization Sharing

Virtualization What is Virtualization? Virtualization is the simulation of the software and/

EUROPA: Efficient User-Mode Packet Forwarding in Network Virtualization Virtualization Yong

Virtualization. A dream within a dream Type 1 Virtualization Hypervisor run on bare

Shareholder Value Reconsidered Simon Deakin and Ajit Singh Centre for Business Research

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

IO Virtualization Kedar &amp; Ozzie Overview Benefits Challenges Full Virtualization

FULL YEAR RESULTS FULL YEAR RESULTS. 2017 FULL YEAR RESULTS FULL YEAR RESULTS . 2017 . 2017 .

Linux Virtualization Kir Kolyshkin &lt;kir@openvz.org&gt; OpenVZ project manager What is

Virtualization and High Availability Mika Karlstedt AMICT'08 May 2008 Faculty of Science

Virtualization in Fedora Virtualization in Fedora (KVM based) (KVM based) Kashyap Chamarthy

On Overlapping Communication and File I/O in Collective Write Operation Raafat Feki and Edgar

Prio: Private, Robust, and Efficient Computation of Aggregate Statistics Henry Corrigan-Gibbs and

Collective Rationality in Graph Aggregation Ulle Endriss Institute for Logic, Language and

The dCacheBillingAggregator Gregory J. Sharp Daniel S. Riley Overview The dCache file system

Datadog: A Real-Time Metrics Database for Trillions of Points/Day Ian NOWLAND

Demand Management from an Aggregator's Perspective David Brewster, President May 21, 2009

A Convenient Framework for Efficient Parallel Multipass Algorithms Markus Weimer Joint Work with

Address Subcommittee Meeting May 10, 2017 1:00 2:30 PM Eastern U.S. Dept. of Transportation

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Overview

KVM MMU Virtualization Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Index What is MMU

IO Virtualization Kedar & Ozzie Overview Benefits Challenges Full Virtualization

Linux Virtualization Kir Kolyshkin <kir@openvz.org> OpenVZ project manager What is