full virtualization for gpus reconsidered
play

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, - PowerPoint PPT Presentation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. GPUvm: Why not virtualizing GPUs at the hypervisor?. USENIX ATC 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware


  1. Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. “ GPUvm: Why not virtualizing GPUs at the hypervisor?.” USENIX ATC’ 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware Research Group

  2. Overview • Demands, introductions, challenges of virtual GPUs • Distinctive features of GPUvm • Re-evaluate GPUvm with additional benchmarks – Hard to set up the testbed – Some functionalities do not work – Over 200x overheads on average – Unfairness issue – Over 40% throughput loss #2

  3. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #3

  4. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #4

  5. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #5

  6. GPU Virtualization Challenges • Diverse hardware • Undocumented APIs • Closed-source GPUs and drivers • Deep graphics stack • Coupled layers • Significant overheads • Limited flexibility #6

  7. GPU Virtualization Comparisons Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #7

  8. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #8

  9. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #9

  10. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #10

  11. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #11

  12. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #12

  13. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #13

  14. GPU Virtualization Examples Front-end M. Dowty, VMware SVGA , SIGOPS- OSR’09 Device emulation J. Duato, rCUDA , HiPC’10,11 API remoting G. Giunta, gVirtuS , European Conference on Parallel Processing’10 AMD MxGPU ( FirePro ), VMworld’15 Mediated-passthrough NVIDIA GRID vGPU, 15 KVMGT (Intel GVT-g ), 14 Back-end Passthrough Amazon Elastic Compute Cloud (AWS EC2 ) #14

  15. GPUvm Features Similar approaches when virtualizing at hypervisor-level Front-end Device emulation Exposes a native device model to VMs API remoting Forwards commands to GPU virtual aggregator Back-end Mediated-passthrough Passes-through some operations (I/O requests) to hardware #15

  16. Full-virtualization vs. Para-virtualization Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #16

  17. Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #17

  18. Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps vGPU driver GPU driver API vGPU driver API Back End API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver Hypervisor GPU driver API Hypervisor API GPU driver GPU #18

  19. Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #19

  20. Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #20

  21. GPUvm Overview • Access aggregator #21

  22. GPUvm Overview • Access aggregator #22

  23. GPUvm Overview • Access aggregator #23

  24. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #24

  25. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #25

  26. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #26

  27. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #27

  28. GPUvm Overview • Access aggregator • Shadow channel – Mapped by a virtual channel • Shadow page table • Virtual scheduler – FIFO – CREDIT – BAND (bandwidth-aware non-preemptive device) #28

  29. Why GPUvm? • Open-source • Overheads – FV (36x) PV (1.9x) Easier to Easier to analyze upgrade/swap/optimize • performance/mechanism Good open architecture components – Decoupled components – Native device model, virtual MMIO, shadow channels, shadow page tables, virtual schedulers Significant • Not-so-good aspects performance impact – Interposes guest access to memory-mapped resources – Shadows expensive resources • Trade-off of hypervisor-level full-virtualization #29

  30. MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #30

  31. MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #31

  32. Testbed • Specific hardware – NVIDIA Quadro 6000 NVC0 – GF100GL vs. GF100 (GTX 480) (different region addresses) • Specific software – Fedora 16 (Kernel 3.6.5) – Xen HVM (4.2.0) – Gdev (commit 605e69e7) – GCC 4.6.3 – NVCC 4.2 – Boost 1.4.7 #32

  33. Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #33

  34. Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend