Trillium: The Code is the IR Amogh Akshintala, Hangchen Yu , Arthur - - PowerPoint PPT Presentation

trillium the code is the ir
SMART_READER_LITE
LIVE PREVIEW

Trillium: The Code is the IR Amogh Akshintala, Hangchen Yu , Arthur - - PowerPoint PPT Presentation

Trillium: The Code is the IR Amogh Akshintala, Hangchen Yu , Arthur M. Peters, Christopher J. Rossbach Brief Overview GPGPU Virtualization End-to-end comparison of prior approaches Lessons learnt: Virtual ISA unnecessary


slide-1
SLIDE 1

Trillium: The Code is the IR

Amogh Akshintala, Hangchen Yu, Arthur M. Peters, Christopher J. Rossbach

slide-2
SLIDE 2

#2

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

GPGPU Virtualization End-to-end comparison of prior approaches Lessons learnt:

  • Virtual ISA unnecessary
  • Para-virtual API remoting only feasible option

Brief Overview

slide-3
SLIDE 3

#3

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

General purpose computing on GPUs

↑↑↑

GPU CPU

Performance Gain

Scientific Computing Machine Learning

slide-4
SLIDE 4

#4

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

The need to virtualize GPUs

Performance Cost

P3.2xlarge 1x V100, $2,200/mo P2.xlarge 1x K80, $650/mo g3s.xlarge 1x M60, $540/mo

Credits to BitFusion Inc.

slide-5
SLIDE 5

#5

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

The need to virtualize GPUs

Performance Cost

P3.2xlarge 1x V100, $2,200/mo P2.xlarge 1x K80, $650/mo g3s.xlarge 1x M60, $540/mo Virtual V100 0.75x V100, $1,500 Virtual V100 0.5x V100, $1,000

Credits to BitFusion Inc.

slide-6
SLIDE 6

#6

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Existing techniques are impractical

Sharing Interposition Isolation Compatibility Slowdown Pass Through ❌ 1x Full-virtualization

Everybody sacrifices something

User-mode API Remoting Para-virtualization

slide-7
SLIDE 7

#8

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Hypervisor Virtual GPU

Full-virtualization

Native stack

Vendor Library Vendor Driver GPU VM Application Vendor Library Vendor Driver

Sharing Interposition Isolation Compatibility Slowdown

✔︐ ✔︐ ✔︐ ❌ 100x

slide-8
SLIDE 8

#11

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

User-mode API Remoting

Native stack

Hypervisor Custom API Server Vendor Library Vendor Driver GPU VM Application Custom User-mode Library

Sharing Interposition Isolation Compatibility Slowdown

✔︐ ❌ ✔︐/❌ ❌ 1.5x

slide-9
SLIDE 9

#13

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Para-virtualization

Native stack

Hypervisor Vendor Library Vendor Driver GPU VM Application Custom User-mode Library Interface Translator Custom Guest Driver Custom Virtual GPU

Sharing Interposition Isolation Compatibility Slowdown

✔︐ ✔︐ ✔︐ ❌ 10x

slide-10
SLIDE 10

#18

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

End-to-end performance comparison

  • Y. Suzuki, S. Kato, H. Yamada, K. Kono, “GPUvm: Why Not Virtualizing GPUs at the Hypervisor?”, ATC’14
slide-11
SLIDE 11

#19

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

More details in the paper...

slide-12
SLIDE 12

#20

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Everybody sacrifices something

Sharing Interposition Isolation Compatibility Slowdown Full-virtualization ✔︐ ✔︐ ✔︐ ❌ 100x User-mode API Remoting ✔︐ ❌ ✔︐/❌ ❌ 1.5x Para-virtualization ✔︐ ✔︐ ✔︐ ❌ 10x

slide-13
SLIDE 13

#22

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Para-virtual API-remoting...

Sharing Interposition Isolation Compatibility Slowdown Full-virtualization ✔︐ ✔︐ ✔︐ ❌ 100x User-mode API Remoting ✔︐ ❌ ✔︐/❌ ❌ 1.5x Para-virtualization ✔︐ ✔︐ ✔︐ ❌ 10x Para-virtual API remoting ✔︐ ✔︐ ✔︐ 1.5x

slide-14
SLIDE 14

#26

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

SVGA2: TGSI as IR and vISA

Native stack

Hypervisor Vendor Library Vendor Driver GPU VM Application Translator

OpenGL/DX11 Libraries

vmwgfx.ko SVGA Device

TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language for describing shaders.

GLSL Code TGSI Native ISA

slide-15
SLIDE 15

#31

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Xen-SVGA: computing support

Native stack

Hypervisor Vendor Library GPU VM Application

Mesa3D OpenCL Library

Nouveau OpenCL Code TGSI Native ISA TGSI LLVM Compiler Nouveau TGSI Xen-SVGA Device

slide-16
SLIDE 16

#32

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Xen-SVGA: computing support

Native stack

Hypervisor Vendor Library GPU VM Application

Mesa3D OpenCL Library

Nouveau OpenCL Code TGSI Native ISA TGSI LLVM Compiler Nouveau TGSI Xen-SVGA Device

slide-17
SLIDE 17

#33

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

vISA - Boon or bane?

PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA).

slide-18
SLIDE 18

#37

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Trillium: Eliminates vISA

Native stack

Hypervisor Vendor Library Nouveau GPU VM Application

Mesa3D OpenCL Library

Nouveau OpenCL Code Native ISA Trillium Device

slide-19
SLIDE 19

#39

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

So are we done?

slide-20
SLIDE 20

#40

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Checkpoint

✓ Virtual ISA is unnecessary ✓ Para-virtual API remoting system performs better Sharing Interposition Isolation Compatibility Slowdown User-mode API Remoting ✔︐ ❌ ✔︐/❌ ❌ 1.5x Para-virtualization ✔︐ ✔︐ ✔︐ ❌ 10x Trillium ✔︐ ✔︐ ✔︐ ❌ 2.4x

slide-21
SLIDE 21

#41

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Para-virtual API-remoting...

Sharing Interposition Isolation Compatibility Slowdown User-mode API Remoting ✔︐ ❌ ✔︐/❌ ❌ 1.5x Para-virtualization ✔︐ ✔︐ ✔︐ ❌ 1-10x Trillium ✔︐ ✔︐ ✔︐ ❌ 2.4x ✕ Interposing too low in the stack ✕ API-specific

slide-22
SLIDE 22

#42

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Para-virtual API-remoting...

Sharing Interposition Isolation Compatibility Slowdown User-mode API Remoting ✔︐ ❌ ✔︐/❌ ❌ 1.5x Para-virtualization ✔︐ ✔︐ ✔︐ ❌ 1-10x Trillium ✔︐ ✔︐ ✔︐ ❌ 2.4x AvA ✔︐ ✔︐ ✔︐ ✔︐ <1.5x ✕ Interposing too low in the stack ✕ API-specific

slide-23
SLIDE 23

#45

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Para-virtual API Stack

Native stack

Automatic virtualization of accelerators

CL.h + Annotations

Hypervisor Generated API Server Vendor Library Vendor Driver Accelerators VM Application AvA Guest Driver AvA Virtual Device

  • H. Yu, A. M. Peters, A. Akshintala, C. J. Rossbach, Automatic Virtualization of Accelerators, HotOS’19

Generated User-mode Library

slide-24
SLIDE 24

#46

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Preliminary development effort

Type APIs LoC Time Difficulty GPUvm Full-virt 1 20 000 Years ★★★★ SVGA2 Para-virt 2 MANY! Years ★★★★ AvA Automatic Para-virtual API Remoting 9 OpenCL: 835 Days ★

slide-25
SLIDE 25

#47

  • A. Akshintala, H. Yu, A. M. Peters and C. J. Rossbach, Trillium, Virt’19

Conclusion

Lessons

  • Virtual ISA is unnecessary

○ Decouple device virtualization from GPU ISA virtualization

  • Para-virtual API remoting can lead to better performance and properties

New para-virtual API remoting system: AvA

  • “No IR” enables interposition at user-mode APIs
  • Compatibility is compensated