Automatic Virtualization of Accelerators
Hangchen Yu, Arthur Michener Peters, Amogh Akshintala, Christopher J. Rossbach
HotOS’19 13 May 2019
Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener - - PowerPoint PPT Presentation
Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener Peters , Amogh Akshintala, Christopher J. Rossbach HotOS19 13 May 2019 Accelerators are not virtualized Cloud computing relies on virtualization Consolidation,
HotOS’19 13 May 2019
#2
– CPUs, Memory, I/O devices
– Dedicated to VMs – Underutilized
VM1 VM2
#3
– TensorFlow, TF Lite, QuickAssist, ...
– proprietary – highly-specialized
#6
Application Hardware Hardware Interface
MMIO INTR …
Public API
Stream Socket … DNS
User-mode Library Driver Standard OS Interfaces
File Socket … syscall
#9
Application Hardware Hardware Interface Public API
API
User-mode Library Driver
ioctl MMIO
Proprietary Interfaces
DMA INTR
#11
Silo
Application Hardware Hardware Interface Public API
API
User-mode Library Driver
ioctl MMIO
Proprietary Interfaces
DMA INTR API Hardware
Silo
#14
Silo
Hypervisor
Custom API Server Vendor Library Vendor Driver Accelerator VM Application Custom User-mode Library
#15
Silo
Hypervisor
Custom API Server Vendor Library Vendor Driver Hardware VM Application Custom User-mode Library Custom Guest Driver Custom Virtual GPU
#16
Silo
Hypervisor Generated API Server Vendor Library Vendor Driver Accelerators VM Application
Generated User-mode Library
AvA Guest Driver AvA Virtual Device
CAvA
#17
Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA
#18
Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA
ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength);
#19
Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA
ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); }
#20
Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA
ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); } ava_argument(data) { ava_output; ava_buffer(*dataLength); }
#21
Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA
ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); }
ncStatus_t ncGlobalGetOption(...) { ... cmd = new_command(...); cmd->api_id = MVNC_API; cmd->command_id = NC_GLOBAL_GET_OPTION; ... send_command(cmd); wait_for_reply(cmd); ... }
ava_argument(data) { ava_output; ava_buffer(*dataLength); }
#23
– OpenCL, CUDA, HIP, TensorFlow C, NCSDK, QAT, …
– GPUs, Intel Movidius NCS, Intel QAT, FPGA applications, …
– 5.6% for CUDA; 7% for OpenCL, excluding myocyte (call-intensive) – Compare to 100× for GPUvm
#24
Type APIs LoC (#Funcs) Time Difficulty GPUvm Full-virt 1 20 000 Years ★★★★ SVGA2 Para-virt 2 MANY! Years ★★★★ GvirtuS API Remoting 7 CUDA: 3 011 (71) OpenCL: 2 531 (22) ~60 / Function Months ★★ AvA Automatic Para-virtual API Remoting 9 CUDA: 221 (16) OpenCL: 835 (37) ~20 / Function Days ★
#25
#26
Compensates for:
– A single developer can virtualize a new API/device in days