Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener - - PowerPoint PPT Presentation

automatic virtualization of accelerators
SMART_READER_LITE
LIVE PREVIEW

Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener - - PowerPoint PPT Presentation

Automatic Virtualization of Accelerators Hangchen Yu, Arthur Michener Peters , Amogh Akshintala, Christopher J. Rossbach HotOS19 13 May 2019 Accelerators are not virtualized Cloud computing relies on virtualization Consolidation,


slide-1
SLIDE 1

Automatic Virtualization of Accelerators

Hangchen Yu, Arthur Michener Peters, Amogh Akshintala, Christopher J. Rossbach

HotOS’19 13 May 2019

slide-2
SLIDE 2

#2

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Accelerators are not virtualized

  • Cloud computing relies on virtualization

– Consolidation, Elasticity, …

  • Most resources virtualized

– CPUs, Memory, I/O devices

  • Except Accelerators

– Dedicated to VMs – Underutilized

VM1 VM2

slide-3
SLIDE 3

#3

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Explosion of Accelerators and APIs

  • 15 new accelerators in 3.5 years

– Google TPUs, Intel QuickAssist, …

  • Many important APIs

– TensorFlow, TF Lite, QuickAssist, ...

  • Accelerator stacks

– proprietary – highly-specialized

slide-4
SLIDE 4

#6

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Traditional Technology Stacks

Application Hardware Hardware Interface

MMIO INTR …

Public API

Stream Socket … DNS

User-mode Library Driver Standard OS Interfaces

File Socket … syscall

slide-5
SLIDE 5

#9

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Accelerator Silos

Application Hardware Hardware Interface Public API

API

User-mode Library Driver

ioctl MMIO

Proprietary Interfaces

DMA INTR

slide-6
SLIDE 6

#11

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Silo

Accelerator Silos

Application Hardware Hardware Interface Public API

API

User-mode Library Driver

ioctl MMIO

Proprietary Interfaces

DMA INTR API Hardware

Silo

slide-7
SLIDE 7

#14

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Silo

Hypervisor

API Remoting

Custom API Server Vendor Library Vendor Driver Accelerator VM Application Custom User-mode Library

Hypervisor

slide-8
SLIDE 8

#15

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Silo

Hypervisor

SVGA2: Para-virtual GPU

Custom API Server Vendor Library Vendor Driver Hardware VM Application Custom User-mode Library Custom Guest Driver Custom Virtual GPU

slide-9
SLIDE 9

#16

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Para-virtual API Stack

Silo

CAvA: Compiler for Automatic Virtualization of Accelerators CUDA.h

Hypervisor Generated API Server Vendor Library Vendor Driver Accelerators VM Application

Generated User-mode Library

AvA Guest Driver AvA Virtual Device

CAvA

slide-10
SLIDE 10

#17

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA Toolchain

Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA

slide-11
SLIDE 11

#18

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA Toolchain

Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA

ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength);

slide-12
SLIDE 12

#19

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA Toolchain

Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA

ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); }

slide-13
SLIDE 13

#20

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA Toolchain

Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA

ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); } ava_argument(data) { ava_output; ava_buffer(*dataLength); }

slide-14
SLIDE 14

#21

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA Toolchain

Accelerator API.h Para-virtual API Stack Skeletal API Spec. CAvA Developer API Specification CAvA

ncStatus_t ncGlobalGetOption( int option, void *data, int *dataLength); ava_argument(data) { ava_input; ava_output; ava_buffer(1); }

ncStatus_t ncGlobalGetOption(...) { ... cmd = new_command(...); cmd->api_id = MVNC_API; cmd->command_id = NC_GLOBAL_GET_OPTION; ... send_command(cmd); wait_for_reply(cmd); ... }

ava_argument(data) { ava_output; ava_buffer(*dataLength); }

slide-15
SLIDE 15

#23

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Preliminary Experiences

  • APIs

– OpenCL, CUDA, HIP, TensorFlow C, NCSDK, QAT, …

  • Devices

– GPUs, Intel Movidius NCS, Intel QAT, FPGA applications, …

  • Overhead measurements

– 5.6% for CUDA; 7% for OpenCL, excluding myocyte (call-intensive) – Compare to 100× for GPUvm

slide-16
SLIDE 16

#24

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Preliminary Development Effort

Type APIs LoC (#Funcs) Time Difficulty GPUvm Full-virt 1 20 000 Years ★★★★ SVGA2 Para-virt 2 MANY! Years ★★★★ GvirtuS API Remoting 7 CUDA: 3 011 (71) OpenCL: 2 531 (22) ~60 / Function Months ★★ AvA Automatic Para-virtual API Remoting 9 CUDA: 221 (16) OpenCL: 835 (37) ~20 / Function Days ★

slide-17
SLIDE 17

#25

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

Continuing Work

  • Inferring more information about the API

– API documentation – Programs which use the API – API implementations

slide-18
SLIDE 18

#26

  • H. Yu, A. M. Peters, A. Akshintala and C. J. Rossbach, AvA, HotOS’19

AvA: Automatic Virtualization of Accelerators

  • Silos→API remoting only viable accelerator virtualization technique

Thank you && Debate

  • AvA

Compensates for:

  • Compatibility with automation
  • Interposition with hypervisor mediated transport

– A single developer can virtualize a new API/device in days