Confidential Accelerators Stavros Volos Microsoft Research

Accelerators Play Pivotal Role in Cloud GPU, FPGA, AI Accelerators (e.g., Brainwave, TPU) Increasing compute performance and bandwidth • E.g., Nvidia V100 offers 14 TFLOPS & ~1 TB/s of Mem. BW • 100x faster than CPU But, not designed with confidential computing primitives Undesirable trade-off between security and performance

o Securely offload code and data Confidential Accelerators o Isolation from privileged attackers o Remote attestation Data App App Data Code Data Operating Runtim Code Code System e Hypervisor Hardware How do we design trusted execution environments (TEE) for accelerators?

Confidential GPUs ‡ ‡ Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. Graviton: Trusted Execution Environment on GPUs, OSDI’18

GPU 101 – Overview CPU Host Memory System Bus PCIe Bridge PCIe Bus GPU PCIe host interface Memory controller High bandwidth Command Trusted, processor memory on-packag DMA e memory engines L2 cache

GPU 101 – Execution Environment Execution is context-based • Context is a collection of resources akin to a CPU process • Resources include command queues and device memory Kernels operate on data in device memory • Kernel is application code executed on GPU cores via API call • Device memory management (allocation, transfer) is explicit via API calls GPU cores and DMA engines controlled via command channels • Kernel’s code/data transfer to device memory • Kernel dispatch to GPU

GPU 101 – Execution Environment 1. Device driver manages command & DMA buffers Context Commands fetched by command processor • Data fetched by DMA engines • CPU Command DMA buffer buffer GPU Device memory

GPU 101 – Execution Environment 1. Device driver manages command & DMA buffers Context Commands fetched by command processor • Data fetched by DMA engines • CPU 2. Device driver controls memory management GPUs rely on virtual memory and channel • abstraction for context isolation Command DMA buffer buffer GPU Channels Page directory Page tables Device memory Large attack vector if host OS and device driver are compromised

TEE on GPUs (1/2) Key idea: TEE takes the form of secure context • Cryptographically bound to user’s trusted GPU runtime • Isolated from all software on host and other contexts • Commands and DMA buffers receive ciphertext Modest hardware complexity: extensions limited to command processor • New commands for context and memory management • Provision context’s key during context creation • Enforce strict context-level ownership of device memory • Keyed MACs passed from/to GPU runtime for authorization and validation • Crypto engine for authenticated encryption Hardware changes NOT on the critical path of the GPU

TEE on GPUs (2/2) Software extensions limited to GPU runtime and GPU driver • Secure CUDA driver API implements all required security protocols • GPU driver uses bootstrap channel for context/memory management • GPU runtime validates device driver’s actions as part of CUDA driver API Complexity hidden behind CUDA programming model • CUDA programming model well defined for abstracting device management • No/minimal modifications to user application codebase Almost all changes are transparent to application developers

TEE on GPUs – Summary CPU Host Memory System Bus Blocks MMIO PCIe Bridge PCIe Bus accesses GPU PCIe host interface Memory controller High bandwidth Command processor memory PUF, ECDSA Attestation, DMA key gen, isolation, secure engines signing command Security submission L2 cache module

Summary Current cloud trends • Accelerators are the future of the cloud • No confidential computing support by accelerators Trusted execution environment for accelerators • Allow CPU TEEs for secure code/data offload to device • Can be simpler than CPU TEEs

Confidential Accelerators Stavros Volos Microsoft Research - PowerPoint PPT Presentation

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in Cloud GPU, FPGA, AI Accelerators (e.g., Brainwave, TPU) Increasing compute performance and bandwidth E.g., Nvidia V100 offers 14 TFLOPS & ~1

Application Accelerators: Application Accelerators: Application Accelerators: Application

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

2019-2020 Campaign Theme CONFIDENTIAL & PROPRIETARY CONFIDENTIAL & PROPRIETARY

the balance CONFIDENTIAL No stone to be unturned CONFIDENTIAL CONFIDENTIAL CONFIDENTIAL

2019-2020 Campaign Theme CONFIDENTIAL & PROPRIETARY CONFIDENTIAL & PROPRIETARY

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

EUCARD2/WP4:Applications Medium Energy Accelerators/Accelerators for Medicine Introduction Hywel

Post- -accelerators accelerators for EURISOL for EURISOL Post Marie- -H H l l ne

HISTORY HISTORY AND AND APPLICATIONS APPLICATIONS OF OF ACCELERATORS ACCELERATORS

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Laser plasma accelerators: Laser plasma accelerators: state-of-the-art and perspective

Chronos: Efficient Speculative Parallelism for Accelerators MALEEN ABEYDEERA, DANIEL SANCHEZ

Accelerators LISHEP Lecture I Oliver Brning CERN http://bruening.home.cern.ch/bruening

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization Michael Bauer (Stanford) Henry

DMA implementations for FPGA- based data acquisition systems Presenter: Wojciech M. Zabootny

The COIN-OR Optimization Suite: Open Source Tools for Optimization Part 5: Advanced Modeling with

Neural Networks for Time Series Prediction 15-496/782: Artificial Neural Networks Kornel

A n A n a l y s i s o f A c c e l e r a t o r C o u p l i n g i n

Examples of I/O Devices Device Behaviour Partner Data Rate (Mbit/sec) Keyboard Input Human

PCIe and DMA in MirageOS Fabian Bonk Wednesday 20 th May, 2020 Chair of Network Architectures and

IO and Device Management IO Processing DMA Logical Structure of I/O Functionality

Sambuz

Useful Links

Newsletter

Mail Us

Confidential Accelerators Stavros Volos Microsoft Research - PowerPoint PPT Presentation

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in Cloud GPU, FPGA, AI Accelerators (e.g., Brainwave, TPU) Increasing compute performance and bandwidth E.g., Nvidia V100 offers 14 TFLOPS & ~1

Application Accelerators: Application Accelerators: Application Accelerators: Application

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Accelerators for Americas Future ACCELERATORS - MODERN SHIPS OF DISCOVERY October 26, 2009

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Activities on accelerators in Spain Francis Perez ALBA Accelerators Head on behalf of

2019-2020 Campaign Theme CONFIDENTIAL &amp; PROPRIETARY CONFIDENTIAL &amp; PROPRIETARY

the balance CONFIDENTIAL No stone to be unturned CONFIDENTIAL CONFIDENTIAL CONFIDENTIAL

2019-2020 Campaign Theme CONFIDENTIAL &amp; PROPRIETARY CONFIDENTIAL &amp; PROPRIETARY

EUCARD2/WP4:Applica2ons Medium Energy Accelerators/Accelerators for Medicine

EUCARD2/WP4:Applications Medium Energy Accelerators/Accelerators for Medicine Introduction Hywel

Post- -accelerators accelerators for EURISOL for EURISOL Post Marie- -H H l l ne

HISTORY HISTORY AND AND APPLICATIONS APPLICATIONS OF OF ACCELERATORS ACCELERATORS

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff

Laser plasma accelerators: Laser plasma accelerators: state-of-the-art and perspective

Chronos: Efficient Speculative Parallelism for Accelerators MALEEN ABEYDEERA, DANIEL SANCHEZ

Accelerators LISHEP Lecture I Oliver Brning CERN http://bruening.home.cern.ch/bruening

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization Michael Bauer (Stanford) Henry

DMA implementations for FPGA- based data acquisition systems Presenter: Wojciech M. Zabootny

The COIN-OR Optimization Suite: Open Source Tools for Optimization Part 5: Advanced Modeling with

Neural Networks for Time Series Prediction 15-496/782: Artificial Neural Networks Kornel

A n A n a l y s i s o f A c c e l e r a t o r C o u p l i n g i n

Examples of I/O Devices Device Behaviour Partner Data Rate (Mbit/sec) Keyboard Input Human

PCIe and DMA in MirageOS Fabian Bonk Wednesday 20 th May, 2020 Chair of Network Architectures and

IO and Device Management IO Processing DMA Logical Structure of I/O Functionality

Sambuz

Useful Links

Newsletter

Mail Us

2019-2020 Campaign Theme CONFIDENTIAL & PROPRIETARY CONFIDENTIAL & PROPRIETARY

2019-2020 Campaign Theme CONFIDENTIAL & PROPRIETARY CONFIDENTIAL & PROPRIETARY