S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster - PowerPoint PPT Presentation

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Jonas Markussen PhD student Simula Research Laboratory

Outline • Motivation • PCIe Overview • Non-Transparent Bridges • Device Lending

Distributed applications may need to access and use IO resources that are physically located inside remote hosts Front-end . . . Control + Signaling + Data Interconnect . . . . . . . . . … … … Compute node Compute node Compute node

Software abstractions simplify the use and allocation of resources in a cluster and facilitate development of distributed applications Control + Handled in software Signaling + . . . • rCUDA Data … • CUDA-aware Open MPI … … • Custom GPUDirect RDMA implementation Front-end … • . . . … … Logical view of resources

Local resource Remote resource using middleware Application Application CUDA library + driver CUDA – middleware integration Local Middleware service PCIe IO bus Interconnect transport (RDMA) Interconnect Interconnect transport (RDMA) Middleware service/daemon Remote CUDA driver PCIe IO bus

In PCIe clusters, the same fabric is used both as local IO bus within a single node and as the interconnect between separate nodes Memory bus PCIe interconnect switch RAM External PCIe cable CPU and chipset PCIe bus Interconnect switch PCIe interconnect PCIe IO device host adapter

Remote resource over native fabric Local resource Application Application CUDA library + driver CUDA library + driver Local PCIe IO bus PCIe IO bus PCIe-based interconnect Remote PCIe IO bus

PCIe Overview

PCIe is the dominant IO bus technology in computers today, and can also be used as a high-bandwidth low-latency interconnect 35 30 Gigabytes per second (GB/s) 25 PCIe x4 20 PCIe x8 15 PCIe x16 10 5 0 Gen 2 Gen 3 Gen 4 PCI-SIG. PCI Express 3.1 Base Specification, 2010. http://www.eetimes.com/document.asp?doc_id=1259778

Memory reads and writes are handled by PCIe as transactions that are packet-switched through the fabric depending on the address CPU and chipset Upstream • RAM Downstream • Peer-to-peer (shortest path) • PCIe device PCIe device PCIe device

IO devices and the CPU share the same physical address space, allowing devices to access system memory and other devices Address space Interrupt vecs 0x00000… 0xfee00xxx IO device CPU and chipset IO device RAM IO device RAM 0xFFFFF… PCIe device Memory-mapped IO (MMIO / PIO) • Direct Memory Access (DMA) • Message-Signaled Interrupts (MSI-X) • PCIe device PCIe device

Non-Transparent Bridges

Remote address space can be mapped into local address space by using PCIe Non-Transparent Bridges (NTBs) Address space NTB CPU and chipset CPU and chipset Local RAM RAM RAM Local host NTB addr mapping Remote host Local Remote 0xf000 0x9000 . . . . . . PCIe NTB adapter PCIe NTB adapter

Using NTBs, each node in the cluster take part in a shared address space and have their own “window” into the global address space A’s addr space Global addr space Addr space in A Local IO devices Addr space in B Global addr space Addr space in C Local RAM C’s addr space A B C Local IO devices Exported address range NTB-based Local RAM interconnect

Device Lending

A remote IO device can be “borrowed” by mapping it into local address space, making it appear locally installed in the system Device driver CPU and chipset CPU and chipset Borrower Owner RAM RAM NTB addr mapping Remote Local 0xb000 0x2000 PCIe hot-plug . . . . . . NTB adapter Inserted device Physical device NTB adapter 0x1000 0x2000 0xb000 0xe000

By intercepting DMA API calls to set up IOMMU mappings and inject reverse NTB mappings, physical location is completely transparent Device driver CPU and chipset CPU and chipset Borrower Owner dma_addr = dma_map_page(0x9000); RAM RAM Use addr NTB addr mapping IOV Phys 0xf000 0x5000 0x9000 Local Remote . . . . . . 0xf000 0x5000 IOMMU . . . . . . NTB adapter Inserted device Physical device NTB adapter 0x1000 0x2000 0xb000 0xe000

Borrowed remote resource Resource appears local Application to OS, driver, and app CUDA library + driver Local Unmodified local driver PCIe IO bus (with hot-plug support) Hardware mappings PCIe NTB interconnect ensure fast data path Works with any PCIe device Remote (even individual SR-IOV functions) PCIe IO bus

Borrowed remote resource Remote resource using middleware Application Application CUDA library + driver CUDA – middleware integration Local Middleware service PCIe IO bus Interconnect transport (RDMA) PCIe NTB interconnect Interconnect Interconnect transport (RDMA) Middleware service/daemon Remote CUDA driver PCIe IO bus PCIe IO bus

Borrowed remote resource Local resource Application Application CUDA library + driver CUDA library + driver Local PCIe IO bus PCIe IO bus PCIe NTB interconnect Remote PCIe IO bus

Device-to-host memory transfer 14 Gigabytes per second (GB/s) 12 10 8 6 4 2 0 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB Transfer size bandwidthTest (Local) bandwidthTest (Borrowed) PXH830 DMA (GPUDirect RDMA) 1. Nvidia CUDA 8.0 Samples bandwidthTest GPU: Quadro P400 Nvidia driver: Version 375.26 (Centos 7) 2. GPUDirect RDMA benchmark using Dolphin NTB DMA CPU: Xeon E5-1630 3.7 GHz Memory: DDR4 2133 MHz https://github.com/Dolphinics/cuda-rdma-bench

Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices RAM Task A Task A Task B Task C CPU + chipset SSD FPGA NIC SSD SSD SSD NTB GPU SSD GPU GPU GPU SSD RAM Task B CPU + chipset NIC NIC FPGA GPU NTB GPU GPU GPU GPU SSD SSD RAM SSD Task C FPGA CPU + chipset GPU GPU GPU NTB Device pool

Server room http://mlab.no/blog/2016/12/eir/ EIR – Efficient computer aided diagnosis framework for gastrointestinal examination Examination room Examination room

Moving forward • Strategy-based management • Fail-over mechanisms • VFIO and other API integration (“SmartIO”) • Borrowing vGPU functions

Thank you! My email address “Device Lending in PCI Express Networks” Selected ACM NOSSDAV 2016 publications “Efficient Processing of Video in a Multi Auditory jonassm@simula.no Environment using Device Lending of GPUs” ACM Multimedia Systems 2016 (MMSys’16) “PCIe Device Lending” University of Oslo 2015 Device Lending demo and more Visit Dolphin in exhibition area (booth 625)

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster - PowerPoint PPT Presentation

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Jonas Markussen PhD student Simula Research Laboratory Outline Motivation PCIe Overview Non-Transparent Bridges Device Lending Distributed applications may need

Nquire ask anything Anis Abboud, Chris Snyder, Mario Finelli Device 1 Device 2 Device 1

PCI Express Rx-Tx-Protocol Solutions Customer Presentation December 13, 2013 Agenda PCIe

S9709 Dynamic Sharing of f GPUs and IO IO in in a PCIe Network Hkon Kvale Stensland Senior

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Group Lending and Enforcement October 2007 () Group lending October 2007 1 / 26 Group Lending

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Dual T1E1 Express (PCIe) Analysis & Emulation Boards 818 West Diamond Avenue - Third Floor,

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

LENDING BROUGHT TO YOU BY Y O U R G U I D E T O Identifying Abusive or Unfair Lending

Energy Efficiency Cluster Lending For SMEs By Indian Banks Cluster lending represents an

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Introduction to Linux dynamic device management Birmingham Linux User Group 21 April 2011 Nick

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Dynamic Front End Sharing In Graphics Dynamic Front End Sharing In Graphics Processing

Device Creation with Qt Enterprise Embedded Andy Nichols Overview The challenges of device

Towards a Unified Framework for Mobile Device Security Wayne A. Jansen, NIST Mobile Device

United States Court of Appeals for the Federal Circuit ______________________ NETAC TECHNOLOGY

Project Plan VW Car-Net Smart Hub Web Apps The Capstone Experience Team Volkswagen Bryce Archer

Mediasite Roadmap: Tools for a Digital-First Reality AGENDA New world How we're

DRAFT 20192023 TxDOT STRATEGIC PLAN Discussion Item April 26, 2018 Draft 20192023 TxDOT

Page 1 2. Okay to use images off the Internet. Must be colorful and authentic. Vamos de Viaje ~ El

Grad aduati tion I Infor orma mati tion Tim m Hude denbu burg a g and d Josh M MacN

Canadian Martyrs Catholic School General Playground Safety All of the playground safety rules at

Welcome to 2 nd Grade! BEE FOLDERS Communication It is VERY important for you to look over your

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster - PowerPoint PPT Presentation

S7281: Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Jonas Markussen PhD student Simula Research Laboratory Outline Motivation PCIe Overview Non-Transparent Bridges Device Lending Distributed applications may need

Nquire ask anything Anis Abboud, Chris Snyder, Mario Finelli Device 1 Device 2 Device 1

PCI Express Rx-Tx-Protocol Solutions Customer Presentation December 13, 2013 Agenda PCIe

S9709 Dynamic Sharing of f GPUs and IO IO in in a PCIe Network Hkon Kvale Stensland Senior

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Group Lending and Enforcement October 2007 () Group lending October 2007 1 / 26 Group Lending

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Dual T1E1 Express (PCIe) Analysis &amp; Emulation Boards 818 West Diamond Avenue - Third Floor,

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

LENDING BROUGHT TO YOU BY Y O U R G U I D E T O Identifying Abusive or Unfair Lending

Energy Efficiency Cluster Lending For SMEs By Indian Banks Cluster lending represents an

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Introduction to Linux dynamic device management Birmingham Linux User Group 21 April 2011 Nick

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Dynamic Front End Sharing In Graphics Dynamic Front End Sharing In Graphics Processing

Device Creation with Qt Enterprise Embedded Andy Nichols Overview The challenges of device

Towards a Unified Framework for Mobile Device Security Wayne A. Jansen, NIST Mobile Device

United States Court of Appeals for the Federal Circuit ______________________ NETAC TECHNOLOGY

Project Plan VW Car-Net Smart Hub Web Apps The Capstone Experience Team Volkswagen Bryce Archer

Mediasite Roadmap: Tools for a Digital-First Reality AGENDA New world How we're

DRAFT 20192023 TxDOT STRATEGIC PLAN Discussion Item April 26, 2018 Draft 20192023 TxDOT

Page 1 2. Okay to use images off the Internet. Must be colorful and authentic. Vamos de Viaje ~ El

Grad aduati tion I Infor orma mati tion Tim m Hude denbu burg a g and d Josh M MacN

Canadian Martyrs Catholic School General Playground Safety All of the playground safety rules at

Welcome to 2 nd Grade! BEE FOLDERS Communication It is VERY important for you to look over your

Dual T1E1 Express (PCIe) Analysis & Emulation Boards 818 West Diamond Avenue - Third Floor,