John Paul Walters Project Leader, USC Information Sciences Institute - PowerPoint PPT Presentation

Achieving Near-Native GPU Performance in the Cloud John Paul Walters Project Leader, USC Information Sciences Institute jwalters@isi.edu

Outline  Motivation  ISI’s HPC Cloud Effort  Background: PCI Passthrough, SR-IOV  Results  Conclusion 2

Motivation  Scientific workloads demand increasing performance with greater power efficiency – Architectures have been driven towards specialization, heterogeneity  Infrastructure-as-a-Service (IaaS) clouds can democratize access to the latest, most powerful accelerators – If performance goals are met  Can we provide HPC-class performance in the cloud? 3

ISI’s HPC Cloud Work  Cloud computing is traditionally seen as a resource for IT – Web servers, databases  More recently researchers have begun to leverage the public cloud as an HPC resource – AWS virtual cluster is 101 on Top500 list  Major difference between HPC and IT in the cloud: – Types of resources, heterogeneity  Our contribution: we’re developing the heterogeneous HPC extensions for the OpenStack cloud computing platform 4

OpenStack Background  OpenStack founded by Rackspace and NASA Google Trends Searches for Common  In use by Rackspace, HP, and Open Source IaaS Projects 120 others for their public clouds openstack 100 cloudstack  Open source with hundreds of 80 opennebula 60 eucalyptus cloud participating companies 40  In use for both public and private 20 0 clouds  Current stable release: OpenStack Juno – OpenStack Kilo to be released in April 5

Accessing GPUs from Virtual Hosts Using API Remoting Host to Device Bandwidth, Matrix Multiply for Increasing Pageable NxM 4000 3500 200 3000 GFlops/Sec MB/sec 150 2500 2000 100 1500 Host 50 Host 1000 500 0 gVirtus LXC 0 LXC gVirtus Size (NxM), Single Precision Real Bytes Larger matrix multiply amortizes I/O performance low for I/O transfer cost, LXC and native gVirtus/KVM, LXC much closer to performance indistinguishable. native performance. 6

Accelerators and Virtualization SHOC Performance for Common Signal Processing Kernels • Combine non- KVM Xen LXC VMWare virtualized 1.01 accelerators with Relative Performance 1 virtual hosts 0.99 • 0.98 Results in > 99% 0.97 efficiency 0.96 sgemm_t_p… sgemm_n_p… dgemm_n_… dgemm_t_p… fft_sp fft_sp_pcie ifft_sp ifft_sp_pcie fft_dp fft_dp_pcie ifft_dp ifft_dp_pcie sgemm_n sgemm_t dgemm_n dgemm_t 7

PCI Passthrough Background  1:1 mapping of physical device to virtual machine  Device remains non- virtualized 8

SR-IOV Background  SR-IOV partitions a single physical device into multiple virtual functions  Virtual functions almost indistinguishable from physical functions.  Virtual functions passed to virtual machines using PCI passthrough Image from: http://docs.oracle.com/cd/E23824_01/html/819-3196/figures/sriov-intro.png 9

Multi-GPU with SR-IOV and GPUDirect  Many real applications extend beyond a single node’s capabilities  Test multi-node performance with Infiniband SR-IOV and GPUDirect  4 Sandy Bridge nodes equipped with K20/K40 GPUs – ConnectX-3 IB with SR-IOV enabled – Ported Mellanox OFED 2.1-1 to 3.13 kernel – KVM hypervisor  Test with LAMMPS, OSU Microbenchmarks, and HOOMD 10

LAMMPS Rhodopsin with SR-IOV Performance LAMMPS Rhodopsin Performance 3.5 Millions of atom-timesteps per second 3 2.5 2 VM 32c/4g VM 4c/4g 1.5 Base 32c/4g 1 Base 4c/4g 0.5 0 32k 64k 128k 256k 512k Problem Size 11

LAMMPS Lennard-Jones with SR-IOV Performance LAMMPS Lennard-Jones Performance 140 Millions of atom-timesteps per second 120 100 80 VM 32c/4g VM 4c/4g 60 Base 32c/4g 40 Base 4c/4g 20 0 2k 4k 8k 16k 32k 64k 128k 256k 512k 1024k 2048k Problem Size 12

LAMMPS Virtualized Performance  Achieve 96% - 99% efficiency – Performance gap decreases with increasing problem size  Future work needed to validate results across much larger systems – This work is in the early stages 13

GPUDirect Advantage  Validate GPUDirect over SR-IOV – Uses nvidia_peer_memory- 1.0-0 kernel module  OSU GDR Microbenchmarks  HOOMD MD Image source: http://old.mellanox.com/content/pages.php?pg=products_dyn &product_family=116 14

OSU GDR Microbenchmarks: Latency 500 40 450 400 30 Avg Latency (us) 350 20 300 10 250 Native 200 0 1 4 16 64 Virtualized 150 100 50 0 131072 262144 524288 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 1048576 Size (Bytes) 15

OSU GDR Microbenchmarks: Bandwidth Bandwidth (MB/s) 1000 1500 2000 2500 3000 3500 500 0 1 2 4 8 16 32 64 128 256 512 Size (Bytes) 1024 2048 16 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 VIrtualized Native

GPUDirect-enabled VM Performance HOOMD GPUDirect Performance, 256K Particles Lennard-Jones Simulation 800 Average Timesteps per second 700 600 500 VM GPUDirect 400 VM No GPUDirect 300 Base GPUDirect 200 Base No GPUDirect 100 0 0 1 2 3 4 N Nodes 17

Discussion  Take-away: GDR provides nearly 10% improvement  SR-IOV interconnect results in < 2% overhead  Further work needed to validate these results in larger systems – Small-scale results are promising 18

Future Work  For full results see: – J.P. Walters, et al. GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications , IEEE Cloud 2014 – A.J. Younge, et al. Supporting High Performance Molecular Dynamics in Virtualized Clusters using IOMMU, SR-IOV, and GPUDirect, to appear in VEE 2015.  Next steps: – Extend scalability results – OpenStack integration  Code: https://github.com/usc-isi/nova 19

Questions and Comments  Contact me: – jwalters@isi.edu – www.isi.edu/people/jwalters/ 20

John Paul Walters Project Leader, USC Information Sciences Institute - PowerPoint PPT Presentation

Achieving Near-Native GPU Performance in the Cloud John Paul Walters Project Leader, USC Information Sciences Institute jwalters@isi.edu Outline Motivation ISIs HPC Cloud Effort Background: PCI Passthrough, SR-IOV Results

Responsible Use of Veterinary Products Bettye K. Walters, DVM Bettye.walters@fda.hhs.gov

TRIUMPH GULF COAST Elizabeth J. Walters Burke, Blue, Hutchison, Walters & Smith, P.A. 16215

Hannah Walters 2010 Undergraduate Intern at Colorado State University Two Undergraduates: Hannah

Model-Based Testing of ETCS RBCs Aled Rhys Walters Swansea University An iCASE PhD in

EPUB in the Browser Ben Walters Principle Software Engineering Lead at Microsoft

Thrift Stores A Simplistic Approach Presented by John Walters Director of Stores Society of St.

Paul: a legal case study The hard question: Paul, an apostle The hard question: WHAT DO YOU

a legal case study Paul, an apostle We love because he first loved us. Paul, an apostle

John F. Coombs, B.Sc., M.D. 152 Walters Lane, Fallbrook, Ontario Canada, K0G 1A0 May 17, 2016

Paul: a legal case study Paul, an apostle The Industrial Hygiene Foundation Paul, an apostle

JOHN PAUL JONES ARENA CHARLOTTESVILLE, VIRGINIA SCHEMATIC DESIGN PRESENTATION LOGAN BROWN JOHN PAUL

Welcome to John Deere Forestry! John Deere Equipment Divisions 2 | John Deere Forestry Oy l

D. John the Baptists humility John 3:22 36 1. John 3:22 Jesus was teaching and

Rules John likes all people Could list all people likes(john,alfred). likes(john,bertrand).

Paul L Downing Mailbox Julian H Paul L Downing Paul L Downing had created the mailbox in

HOME WINEMAKING ARCH 551-02 LANCE WALTERS HISTORY. PROCESS. CHAOS THEORY. FORM. ALESHA QUAM

Advance Power Conversion Co., Ltd. www.apcon.co.th Corporate Overview Advance Power Conversion

Cino Zucchi The Intermodal Sta/on as Urban Catalyst CZ

Wise Use of Wetlands Sansanee Choowaew THE RAMSAR CONCEPT OF WISE USE Under Article 3.1

Rosa Menkman Robbie Haynes, Dominic Cutilletta Bio 35 year old Dutch artist/theorist

Advanced techniques: Quantum treatment of nuclei and non-adiabatic (surface hopping) approaches

2013- 2014 A Remarkable Year Look what Members enjoyed and achieved 30 th Anniversary Spring

Rick Dunn, P.E. Benton PUD Senior Director Engineering & Power Management October 23,

TOP SOIL BUILDERS KEN LAING ORCHARD HILL FARM ST .THOM AS, ON OHF UNFAIR ADVANTAGES

John Paul Walters Project Leader, USC Information Sciences Institute - PowerPoint PPT Presentation

Achieving Near-Native GPU Performance in the Cloud John Paul Walters Project Leader, USC Information Sciences Institute jwalters@isi.edu Outline Motivation ISIs HPC Cloud Effort Background: PCI Passthrough, SR-IOV Results

Responsible Use of Veterinary Products Bettye K. Walters, DVM Bettye.walters@fda.hhs.gov

TRIUMPH GULF COAST Elizabeth J. Walters Burke, Blue, Hutchison, Walters &amp; Smith, P.A. 16215

Hannah Walters 2010 Undergraduate Intern at Colorado State University Two Undergraduates: Hannah

Model-Based Testing of ETCS RBCs Aled Rhys Walters Swansea University An iCASE PhD in

EPUB in the Browser Ben Walters Principle Software Engineering Lead at Microsoft

Thrift Stores A Simplistic Approach Presented by John Walters Director of Stores Society of St.

Paul: a legal case study The hard question: Paul, an apostle The hard question: WHAT DO YOU

a legal case study Paul, an apostle We love because he first loved us. Paul, an apostle

John F. Coombs, B.Sc., M.D. 152 Walters Lane, Fallbrook, Ontario Canada, K0G 1A0 May 17, 2016

Paul: a legal case study Paul, an apostle The Industrial Hygiene Foundation Paul, an apostle

JOHN PAUL JONES ARENA CHARLOTTESVILLE, VIRGINIA SCHEMATIC DESIGN PRESENTATION LOGAN BROWN JOHN PAUL

Welcome to John Deere Forestry! John Deere Equipment Divisions 2 | John Deere Forestry Oy l

D. John the Baptists humility John 3:22 36 1. John 3:22 Jesus was teaching and

Rules John likes all people Could list all people likes(john,alfred). likes(john,bertrand).

Paul L Downing Mailbox Julian H Paul L Downing Paul L Downing had created the mailbox in

HOME WINEMAKING ARCH 551-02 LANCE WALTERS HISTORY. PROCESS. CHAOS THEORY. FORM. ALESHA QUAM

Advance Power Conversion Co., Ltd. www.apcon.co.th Corporate Overview Advance Power Conversion

Cino Zucchi The Intermodal Sta/on as Urban Catalyst CZ

Wise Use of Wetlands Sansanee Choowaew THE RAMSAR CONCEPT OF WISE USE Under Article 3.1

Rosa Menkman Robbie Haynes, Dominic Cutilletta Bio 35 year old Dutch artist/theorist

Advanced techniques: Quantum treatment of nuclei and non-adiabatic (surface hopping) approaches

2013- 2014 A Remarkable Year Look what Members enjoyed and achieved 30 th Anniversary Spring

Rick Dunn, P.E. Benton PUD Senior Director Engineering &amp; Power Management October 23,

TOP SOIL BUILDERS KEN LAING ORCHARD HILL FARM ST .THOM AS, ON OHF UNFAIR ADVANTAGES

TRIUMPH GULF COAST Elizabeth J. Walters Burke, Blue, Hutchison, Walters & Smith, P.A. 16215

Rick Dunn, P.E. Benton PUD Senior Director Engineering & Power Management October 23,