Boostin Boosting g Perf erfor ormance mance and Ear and - - PowerPoint PPT Presentation

boostin boosting g
SMART_READER_LITE
LIVE PREVIEW

Boostin Boosting g Perf erfor ormance mance and Ear and - - PowerPoint PPT Presentation

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politcnica de Valncia Spain Outline 1.


slide-1
SLIDE 1

Boostin Boosting g Perf erfor

  • rmance

mance and Ear and Earnings nings

  • f
  • f Cloud Computing

Cloud Computing Deplo Deployments yments with with rCUD rCUDA

Federico Silla

Universitat Politècnica de València Spain

slide-2
SLIDE 2

GPU Technology Conference 2017

2/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-3
SLIDE 3

GPU Technology Conference 2017

3/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-4
SLIDE 4

GPU Technology Conference 2017

4/33

Using CUDA GPUs from virtual machines

  • How to access the GPU in the native domain from the inside of a

virtual machine?

slide-5
SLIDE 5

GPU Technology Conference 2017

5/33

  • The PCI passthrough technique can be used to assign the GPU

to a virtual machine

  • However … the GPU is assigned in an exclusive way
  • Concurrent usage of the GPU is not possible

Using CUDA GPUs from virtual machines

slide-6
SLIDE 6

GPU Technology Conference 2017

6/33

  • … the amount of virtual machines using CUDA acceleration cannot

be larger than the amount of GPUs present in the host

Using CUDA GPUs from virtual machines

virtual machines ≤ GPUs

slide-7
SLIDE 7

GPU Technology Conference 2017

7/33

  • GPU virtualization allows as many virtual machines as required to

share the GPU in the host

Using CUDA GPUs from virtual machines

slide-8
SLIDE 8

GPU Technology Conference 2017

8/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization for CUDA
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-9
SLIDE 9

GPU Technology Conference 2017

9/33

rCUDA … CUDA … they sound similar

slide-10
SLIDE 10

GPU Technology Conference 2017

10/33

Basics of GPU computing

GPU GPU

Basic behavior of CUDA

slide-11
SLIDE 11

GPU Technology Conference 2017

11/33

GPU GPU

Basics of GPU computing

slide-12
SLIDE 12

GPU Technology Conference 2017

12/33

rCUDA … remote CUDA

A software technology that enables a more flexible use of GPUs in computing facilities

No GPU No GPU

rCUDA is a development by Universitat Politècnica de València, Spain

slide-13
SLIDE 13

GPU Technology Conference 2017

13/33

Basics of rCUDA

rCUDA is a development by Universitat Politècnica de València, Spain

slide-14
SLIDE 14

GPU Technology Conference 2017

14/33

Basics of rCUDA

rCUDA is a development by Universitat Politècnica de València, Spain

slide-15
SLIDE 15

GPU Technology Conference 2017

15/33

Physical configuration Logical configuration

rCUDA GPU virtualization envision

 rCUDA allows a new vision of a GPU deployment, moving from

the usual cluster configuration: to the following one:

Interconnection Network

Network

GPU

PCIe

CPU CPU

RAM RAM Network

GPU

PCIe

CPU CPU

RAM RAM Network

GPU

PCIe

CPU CPU

RAM RAM Network

GPU

PCIe

CPU CPU

RAM RAM

node n node 2 node 3 node 1 RAM RAM RAM RAM

Logical connections

node n node 2 node 3 node 1

Interconnection Network

Network

CPU CPU

RAM RAM Network

CPU CPU

RAM RAM Network

CPU CPU

RAM RAM Network PCIe PCIe PCIe PCIe

CPU CPU

RAM RAM

GPU RAM GPU RAM GPU RAM GPU RAM

slide-16
SLIDE 16

GPU Technology Conference 2017

16/33

Performance of applications using rCUDA

  • Several applications executed with

CUDA and rCUDA

  • K20 GPU and FDR InfiniBand
  • K40 GPU and EDR InfiniBand

Lower is better

slide-17
SLIDE 17

GPU Technology Conference 2017

17/33

Performance of applications using rCUDA

EDR InfiniBand and P100 GPU CUDA-MEME BarraCUDA Lower is better Lower is better

slide-18
SLIDE 18

GPU Technology Conference 2017

18/33

Why the good performance of rCUDA?

The low overhead of applications using rCUDA is due to:

  • Data copies with rCUDA attaining higher bandwidth to the

remote GPU than CUDA does to the local GPU

  • Some internal synchronization mechanisms faster in rCUDA

than in CUDA

  • … a very careful implementation of the rCUDA framework …

“Ideas Are Easy, Implementation Is Hard”

Guy Kawasaki, marketing specialist and Silicon Valley venture capitalist

slide-19
SLIDE 19

GPU Technology Conference 2017

19/33

Example of performance with P2P copies

rCUDA model CUDA model

rCUDA scenario 1 rCUDA scenario 2

rCUDA provides the same semantics as CUDA

slide-20
SLIDE 20

GPU Technology Conference 2017

20/33

Higher is better

Example of performance with P2P copies

rCUDA scenario 2

slide-21
SLIDE 21

GPU Technology Conference 2017

21/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization for CUDA
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-22
SLIDE 22

GPU Technology Conference 2017

22/33

  • This configuration

allows the use of more than one GPU at the host

Using rCUDA to access the GPU

Low performance network fabric available

KVM KVM

  • In clusters where InfiniBand is not available, the rCUDA server

may be placed in the native domain and the rCUDA client would be placed inside the VMs

  • The virtual network provided by the

hypervisor would be used to exchange data between the rCUDA clients and the rCUDA server

slide-23
SLIDE 23

GPU Technology Conference 2017

23/33

Using rCUDA to access the GPU

High performance network fabric available

KVM KVM

  • If InfiniBand is

available, the rCUDA server can be placed in another node

  • Several GPUs can be provided to the

VMs, either in a single remote node or in several remote nodes

slide-24
SLIDE 24

GPU Technology Conference 2017

24/33

Application performance with KVM

LAMMPS CUDA-MEME CUDASW++ GPU-BLAST FDR InfiniBand + K20 !!

slide-25
SLIDE 25

GPU Technology Conference 2017

25/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization for CUDA
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-26
SLIDE 26

GPU Technology Conference 2017

26/33

CUDA approach

  • Let’s use a computer with two GPUs and four virtual machines:
  • Two virtual machines use one GPU each (PCI passthrough)
  • Two virtual machines must run applications on CPU
slide-27
SLIDE 27

GPU Technology Conference 2017

27/33

rCUDA approach

  • With rCUDA, the four virtual machines can share both GPUs. The

two GPUs can be either in the same host or in other computer

slide-28
SLIDE 28

GPU Technology Conference 2017

28/33

Performance comparison

  • Each of the 4 virtual machines execute as many instances as

possible of one of the 4 following applications:

  • LAMMPS (red color in the plot below)
  • NAMD (green)
  • GPU-Blast (blue)
  • Fluidsim (yellow)
  • For each experiment, applications are shifted across virtual

machines

Sharing GPUs among applications increases the overall amount of executed jobs

slide-29
SLIDE 29

GPU Technology Conference 2017

29/33

Outline

  • 1. Using CUDA GPUs from virtual machines
  • 2. rCUDA: GPU virtualization for CUDA
  • 3. Performance of rCUDA with one virtual machine
  • 4. Performance of rCUDA with several virtual machines
  • 5. Conclusions
slide-30
SLIDE 30

GPU Technology Conference 2017

30/33

Conclusions

  • rCUDA allows GPUs to be shared among several virtual machines
  • Applications do not need to be

modified in order to use rCUDA

  • Performance with rCUDA when

GPUs are not shared is not significantly reduced

  • Overall performance is

increased when GPUs are shared among virtual machines

slide-31
SLIDE 31

GPU Technology Conference 2017

31/33

Get a free copy of rCUDA at

http://www http://www.r .rcuda.net cuda.net

@rcuda_

More than 800 requests world wide

rCUDA is a development by Universitat Politècnica de València, Spain

slide-32
SLIDE 32

GPU Technology Conference 2017

32/33

Get a free copy of rCUDA at

http://www http://www.r .rcuda.net cuda.net

@rcuda_

More than 800 requests world wide

Jaime Sierra Pablo Higueras Carlos Reaño Javier Prades Tony Díaz

rCUDA is a development by Universitat Politècnica de València, Spain

slide-33
SLIDE 33

GPU Technology Conference 2017

33/33

Thanks! Questions?

rCUDA is a development by Universitat Politècnica de València, Spain