Boostin Boosting g Perf erfor ormance mance and Ear and - PowerPoint PPT Presentation

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politècnica de València Spain

Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 2/33

Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 3/33

Using CUDA GPUs from virtual machines • How to access the GPU in the native domain from the inside of a virtual machine? GPU Technology Conference 2017 4/33

Using CUDA GPUs from virtual machines • The PCI passthrough technique can be used to assign the GPU to a virtual machine • However … the GPU is assigned in an exclusive way • Concurrent usage of the GPU is not possible GPU Technology Conference 2017 5/33

Using CUDA GPUs from virtual machines • … the amount of virtual machines using CUDA acceleration cannot be larger than the amount of GPUs present in the host virtual machines ≤ GPUs GPU Technology Conference 2017 6/33

Using CUDA GPUs from virtual machines • GPU virtualization allows as many virtual machines as required to share the GPU in the host GPU Technology Conference 2017 7/33

Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization for CUDA 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 8/33

rCUDA … CUDA … they sound similar GPU Technology Conference 2017 9/33

Basics of GPU computing Basic behavior of CUDA GPU GPU GPU Technology Conference 2017 10/33

Basics of GPU computing GPU GPU GPU Technology Conference 2017 11/33

rCUDA … remote CUDA A software technology that enables a more flexible use of GPUs in computing facilities No GPU No GPU rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 12/33

Basics of rCUDA rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 13/33

Basics of rCUDA rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 14/33

rCUDA GPU virtualization envision  rCUDA allows a new vision of a GPU deployment, moving from the usual cluster configuration: node 1 node 2 node 3 node n GPU RAM GPU RAM GPU RAM GPU RAM Physical CPU RAM CPU RAM CPU RAM CPU RAM PCIe PCIe PCIe PCIe configuration CPU CPU CPU CPU RAM RAM RAM RAM Network Network Network Network Interconnection Network GPU RAM GPU RAM GPU RAM GPU RAM Logical to the following one: connections node 1 node 2 node 3 node n CPU CPU CPU CPU RAM RAM RAM RAM PCIe PCIe PCIe PCIe CPU CPU CPU CPU RAM RAM RAM RAM Logical Network Network Network Network configuration Interconnection Network GPU Technology Conference 2017 15/33

Performance of applications using rCUDA • Several applications executed with CUDA and rCUDA • K20 GPU and FDR InfiniBand • K40 GPU and EDR InfiniBand Lower is better GPU Technology Conference 2017 16/33

Performance of applications using rCUDA Lower EDR InfiniBand and P100 GPU is better BarraCUDA CUDA-MEME Lower is better GPU Technology Conference 2017 17/33

Why the good performance of rCUDA? The low overhead of applications using rCUDA is due to: • Data copies with rCUDA attaining higher bandwidth to the remote GPU than CUDA does to the local GPU • Some internal synchronization mechanisms faster in rCUDA than in CUDA • … a very careful implementation of the rCUDA framework … “Ideas Are Easy, Implementation Is Hard ” Guy Kawasaki, marketing specialist and Silicon Valley venture capitalist GPU Technology Conference 2017 18/33

Example of performance with P2P copies CUDA rCUDA model model rCUDA scenario 1 rCUDA provides the same semantics as CUDA rCUDA scenario 2 GPU Technology Conference 2017 19/33

Example of performance with P2P copies rCUDA scenario 2 Higher is better GPU Technology Conference 2017 20/33

Using rCUDA to access the GPU • In clusters where InfiniBand is not available, the rCUDA server may be placed in the native domain and the rCUDA client would be placed inside the VMs • The virtual network provided by the hypervisor would be used to exchange data between the rCUDA clients and the Low performance rCUDA server network fabric available • This configuration allows the use of more than one GPU at the host KVM KVM GPU Technology Conference 2017 22/33

Using rCUDA to access the GPU High performance network fabric available • If InfiniBand is available, the rCUDA server can be placed in another node • Several GPUs can be provided to the VMs, either in a single remote node or in KVM KVM several remote nodes GPU Technology Conference 2017 23/33

Application performance with KVM FDR InfiniBand + K20 !! LAMMPS CUDA-MEME CUDASW++ GPU-BLAST GPU Technology Conference 2017 24/33

CUDA approach • Let’s use a computer with two GPUs and four virtual machines: • Two virtual machines use one GPU each (PCI passthrough) • Two virtual machines must run applications on CPU GPU Technology Conference 2017 26/33

rCUDA approach • With rCUDA, the four virtual machines can share both GPUs. The two GPUs can be either in the same host or in other computer GPU Technology Conference 2017 27/33

Performance comparison • Each of the 4 virtual machines execute as many instances as possible of one of the 4 following applications: • LAMMPS (red color in the plot below) • NAMD (green) • GPU-Blast (blue) • Fluidsim (yellow) • For each experiment, applications are shifted across virtual machines Sharing GPUs among applications increases the overall amount of executed jobs GPU Technology Conference 2017 28/33

Conclusions • rCUDA allows GPUs to be shared among several virtual machines • Applications do not need to be modified in order to use rCUDA • Performance with rCUDA when GPUs are not shared is not significantly reduced • Overall performance is increased when GPUs are shared among virtual machines GPU Technology Conference 2017 30/33

Get a free copy of rCUDA at http://www http://www.r .rcuda.net cuda.net More than 800 requests world wide @rcuda_ rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 31/33

Get a free copy of rCUDA at http://www http://www.r .rcuda.net cuda.net More than 800 requests world wide @rcuda_ Javier Prades Jaime Sierra Tony Díaz Pablo Higueras Carlos Reaño rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 32/33

Thanks! Questions? rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 33/33

Boostin Boosting g Perf erfor ormance mance and Ear and - PowerPoint PPT Presentation

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politcnica de Valncia Spain Outline 1.

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Lecture 17: Boosting CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO

Working together with care Introduction We are extremely proud to have built up a reputation for

Plans for MICE at RAL Paul Drumm WG3 Fact03 paul drumm, mutac jan 2003 1 MICE MICE

What was tested and how? Results and certifications? What is RAL? The letters RAL

About Us 03 Our Services 08 Our Projects 11 Contact Us 31 We understand our AEC embraces

Displayandusageof InternationalizedRegistrationData RamMohan

Elementary School 2015-2016 Welcome to Eden Hall Upper Elementary School Steven M. Smith,

Ohio Department of Transportation Jo John n R. K R. Kasic ich, , Governor Jerry Wray,

ODOT Region 1 I-5: Interstate Bridge Trunnion Replacement Project During Closure

Boostin Boosting g Perf erfor ormance mance and Ear and - PowerPoint PPT Presentation

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politcnica de Valncia Spain Outline 1.

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib &amp; Torsten

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

STK-IN4300 Statistical Learning Methods in Data Science Likelihood-based Boosting introduction

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

Lecture 17: Boosting CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO

Working together with care Introduction We are extremely proud to have built up a reputation for

Plans for MICE at RAL Paul Drumm WG3 Fact03 paul drumm, mutac jan 2003 1 MICE MICE

What was tested and how? Results and certifications? What is RAL? The letters RAL

About Us 03 Our Services 08 Our Projects 11 Contact Us 31 We understand our AEC embraces

Displayandusageof InternationalizedRegistrationData RamMohan

Elementary School 2015-2016 Welcome to Eden Hall Upper Elementary School Steven M. Smith,

Ohio Department of Transportation Jo John n R. K R. Kasic ich, , Governor Jerry Wray,

ODOT Region 1 I-5: Interstate Bridge Trunnion Replacement Project During Closure

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten