Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o - PowerPoint PPT Presentation

Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o – igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP)

Mul(ple Socket CPUs Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 2

Mul(ple Socket CPUs + Accelerators Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 3

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The General Concept of Accelerated Compu(ng Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 5

~ 30/40 GBytes Host Memory CPU 2. Launch Kernel 1. Copy Data 4. Copy Result GPU ~ 110/120 GByte 3. Execute Device Memory GPU kernel Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 6

Why Does GPU Accelerate Compu(ng? • Highly scalable design • Higher aggregate memory bandwidth • Huge number of low frequency cores • Higher aggregate computa(onal power • Massively parallel processors for data processing Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 7

SMX Processor & Warp Scheduler & Core

Why Does GPU Not Accelerate Compu(ng? • PCI Bus boAleneck • Synchroniza(on weakness • Extremely slow serialized execu(on • High complexity – SPMD(T) + SIMD & Memory Model • People forget about the Amdahl’s law – accelera(ng only the 50% of the original code, the expected speedup can get at most a value of 2!! Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 9

What is CUDA? • NVIDIA compute architecture • Software development capability provided free of charge by NVIDIA • C and C++ programming language extension that simplifies creation of efficient applications for CUDA- enabled GPGPUs • Available for Linux, Windows and Mac OS X Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 10

#define N (2048 * 2048) #define THREADS_PER_BLOCK 512 int main( void ) { int *a, *b, *c; // host copies of a, b, c int *dev_a, *dev_b, *dev_c; // device copies of a, b, c int size = N * sizeof( int ); // we need space for N integers // allocate device copies of a, b, c cudaMalloc( (void**)&dev_a, size ); cudaMalloc( (void**)&dev_b, size ); cudaMalloc( (void**)&dev_c, size ); a = (int*)malloc( size ); b = (int*)malloc( size ); c = (int*)malloc( size ); random_ints( a, N ); random_ints( b, N ); // copy inputs to device cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice ); cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice ); // launch add() kernel with blocks and threads add<<< N/THREADS_PER_BLOCK, THREADS_PER_BLOCK >>>(dev_a, dev_b, dev_c); // copy device result back to host copy of c cudaMemcpy( c, dev_c, size, cudaMemcpyDeviceToHost ); free( a ); free( b ); free( c ); cudaFree( dev_a ); cudaFree( dev_b ); cudaFree( dev_c ); return 0; } Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 11

Direc(ve Based Approaches: OpenACC • Implementa(ons available now from PGI, Cray, and GCC • Same source can be used to generate code for CPU and GPU • Easier development • Less flexibility Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 12

#include <stdio.h> #include <stdlib.h> #include <math.h> int main( int argc, char* argv[] ) { int n = 10000; double *restrict a; double *restrict b; double *restrict c; size_t bytes = n*sizeof(double); a = (double*)malloc(bytes); b = (double*)malloc(bytes); c = (double*)malloc(bytes); // Initialize content of input vectors, vector a[i] = sin(i)^2 vector b[i] = cos(i)^2 int i; for(i=0; i<n; i++) { a[i] = sin(i)*sin(i); b[i] = cos(i)*cos(i); } // sum component wise and save result into vector c #pragma acc kernels copyin(a[0:n],b[0:n]), copyout(c[0:n]) for(i=0; i<n; i++) { c[i] = a[i] + b[i]; } free(a); free(b); free(c); return 0; } Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 13

Direc(ve Based Approaches: OpenMP • The API V4.5 describes OpenMP pragma for GPUs • At the moment IBM is the only main compiler suppor(ng it (see hAp://www.openmp.org/ resources/openmp-compilers/) • Ideally works with same model of OpenACC Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 14

CUDA Fortran • PGI / NVIDIA collabora(on • Same CUDA programming model as CUDA-C with Fortran syntax • Variables with device-type reside in GPUmemory • Use standard allocate, deallocate • Copy between CPU and GPU with assignment statements: GPU_array = CPU_array • Kernel loop direc(ves (CUF Kernels) to parallelize loops with device data Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 15

CPU & GPU ~ 8 GBytes The Intel Xeon E5-2665 Sandy Bridge-EP 2.4GHz Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 16

GPUs planorms for HPC • Deploy balanced and cost effec(ve GPUs based planorms is s(ll really hard these days • Management, usage and SW development for add on accelerated planorm requires skills and exper(se • The NVLINK promises delivers high bandwidth between GPUs but only IBM supports NVILINK connec(on GPU/CPU • General purpose high-density GPU based solu(on are limited to specific cases Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 19

GPU SW Development and Applica(ons • GPU based technology planorms evolve rapidly • New features are oqen disrup(ve and requires effort for soqware op(miza(on • Efficient GPU code requires constant update and maintenance (today really much true for CPU SW too) • Few remarks on GPU based SW for scien(fic compu(ng Overview on GPU Accelerators and Programming Paradigms Ivan GiroAo igiroAo@ictp.it 20

Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o - PowerPoint PPT Presentation

Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket CPUs Overview on GPU Accelerators

Application Accelerators: Application Accelerators: Application Accelerators: Application

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

One of three programming paradigms We can identify three paradigms: functional programming,

Trees CoSc 450: Programming Paradigms 08 The definition of a tree CoSc 450: Programming

Lists CoSc 450: Programming Paradigms 07 The definition of a list CoSc 450: Programming

Orders of Growth and Tree Recursion CoSc 450: Programming Paradigms 04 Graphics primitive

Higher-Order Procedures CoSc 450: Programming Paradigms 05 In the functional paradigm,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Pithos: Experience and Lessons http://pithos.grnet.gr Panos Louridas, GRNET louridas@grnet.gr

Benchmarks are Hard What do we measure? How do we measure it? How do we verify our

Compressing IP Forwarding Tables for Fun and Profit Gbor Rtvri, Zoltn Cserntony, Attila

Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE ALICE-USA, May 2013, CERN

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo

Mat 2345 Bases Integers & Computers Linear Week 8 Combos Induction Proofs Fall 2013

M obius Functions of Posets V: GCD Matrices Bruce Sagan Department of Mathematics Michigan

Polynomial vs. Exponential I Big difference n 3 : n = 1000 10 9 2 n : n = 1000 2 1000 = 10

Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o - PowerPoint PPT Presentation

Overview on GPU Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket CPUs Overview on GPU Accelerators

Application Accelerators: Application Accelerators: Application Accelerators: Application

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Confidential Accelerators Stavros Volos Microsoft Research Accelerators Play Pivotal Role in

One of three programming paradigms We can identify three paradigms: functional programming,

Trees CoSc 450: Programming Paradigms 08 The definition of a tree CoSc 450: Programming

Lists CoSc 450: Programming Paradigms 07 The definition of a list CoSc 450: Programming

Orders of Growth and Tree Recursion CoSc 450: Programming Paradigms 04 Graphics primitive

Higher-Order Procedures CoSc 450: Programming Paradigms 05 In the functional paradigm,

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Pithos: Experience and Lessons http://pithos.grnet.gr Panos Louridas, GRNET louridas@grnet.gr

Benchmarks are Hard What do we measure? How do we measure it? How do we verify our

Compressing IP Forwarding Tables for Fun and Profit Gbor Rtvri, Zoltn Cserntony, Attila

Predrag BUNCIC, Thorsten KOLLEGER &amp; Pierre VANDE VYVRE ALICE-USA, May 2013, CERN

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo

Mat 2345 Bases Integers &amp; Computers Linear Week 8 Combos Induction Proofs Fall 2013

M obius Functions of Posets V: GCD Matrices Bruce Sagan Department of Mathematics Michigan

Polynomial vs. Exponential I Big difference n 3 : n = 1000 10 9 2 n : n = 1000 2 1000 = 10

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE ALICE-USA, May 2013, CERN

Mat 2345 Bases Integers & Computers Linear Week 8 Combos Induction Proofs Fall 2013