OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools One - PowerPoint PPT Presentation

OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools

One Slide Intro to OpenACC Directives Manage #pragma acc data copyin(x[0:n],y[0:n]) copyout(z[0:n]) Incremental Data { Movement ... Single source #pragma acc parallel { Interoperable Initiate #pragma acc loop gang vector Parallel for (i = 0; i < n; ++i) { Performance portable Execution z[i] = x[i] + y[i]; ... CPU, GPU, MIC } Optimize } Loop ... Mappings }

OpenACC 2.0 Highlights acc routine acc loop tile acc enter / exit data default(none) acc atomic device_type(...) acc wait async new API routines acc parallel wait() many clarifications

OpenACC 2.0 acc routine #pragma acc routine vector float dotprod( float* a, float* b, int n ){ float sum = 0.0f; #pragma acc loop vector reduction(+:sum) for( int i = 0; i < n; ++ i sum += a[i]*b[i]; return sum; }

OpenACC 2.0 acc enter data and acc exit data template<typename T>class v{ T* _data; size_t _size; ... move_to_device(){ #pragma acc enter data copyin(this, \ _data[0:_size]) } update_host(){ #pragma acc update self(_data[0:_size]) }...

OpenACC 2.0 acc atomic #pragma acc parallel loop for( i = 0; i < n; ++i ){ x = index[i]; #pragma acc atomic update hist[x]++; }

OpenACC 2.0 acc wait async #pragma acc parallel loop async(1) for(...){...} #pragma acc parallel loop async(2) for(...){...} #pragma acc wait(1) async(2) #pragma acc parallel loop async(2) wait(1) for(...){...}

PGI 2015 Additions C++ class data member in OpenACC data clauses template<typename T> class myvect{ T* _data; size_t _size; public: // ... void dev_create(){ #pragma acc enter data copyin(this) #pragma acc enter data copyin(_data[0:_size]) } void host_update(){ #pragma acc update self(_data[0:_size]) }

PGI 2015 Additions Managed Memory Support (beta feature) pgc++ -ta=tesla:managed malloc, calloc, free, new, delete, allocatable replaced with managed allocate/free limitations

PGI 2015 OpenACC Performance – NIM (NOAA) All times measured on a K20x not including data transfers from host memory to device memory VDMINTV VDMINTS FLUX 25000 35000 4500 Microseconds 4000 30000 20000 3500 25000 3000 15000 20000 2500 2000 15000 10000 1500 10000 1000 5000 5000 500 0 0 0 F2C-ACC PGI 2014 PGI 2015 F2C-ACC PGI 2014 PGI 2015 F2C-ACC PGI 2014 PGI 2015 PGI 2015 OpenACC Performance Enhancements:  !$ACC CACHE directive  Variable length VECTOR support  Scalar replacement optimizations  Short loop optimizations

OpenACC 2.5 (in design) acc data copy(x) == present_or_copy(x) #pragma acc data present_or_copy(x[0:n]) copy(b[0:n]) {....}

OpenACC 2.5 (in design) acc declare(allocatable) module m real, allocatable :: a(:,:) !$acc declare create(a) end module subroutine init(n) use m allocate(a(n,n)) ...

OpenACC 2.5 (in design) default(present) #pragma acc parallel loop default(present) for( i = 0; i < n; ++i ) a[i] = fexpf(b[i]) * cosf(c[i]);

OpenACC 3.0 (in design) Deep Copy – Data Structure Management template<typename T>class reactor{ class magnet* m; class laser* l; class coolant* c; class steampipe* s; }; ... class reactor R; #pragma acc enter data copyin(R)

Future of OpenACC On Future Supercomputers High Performance Highly Parallel $ $ $ $ $ $ Descriptive $ $ $ $ $ $ Shared Cache Performance Portable $ $ $ $ $ $ $ $ Data Management Shared Cache High Capacity Parallelism Management Memory PGI Commitment High Bandwidth Memory

OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools One - PowerPoint PPT Presentation

OpenACC 2.0 and Beyond PGI Accelerator Compilers and Tools One Slide Intro to OpenACC Directives Manage #pragma acc data copyin(x[0:n],y[0:n]) copyout(z[0:n]) Incremental Data { Movement ... Single source #pragma acc parallel {

ADVANCED OPENACC PROGRAMMING JEFF LARKIN, NVIDIA DEVELOPER TECHNOLOGIES AGENDA OpenACC Review

L8179 ZERO TO GPU HERO WITH OPENACC Jeff Larkin, GTC 2019, March 2019 OUTLINE Topics to be

GPU COMPUTING WITH OPENACC 3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC

OpenACC Birgitte Bryds HPC2N, Ume a University 12 December 2017 1 / 27 OpenACC Overview

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray

S7546 Multi-GPU Programming with OpenACC Jeff Larkin, May 9, 2017, GTC17 Multi-GPU

S6540 High-Accuracy Quantum Chemistry Need for Speed: Accelerating High-Accuracy using OpenACC

COMPARING OPENACC AND OPENMP PERFORMANCE AND PROGRAMMABILITY JEFF LARKIN, NVIDIA GUIDO

HIGH PERFORMANCE AND PRODUCTIVITY WITH UNIFIED MEMORY AND OPENACC: A LBM CASE STUDY Jiri Kraus,

NEW GPU FUNCTIONALITY IN VASP WITH OPENACC AND CUDA LIBRARIES Stefan Maintz, 2019/12/18 AGENDA

S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16

An Extension of OpenACC Directives for Out-of-Core Stencil Computation with Temporal Blocking

INTRODUCTION TO COMPILER DIRECTIVES WITH OPENACC JEFF LARKIN, NVIDIA DEVELOPER TECHNOLOGIES

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

PORTING VASP TO GPUS WITH OPENACC Stefan Maintz, Dr. Markus Wetzstein 03/26/2018

The influence of social deprivation on length of hospitalization Engin YILMAZ Denis RAYNAUD The

Georgian Leasing Company Bond presentation www.leasing.ge Key features of the bond Issuer

@CWCHAMPIONS @CWCHAMPIONS #ChampsNov www.coventry-warwickshire.co.uk @CWCHAMPIONS @CWCHAMPIONS

Kepler Cheuvreux 19 th Swiss Seminar 22 nd March 2018 El Gouna, Egypt ODH at a glance The Turn

BRIGHAM CITY I-15 SB North of 1100 South Project Type ATMS or ITS Install Variable Message

IDN Variant TLD Implementation Status, Recommendations and Next Steps GNSO Council 18 April 2019

Density Functional Theory: Foundations and Possibilities Alexander F. Krutov Samara Center for

PROJECTING ELASTOMERIC SHOCK ABSORBERS WITH ADJUSTABLE STIFFNES. V. Gonca, Y. Shvab Riga