OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and - PowerPoint PPT Presentation

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and Bronis de Supinski LLNL LLNL-PRES-767542 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE- AC52-07NA27344. Lawrence Livermore National Security, LLC

OpenMP 5.0 was ratified in November n Addressed several major open issues for OpenMP n Did not break (most?) existing code à One possible issue: nonmonotonic default n Includes 293 passed tickets: lots of new changes Lawrence Livermore National Laboratory 2 LLNL-PRES-767542

Major new features in OpenMP 5.0 n Significant extensions to improve usability and offload flexibility à OpenMP contexts, metadirective and declare variant à Addition of requires directive, including support for unified shared memory à Memory allocators and support for deep memory hierarchies à Descriptive loop construct à Release/acquire semantics added to memory model n Host extensions that sometimes help à Ability to quiesce OpenMP threads à Support to print/inspect affinity state à Support for C/C++ array shaping n First (OMPT) and third (OMPD) party tool support Lawrence Livermore National Laboratory 3 LLNL-PRES-767542

Major new features in OpenMP 5.0 n Some significant extensions to existing functionality à Verbosity reducing changes such as implicit declare target directives à User defined mappers provide deep copy support for map clauses à Support for reverse offload à Support for task reductions , including on taskloop construct, task affinity, new dependence types, depend objects and detachable tasks à Allows teams construct outside of target (i.e., on host) à Supports collapse of non-rectangular loops à Scan extension of reductions n Major advances for base language normative references à Completed support for Fortran 2003 à Added Fortran 2008, C11, C++11, C++14 and C++17 Lawrence Livermore National Laboratory 4 LLNL-PRES-767542

Clarifications and minor enhancements n Supports collapse of imperfectly nested loops n Supports != on C/C++ loops, and range for for(auto &x:range) n Adds conditional modifier to lastprivate n Support use of any C/C++ lvalue in depend clauses n Permits declare target on C++ classes with virtual members n Clarification of declare target C++ initializations n Adds task modifier on many reduction clauses n Adds depend clause to taskwait construct Lawrence Livermore National Laboratory 5 LLNL-PRES-767542

typedef struct mypoints { int len; double *needed_data; An OpenMP 4 example double useless_data[500000]; } mypoints_t; § Heterogeneous #pragma omp declare target int do_something_with_p(mypoints_t &p_ref); programming requires map #pragma omp end declare target clauses to transfer (ownership of) data mypoints_t * p = new_array_of_mypoints_t(N); to target devices #pragma omp target enter data map(p[0:N]) for(int i=0; i<N; ++i){ #pragma omp target enter data \ § map can’t provide deep map(p[i].needed_data[0:p[i].len]) } copy on a single construct #pragma omp target // can’t express map here { § No support for unified do_something_with_p(*p); } memory in portable code Lawrence Livermore National Laboratory 6 LLNL-PRES-767542

typedef struct mypoints { int len; double *needed_data; The requires Construct double useless_data[500000]; } mypoints_t; § Informs the compiler #pragma omp declare target that the code requires int do_something_with_p(mypoints_t &p_ref); #pragma omp end declare target an optional feature or setting to work #pragma omp requires unified_shared_memory mypoints_t * p = new_array_of_mypoints_t(N); § OpenMP 5.0 adds #pragma omp target // no map clauses needed the requires construct so { that a program do_something_with_p(*p); } can declare that it assumes shared memory between devices Lawrence Livermore National Laboratory 7 LLNL-PRES-767542

typedef struct mypoints { int len; double *needed_data; Implicit declare target double useless_data[500000]; } mypoints_t; § Heterogeneous programming // no declare target needed requires compiler to generate int do_something_with_p(mypoints_t &p_ref); versions of functions for the devices on which they will execute #pragma omp requires unified_shared_memory mypoints_t * p = new_array_of_mypoints_t(N); § Generally requires the programmer to inform compiler of the devices on which the functions will execute #pragma omp target // no map clauses needed { do_something_with_p(*p); § OpenMP 5.0 requires the compiler } to assume device versions exist and to generate them when it can “see” the definition and a use on the device Lawrence Livermore National Laboratory 8 LLNL-PRES-767542

Deep Copy with typedef struct mypoints { int len; double *needed_data; declare mapper double useless_data[500000]; } mypoints_t; § Not all devices support // no declare target needed shared memory so requiring it int do_something_with_p(mypoints_t *p); makes a program less portable #pragma omp declare mapper(mypoints_t v)\ map(v.len, v.needed_data, \ § Painstaking care was required v.needed_data[0:v.len]) to map complex data before 5.0 mypoints_t * p = new_array_of_mypoints_t(N); § OpenMP 5.0 adds deep #pragma omp target map(p[:N]) copy support so that programmer { can ensure that compiler do_something_with_p(p); correctly maps complex (pointer- } based) data Lawrence Livermore National Laboratory 9 LLNL-PRES-767542

Reverse Offload § Why only offload from host to device? § Why pessimize every launch when you only sometimes need to go back to the host? Lawrence Livermore National Laboratory 10 LLNL-PRES-767542

Reverse Offload #pragma omp requires reverse_offload #pragma omp target map(inout: data[0:N]) { do_something_offloaded(data); #pragma omp target device(ancestor: 1) printf("back on the host right now\n"); do_something_after_print_completes(); #pragma omp target device(ancestor: 1)\ map(inout: data[0:N]) MPI_Isend(... data ...); do_more_work_after_MPI(); } Lawrence Livermore National Laboratory 11 LLNL-PRES-767542

Reverse Offload: take care! #pragma omp requires reverse_offload #pragma omp target teams parallel num_teams(T) num_threads(N) { #pragma omp target device(ancestor: 1) printf("back on the host right now\n"); // called N*T times on the host, probably serially! } Lawrence Livermore National Laboratory 12 LLNL-PRES-767542

Execution Contexts § Context describes lexical “scope” of an OpenMP construct and it’s lexical nesting in other OpenMP constructs: // context = {} #pragma omp target teams { // context = {target, teams} #pragma omp parallel { // context = {target, teams, parallel} #pragma omp simd aligned(a:64) for (...) { // context = {target, teams, parallel, simd(aligned(a:64), simdlen(8), notinbranch) } foo(a); }}} § Contexts also apply to metadirective Lawrence Livermore National Laboratory 13 LLNL-PRES-767542

Meta directive The directive directive n Started life many years, at least 5, ago as the super_if n Especially important now that we have target constructs n A metadirective is a directive that can specify multiple directive variants of which one may be conditionally selected to replace the metadirective based on the enclosing OpenMP context. #pragma omp metadirective \ when(device={kind(gpu)}: parallel for)\ default( target teams distribute parallel for ) for (i= lb; i< ub; i++) v3[i] = v1[i] * v2[i]; ... Lawrence Livermore National Laboratory 14 LLNL-PRES-767542

Meta directive The directive directive n Started life many years, at least 5, ago as the super_if n Especially important now that we have target constructs n A metadirective is a directive that can specify multiple directive variants of which one may be conditionally selected to replace the metadirective based on the enclosing OpenMP context. When compiling to #pragma omp target teams distribute be called on a gpu for (i= lb; i< ub; i++) v3[i] = v1[i] * v2[i]; ... Lawrence Livermore National Laboratory 15 LLNL-PRES-767542

Meta directive The directive directive n Started life many years, at least 5, ago as the super_if n Especially important now that we have target constructs n A metadirective is a directive that can specify multiple directive variants of which one may be conditionally selected to replace the metadirective based on the enclosing OpenMP context. When compiling for a anything that is not a gpu! #pragma omp target teams distribute parallel for for (i= lb; i< ub; i++) v3[i] = v1[i] * v2[i]; ... Lawrence Livermore National Laboratory 16 LLNL-PRES-767542

Meta directive The directive directive n Started life many years, at least 5, ago as the super_if n Especially important now that we have target constructs n A metadirective is a directive that can specify multiple directive variants of which one may be conditionally selected to replace the metadirective based on the enclosing OpenMP context. When compiling for both #pragma omp target teams distribute parallel for #pragma omp parallel for for (i= lb; i< ub; i++) for (i= lb; i< ub; i++) v3[i] = v1[i] * v2[i]; v3[i] = v1[i] * v2[i]; ... ... Lawrence Livermore National Laboratory 17 LLNL-PRES-767542

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and - PowerPoint PPT Presentation

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and Bronis de Supinski LLNL LLNL-PRES-767542 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Application Accelerators: Application Accelerators: Application Accelerators: Application

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of

TAKING A CLOSER LOOK AT IRATI Supervisors: Koen Veelenturf Marijke Kaat - SURFnet

123 194 O. Noroozi et al. The separate effects of interactive digital learning materials (IDLMs)

Migrating Speedment to Java 9 Dan Lawesson , @dan_lawesson CSO, Speedment, Inc About Us About

BOOM Analytics: Exploring Data-Centric,Declarative Programming for the Cloud Jadwiga Kaska 21

Ad-hoc Shared State for Web Applications Jack Jansen <ajn350@student.vu.nl> Introduction

Best Practice in Teaching Children Who Are Non-Verbal Teaching, Planning, Target Setting and

Ellidiss Technologies w w w . e l l i d i s s . c o m Ellidiss Technologies Model Processing

Sambuz

Useful Links

Newsletter

Mail Us

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and - PowerPoint PPT Presentation

OpenMP 5.0 for Accelerators and What Comes Next Tom Scogland and Bronis de Supinski LLNL LLNL-PRES-767542 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Application Accelerators: Application Accelerators: Application Accelerators: Application

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch

DETECTORS AND ACCELERATORS DETECTORS AND ACCELERATORS APPLIED TO MEDICINE Jos Bernabu Jos

R265: Advanced Topics in Computer Architecture Seminar 7: HW accelerators and accelerators for

Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of

TAKING A CLOSER LOOK AT IRATI Supervisors: Koen Veelenturf Marijke Kaat - SURFnet

123 194 O. Noroozi et al. The separate effects of interactive digital learning materials (IDLMs)

Migrating Speedment to Java 9 Dan Lawesson , @dan_lawesson CSO, Speedment, Inc About Us About

BOOM Analytics: Exploring Data-Centric,Declarative Programming for the Cloud Jadwiga Kaska 21

Ad-hoc Shared State for Web Applications Jack Jansen &lt;ajn350@student.vu.nl&gt; Introduction

Best Practice in Teaching Children Who Are Non-Verbal Teaching, Planning, Target Setting and

Ellidiss Technologies w w w . e l l i d i s s . c o m Ellidiss Technologies Model Processing

Sambuz

Useful Links

Newsletter

Mail Us

Ad-hoc Shared State for Web Applications Jack Jansen <ajn350@student.vu.nl> Introduction