Directive-Based Programming with OpenMP Shared Memory Programming - PowerPoint PPT Presentation

Directive-Based Programming with OpenMP

Shared Memory Programming • Explicit thread creation (pthreads): pthread t thread; pthread create(&thread, &attr, some_function, (void ∗ ) &result); … pthread join(thread, NULL); • Tasks (C++ 11): auto handle = std::async(std::launch::async, some_code, ...); auto result = handle.get(); • Kernels (CUDA): __global__ void someKernel(...) { int idx = blockIdx.x*blockDim.x + threadIdx.x // execute code for given index } 23

OpenMP • API for shared memory parallel programming targeting Fortran, C and C++ • specifications maintained by OpenMP Architecture Review Board (ARB) • members include AMD, ARM, Cray, Intel, Fujitsu, IBM, NVIDIA – versions 1.0 (Fortran ’97, C ’98) - 3.1 (2011) shared memory – 4.0 (2013) accelerators, NUMA – 4.5 (2015) improved memory mapping, SIMD – 5.0 (2018) improved accelerator support 25

OpenMP • comprises compiler directives, library routines and environment variables – C directives (case sensitive) #pragma omp directive-name [clause-list ] – library calls begin with omp void omp_set_num_threads(int nthreads); – environment variables begin with OMP export OMP_NUM_THREADS=4 • requires compiler support – activated via -fopenmp (gcc/clang) or -openmp (icc) compiler flags 26

The parallel Directive • OpenMP uses a fork/join model: programs execute serially until they encounter a parallel directive: – this creates a group of threads – number of threads is dependent on the OMP_NUM_THREADS environment variable or set via function call, e.g. omp_set_num_threads(nthreads) – main thread becomes the master thread, with thread id of 0 #pragma omp parallel [clause-list] { /*structured block*/ } • each thread executes the structured block 27

Fork/Join in OpenMP • Conceptually, threads are created and destroyed for each parallel region; in practice, usually implemented as a thread pool A1, Fork join, CC BY 3.0 28

Parallel Directive: Clauses Clauses are used to specify: • conditional parallelization: to determine if the parallel construct results in creation/use of threads if ( scalar-expression ) • degree of concurrency: explicit specification of the number of threads created/used num threads( integer-expression ) • data handling: to indicate if specific variables are local to the thread (allocated on the thread’s stack), global, or ‘special’ private( variable-list ) shared( variable-list ) firstprivate( variable-list ) default(shared j none) 29

Compiler Translation: OpenMP to Pthreads • OpenMP code main() { int a, b; // serial segment # pragma omp parallel num_threads(8) private(a) shared(b) { /* parallel segment */ } // rest of serial segment } • Pthreads equivalent (structured block is outlined ) main() { int a, b; // serial segment for (i=0; i<8; i++) pthread_create (..... , internal_thunk ,...); for (i=0; i<8; i++) pthread_join (........); // rest of serial segment } void *internal_thunk(void *packaged argument) { int a; /* parallel segment */ } 30

Parallel Directive Examples # pragma omp parallel if (is_parallel == 1) num_threads(8) \ private(a) shared(b) firstprivate(c) • if the value of variable is_parallel is one, eight threads are used • each thread has private copy of a and c , but all share one copy of b • the value of each private copy of c is initialized to value of c before the parallel region # pragma omp parallel reduction(+ : sum) num_threads(8) \ default(private) • eight threads get a copy of the variable sum • when threads exit, the values of these local copies are accumulated into the sum variable on the master thread – other reduction operations include *, -, &, |, ^, &&, || • all variables are private unless otherwise specified 31

Example: Computing Pi compute π by generating random points in square of side length 2 centred at (0,0), and counting points falling within circle of radius 1 – area of square = 4, area of circle: π r 2 = π – ratio of points in circle to outside approaches π /4 # pragma omp parallel private(i) shared(npoints) \ reduction(+ : sum) num_threads(8) { int seed = omp_get_thread_num(); // private num threads = omp_get_num_threads(); sum = 0; for (i = 0; i < npoints / num_threads; i++) { rand_x = ( double ) rand_range (& seed , -1, 1); Jirah, Monte-Carlo01, CC BY-SA 3.0 rand_y = ( double ) rand_range (& seed , -1, 1); if (( rand_x * rand_x + rand_y * rand_y ) <= 1.0) sum ++; } } 32

The for Work-Sharing Directive • use with the parallel directive to partition a subsequent for loop # pragma omp parallel shared(npoints) \ reduction (+: sum) num_threads(8) { int seed = omp_get_thread_num(); sum = 0; # pragma omp for for (i = 0; i < npoints ; i++) { rand x = (double) rand_range(& seed, -1, 1); rand y = (double) rand_range(& seed, -1, 1); if (( rand_x * rand_x + rand_y * rand_y ) <= 1.0) sum++; } } – the loop index ( i ) is assumed to be private – only two directives plus sequential code (code is easy to read/maintain) • implicit synchronization at the end of the loop – can add a nowait clause to prevent this • it is common to merge the directives: #pragma omp parallel for ... 33

Assigning Iterations to Threads • the schedule clause of the for directive assigns iterations to threads • schedule(static[, chunk-size ]) – splits the iteration space into chunks of size chunk-size and allocates to threads in round-robin fashion – if chunk size is unspecified, number of chunks equals number of threads • schedule(dynamic[, chunk-size ]) – iteration space is split into chunk-size blocks scheduled dynamically • schedule(guided[, chunk-size ]) – chunk size decreases exponentially with iterations to a minimum of chunk-size • schedule(runtime) – determine scheduling based on setting of the OMP_SCHEDULE environment variable 34

Synchronization in OpenMP barrier : each thread waits until others arrive ( nowait : skip barrier) • single : executed by one thread only • #pragma omp parallel { // my part of computation #pragma omp single { /* executed by one thread */ } #pragma omp barrier #pragma omp for nowait for (i=0; i<N; i++) { // data parallel part } // threads continue here automatically } 35

Directive-Based Programming with OpenMP Shared Memory Programming - PowerPoint PPT Presentation

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation (pthreads): pthread t thread; pthread create(&thread, &attr, some_function, (void ) &result); pthread join(thread, NULL);

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Programming Shared-memory Platforms with OpenMP Xu Liu Topics for Today Introduction to

Shared Memory Programming with OpenMP Lecture 7: Further topics Nested parallelism Unlike

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

GPU COMPUTING WITH OPENACC 3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC

The 2018 LVD Guide What suppliers of electrical equipment to the EU need to know C&R

RADIO LOCKDOWN DIRECTIVE Major Threat for Free Software on Radio Devices Max Mehl Coordinator

Introduction to Parallel Computing George Karypis Programming Shared Address Space Platforms

Protocol on SEA Introduction to the Manual Resource Manual to Support Application of the UNECE

OpenACC Birgitte Bryds HPC2N, Ume a University 12 December 2017 1 / 27 OpenACC Overview

GPU Computing with OpenACC Directives GPUs Reaching Broader Set of Developers 1,000,000s CAE

Joint Crisis Committee Briefing to delegates 20 May 2013 Introduction What is the JCC?

Directive-Based Programming with OpenMP Shared Memory Programming - PowerPoint PPT Presentation

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation (pthreads): pthread t thread; pthread create(&thread, &attr, some_function, (void ) &result); pthread join(thread, NULL);

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Programming Shared-memory Platforms with OpenMP Xu Liu Topics for Today Introduction to

Shared Memory Programming with OpenMP Lecture 7: Further topics Nested parallelism Unlike

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

GPU COMPUTING WITH OPENACC 3 WAYS TO ACCELERATE APPLICATIONS Applications Programming OpenACC

The 2018 LVD Guide What suppliers of electrical equipment to the EU need to know C&amp;R

RADIO LOCKDOWN DIRECTIVE Major Threat for Free Software on Radio Devices Max Mehl Coordinator

Introduction to Parallel Computing George Karypis Programming Shared Address Space Platforms

Protocol on SEA Introduction to the Manual Resource Manual to Support Application of the UNECE

OpenACC Birgitte Bryds HPC2N, Ume a University 12 December 2017 1 / 27 OpenACC Overview

GPU Computing with OpenACC Directives GPUs Reaching Broader Set of Developers 1,000,000s CAE

Joint Crisis Committee Briefing to delegates 20 May 2013 Introduction What is the JCC?

The 2018 LVD Guide What suppliers of electrical equipment to the EU need to know C&R