Parallel programming 03 Walter Boscheri walter.boscheri@unife.it - - PowerPoint PPT Presentation
Parallel programming 03 Walter Boscheri walter.boscheri@unife.it - - PowerPoint PPT Presentation
Parallel programming 03 Walter Boscheri walter.boscheri@unife.it University of Ferrara - Department of Mathematics and Computer Science A.Y. 2018/2019 - Semester I Outline Introduction to OpenMP 1 OpenMP directives 2 OpenMP synchronization
Outline
1
Introduction to OpenMP
2
OpenMP directives
3
OpenMP synchronization
4
OpenMP syntax and main commands
5
OpenMP optimization
6
OpenMP SIMD
7
Exercise
- 1. Introduction to OpenMP
OpenMP overview
OpenMPI is a standard programming model for shared memory parallel programming portable across shared memory architectures FORTRAN binding OpenMP is a standard OpenMP is the easiest approach to multi-threaded programming
Walter Boscheri Parallel programming 03 2 / 27
- 1. Introduction to OpenMP
Where to use OpenMP
# CPUs Problem size
Scalar OpenMP MPI Problem dominated by Overhead
Walter Boscheri Parallel programming 03 3 / 27
- 1. Introduction to OpenMP
OpenMP programming model
OpenMP is a shared memory model Workload is distributed among threads. Variables can be
shared among all threads duplicated for each thread
Threads communicate by sharing variables Unintended sharing of data can lead to race conditions: this is when the program’s outcome changes as the threads are scheduled differently Use synchronization to protect data conflicts and control race conditions
Walter Boscheri Parallel programming 03 4 / 27
- 1. Introduction to OpenMP
OpenMP execution model
begin execution as a single process (master thread) start of a parallel construct: master thread creates team of threads completion of a parallel construct: threads in the team synchronize (implicit barrier)
- nly master thread continues execution
Block of code to be executed by multiple threads in parallel: !$OMP PARALLEL !$OMP END PARALLEL
Walter Boscheri Parallel programming 03 5 / 27
- 2. OpenMP directives
OpenMP directive format: FORTRAN
include file for library routines: USE
- mp_lib
- r
INCLUDE ’omp_lib.h’ OpenMP sentinel: !$OMP conditional compilation: !$ integration in Visual Studio 2019:
Walter Boscheri Parallel programming 03 6 / 27
- 2. OpenMP directives
OpenMP environment variables
get the maximum number of threads NCPU which are available: NPRCS = OMP_GET_NUM_PROCS () set the number of threads NCPU that has to be used: CALL OMP_SET_NUM_THREADS (NCPU) wall clock timers which provide elapsed time:
t_start = OMP_GET_WTIME () ! work to be measured t_end = OMP_GET_WTIME () PRINT *, ’ Work took ’, t_end - t_start , ’ seconds ’ function CALL OMP_GET_WTICK () returns the number of seconds between two successive clock ticks
Walter Boscheri Parallel programming 03 7 / 27
- 2. OpenMP directives
OpenMP variables
Private and shared variables: private (list): declares the variables in list to be private to each thread in a team shared (list): makes variables that appear in list shared among all the threads in a team
N.B.- if not specified, the default is shared. Exceptions: local variables in called sub-programs are private loop control variables of parallel OMP (DO loop) is private
Walter Boscheri Parallel programming 03 8 / 27
- 2. OpenMP directives
Worksharing and Synchronization
Which thread executes which statement or operation? and when? ⇓ Worksharing constructs master and synchronization constructs This is how to organize the parallel work!
Walter Boscheri Parallel programming 03 9 / 27
- 2. OpenMP directives
OpenMP worksharing constructs
divide the execution of the enclosed code region among the members
- f the team
must be enclosed dynamically within a parallel region they do not launch new threads no implied barrier on entry
Walter Boscheri Parallel programming 03 10 / 27
- 2. OpenMP directives
OpenMP SECTIONS directive
!$OMP PARALLEL !$OMP SECTIONS a = b = !$OMP SECTION c = d = !$OMP SECTION e = f = !$OMP SECTION g = h = !$OMP END SECTIONS !$OMP END PARALLEL
a = ... b = ... a = ... b = ... a = ... b = ... c = ... d = ... f = ... h = ... g = ... e = ...
Walter Boscheri Parallel programming 03 11 / 27
- 2. OpenMP directives
OpenMP DO directive
!$OMP PARALLEL a = 5 !$OMP DO DO i = 1, 20 c(i) = b(i) + a*i ENDDO !$OMP END DO !$OMP END PARALLEL
a = 5 i=1,5 a = 5 i=6,10 i=11,15 i=16,20 a = 5 a = 5
Walter Boscheri Parallel programming 03 12 / 27
- 2. OpenMP directives
OpenMP DO directive
Clauses for !$OMP DO private (list) declares the variables in list to be private to each thread in a team shared (list) makes variables that appear in list shared among all the threads in a team collapse (n) with constant integer n. The iterations of the following n nested loops are collapsed into one larger iteration space
!$OMP PARALLEL DO PRIVATE(j) COLLAPSE (2) DO i = 1, 4 DO j = 1, 100 a(i) = b(j) + 4 ENDDO ENDDO !$OMP END PARALLEL DO
Walter Boscheri Parallel programming 03 12 / 27
- 2. OpenMP directives
OpenMP DO directive
Clauses for !$OMP DO reduction (operator: list) performs a reduction on the variables that appear in list with the
- perator operator.
It can be one of the following: +, -, *, .AND., .OR., MAX, MIN
Variables must be shared. At the end of the reduction, the shared variable is updated to reflect the result of combining the original value of the shared reduction variable with the final value of each of the private copies using the
- perator specified.
sm = 0 !$OMP PARALLEL DO PRIVATE(r) REDUCTION (+:sm) DO i = 1, 20 r = work(i) sm = sm + r ENDDO !$OMP END PARALLEL DO
Walter Boscheri Parallel programming 03 12 / 27
- 2. OpenMP directives
OpenMP DO directive
Clauses for !$OMP DO nowait implicit barrier at the end of !$OMP DO unless nowait is specified. If nowait is specified, threads do not synchronize at the end of the parallel loop. schedule (type [chunk] ). type can be:
static: iterations are divided into pieces of a size specified by chunk dynamic: iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically
- btains the next set of iterations.
guided: the chink size is reduced in an exponentially decreasing manner with each dispatched piece of the iteration space. chunk specifies the smallest piece. auto: scheduling is delegated to the compiler and/or runtime system.
Walter Boscheri Parallel programming 03 12 / 27
- 2. OpenMP directives
OpenMP WORKSHARE directive
WORKSHARE directive allows parallelization of array expressions and FORALL statements.
!$OMP WORKSHARE A = B FORALL(i=1:N,j=1:N, A(i,j).NE .0.0) B(i,j) = 1.0/A(i,j) !$OMP END WORKSHARE work inside block is divided into separate units of work each unit of work is executed only once the units of work are assigned to threads in any manner similar to PARALLEL DO without explicit loops
Walter Boscheri Parallel programming 03 13 / 27
- 2. OpenMP directives
OpenMP TASK directive
When a thread encounters a TASK construct, a task is generated from the code for the associated structured block. TASK defines an explicit task.
!$OMP TASK [clause] !block of work !$OMP END TASK Clauses: untied default( shared — non — private ) private (list) shared (list)
Walter Boscheri Parallel programming 03 14 / 27
- 2. OpenMP directives
OpenMP TASK directive
the encountering thread may immediately execute the task may defer its execution The number of task is limited to the number of threads! Completion of a task can be guaranteed using task synchronization con- structs ⇓ !$OMP TASKWAIT
Walter Boscheri Parallel programming 03 14 / 27
- 2. OpenMP directives
OpenMP SINGLE directive
The block is executed by only one thread in the team, which should not necessary be the master thread.
!$OMP SINGLE !block of work !$OMP END SINGLE implicit barrier at the end of SINGLE construct (unless nowait is specified)
To reduce the overhead, one can combine several parallel parts (DO, WORKSHARE, SECTIONS) and sequential parts (SINGLE) in one parallel region (!$OMP PARALLEL --- !$OMP END PARALLEL)
Walter Boscheri Parallel programming 03 15 / 27
- 3. OpenMP synchronization
OpenMP synchronization
implicit barrier
beginning and end of parallel constructs end of all other constructs implicit synchronization can be removed by means of nowait clause
explicit: CRITICAL directive
!$OMP CRITICAL !block of work !$OMP END CRITICAL A thread waits at the beginning of a critical region until no other thread in the team is executing a critical region with the same name. All unnamed critical directives map to the same unspecified name.
Walter Boscheri Parallel programming 03 16 / 27
- 3. OpenMP synchronization
OpenMP CRITICAL directive
cnt = 0 a = 5 !$OMP PARALLEL !$OMP DO DO i = 1, 20 IF(b(i).EQ.0) THEN !$OMP CRITICAL cnt = cnt + 1 !$OMP END CRITICAL ENDIF a(i) = b(i) + a*i ENDDO !$OMP END DO !$OMP END PARALLEL
cnt = 0 i=1,5 i=6,10 i=11,15 i=16,20 a = 5 IF ... IF ... IF ... IF ...
cnt = cnt + 1 cnt = cnt + 1
a(i)=... a(i)=... a(i)=... a(i)=...
Walter Boscheri Parallel programming 03 17 / 27
- 3. OpenMP synchronization
Race conditions
!$OMP PARALLEL SECTIONS A = A + B !$OMP SECTION B = A + C !$OMP SECTION C = B + A !$OMP END PARALLEL SECTIONS the result varies unpredictably based on the specific order of the execution for each section wrong answers produced without warning
Walter Boscheri Parallel programming 03 18 / 27
- 3. OpenMP synchronization
Race conditions
!$OMP PARALLEL SHARED(X), PRIVATE(tmp) ID = OMP_GET_THREAD_NUM () !$OMP DO REDUCTION (+:X) DO i = 1, 100 tmp = work1(i) X = X + tmp ENDDO !$OMP END DO NOWAIT Y(ID) = work2(X,ID) !$OMP END PARALLEL the result varies unpredictably because the value of X is not uniquely fixed until the barrier at the end of the DO loop be careful when using NOWAIT
Walter Boscheri Parallel programming 03 18 / 27
- 3. OpenMP synchronization
OpenMP overheads
Overheads of some OpenMP directives given in microseconds.
Walter Boscheri Parallel programming 03 19 / 27
- 4. OpenMP syntax and main commands
OpenMP environment variables
!$OMP PARALLEL forms a team of threads and starts parallel execution !$OMP DO specifies that the iterations of associated loops will be executed in parallel by threads in the team !$OMP SINGLE specifies that the associated structured block is executed by
- nly one of the threads in the team
!$OMP WORKSHARE divides the execution of the enclosed structured block into separate units of work, each executed only once by one thread [nowait] !$OMP BARRIER synchronizes all threads !$OMP SECTIONS a noniterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team timing routines: OMP GET WTIME(): provides elapsed time OMP GET WTICK(): returns the seconds between two consecutive clock ticks
Walter Boscheri Parallel programming 03 20 / 27