Parallel programming 03 Walter Boscheri walter.boscheri@unife.it - - PowerPoint PPT Presentation

parallel programming 03
SMART_READER_LITE
LIVE PREVIEW

Parallel programming 03 Walter Boscheri walter.boscheri@unife.it - - PowerPoint PPT Presentation

Parallel programming 03 Walter Boscheri walter.boscheri@unife.it University of Ferrara - Department of Mathematics and Computer Science A.Y. 2018/2019 - Semester I Outline Introduction to OpenMP 1 OpenMP directives 2 OpenMP synchronization


slide-1
SLIDE 1

Parallel programming 03

Walter Boscheri walter.boscheri@unife.it

University of Ferrara - Department of Mathematics and Computer Science

A.Y. 2018/2019 - Semester I

slide-2
SLIDE 2

Outline

1

Introduction to OpenMP

2

OpenMP directives

3

OpenMP synchronization

4

OpenMP syntax and main commands

5

OpenMP optimization

6

OpenMP SIMD

7

Exercise

slide-3
SLIDE 3
  • 1. Introduction to OpenMP

OpenMP overview

OpenMPI is a standard programming model for shared memory parallel programming portable across shared memory architectures FORTRAN binding OpenMP is a standard OpenMP is the easiest approach to multi-threaded programming

Walter Boscheri Parallel programming 03 2 / 27

slide-4
SLIDE 4
  • 1. Introduction to OpenMP

Where to use OpenMP

# CPUs Problem size

Scalar OpenMP MPI Problem dominated by Overhead

Walter Boscheri Parallel programming 03 3 / 27

slide-5
SLIDE 5
  • 1. Introduction to OpenMP

OpenMP programming model

OpenMP is a shared memory model Workload is distributed among threads. Variables can be

shared among all threads duplicated for each thread

Threads communicate by sharing variables Unintended sharing of data can lead to race conditions: this is when the program’s outcome changes as the threads are scheduled differently Use synchronization to protect data conflicts and control race conditions

Walter Boscheri Parallel programming 03 4 / 27

slide-6
SLIDE 6
  • 1. Introduction to OpenMP

OpenMP execution model

begin execution as a single process (master thread) start of a parallel construct: master thread creates team of threads completion of a parallel construct: threads in the team synchronize (implicit barrier)

  • nly master thread continues execution

Block of code to be executed by multiple threads in parallel: !$OMP PARALLEL !$OMP END PARALLEL

Walter Boscheri Parallel programming 03 5 / 27

slide-7
SLIDE 7
  • 2. OpenMP directives

OpenMP directive format: FORTRAN

include file for library routines: USE

  • mp_lib
  • r

INCLUDE ’omp_lib.h’ OpenMP sentinel: !$OMP conditional compilation: !$ integration in Visual Studio 2019:

Walter Boscheri Parallel programming 03 6 / 27

slide-8
SLIDE 8
  • 2. OpenMP directives

OpenMP environment variables

get the maximum number of threads NCPU which are available: NPRCS = OMP_GET_NUM_PROCS () set the number of threads NCPU that has to be used: CALL OMP_SET_NUM_THREADS (NCPU) wall clock timers which provide elapsed time:

t_start = OMP_GET_WTIME () ! work to be measured t_end = OMP_GET_WTIME () PRINT *, ’ Work took ’, t_end - t_start , ’ seconds ’ function CALL OMP_GET_WTICK () returns the number of seconds between two successive clock ticks

Walter Boscheri Parallel programming 03 7 / 27

slide-9
SLIDE 9
  • 2. OpenMP directives

OpenMP variables

Private and shared variables: private (list): declares the variables in list to be private to each thread in a team shared (list): makes variables that appear in list shared among all the threads in a team

N.B.- if not specified, the default is shared. Exceptions: local variables in called sub-programs are private loop control variables of parallel OMP (DO loop) is private

Walter Boscheri Parallel programming 03 8 / 27

slide-10
SLIDE 10
  • 2. OpenMP directives

Worksharing and Synchronization

Which thread executes which statement or operation? and when? ⇓ Worksharing constructs master and synchronization constructs This is how to organize the parallel work!

Walter Boscheri Parallel programming 03 9 / 27

slide-11
SLIDE 11
  • 2. OpenMP directives

OpenMP worksharing constructs

divide the execution of the enclosed code region among the members

  • f the team

must be enclosed dynamically within a parallel region they do not launch new threads no implied barrier on entry

Walter Boscheri Parallel programming 03 10 / 27

slide-12
SLIDE 12
  • 2. OpenMP directives

OpenMP SECTIONS directive

!$OMP PARALLEL !$OMP SECTIONS a = b = !$OMP SECTION c = d = !$OMP SECTION e = f = !$OMP SECTION g = h = !$OMP END SECTIONS !$OMP END PARALLEL

a = ... b = ... a = ... b = ... a = ... b = ... c = ... d = ... f = ... h = ... g = ... e = ...

Walter Boscheri Parallel programming 03 11 / 27

slide-13
SLIDE 13
  • 2. OpenMP directives

OpenMP DO directive

!$OMP PARALLEL a = 5 !$OMP DO DO i = 1, 20 c(i) = b(i) + a*i ENDDO !$OMP END DO !$OMP END PARALLEL

a = 5 i=1,5 a = 5 i=6,10 i=11,15 i=16,20 a = 5 a = 5

Walter Boscheri Parallel programming 03 12 / 27

slide-14
SLIDE 14
  • 2. OpenMP directives

OpenMP DO directive

Clauses for !$OMP DO private (list) declares the variables in list to be private to each thread in a team shared (list) makes variables that appear in list shared among all the threads in a team collapse (n) with constant integer n. The iterations of the following n nested loops are collapsed into one larger iteration space

!$OMP PARALLEL DO PRIVATE(j) COLLAPSE (2) DO i = 1, 4 DO j = 1, 100 a(i) = b(j) + 4 ENDDO ENDDO !$OMP END PARALLEL DO

Walter Boscheri Parallel programming 03 12 / 27

slide-15
SLIDE 15
  • 2. OpenMP directives

OpenMP DO directive

Clauses for !$OMP DO reduction (operator: list) performs a reduction on the variables that appear in list with the

  • perator operator.

It can be one of the following: +, -, *, .AND., .OR., MAX, MIN

Variables must be shared. At the end of the reduction, the shared variable is updated to reflect the result of combining the original value of the shared reduction variable with the final value of each of the private copies using the

  • perator specified.

sm = 0 !$OMP PARALLEL DO PRIVATE(r) REDUCTION (+:sm) DO i = 1, 20 r = work(i) sm = sm + r ENDDO !$OMP END PARALLEL DO

Walter Boscheri Parallel programming 03 12 / 27

slide-16
SLIDE 16
  • 2. OpenMP directives

OpenMP DO directive

Clauses for !$OMP DO nowait implicit barrier at the end of !$OMP DO unless nowait is specified. If nowait is specified, threads do not synchronize at the end of the parallel loop. schedule (type [chunk] ). type can be:

static: iterations are divided into pieces of a size specified by chunk dynamic: iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically

  • btains the next set of iterations.

guided: the chink size is reduced in an exponentially decreasing manner with each dispatched piece of the iteration space. chunk specifies the smallest piece. auto: scheduling is delegated to the compiler and/or runtime system.

Walter Boscheri Parallel programming 03 12 / 27

slide-17
SLIDE 17
  • 2. OpenMP directives

OpenMP WORKSHARE directive

WORKSHARE directive allows parallelization of array expressions and FORALL statements.

!$OMP WORKSHARE A = B FORALL(i=1:N,j=1:N, A(i,j).NE .0.0) B(i,j) = 1.0/A(i,j) !$OMP END WORKSHARE work inside block is divided into separate units of work each unit of work is executed only once the units of work are assigned to threads in any manner similar to PARALLEL DO without explicit loops

Walter Boscheri Parallel programming 03 13 / 27

slide-18
SLIDE 18
  • 2. OpenMP directives

OpenMP TASK directive

When a thread encounters a TASK construct, a task is generated from the code for the associated structured block. TASK defines an explicit task.

!$OMP TASK [clause] !block of work !$OMP END TASK Clauses: untied default( shared — non — private ) private (list) shared (list)

Walter Boscheri Parallel programming 03 14 / 27

slide-19
SLIDE 19
  • 2. OpenMP directives

OpenMP TASK directive

the encountering thread may immediately execute the task may defer its execution The number of task is limited to the number of threads! Completion of a task can be guaranteed using task synchronization con- structs ⇓ !$OMP TASKWAIT

Walter Boscheri Parallel programming 03 14 / 27

slide-20
SLIDE 20
  • 2. OpenMP directives

OpenMP SINGLE directive

The block is executed by only one thread in the team, which should not necessary be the master thread.

!$OMP SINGLE !block of work !$OMP END SINGLE implicit barrier at the end of SINGLE construct (unless nowait is specified)

To reduce the overhead, one can combine several parallel parts (DO, WORKSHARE, SECTIONS) and sequential parts (SINGLE) in one parallel region (!$OMP PARALLEL --- !$OMP END PARALLEL)

Walter Boscheri Parallel programming 03 15 / 27

slide-21
SLIDE 21
  • 3. OpenMP synchronization

OpenMP synchronization

implicit barrier

beginning and end of parallel constructs end of all other constructs implicit synchronization can be removed by means of nowait clause

explicit: CRITICAL directive

!$OMP CRITICAL !block of work !$OMP END CRITICAL A thread waits at the beginning of a critical region until no other thread in the team is executing a critical region with the same name. All unnamed critical directives map to the same unspecified name.

Walter Boscheri Parallel programming 03 16 / 27

slide-22
SLIDE 22
  • 3. OpenMP synchronization

OpenMP CRITICAL directive

cnt = 0 a = 5 !$OMP PARALLEL !$OMP DO DO i = 1, 20 IF(b(i).EQ.0) THEN !$OMP CRITICAL cnt = cnt + 1 !$OMP END CRITICAL ENDIF a(i) = b(i) + a*i ENDDO !$OMP END DO !$OMP END PARALLEL

cnt = 0 i=1,5 i=6,10 i=11,15 i=16,20 a = 5 IF ... IF ... IF ... IF ...

cnt = cnt + 1 cnt = cnt + 1

a(i)=... a(i)=... a(i)=... a(i)=...

Walter Boscheri Parallel programming 03 17 / 27

slide-23
SLIDE 23
  • 3. OpenMP synchronization

Race conditions

!$OMP PARALLEL SECTIONS A = A + B !$OMP SECTION B = A + C !$OMP SECTION C = B + A !$OMP END PARALLEL SECTIONS the result varies unpredictably based on the specific order of the execution for each section wrong answers produced without warning

Walter Boscheri Parallel programming 03 18 / 27

slide-24
SLIDE 24
  • 3. OpenMP synchronization

Race conditions

!$OMP PARALLEL SHARED(X), PRIVATE(tmp) ID = OMP_GET_THREAD_NUM () !$OMP DO REDUCTION (+:X) DO i = 1, 100 tmp = work1(i) X = X + tmp ENDDO !$OMP END DO NOWAIT Y(ID) = work2(X,ID) !$OMP END PARALLEL the result varies unpredictably because the value of X is not uniquely fixed until the barrier at the end of the DO loop be careful when using NOWAIT

Walter Boscheri Parallel programming 03 18 / 27

slide-25
SLIDE 25
  • 3. OpenMP synchronization

OpenMP overheads

Overheads of some OpenMP directives given in microseconds.

Walter Boscheri Parallel programming 03 19 / 27

slide-26
SLIDE 26
  • 4. OpenMP syntax and main commands

OpenMP environment variables

!$OMP PARALLEL forms a team of threads and starts parallel execution !$OMP DO specifies that the iterations of associated loops will be executed in parallel by threads in the team !$OMP SINGLE specifies that the associated structured block is executed by

  • nly one of the threads in the team

!$OMP WORKSHARE divides the execution of the enclosed structured block into separate units of work, each executed only once by one thread [nowait] !$OMP BARRIER synchronizes all threads !$OMP SECTIONS a noniterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team timing routines: OMP GET WTIME(): provides elapsed time OMP GET WTICK(): returns the seconds between two consecutive clock ticks

Walter Boscheri Parallel programming 03 20 / 27