Lecture 12: OpenMP Abhinav Bhatele, Department of Computer Science - - PowerPoint PPT Presentation

lecture 12 openmp
SMART_READER_LITE
LIVE PREVIEW

Lecture 12: OpenMP Abhinav Bhatele, Department of Computer Science - - PowerPoint PPT Presentation

Introduction to Parallel Computing (CMSC498X / CMSC818X) Lecture 12: OpenMP Abhinav Bhatele, Department of Computer Science Announcements Use office hours If you foresee not being able to complete assignments for a valid reason, email me


slide-1
SLIDE 1

Lecture 12: OpenMP

Abhinav Bhatele, Department of Computer Science

Introduction to Parallel Computing (CMSC498X / CMSC818X)

slide-2
SLIDE 2

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Announcements

  • Use office hours
  • If you foresee not being able to complete assignments for a valid reason, email me

asap instead of after the deadline

2

slide-3
SLIDE 3

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

saxpy (single precision a*x+y) example

3

for (int i = 0; i < n; i++) { z[i] = a * x[i] + y[i]; }

slide-4
SLIDE 4

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

saxpy (single precision a*x+y) example

3

for (int i = 0; i < n; i++) { z[i] = a * x[i] + y[i]; } #pragma omp parallel for

slide-5
SLIDE 5

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Overriding defaults using clauses

  • Specify how data is shared between threads executing a parallel region
  • private(list)
  • shared(list)
  • default(shared | none)
  • reduction(operator: list)
  • firstprivate(list)
  • lastprivate(list)

4

https://www.openmp.org/spec-html/5.0/openmpsu106.html#x139-5540002.19.4

slide-6
SLIDE 6

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

private clause

  • Each thread has its own copy of the variables in the list
  • Private variables are uninitialized when a thread starts
  • The value of a private variable is unavailable to the master thread after the parallel

region has been executed

5

slide-7
SLIDE 7

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

default clause

  • Determines the data sharing attributes for variables for which this would be implicitly

determined otherwise

6

slide-8
SLIDE 8

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Anything wrong with this example?

7

val = 5; #pragma omp parallel for private(val) for (int i = 0; i < n; i++) { ... = val + 1; }

slide-9
SLIDE 9

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Anything wrong with this example?

7

val = 5; #pragma omp parallel for private(val) for (int i = 0; i < n; i++) { ... = val + 1; } The value of val will not be available to threads inside the loop

slide-10
SLIDE 10

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Anything wrong with this example?

8

#pragma omp parallel for private(val) for (int i = 0; i < n; i++) { val = i + 1; } printf(“%d\n”, val);

slide-11
SLIDE 11

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Anything wrong with this example?

8

#pragma omp parallel for private(val) for (int i = 0; i < n; i++) { val = i + 1; } printf(“%d\n”, val); The value of val will not be available to the master thread outside the loop

slide-12
SLIDE 12

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

firstprivate clause

  • Initializes each thread’s private copy to the value of the master thread’s copy

9

val = 5; #pragma omp parallel for firstprivate(val) for (int i = 0; i < n; i++) { ... = val + 1; }

slide-13
SLIDE 13

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

lastprivate clause

  • Writes the value belonging to the thread that executed the last iteration of the loop

to the master’s copy

  • Last iteration determined by sequential order

10

slide-14
SLIDE 14

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

lastprivate clause

  • Writes the value belonging to the thread that executed the last iteration of the loop

to the master’s copy

  • Last iteration determined by sequential order

10

#pragma omp parallel for lastprivate(val) for (int i = 0; i < n; i++) { val = i + 1; } printf(“%d\n”, val);

slide-15
SLIDE 15

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

reduction(operator: list) clause

  • Reduce values across private copies of a variable
  • Operators: +, -, *, &, |, ^, &&, ||, max, min

11

#pragma omp parallel for for (int i = 0; i < n; i++) { val += i; } printf(“%d\n”, val);

https://www.openmp.org/spec-html/5.0/openmpsu107.html#x140-5800002.19.5

slide-16
SLIDE 16

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

reduction(operator: list) clause

  • Reduce values across private copies of a variable
  • Operators: +, -, *, &, |, ^, &&, ||, max, min

11

#pragma omp parallel for for (int i = 0; i < n; i++) { val += i; } printf(“%d\n”, val); reduction(+: val)

https://www.openmp.org/spec-html/5.0/openmpsu107.html#x140-5800002.19.5

slide-17
SLIDE 17

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

User-specified loop scheduling

  • Schedule clause
  • type: static, dynamic, guided, runtime
  • static: iterations divided as evenly as possible (#iterations/#threads)
  • chunk < #iterations/#threads can be used to interleave threads
  • dynamic: assign a chunk size block to each thread
  • When a thread is finished, it retrieves the next block from an internal work queue
  • Default chunk size = 1

12

schedule (type[, chunk])

slide-18
SLIDE 18

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Other schedules

  • guided: similar to dynamic but start with a large chunk size and gradually decrease it

for handling load imbalance between iterations

  • auto: scheduling delegated to the compiler
  • runtime: use the OMP_SCHEDULE environment variable

13

https://software.intel.com/content/www/us/en/develop/articles/openmp-loop-scheduling.html

slide-19
SLIDE 19

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Calculate the value of

14

π = ∫

1

4 1 + x2

int main(int argc, char *argv[]) { ... n = 10000; h = 1.0 / (double) n; sum = 0.0; for (i = 1; i <= n; i += 1) { x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x * x)); } pi = h * sum; ... }

slide-20
SLIDE 20

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Calculate the value of

15

π = ∫

1

4 1 + x2

int main(int argc, char *argv[]) { ... n = 10000; h = 1.0 / (double) n; sum = 0.0; #pragma omp parallel for firstprivate(h) private(x) reduction(+: sum) for (i = 1; i <= n; i += 1) { x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x * x)); } pi = h * sum; ... }

slide-21
SLIDE 21

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Parallel region

  • All threads execute the structured block
  • Number of threads can be specified just like the parallel for directive

16

#pragma omp parallel [clause [clause] ... ] structured block

slide-22
SLIDE 22

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Synchronization

  • Concurrent access to shared data may result in inconsistencies
  • Use mutual exclusion to avoid that
  • critical directive
  • atomic directive
  • Library lock routines

17

https://software.intel.com/content/www/us/en/develop/documentation/advisor-user-guide/top/appendix/adding-parallelism-to-your-program/replacing-annotations-with-openmp-code/adding-openmp-code-to- synchronize-the-shared-resources.html

slide-23
SLIDE 23

Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu