Lecture 11: OpenMP Abhinav Bhatele, Department of Computer Science - - PowerPoint PPT Presentation

lecture 11 openmp
SMART_READER_LITE
LIVE PREVIEW

Lecture 11: OpenMP Abhinav Bhatele, Department of Computer Science - - PowerPoint PPT Presentation

Introduction to Parallel Computing (CMSC498X / CMSC818X) Lecture 11: OpenMP Abhinav Bhatele, Department of Computer Science Announcements Assignment 2 has been posted Deadline: October 19, 11:59 pm AoE Abhinav Bhatele (CMSC498X/CMSC818X)


slide-1
SLIDE 1

Lecture 11: OpenMP

Abhinav Bhatele, Department of Computer Science

Introduction to Parallel Computing (CMSC498X / CMSC818X)

slide-2
SLIDE 2

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Announcements

  • Assignment 2 has been posted
  • Deadline: October 19, 11:59 pm AoE

2

slide-3
SLIDE 3

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Shared memory programming

  • All entities (threads) have access to the entire address space
  • Threads “communicate” or exchange data by sharing variables
  • User has to manage data conflicts

3

slide-4
SLIDE 4

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

OpenMP

  • OpenMP is an example of a shared memory programming model
  • Provides on-node parallelization
  • Meant for certain kinds of programs/computational kernels
  • That use arrays and loops
  • Hopefully easy to implement in parallel with small code changes

4

slide-5
SLIDE 5

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

OpenMP

  • OpenMP is a language extension that enables parallelizing C/C++/Fortran code
  • Programmer uses compiler directives and library routines to indicate parallel regions

in the code

  • Compiler converts code to multi-threaded code
  • Fork/join model of parallelism

5

slide-6
SLIDE 6

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fork-join parallelism

  • Single flow of control
  • Master thread spawns worker threads

6

https://en.wikipedia.org/wiki/OpenMP

slide-7
SLIDE 7

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Fork-join parallelism

  • Single flow of control
  • Master thread spawns worker threads

6

https://en.wikipedia.org/wiki/OpenMP

slide-8
SLIDE 8

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Race conditions when threads interact

  • Unintended sharing of variables can lead to race conditions
  • Race condition: program outcome depends on the scheduling order of threads
  • How can we prevent data races?
  • Use synchronization
  • Change how data is stored

7

slide-9
SLIDE 9

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

OpenMP pragmas

  • Pragma: a compiler directive in C or C++
  • Mechanism to communicate with the compiler
  • Compiler may ignore pragmas

8

#pragma omp construct [clause [clause] ... ]

slide-10
SLIDE 10

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

  • Compiling:
  • Setting number of threads:

Hello World in OpenMP

9

#include <stdio.h> #include <omp.h> int main(void) { #pragma omp parallel printf("Hello, world.\n"); return 0; } gcc -fopenmp hello.c -o hello export OMP_NUM_THREADS=2

slide-11
SLIDE 11

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Parallel for

  • Directs the compiler that the immediately following for loop should be executed in

parallel

10

#pragma omp parallel for [clause [clause] ... ] for (i = init; test_expression; increment_expression) { ... do work ... }

slide-12
SLIDE 12

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Parallel for example

11

int main(int argc, char **argv) { int a[100000]; #pragma omp parallel for for (int i = 0; i < 100000; i++) { a[i] = 2 * i; } return 0; }

slide-13
SLIDE 13

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Parallel for execution

  • Master thread creates worker threads
  • All threads divide iterations of the loop among themselves

12

Master thread Worker thread 1 Time Worker thread 2 Worker thread 3

parallel for synchronize

slide-14
SLIDE 14

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Number of threads

  • Use environment variable
  • Use omp_set_num_threads(int num_threads)
  • Set the number of OpenMP threads to be used in parallel regions
  • int omp_get_num_procs(void);
  • Returns the number of available processors
  • Can be used to decide the number of threads to create

13

export OMP_NUM_THREADS=X

slide-15
SLIDE 15

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Loop scheduling

  • Assignment of loop iterations to different worker threads
  • Default schedule tries to balance iterations among threads
  • User-specified schedules are also available

14

slide-16
SLIDE 16

Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING

Data sharing defaults

  • Most variables are shared by default
  • Global variables are shared
  • Exception: loop index variables are private by default
  • Stack variables in function calls from parallel regions are also private to each thread

(thread-private)

15

slide-17
SLIDE 17

Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu