CS 4230: Parallel Programming Lecture 4: OpenMP Open - - PowerPoint PPT Presentation

cs 4230 parallel programming lecture 4 openmp open multi
SMART_READER_LITE
LIVE PREVIEW

CS 4230: Parallel Programming Lecture 4: OpenMP Open - - PowerPoint PPT Presentation

CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP constructs syntax


slide-1
SLIDE 1

CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing

January 23, 2017

01/23/2017 1 CS4230

slide-2
SLIDE 2

Outline

  • OpenMP – another approach for thread parallel

programming

  • Fork-Join execution model
  • OpenMP constructs – syntax and semantics

– Work sharing – Thread scheduling – Data sharing – Reduction – Synchronization

  • ‘count_primes’ hands-on!

2 01/23/2017 CS4230

slide-3
SLIDE 3

OpenMP: Common Thread-Level Programming Approach in HPC

  • Portable across shared-memory architectures
  • Incremental parallelization

– Parallelize individual computations in a program while leaving the rest of the program sequential

  • Compiler based

– Compiler generates thread program and synchronization

  • Extensions to existing programming languages

(Fortran, C and C++)

– mainly by directives – a few library routines

See http://www.openmp.org

01/23/2017 CS4230 3

slide-4
SLIDE 4

Fork-Join Model

1/23/2017 CS 4230

slide-5
SLIDE 5

OpenMP HelloWorld

#include <omp.h> #include <stdio.h> int main (int argc, char *argv[]) { #pragma omp parallel { printf("Hello World from Thread %d!\n”,

  • mp_get_thread_num());

} return 0; }

Compiling for OpenMP gcc: -fopenmp, icc: -openmp, pgcc: -mp, …

01/23/2017 CS4230 5

slide-6
SLIDE 6

Number of threads

  • if clause
  • NUM_THREADS clause
  • omp_set_num_threads()
  • OMP_NUM_THREADS
  • Default

01/23/2017 CS4230 6

slide-7
SLIDE 7

OpenMP constructs

  • Compiler directives (44)

#pragma omp parallel [clause]

  • Runtime library routines (35)

#include <omp.h> int omp_get_num_threads(void) int omp_get_thread_num(void)

  • Environment variable (13)

export OMP_NUM_THREADS=x

01/23/2017 CS4230 7

slide-8
SLIDE 8

Work sharing

  • divides the execution of the enclosed code

region among multiple threads

– for shares iterations of a loop across the team of threads #pragma omp parallel for [clause] – Also sections and single (see [1])

01/23/2017 CS4230 8

slide-9
SLIDE 9

Work sharing - for

#include <omp.h> int main (int argc, char *argv[]) { int i, n=10; #pragma omp parallel for { for(i=0;i<n;i++) printf("Hello World!\n”); } return 0; }

01/23/2017 CS4230 9

slide-10
SLIDE 10

Thread scheduling

  • Static: Loop iterations are divided into pieces of

size chunk and then statically assigned to threads.

– schedule(static [,chunk])

  • Dynamic: Loop iterations are divided into pieces
  • f size chunk, and dynamically scheduled among

the threads

– schedule(dynamic [,chunk])

  • More options,

– guided, runtime, auto

01/23/2017 CS4230 10

slide-11
SLIDE 11

Data sharing/ Data scope

  • shared variables are shared among threads
  • private variables are private to a thread
  • Default is shared
  • Loop index is private, nested loops

#pragma omp parallel for private(list) shared(list)

  • Can be used with any work sharing clause
  • Also firstprivate, lastprivate, default,

copyin, … (see [1])

01/23/2017 CS4230 11

slide-12
SLIDE 12

Reduction

  • The reduction clause performs a reduction on

the variables that appear in its list

  • A private copy for each list variable is created for

each thread.

  • At the end of the reduction, the reduction

variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable.

reduction (operator: list)

01/23/2017 CS4230 12

slide-13
SLIDE 13

Reduction

#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int i,n=1000; float a[1000], b[1000], sum; for (i=0; i<n; i++) a[i] = b[i] = i * 1.0; sum = 0.0; #pragma omp parallel for reduction(+:sum) for (i=0; i<n; i++) sum = sum + (a[i] * b[i]); printf("Sum = %f\n",sum); }

01/23/2017 CS4230 13

Source: http://computing.llnl.gov/tutorials/openMP/samples/C/omp_reduction.c

slide-14
SLIDE 14

OpenMP Synchronization

  • Recall ‘barrier’ from pthreads

– int pthread_barrier_wait(pthread_barrier_t *barrier);

  • Implicit barrier

– At the end of parallel regions – Barrier can be removed with nowait clause #pragma omp parallel for nowait

  • Explicit synchronization

– single, critical, atomic, ordered, flush

01/23/2017 CS4230 14

slide-15
SLIDE 15

Exercise

  • See prime_sequential.c
  • How to improve? Write a thread parallel

version using what we discussed

  • Observe scalability with the #of threads

01/23/2017 CS4230 15

Threads Time (s) Speedup

slide-16
SLIDE 16

Summary

  • What’s good?

– Small changes are required to produce a parallel program from sequential (parallel formulation) – Avoid having to express low-level mapping details – Portable and scalable, correct on 1 processor

  • What is missing?

– Not completely natural if want to write a parallel code from scratch – Not always possible to express certain common parallel constructs – Locality management – Control of performance

01/23/2017 CS4230 16

slide-17
SLIDE 17

References

[1] Blaise Barney, Lawrence Livermore National Laboratory https://computing.llnl.gov/tutorials/openMP [2] XSEDE HPC Workshop: OpenMP https://www.psc.edu/index.php/136- users/training/2496-xsede-hpc-workshop- january-17-2017-openmp

01/23/2017 CS4230 17