CS240A, T. Yang, 2013 Modified from Demmel/Yelick’s and Mary Hall’s Slides
1
Programming with OpenMP CS240A, T. Yang, 2013 Modified from - - PowerPoint PPT Presentation
Parallel Programming with OpenMP CS240A, T. Yang, 2013 Modified from Demmel/Yelicks and Mary Halls Slides 1 Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining
CS240A, T. Yang, 2013 Modified from Demmel/Yelick’s and Mary Hall’s Slides
1
2
programs
3
parallel regions, rather than T concurrently-executing threads.
4
int main() { // Do this part in parallel printf( "Hello, World!\n" ); return 0; }
5
int main() {
// Do this part in parallel #pragma omp parallel { printf( "Hello, World!\n" ); } return 0; }
Printf Printf Printf Printf
threads in a team
#pragma omp parallel [ clause [ clause ] ... ] new-line structured-block
8
(reads/write or write/write pairs) between iterations!
bounds and divide iterations among parallel threads
?
for( i=0; i < 25; i++ ) { printf(“Foo”); } #pragma omp parallel for
sum = 0; #pragma omp parallel for reduction(+:sum) for (i=0; i < 100; i++) { sum += array[i]; }
1 1 1 i 1 i i b a
n 1 n i 1 n 1 n 2 1 1 x x x x x x b a
n 1 n 2 1 1
1
1
2
3
4
14
threads
account for all iterations
allocating an additional [chunk] iterations when a thread finishes
exponentially reduced with each allocation
2 (2)
coordination
complexity and frequency of decisions
OMP_NUM_THREADS
value of this environment variable is the maximum number of threads to use
setenv OMP_NUM_THREADS 16 [csh, tcsh] export OMP_NUM_THREADS=16 [sh, ksh, bash] OMP_SCHEDULE
schedule type RUNTIME
setenv OMP_SCHEDULE GUIDED,4 [csh, tcsh]
export OMP_SCHEDULE= GUIDED,4 [sh, ksh, bash]
19
two types of data
threads, similarly named
thread (often stack-allocated)
shared
private // shared, globals int bigdata[1024]; void* foo(void* bar) { // private, stack int tid; /* Calculation goes here */ } int bigdata[1024]; void* foo(void* bar) { int tid; #pragma omp parallel \ shared ( bigdata ) \ private ( tid ) { /* Calc. here */ } }
20
require flush directive
parallel regions
#pragma omp critical { /* Critical code here */ } #pragma omp barrier
/* Code goes here */
#pragma omp single { /* Only executed once */ }
CS267 Lecture 6
21
for( t=0; t < t_steps; t++) { for( x=0; x < x_dim; x++) { for( y=0; y < y_dim; y++) { grid[x][y] = /* avg of neighbors */ } } } #pragma omp parallel for \ shared(grid,x_dim,y_dim) private(x,y) // Implicit Barrier Synchronization temp_grid = grid; grid = other_grid;
CS267 Lecture 6
22
CS267 Lecture 6
23
25
CS267 Lecture 6
26