OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline - - PowerPoint PPT Presentation

openmp dynamic loops
SMART_READER_LITE
LIVE PREVIEW

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline - - PowerPoint PPT Presentation

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work


slide-1
SLIDE 1

OpenMP dynamic loops

Paolo Burgio

paolo.burgio@unimore.it

slide-2
SLIDE 2

Outline

› Expressing parallelism

– Understanding parallel threads

› Memory Data management

– Data clauses

› Synchronization

– Barriers, locks, critical sections

› Work partitioning

– Loops, sections, single work, tasks…

› Execution devices

– Target

2

slide-3
SLIDE 3

Let's talk about performance

› We already saw how parallelism ≠> performance

– Example: a loop – If one thread is delayed, it prevents other threads to do useful work!!

3

T T T T

#pragma omp parallel num_threads(4) { #pragma omp for for(int i=0; i<N; i++) { ... } // (implicit) barrier // USEFUL WORK!! } // (implicit) barrier

slide-4
SLIDE 4

Let's talk about performance

› We already saw how parallelism ≠> performance

– Example: a loop – If one thread is delayed, it prevents other threads to do useful work!!

3

T T T T

#pragma omp parallel num_threads(4) { #pragma omp for for(int i=0; i<N; i++) { ... } // (implicit) barrier // USEFUL WORK!! } // (implicit) barrier

slide-5
SLIDE 5

Unbalanced loop partitioning

› Iterations are statically assigned before entering the loop

– Might not be effective nor efficient

4

T T T T

#pragma omp parallel for num_threads (4) for (int i=0; i<16; i++) { /* UNBALANCED LOOP CODE */ } /* (implicit) Barrier */ I D L E I D L E I D L E

slide-6
SLIDE 6

Dynamic loops

› Assign iterations to threads in a dynamic manner

– At runtime!!

› Static semantic

– "Partition the loop in Nthreads parts threads and assign them to the team" – Naive and passive

› Dynamic semantic

– "Each thread in the team fetches an iteration (or a block of) when he's idle" – Proactive – Work-conservative

5

slide-7
SLIDE 7

Dynamic loops

› Activated using the schedule clause

6

T T T T

#pragma omp parallel for num_threads (4) \ schedule(dynamic) for (int i=0; i<16; i++) { /* UNBALANCED LOOP CODE */ } /* (implicit) Barrier */ 15

slide-8
SLIDE 8

The schedule clause

› The iteration space is divided according to the schedule clause

– kind can be : { static | dynamic | guided | auto | runtime }

7

#pragma omp for [clause [[,] clause]...] new-line for-loops Where clauses can be: private(list) firstprivate(list) lastprivate(list) linear(list[ : linear-step]) reduction(reduction-identifier : list) schedule([modifier [, modifier]:]kind[, chunk_size]) collapse(n)

  • rdered[(n)]

nowait

slide-9
SLIDE 9

OMP loop schedule policies

› schedule(static[, chunk_size])

– Iterations are divided into chunks of chunk_size, and chunks are assigned to threads before entering the loop – If chunk_size unspecified, = NITER/NTHREADS (with some adjustement…)

› schedule(dynamic[, chunk_size])

– Iterations are divided into chunks of chunk_size – At runtime, each thread requests for a new chunk after finishing one – If chunk_size unspecified, then = 1

8

slide-10
SLIDE 10

Static vs. Dynamic

9 4 5 6 7

T T

1 2 3 1 6 7

T T

3 4 5 2 #pragma omp parallel for num_threads (2) \ schedule( ... ) for (int i=0; i<8; i++) { // ... } /* (implicit) Barrier */ ID 0 ID 1

slide-11
SLIDE 11

OMP loop schedule policies (cont'd)

› schedule(guided[, chunk_size])

– A mix of static and dynamic – chunk_size determined statically, assignment done dynamically

› schedule(auto)

– Programmer let compiler and/or runtime decide – Chunk size, thread mapping.. – "I wash my hands"

› schedule(runtime)

– Only runtime decides according to run-sched-var ICV – If run-sched-var = auto, then implementation defined

10

slide-12
SLIDE 12

Loops chunking

11 1 3 5 7

T T

2 4 6

schedule(static) schedule(dynamic, NITER/NTRHD) schedule(dynamic, 2) schedule(dynamic, 1) Schedule(dynamic)

2 3 6 7

T T

1 4 5 4 5 6 7

T T

1 2 3 1 2 3

T T

4 5 6 7 ID 1 ID 0 chunk

slide-13
SLIDE 13

Modifiers, collapsed and ordered

› These we won't see

– E.g., modifier can be : { monothonic | nonmonothonic | simd } – Let you tune the loop and give more information to the OMP stack – To maximize performance

12

#pragma omp for [clause [[,] clause]...] new-line for-loops Where clauses can be: private(list) firstprivate(list) lastprivate(list) linear(list[ : linear-step]) reduction(reduction-identifier : list) schedule([modifier [, modifier]:]kind[, chunk_size]) collapse(n)

  • rdered[(n)]

nowait

slide-14
SLIDE 14

Static vs. dynamic loops

› So, why not always dynamic?

– For unbalanced workloads, they are more flexible – "For balanced workload, in the worst case, they behave like static loops!"

Not always true! › Static loops loops have a (light) cost only before the loop

– Actually, the lighter way you can distribute work in OpenMP!! – Often a performance reference..

› Dynamic loops have a cost:

– For initializing the loop – For fetching a(nother) chunk of work – At the end of the loop

13

slide-15
SLIDE 15

OpenMP loops overhead

14 4 5 6 7

T T

1 3 4 1 3 5 7 2 4 6 2 3 6 7 1 4 5 4 5 6 7

T T

1 2 3

schedule(static) schedule(dynamic, NITER/NTHRD) schedule(dynamic, 2) schedule(dynamic, 1) schedule(dynamic)

T T T T

slide-16
SLIDE 16

Exercise

› Create an array of N elements

– Put inside each array element its index, multiplied by '2' – arr[0] = 0; arr[1] = 2; arr[2] = 4; ...and so on..

› Now, simulate unbalanced workload

– Use both static and dynamic loops – Each thread prints iteration index i – What do you (should) see?

15

Let's code!

#pragma omp parallel for schedule(...) for (int i=0; i<NUM; i++) { // ... // Simulate iteration-dependant work volatile long a = i * 1000000L; while(a--) ; }

slide-17
SLIDE 17

How to run the examples

› Download the Code/ folder from the course website › Compile › $ gcc –fopenmp code.c -o code › Run (Unix/Linux) $ ./code › Run (Win/Cygwin) $ ./code.exe

16

Let's code!

slide-18
SLIDE 18

References

› "Calcolo parallelo" website

– http://hipert.unimore.it/people/paolob/pub/PhD/index.html

› My contacts

– paolo.burgio@unimore.it – http://hipert.mat.unimore.it/people/paolob/

› Useful links

– http://www.openmp.org – http://www.google.com – http://gcc.gnu.org

17