in openmp
play

in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing - PowerPoint PPT Presentation

Barriers in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work


  1. Barriers in OpenMP Paolo Burgio paolo.burgio@unimore.it

  2. Outline › Expressing parallelism – Understanding parallel threads › Memory Data management – Data clauses › Synchronization – Barriers, locks, critical sections › Work partitioning – Loops, sections, single work, tasks… › Execution devices – Target 2

  3. OpenMP synchronization › OpenMP provides the following synchronization constructs: – barrier – flush – master – critical – atomic – taskwait – taskgroup – ordered – ..and OpenMP locks 3

  4. Creating ting a pa parreg eg › Master-slave, fork-join execution model – Master thread spawns a team of Slave threads – They all perform computation in parallel – At the end of the parallel region, implicit barrier int main() { /* Sequential code */ #pragma omp parallel num_threads(4) { /* Parallel code */ } // Parreg end: (implicit) barrier /* (More) sequential code */ } 4

  5. Creating ting a pa parreg eg › Master-slave, fork-join execution model – Master thread spawns a team of Slave threads – They all perform computation in parallel – At the end of the parallel region, implicit barrier int main() { T /* Sequential code */ #pragma omp parallel num_threads(4) { /* Parallel code */ } // Parreg end: (implicit) barrier /* (More) sequential code */ } 4

  6. Creating ting a pa parreg eg › Master-slave, fork-join execution model – Master thread spawns a team of Slave threads – They all perform computation in parallel – At the end of the parallel region, implicit barrier int main() { /* Sequential code */ #pragma omp parallel num_threads(4) { T T T T /* Parallel code */ } // Parreg end: (implicit) barrier /* (More) sequential code */ } 4

  7. Creating ting a pa parreg eg › Master-slave, fork-join execution model – Master thread spawns a team of Slave threads – They all perform computation in parallel – At the end of the parallel region, implicit barrier int main() { /* Sequential code */ #pragma omp parallel num_threads(4) { /* Parallel code */ } // Parreg end: (implicit) barrier T /* (More) sequential code */ } 4

  8. OpenMP explicit barriers #pragma omp barrier new-line (a standalone directive) › All threads in a team must wait for all the other threads before going on – "Each barrier region must be encountered by all threads in a team or by none at all" – "The sequence of barrier regions encountered must be the same for every thread in a team" – Why? › Binding set is the team of threads from the innermost enclosing parreg – "It applies to" › Also, it enforces a consistent view of the shared memory – We'll see this.. 5

  9. Let's Exercise code! › Spawn a team of (many) parallel Threads – Printing "Hello World" – Put a #pragma omp barrier – Reprint "Hello World" after › What do you see? – Now, remove the barrier construct › Now, put the barrier inside an if – E.g., if(omp_get_thread_num() == 0) { ... } – What do you see? – Error!!!! 6

  10. Effects on memory › Besides synchronization, a barrier has the effect of making threads' temporary view of the shared memory consistent – You cannot trust any (potentially modified) shared vars before a barrier – Of course, there are no problems with private vars › ..what??? 7

  11. The OpenMP memory model › Shared memory with relaxed consistency – Threads have access to "a place to store and to retrieve variables, called the memory" – Threads can have a temporary view of the memory › Caches, registers, scratchpads… › Can still be accessed by other threads first/ private(a) Process Priv. Priv. T T Temp Temp Priv. Shared Temp ????? T ?? VAR VAR VAR s hared(a) 8

  12. A bit of architecture…

  13. Caches in a nutshell › A quick memory connected to the core processor – ..and to the main memory – Few KB of data › (If any,) caches are a pure hardware mechanism – Used to store a copy mostly accessed data – To speedup execution even by 10-20 times – Istruction caches/Data caches › They perform their work automatically – And transparently – Poor or no control at all at application level – Extremely dangerous in multi- and many-cores 10

  14. Caches eng.wikipedia.org A cache is a hardware or software component that stores data so future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation, or the duplicate of data stored elsewhere. T T T T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ I$ I$ I$ I$ Level-2 $ Offchip memory Main memory, or L3 cache 11

  15. The catch(es) › Caches are power hungry – Some embedded architectures do not have D$ › They are not suitable for critical systems – E.g., BOSCH removed I$s › Hardware mechanism, poor control on them – Flush command (typically, all cache) – Color cache (assign to threads) – Prefetch (move data before it's actually needed) Coherency problem in multi/many-cores!! 12

  16. An example: read stale data a = 5; b = a; // ... T T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 13

  17. An example: read stale data a = 5; b = a; // ... T T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 11 Main memory a 11 13

  18. An example: read stale data a = 5; b = a; // ... T T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 11 Main memory a 11 5 13

  19. An example: read stale data a = 5; b = a; // ... T T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 11 Main memory a 11 5 13

  20. An example: read stale data a = 5; b = a; // ... T dcache_flush(); T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 14

  21. An example: read stale data a = 5; b = a; // ... T dcache_flush(); T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 11 Main memory a 11 14

  22. An example: read stale data a = 5; b = a; // ... T dcache_flush(); T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 11 Main memory a 11 5 14

  23. An example: read stale data a = 5; b = a; // ... T dcache_flush(); T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 5 14

  24. An example: read stale data a = 5; b = a; // ... T dcache_flush(); T c = a; CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 5 14

  25. An(other) example: $ writing policies Write-through a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 11 15

  26. An(other) example: $ writing policies Write-through a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 11 5 15

  27. An(other) example: $ writing policies Write-through a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 11 5 15

  28. An(other) example: $ writing policies Write-through a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ 5 D$ 5 Main memory a 11 11 5 15

  29. An(other) example: $ writing policies Write-back a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 16

  30. An(other) example: $ writing policies Write-back a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 16

  31. An(other) example: $ writing policies Write-back a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 11 Main memory a 11 16

  32. An(other) example: $ writing policies Write-back a = 5; b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 11 Main memory a 11 5 16

  33. An(other) example: $ writing policies Write-back w/cache flush a = 5; dcache_flush(); b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 11 17

  34. An(other) example: $ writing policies Write-back w/cache flush a = 5; dcache_flush(); b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 11 17

  35. An(other) example: $ writing policies Write-back w/cache flush a = 5; dcache_flush(); b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ Main memory a 11 5 11 17

  36. An(other) example: $ writing policies Write-back w/cache flush a = 5; dcache_flush(); b = a; T T CPU CPU CPU CPU 3 2 0 1 D$ D$ D$ D$ 5 Main memory a 11 5 11 17

  37. The flush directive #pragma omp flush [ ( list ) ] new-line › Binding thread set is the encountering thread – More "relaxed" › "It executes the OpenMP flush operation" – Makes its temporary view of the shared memory consistent with other threads – "Calls to dcache_flush() " › Enforces an order on the memory operations on the variables specified in list 18

  38. Semantics: barrier vs flush #pragma omp barrier › Joins the threads of a team › Applies to all threads of a team › Forces consistency of threads' temporary view of the shared memory #pragma omp flush › Applies to one thread › Forces consistency of its temporary view of the shared memory › Much lighter! 19

  39. OpenMP software stack › Multi-layer stack User code – Engineered for portability a = 5; T #pragma omp flush void GOMP_flush() { OpenMP runtime dcache_flush(); } void dcache_flush() { Operating System asm("mov r15, #1"); } Hardware D$ 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend