tasking in openmp
play

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work


  1. Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it

  2. Outline › Expressing parallelism – Understanding parallel threads › Memory Data management – Data clauses › Synchronization – Barriers, locks, critical sections › Work partitioning – Loops, sections, single work, tasks… › Execution devices – Target 2

  3. A history of OpenMP › 1997 – OpenMP for Fortran 1.0 › 1998 – OpenMP for C/C++ 1.0 › 2000 Thread- Regular, loop-based parallelism – centric OpenMP for Fortran 2.0 › 2002 – OpenMP for C/C++ 2.5 › 2008 Task- centric – OpenMP 3.0 Irregular, parallelism ➔ tasking › 2011 – OpenMP 3.1 › 2014 Devices Heterogeneous parallelism, à la GP-GPU – OpenMP 4.5 3

  4. OpenMP programming patterns › "Traditional" OpenMP has a thread-centric execution model – Fork/join – Master-slave › Create a team of threads… – ..then partition the work among them – Using work-sharing constructs 4

  5. OpenMP programming patterns #pragma omp sections #pragma omp for #pragma omp single { for (int i=0; i<8; i++) { #pragma omp section { work(); { A(); } // ... } #pragma omp section { B(); } } #pragma omp section { C(); } #pragma omp section { D(); } } T T T T T T 0 4 5 B 1 A W O 6 C R 2 K D 7 3 5

  6. Let's Exercise code! › Traverse a tree – Perform the same operation on all elements – Download sample code › Recursive r x 6

  7. Let's Exercise code! › Now, parallelize it! void traverse_tree(node_t *n) { – From the example doYourWork(n); if(n->left) traverse_tree(n->left); if(n->right) traverse_tree(n->right); 0 } 1 2 ... traverse_tree(root); 3 4 5 6 7

  8. Solved: traversing a tree in parallel › Recursive void traverse_tree(node_t *n) { – Parreg+section for each call #pragma omp parallel sections { – Nested parallelism #pragma omp section doYourWork(n); › Assume the very first time #pragma omp section we call traverse_tree if(n->left) traverse_tree(n->left); – Root node #pragma omp section if(n->right) traverse_tree(n->right); 0 } } 1 2 ... 3 4 traverse_tree(root); 5 6 8

  9. Catches (1) › Cannot nest worksharing void traverse_tree(node_t *n) { constructs without an doYourWork(n); intervening parreg #pragma omp parallel sections – And its barrier… { – Costly #pragma omp section if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier } // Parrg barrier r ... x traverse_tree(root); 9

  10. Catches (2) › #threads grows exponentially void traverse_tree(node_t *n) { – Harder to manage doYourWork(n); #pragma omp parallel sections { #pragma omp section if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier T } // Parrg barrier r T ... x traverse_tree(root); T T T 10

  11. Catches (3) › Code is not easy to void traverse_tree(node_t *n) { understand doYourWork(n); › Even harder to modify #pragma omp parallel sections { – What if I add a third #pragma omp section child node? if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier T } // Parrg barrier r T ... x traverse_tree(root); T T 11

  12. Limitations of "traditional" WS Cannot nest worksharing constructs without an intervening parreg › Parreg are traditionally costly – A lot of operations to create a team of threads – Barrier… Parreg Static loops prologue Dyn loops start 30k cycles 10-150 cycles 5-6k cycles › The number of threads explodes and it's harder to manage – Parreg => create new threads 12

  13. Limitations of "traditional" WS It is cumbersome to create parallelism dynamically › In loops, sections – Work is statically determined! – Before entering the construct – Even in dynamic loops › "if <condition>, then create work" T T #pragma omp for for (int i=0; i<8; i++) 0 4 { // ... 5 1 } 6 2 7 3 13

  14. Limitations of "traditional" WS Poor semantics for irregular workload › Sections-based parallelism that is anyway cumbersome to write – OpenMP was born for loop-based parallelism › Code not scalable – Even a small modifications causes you to re-think the strategy #pragma omp sections { T T #pragma omp section { A(); } #pragma omp section B { B(); } A #pragma omp section { C(); } C #pragma omp section D { D(); } } 14

  15. A different parallel paradigm A work-oriented paradigm for partitioning workloads › Implements a producer-consumer paradigm – As opposite to OpenmP thread-centric model › Introduce the task pool – Where units of work (OpenMP tasks) – are pushed by threads – and pulled and executed by threads › E.g., implemented as a fifo queue (aka task queue) T T T T t t T t t T Producer(s) Consumer(s) 15

  16. The task directive #pragma omp task [clause [[,] clause]...] new-line structured-block Where clauses can be: if( [ task : ]scalar-expression ) final( scalar-expression ) untied default(shared | none) mergeable private (list ) firstprivate (list ) shared (list ) depend (dependence-type : list ) priority (priority-value ) › We will see only data sharing clauses – Same as parallel but… DEFAULT IS NOT SHARED!!!! 16

  17. Two sides /* Create threads */ › Tasks are produced #pragma omp parallel num_treads(2) { /* Push a task in the q */ › Tasks are consumed #pragma omp task { t0(); Let's } code! /* Push another task in the q */ #pragma omp task › Try this! t1(); – t 0 and t 1 are printf } // Implicit barrier – Also, print who produces T T t 0 T T t 1 Producer(s) Consumer(s) 17

  18. I cheated a bit /* Create threads */ › How many producers? #pragma omp parallel num_treads(2) { – So, how many tasks? /* Push a task in the q */ #pragma omp task { t0(); } /* Push another task in the q */ #pragma omp task t1(); } // Implicit barrier T T t 0 T T t 1 Producer(s) Consumer(s) 18

  19. I cheated a bit /* Create threads */ › How many producers? #pragma omp parallel num_treads(2) { – So, how many tasks? /* Push a task in the q */ #pragma omp task { t0(); } /* Push another task in the q */ #pragma omp task t1(); } // Implicit barrier T T t 1 t 0 T T t 1 t 0 Producer(s) Consumer(s) 18

  20. Let's make it simpler › Work is produced in parallel by threads › Work is consumed in parallel by threads › A lot of confusion! – Number of tasks grows – Hard to control producers › How to make this simpler? 19

  21. Single-producer, multiple consumers › A paradigm! Typically preferred by programmers – Code more understandable /* Create threads */ /* Create threads */ – Simple #pragma omp parallel num_treads(2) #pragma omp parallel num_treads(2) { { – More manageable #pragma omp single { #pragma omp task #pragma omp task › How to do this? t0(); t0(); #pragma omp task #pragma omp task t1(); t1(); } } // Implicit barrier } // Implicit barrier T t 0 T T t 1 Producer(s) Consumer(s) 20

  22. The task directive /* Create threads */ Can be used #pragma omp parallel num_treads(2) { #pragma omp single { /* Push a task in the q */ › in a nested manner #pragma omp task { /* Push a (children) task in the q */ – Before doing work, #pragma omp task produce two other tasks t1(); – Only need one parreg /* Conditionally push task in the q */ "outside" if(cond) #pragma omp task t2(); › in an irregular manner /* After producing t1 and t2, * do some work */ – See cond ? t0(); } – Barriers are not involved! } } // Implicit barrier – Unlike parregs' 21

  23. The task directive /* Create threads */ #pragma omp parallel num_treads(2) t 0 { cond? #pragma omp single { /* Push a task in the q */ #pragma omp task t 2 t 1 { /* Push a (children) task in the q */ #pragma omp task t1(); /* Conditionally push task in the q */ › A task graph if(cond) #pragma omp task t2(); › Edges are "father-son" /* After producing t1 and t2, relationships * do some work */ t0(); › Not timing/precendence!!! } } } // Implicit barrier 22

  24. It's a matter of time › The task directive represents the push in the WQ – And the pull??? › Not "where" it is in the code – But, when! › In OpenMP tasks, we separate the moment in time – when we produce work (push - #pragma omp task ) – when we consume the work (pull - ????) 23

  25. Timing de-coupling › One thread produces /* Create threads */ › All of the thread consume #pragma omp parallel num_treads(2) { #pragma omp single { #pragma omp task › ..but, when???? t0(); #pragma omp task t1(); } // Implicit barrier } // Implicit barrier T t 0 T T Producer(s) Consumer(s) 24

  26. Timing de-coupling › One thread produces /* Create threads */ › All of the thread consume #pragma omp parallel num_treads(2) { #pragma omp single { #pragma omp task › ..but, when???? t0(); #pragma omp task t1(); } // Implicit barrier } // Implicit barrier T t 0 T T t 1 Producer(s) Consumer(s) 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend