Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it

Outline › Expressing parallelism – Understanding parallel threads › Memory Data management – Data clauses › Synchronization – Barriers, locks, critical sections › Work partitioning – Loops, sections, single work, tasks… › Execution devices – Target 2

A history of OpenMP › 1997 – OpenMP for Fortran 1.0 › 1998 – OpenMP for C/C++ 1.0 › 2000 Thread- Regular, loop-based parallelism – centric OpenMP for Fortran 2.0 › 2002 – OpenMP for C/C++ 2.5 › 2008 Task- centric – OpenMP 3.0 Irregular, parallelism ➔ tasking › 2011 – OpenMP 3.1 › 2014 Devices Heterogeneous parallelism, à la GP-GPU – OpenMP 4.5 3

OpenMP programming patterns › "Traditional" OpenMP has a thread-centric execution model – Fork/join – Master-slave › Create a team of threads… – ..then partition the work among them – Using work-sharing constructs 4

OpenMP programming patterns #pragma omp sections #pragma omp for #pragma omp single { for (int i=0; i<8; i++) { #pragma omp section { work(); { A(); } // ... } #pragma omp section { B(); } } #pragma omp section { C(); } #pragma omp section { D(); } } T T T T T T 0 4 5 B 1 A W O 6 C R 2 K D 7 3 5

Let's Exercise code! › Traverse a tree – Perform the same operation on all elements – Download sample code › Recursive r x 6

Let's Exercise code! › Now, parallelize it! void traverse_tree(node_t *n) { – From the example doYourWork(n); if(n->left) traverse_tree(n->left); if(n->right) traverse_tree(n->right); 0 } 1 2 ... traverse_tree(root); 3 4 5 6 7

Solved: traversing a tree in parallel › Recursive void traverse_tree(node_t *n) { – Parreg+section for each call #pragma omp parallel sections { – Nested parallelism #pragma omp section doYourWork(n); › Assume the very first time #pragma omp section we call traverse_tree if(n->left) traverse_tree(n->left); – Root node #pragma omp section if(n->right) traverse_tree(n->right); 0 } } 1 2 ... 3 4 traverse_tree(root); 5 6 8

Catches (1) › Cannot nest worksharing void traverse_tree(node_t *n) { constructs without an doYourWork(n); intervening parreg #pragma omp parallel sections – And its barrier… { – Costly #pragma omp section if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier } // Parrg barrier r ... x traverse_tree(root); 9

Catches (2) › #threads grows exponentially void traverse_tree(node_t *n) { – Harder to manage doYourWork(n); #pragma omp parallel sections { #pragma omp section if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier T } // Parrg barrier r T ... x traverse_tree(root); T T T 10

Catches (3) › Code is not easy to void traverse_tree(node_t *n) { understand doYourWork(n); › Even harder to modify #pragma omp parallel sections { – What if I add a third #pragma omp section child node? if(n->left) traverse_tree(n->left); #pragma omp section if(n->right) traverse_tree(n->right); } // Barrier T } // Parrg barrier r T ... x traverse_tree(root); T T 11

Limitations of "traditional" WS Cannot nest worksharing constructs without an intervening parreg › Parreg are traditionally costly – A lot of operations to create a team of threads – Barrier… Parreg Static loops prologue Dyn loops start 30k cycles 10-150 cycles 5-6k cycles › The number of threads explodes and it's harder to manage – Parreg => create new threads 12

Limitations of "traditional" WS It is cumbersome to create parallelism dynamically › In loops, sections – Work is statically determined! – Before entering the construct – Even in dynamic loops › "if <condition>, then create work" T T #pragma omp for for (int i=0; i<8; i++) 0 4 { // ... 5 1 } 6 2 7 3 13

Limitations of "traditional" WS Poor semantics for irregular workload › Sections-based parallelism that is anyway cumbersome to write – OpenMP was born for loop-based parallelism › Code not scalable – Even a small modifications causes you to re-think the strategy #pragma omp sections { T T #pragma omp section { A(); } #pragma omp section B { B(); } A #pragma omp section { C(); } C #pragma omp section D { D(); } } 14

A different parallel paradigm A work-oriented paradigm for partitioning workloads › Implements a producer-consumer paradigm – As opposite to OpenmP thread-centric model › Introduce the task pool – Where units of work (OpenMP tasks) – are pushed by threads – and pulled and executed by threads › E.g., implemented as a fifo queue (aka task queue) T T T T t t T t t T Producer(s) Consumer(s) 15

The task directive #pragma omp task [clause [[,] clause]...] new-line structured-block Where clauses can be: if( [ task : ]scalar-expression ) final( scalar-expression ) untied default(shared | none) mergeable private (list ) firstprivate (list ) shared (list ) depend (dependence-type : list ) priority (priority-value ) › We will see only data sharing clauses – Same as parallel but… DEFAULT IS NOT SHARED!!!! 16

Two sides /* Create threads */ › Tasks are produced #pragma omp parallel num_treads(2) { /* Push a task in the q */ › Tasks are consumed #pragma omp task { t0(); Let's } code! /* Push another task in the q */ #pragma omp task › Try this! t1(); – t 0 and t 1 are printf } // Implicit barrier – Also, print who produces T T t 0 T T t 1 Producer(s) Consumer(s) 17

I cheated a bit /* Create threads */ › How many producers? #pragma omp parallel num_treads(2) { – So, how many tasks? /* Push a task in the q */ #pragma omp task { t0(); } /* Push another task in the q */ #pragma omp task t1(); } // Implicit barrier T T t 0 T T t 1 Producer(s) Consumer(s) 18

I cheated a bit /* Create threads */ › How many producers? #pragma omp parallel num_treads(2) { – So, how many tasks? /* Push a task in the q */ #pragma omp task { t0(); } /* Push another task in the q */ #pragma omp task t1(); } // Implicit barrier T T t 1 t 0 T T t 1 t 0 Producer(s) Consumer(s) 18

Let's make it simpler › Work is produced in parallel by threads › Work is consumed in parallel by threads › A lot of confusion! – Number of tasks grows – Hard to control producers › How to make this simpler? 19

Single-producer, multiple consumers › A paradigm! Typically preferred by programmers – Code more understandable /* Create threads */ /* Create threads */ – Simple #pragma omp parallel num_treads(2) #pragma omp parallel num_treads(2) { { – More manageable #pragma omp single { #pragma omp task #pragma omp task › How to do this? t0(); t0(); #pragma omp task #pragma omp task t1(); t1(); } } // Implicit barrier } // Implicit barrier T t 0 T T t 1 Producer(s) Consumer(s) 20

The task directive /* Create threads */ Can be used #pragma omp parallel num_treads(2) { #pragma omp single { /* Push a task in the q */ › in a nested manner #pragma omp task { /* Push a (children) task in the q */ – Before doing work, #pragma omp task produce two other tasks t1(); – Only need one parreg /* Conditionally push task in the q */ "outside" if(cond) #pragma omp task t2(); › in an irregular manner /* After producing t1 and t2, * do some work */ – See cond ? t0(); } – Barriers are not involved! } } // Implicit barrier – Unlike parregs' 21

The task directive /* Create threads */ #pragma omp parallel num_treads(2) t 0 { cond? #pragma omp single { /* Push a task in the q */ #pragma omp task t 2 t 1 { /* Push a (children) task in the q */ #pragma omp task t1(); /* Conditionally push task in the q */ › A task graph if(cond) #pragma omp task t2(); › Edges are "father-son" /* After producing t1 and t2, relationships * do some work */ t0(); › Not timing/precendence!!! } } } // Implicit barrier 22

It's a matter of time › The task directive represents the push in the WQ – And the pull??? › Not "where" it is in the code – But, when! › In OpenMP tasks, we separate the moment in time – when we produce work (push - #pragma omp task ) – when we consume the work (pull - ????) 23

Timing de-coupling › One thread produces /* Create threads */ › All of the thread consume #pragma omp parallel num_treads(2) { #pragma omp single { #pragma omp task › ..but, when???? t0(); #pragma omp task t1(); } // Implicit barrier } // Implicit barrier T t 0 T T Producer(s) Consumer(s) 24

Timing de-coupling › One thread produces /* Create threads */ › All of the thread consume #pragma omp parallel num_treads(2) { #pragma omp single { #pragma omp task › ..but, when???? t0(); #pragma omp task t1(); } // Implicit barrier } // Implicit barrier T t 0 T T t 1 Producer(s) Consumer(s) 24

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

CO2101 Processes and Multi-tasking Tom Ridge (tr61) 7th October 2019 tr61 Multi-tasking

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

Flavor anomalies and their high-p T connections Gagan Mohanty IMHEP, IOP Bhubaneswar January

Frontiers of research at CERN and the computing and data management needs E.Elsen EGI Conference

Lizardo Valencia Palomo 28/11/2016 For the Collaboration Content Physics motivation The ALICE

Measurements of open heavy-flavour production in pp and p-Pb collisions with ALICE Sarah

Automatic WCET Analysis of Real-Time Parallel Applications Haluk Ozaktas Christine Rochange

TECHNICAL ANALYSIS OF OPTIONS TO PREVENT LAND CONTAMINATION January 9, 2013 Neil M. Wilmshurst

Complex manifolds of dimension 1 lecture 8: Geodesics on the Poincar e plane Misha Verbitsky

Scalable Asynchronous Contact Mechanics using Charm++ Xiang Ni* , Laxmikant V. Kale* and Rasmus

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

Tasking in OpenMP Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

CO2101 Processes and Multi-tasking Tom Ridge (tr61) 7th October 2019 tr61 Multi-tasking

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking &amp; &amp; Co- -ordination

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

Flavor anomalies and their high-p T connections Gagan Mohanty IMHEP, IOP Bhubaneswar January

Frontiers of research at CERN and the computing and data management needs E.Elsen EGI Conference

Lizardo Valencia Palomo 28/11/2016 For the Collaboration Content Physics motivation The ALICE

Measurements of open heavy-flavour production in pp and p-Pb collisions with ALICE Sarah

Automatic WCET Analysis of Real-Time Parallel Applications Haluk Ozaktas Christine Rochange

TECHNICAL ANALYSIS OF OPTIONS TO PREVENT LAND CONTAMINATION January 9, 2013 Neil M. Wilmshurst

Complex manifolds of dimension 1 lecture 8: Geodesics on the Poincar e plane Misha Verbitsky

Scalable Asynchronous Contact Mechanics using Charm++ Xiang Ni* , Laxmikant V. Kale* and Rasmus

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination

Lambeth Lambeth Partnership Tasking Partnership Tasking & & Co- -ordination