OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it

Outline › Expressing parallelism – Understanding parallel threads › Memory Data management – Data clauses › Synchronization – Barriers, locks, critical sections › Work partitioning – Loops, sections, single work, tasks… › Execution devices – Target 2

Let's talk about performance › We already saw how parallelism ≠> performance – Example: a loop – If one thread is delayed, it prevents other threads to do useful work!! #pragma omp parallel num_threads(4) T { T T T #pragma omp for for(int i=0; i<N; i++) { ... } // (implicit) barrier // USEFUL WORK!! } // (implicit) barrier 3

Unbalanced loop partitioning › Iterations are statically assigned before entering the loop – Might not be effective nor efficient T T T T #pragma omp parallel for num_threads (4) for (int i=0; i<16; i++) { /* UNBALANCED LOOP CODE */ I D I I L D D E L L E E } /* (implicit) Barrier */ 4

Dynamic loops › Assign iterations to threads in a dynamic manner – At runtime!! › Static semantic – "Partition the loop in N threads parts threads and assign them to the team" – Naive and passive › Dynamic semantic – "Each thread in the team fetches an iteration (or a block of) when he's idle" – Proactive – Work-conservative 5

Dynamic loops › Activated using the schedule clause #pragma omp parallel for num_threads (4) \ T T T T schedule(dynamic) for (int i=0; i<16; i++) { /* UNBALANCED LOOP CODE */ 15 } /* (implicit) Barrier */ 6

The schedule clause #pragma omp for [clause [[,] clause]...] new-line for-loops Where clauses can be: private( list ) firstprivate( list ) lastprivate( list ) linear( list [ : linear-step]) reduction(reduction-identifier : list ) schedule([ modifier [, modifier ]:] kind [, chunk_size ]) collapse( n ) ordered[( n )] nowait › The iteration space is divided according to the schedule clause – kind can be : { static | dynamic | guided | auto | runtime } 7

OMP loop schedule policies › schedule(static [, chunk_size] ) – Iterations are divided into chunks of chunk_size , and chunks are assigned to threads before entering the loop – If chunk_size unspecified, = NITER/NTHREADS (with some adjustement…) › schedule(dynamic [, chunk_size] ) – Iterations are divided into chunks of chunk_size – At runtime, each thread requests for a new chunk after finishing one – If chunk_size unspecified, then = 1 8

Static vs. Dynamic ID 0 #pragma omp parallel for num_threads (2) \ ID 1 schedule( ... ) for (int i=0; i<8; i++) T T T T { // ... 4 1 0 0 5 3 1 } /* (implicit) Barrier */ 2 6 4 2 5 7 6 7 3 9

OMP loop schedule policies (cont'd) › schedule(guided[, chunk_size]) – A mix of static and dynamic – chunk_size determined statically, assignment done dynamically › schedule(auto) – Programmer let compiler and/or runtime decide – Chunk size, thread mapping.. – "I wash my hands" › schedule(runtime) – Only runtime decides according to run-sched-var ICV – If run-sched-var = auto , then implementation defined 10

Loops chunking schedule(dynamic, 1) schedule(dynamic, NITER/NTRHD) Schedule(dynamic) schedule(static) schedule(dynamic, 2) ID 0 ID 1 T T T T T T T T 4 0 0 0 4 0 1 2 5 1 1 2 5 1 3 3 chunk 6 2 4 4 6 2 6 5 7 3 5 6 7 3 7 7 11

Modifiers, collapsed and ordered #pragma omp for [clause [[,] clause]...] new-line for-loops Where clauses can be: private( list ) firstprivate( list ) lastprivate( list ) linear( list [ : linear-step]) reduction(reduction-identifier : list ) schedule([ modifier [, modifier ]:] kind [, chunk_size ]) collapse( n ) ordered[( n )] nowait › These we won't see – E.g., modifier can be : { monothonic | nonmonothonic | simd } – Let you tune the loop and give more information to the OMP stack – To maximize performance 12

Static vs. dynamic loops › So, why not always dynamic? – For unbalanced workloads, they are more flexible – "For balanced workload, in the worst case, they behave like static loops!" Not always true! › Static loops loops have a (light) cost only before the loop – Actually, the lighter way you can distribute work in OpenMP!! – Often a performance reference.. › Dynamic loops have a cost: – For initializing the loop – For fetching a(nother) chunk of work – At the end of the loop 13

OpenMP loops overhead schedule(dynamic, 1) schedule(dynamic, NITER/NTHRD) schedule(dynamic) schedule(dynamic, 2) schedule(static) T T T T T T T T 4 0 4 0 0 1 2 0 1 5 3 1 1 5 3 2 6 3 2 6 4 6 4 4 5 7 3 7 5 7 6 7 14

Let's Exercise code! › Create an array of N elements – Put inside each array element its index, multiplied by '2' – arr [ 0 ] = 0 ; arr [ 1 ] = 2 ; arr [ 2 ] = 4 ; ...and so on.. › Now, simulate unbalanced workload – Use both static and dynamic loops – Each thread prints iteration index i – What do you (should) see? #pragma omp parallel for schedule(...) for (int i=0; i<NUM; i++) { // ... // Simulate iteration-dependant work volatile long a = i * 1000000L ; while(a--) ; } 15

Let's How to run the examples code! › Download the Code/ folder from the course website › Compile › $ gcc – fopenmp code.c -o code › Run (Unix/Linux) $ ./code › Run (Win/Cygwin) $ ./code.exe 16

References › "Calcolo parallelo" website – http://hipert.unimore.it/people/paolob/pub/PhD/index.html › My contacts – paolo.burgio@unimore.it – http://hipert.mat.unimore.it/people/paolob/ › Useful links – http://www.openmp.org – http://www.google.com – http://gcc.gnu.org 17

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Asynchronous Programming Under the Hood Week 6 How to Implement Event Dispatchers Recall:

Isogeny based crypto: whats under the hood? Luca De Feo Universit Paris Saclay UVSQ Nov

Chapter 5: Support Vector Machines Dr. Xudong Liu Assistant Professor School of Computing

Algebraic Approach to Promise Constraint Satisfaction Alexandr Kazda Department of Algebra

A Look Under the Hood of SMART Designs for Developing Adaptive Interventions Daniel Almirall, PhD

Limit Laws for q -Hook Formulas Sara Billey University of Washington Based on joint work with:

Win at Reversing API Tracing and Sandboxing through Inline Hooking Nick Harbour Agenda

New Poverty Data Highlights Need to Help Workers and Families Login at:

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline - PowerPoint PPT Presentation

OpenMP dynamic loops Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks, critical sections Work

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

LOOPS Loops Loops Loops! How can we repeat a piece of code without having to write it out over

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &amp;

Loops! Flow of Control: Loops (Savitch, Chapter 4) TOPICS while Loops do while

Loops! Loops! Loops! Lecture 10 COP 3014 Spring 2017 January 31, 2017 Repetition Statements

Loops! Loops! Loops! Lecture 5 COP 3014 Fall 2020 September 17, 2020 Repetition Statements

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Asynchronous Programming Under the Hood Week 6 How to Implement Event Dispatchers Recall:

Isogeny based crypto: whats under the hood? Luca De Feo Universit Paris Saclay UVSQ Nov

Chapter 5: Support Vector Machines Dr. Xudong Liu Assistant Professor School of Computing

Algebraic Approach to Promise Constraint Satisfaction Alexandr Kazda Department of Algebra

A Look Under the Hood of SMART Designs for Developing Adaptive Interventions Daniel Almirall, PhD

Limit Laws for q -Hook Formulas Sara Billey University of Washington Based on joint work with:

Win at Reversing API Tracing and Sandboxing through Inline Hooking Nick Harbour Agenda

New Poverty Data Highlights Need to Help Workers and Families Login at:

Tutorial 3 Loops Side Effects 1 CS 136 Spring 2020 Tutorial 3 Loops: for loops &