Scheduling Strategies for Optimistic Parallel Execution of - PowerPoint PPT Presentation

Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala and L. Paul Chew University of Texas at Austin Cornell University

Amorphous Data Parallelism • Many irregular programs implement iterative algorithms over worklists ‣ Mesh refinement, agglomerative clustering, maxflow algorithms, compiler analyses, ... • Complex dependences between iterations • But many iterations can be executed in parallel • New elements can be added to worklist 2

Delaunay Mesh Refinement (DMR) Worklist wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Triangle t = wl.get(); if (t no longer in mesh) continue; Cavity c = new Cavity(t); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } 3

Delaunay Mesh Refinement (DMR) Worklist wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Triangle t = wl.get(); if (t no longer in mesh) continue; Cavity c = new Cavity(t); No ordering constraints on c.expand(); processing of worklist items c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } 3

Parallelism in DMR • Can process bad triangles concurrently ‣ As long as cavities do not overlap ‣ Cannot determine this until run time • Example of amorphous data parallelism • Our approach: Galois system for optimistic parallelization [PLDI’07, ASPLOS’08] 4

Galois System • User code ‣ Optimistic iterators foreach e in Set s do B(e) ‣ Sequential Semantics User Code • Class libraries ‣ Data structures Class Libraries ‣ Conflict conditions • Runtime system Runtime ‣ Optimistic parallelization ‣ Conflict detection & handling 5

DMR User Code Worklist wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Triangle t = wl.get(); if (t no longer in mesh) continue; Cavity c = new Cavity(t); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } 6

DMR User Code Worklist wl; wl.add(mesh.badTriangles()); foreach Triangle t in wl { if (t no longer in mesh) continue; Cavity c = new Cavity(t); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } 7

Scheduling Impact: DMR 2.2 stack 2 random 1.8 Speedup 1.6 1.4 1.2 1 0.8 1 2 3 4 # of Cores Evaluation platform: 4-core Xeon system, running Java 1.6 HotSpot JVM Input mesh: 100K triangles, ~40K bad triangles 8

Scheduling in OpenMP • OpenMP provides parallel DO-ALL loops for regular programs • Major scheduling concerns are load- balancing and overhead • OpenMP scheduling policies address these issues ‣ static, dynamic, guided 9

Amorphous Data Parallelism Issues • Algorithmic – The efficiency of the algorithm or data structures • Conflicts – The likelihood that two iterations executed in parallel will conflict • Locality – The temporal or spatial locality exhibited in the data structures • Dynamically created work • Load-balancing and contention still an issue 10

Scheduling Basics • Each iteration is executed by a single core • Each core executes a set of iterations in a linear order • Scheduling maps work from an “iteration space” to positions in an “execution schedule” ‣ Each iteration is mapped to a core, and a position in that core’s execution schedule 11

Scheduling Functions Clustering – Groups ➡ iterations into clusters; Each cluster executed on a single core Labeling – Maps clusters ➡ to cores; Each core can have multiple clusters Ordering – Specifies a • serial execution order for each core 12

Scheduling Functions ➡ Clustering – Groups iterations into clusters; Each cluster executed on a single core Labeling – Maps clusters ➡ to cores; Each core can have multiple clusters Ordering – Specifies a • serial execution order for each core 13

Scheduling Functions Clustering – Groups ➡ iterations into clusters; Each cluster executed on a single core ➡ Labeling – Maps clusters to cores; Each core can have multiple clusters Ordering – Specifies a • serial execution order for each core 14

Scheduling Functions P0 Clustering – Groups ➡ iterations into clusters; Each cluster executed on a single core ➡ Labeling – Maps clusters to cores; Each core can have multiple clusters P1 Ordering – Specifies a • serial execution order for each core 14

Scheduling Functions P0 Clustering – Groups ➡ iterations into clusters; Each cluster executed on a single core Labeling – Maps clusters ➡ to cores; Each core can have multiple clusters P1 ➡ Ordering – Specifies a serial execution order for each core 15

Scheduling Functions P0 Clustering – Groups time ➡ iterations into clusters; Each cluster executed on a single core Labeling – Maps clusters ➡ to cores; Each core can have multiple clusters P1 ➡ Ordering – Specifies a time serial execution order for each core 15

Scheduling Functions P0 Clustering – Groups time ➡ iterations into clusters; Each cluster executed on a single core Labeling – Maps clusters ➡ to cores; Each core can have multiple clusters P1 ➡ Ordering – Specifies a time serial execution order for each core Functions can be defined “online” 15

Example Instantiations • OpenMP’s chunked • DMR’s “generator- self-scheduling computes” ‣ Clustering: chunked ‣ Clustering: chunked + generator-computes ‣ Labeling: dynamic ‣ Labeling: dynamic ‣ Ordering: cluster-major ‣ Ordering: LIFO The Galois system provides a number of built-in scheduling policies 16

Evaluated Applications • Delaunay mesh refinement • Delaunay triangulation • Augmenting-paths maxflow • Preflow-push maxflow • Agglomerative clustering 17

Sample Schedules for DMR • random – default Galois schedule • stack – LIFO schedule • partitioned – data-centric schedule, based on partitioning of mesh • generator-computes – random schedule, new work immediately processed by core that created it 18

DMR Results generator-computes 3 partitioned stack 2.5 Speedup random 2 1.5 1 1 2 3 4 # of Cores 19

Summary of Results • Best combination of policies for each application Clustering Labeling Ordering Delaunay Mesh random/ dynamic/ —/ Refinement inherited random LIFO Delaunay data-centric/ static/ cluster-major/ Triangulation — data-centric random Augmenting Paths data-centric/ static/ cluster-major/ Maxflow inherited data-centric LIFO Preflow Push data-centric/ static/ cluster-major/ Maxflow inherited data-centric LIFO Agglomerative unit/ dynamic/ —/ Clustering custom custom — 20

Summary of Results • Best combination of policies for each application Clustering Labeling Ordering Delaunay Mesh random/ dynamic/ —/ Refinement inherited random LIFO Delaunay data-centric/ static/ cluster-major/ Triangulation — data-centric random Augmenting Paths data-centric/ static/ cluster-major/ Maxflow inherited data-centric LIFO Preflow Push data-centric/ static/ cluster-major/ Maxflow inherited data-centric LIFO Agglomerative unit/ dynamic/ —/ Clustering custom custom — 21

Conclusions • Developed a general framework for scheduling programs with amorphous data parallelism ‣ Subsumes OpenMP scheduling policies • Implemented framework in Galois system ‣ Provides several default scheduling policies ‣ Allows programmers to specify their own scheduling policies when needed 22

Scheduling Strategies for Optimistic Parallel Execution of - PowerPoint PPT Presentation

Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala and L. Paul Chew University of Texas at Austin Cornell

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

execution states with swapping Scheduling 3F. Execution State Model exit running 4A.

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Processes, Execution, and State Operating Systems Principles 4A. Introduction to Scheduling 4B.

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Lectur ture e 8 In Intr tro to to CSP SP CSP as as Sear earch ch 1 Announ nouncem

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

403: Algorithms and Data Structures Quicksort Fall 2016 UAlbany Computer Science Some slides

12/6/2016 Overview of Financial & Administrative Review Agenda U.S. Department of Housing

Stack and Queue ADT Stack Queue 2 ADT Example All main programs rely on concept of

Topic 15 I Implementing and Using Stacks l ti d U i St k "stack n. The set of things

ECE 242 Data Structures Lecture 3 Introduction to Stacks September 14, 2009 ECE242 L3:

Q2 Fiscal 2019 Results August 6, 2019 Cautionary statements regarding forward-looking information

Scheduling Strategies for Optimistic Parallel Execution of - PowerPoint PPT Presentation

Scheduling Strategies for Optimistic Parallel Execution of Irregular Programs Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala and L. Paul Chew University of Texas at Austin Cornell

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

execution states with swapping Scheduling 3F. Execution State Model exit running 4A.

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Processes, Execution, and State Operating Systems Principles 4A. Introduction to Scheduling 4B.

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Lectur ture e 8 In Intr tro to to CSP SP CSP as as Sear earch ch 1 Announ nouncem

ADT Stack 1 Stacks of Coins and Plates 2 Stacks of Rocks and Books TOP OF THE STACK TOP OF

403: Algorithms and Data Structures Quicksort Fall 2016 UAlbany Computer Science Some slides

12/6/2016 Overview of Financial &amp; Administrative Review Agenda U.S. Department of Housing

Stack and Queue ADT Stack Queue 2 ADT Example All main programs rely on concept of

Topic 15 I Implementing and Using Stacks l ti d U i St k &quot;stack n. The set of things

ECE 242 Data Structures Lecture 3 Introduction to Stacks September 14, 2009 ECE242 L3:

Q2 Fiscal 2019 Results August 6, 2019 Cautionary statements regarding forward-looking information

12/6/2016 Overview of Financial & Administrative Review Agenda U.S. Department of Housing

Topic 15 I Implementing and Using Stacks l ti d U i St k "stack n. The set of things