Optimistic Parallelism Requires Abstractions Milind Kulkarni, - PowerPoint PPT Presentation

Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali – The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew – Cornell University

Motivation ✦ Parallel programming very important ✦ Multicore processors ✦ Parallel programming is hard! ✦ Limited success in domains which deal with structured data ✦ Array programs ✦ Database applications ✦ What about irregular applications which deal with unstructured data? ✦ Compile time techniques have failed PLDI 2007 3 June 11th, 2007

Galois System: Core Beliefs ✦ Irregular applications have worklist-style data parallelism ✦ Optimistic parallelization is crucial ✦ Parallelism should be hidden within natural syntactic constructs ✦ High level application semantics are critical for parallelization PLDI 2007 4 June 11th, 2007

Outline ✦ Two challenge problems ✦ Galois programming model and implementation ✦ Evaluation ✦ Related Work ✦ Conclusions PLDI 2007 5 June 11th, 2007

Delaunay Mesh Refinement ✦ Iterative refinement procedure to produce guaranteed quality meshes PLDI 2007 6 June 11th, 2007

Delaunay Pseudo-code Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } PLDI 2007 7 June 11th, 2007

Delaunay Pseudo-code Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); Worklist idiom while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } PLDI 2007 8 June 11th, 2007

Finding Parallelism ✦ Can expand multiple cavities in parallel ✦ Provided cavities do not overlap ✦ Determining this statically is impossible ✦ Solution: Optimistic parallel execution PLDI 2007 9 June 11th, 2007

Agglomerative Clustering ✦ Create binary tree of points in a space in bottom-up fashion ✦ Always choose two closest points to cluster e e a a d d b b a b d c c c e (a) Data points (b) Hierarchical clusters (c) Dendrogram PLDI 2007 10 June 11th, 2007

Agglomerative Clustering ✦ Two key data structures ✦ Priority Queue – Keeps pairs of points < p , n > where n is the nearest neighbor of p ✦ Ordered by distance ✦ KD-tree – Spatial structure to find nearest neighbors PLDI 2007 11 June 11th, 2007

Finding Parallelism ✦ Priority queue functions as a worklist ✦ Seems to be completely sequential ✦ If clusters are independent, can be done in parallel a b d c e PLDI 2007 12 June 11th, 2007

Lessons Learned ✦ Worklist-style data parallelism ✦ May be dependences between iterations ✦ However, worklist abstractions are missing from the code ✦ Concurrent access to shared objects a must ✦ worklist, priority queue, kd-tree PLDI 2007 13 June 11th, 2007

Galois Programming Model and Implementation

Programming Model ✦ Object-based shared memory model Client Code ✦ Client code must Galois Objects invoke methods to access object state ✦ Client code has sequential semantics ✦ But runtime system may execute code in parallel PLDI 2007 15 June 11th, 2007

Worklist Abstractions ✦ Iterators over collections ✦ foreach e in set S do B(e) ✦ Iterations can execute in any order ✦ As in Delaunay mesh refinement ✦ foreach e in poSet S do B(e) ✦ Iterations must respect ordering of S ✦ As in agglomerative clustering ✦ May be dependences between iterations ✦ Sets can change during execution PLDI 2007 16 June 11th, 2007

Delaunay Example Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } PLDI 2007 17 June 11th, 2007

Delaunay Example Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); foreach Element e in wl { if (e no longer in mesh) rest of code unchanged continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } PLDI 2007 18 June 11th, 2007

Delaunay Example Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); foreach Element e in wl { if (e no longer in mesh) Iterators expose worklist abstraction continue; to runtime system Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } PLDI 2007 19 June 11th, 2007

Execution Model ✦ Master thread begins execution ✦ When it encounters an iterator, it uses helper threads to aid in execution of iterations ✦ Iterations assigned to thread according to scheduling policy (for now, dynamic to ensure load balance) ✦ Parallel execution of iterator must respect sequential semantics of iterator ✦ Concurrent access control ✦ Serializability of iterations PLDI 2007 20 June 11th, 2007

Concurrent Access ✦ Concurrent invocations S to a shared object must not interfere ✦ Our current implementation uses locks S.add(x) S.add(y) ✦ Can use other techniques such as TM PLDI 2007 21 June 11th, 2007

Serializability S Workset ... = S.get() ... = S.get() S.add(x) S.contains?(x) S.remove(x) S.add() S.add() (a) Interleaving is illegal (b) Interleaving is legal (and necessary) PLDI 2007 22 June 11th, 2007

Semantic Commutativity ✦ Method calls which commute can be interleaved ✦ Else, commutativity violation ✦ Property of abstract data type ✦ Implementation independent PLDI 2007 23 June 11th, 2007

Galois Classes class SetInterface { ✦ Inverse methods void add(T x); [commutes] ✦ Allow for rollback add(y) {y != x} remove(y) {y != x} when commutativity contains(y) {y != x} violated [inverse] remove(x) bool contains(T x); ✦ Commutativity and [commutes] add(y) {y != x} inverse specified through remove(y) {y != x} interface annotation ... } PLDI 2007 24 June 11th, 2007

Galois Classes class SetInterface { ✦ Inverse methods void add(T x); [commutes] ✦ Allow for rollback add(y) {y != x} remove(y) {y != x} when commutativity contains(y) {y != x} violated [inverse] Galois Classes expose abstractions to remove(x) the runtime system bool contains(T x); ✦ Commutativity and [commutes] add(y) {y != x} inverse specified through remove(y) {y != x} interface annotation ... } PLDI 2007 25 June 11th, 2007

Runtime System ✦ Two main components: ✦ Global commit pool ✦ Manages iterations ✦ Similar to reorder buffer in OOE processors ✦ Per object conflict logs ✦ Detects commutativity violations ✦ Triggers aborts if commutativity violated PLDI 2007 26 June 11th, 2007

Evaluation ✦ Evaluation platform: ✦ Implementation in C++ ✦ gcc compiler on Red Hat Linux ✦ 4 processor, shared memory system ✦ Itanium 2 @ 1.5 GHz PLDI 2007 27 June 11th, 2007

Evaluation – Delaunay ✦ Three different versions of benchmark ✦ reference – purely sequential code ✦ FGL – hand-written, optimistic parallel code using fine-grained locking ✦ meshgen – Galois version of code ✦ Input mesh generated using Triangle ✦ ~10K triangles ✦ ~4K bad triangles PLDI 2007 28 June 11th, 2007

Abort Ratios ✦ Optimism must be warranted ✦ Conflicts lead to rollbacks, which waste work ✦ FGL and meshgen have abort ratios <1% on 4 processors ✦ Closely tied to scheduling policy ✦ Choice of proper scheduling policy is crucial for good performance PLDI 2007 29 June 11th, 2007

Evaluation – Delaunay 8 Execution Time (s) 6 4 reference FGL meshgen 2 0 1 2 3 4 # of processors reference FGL 3 meshgen Speedup 2.5 2 1.5 1 1 2 3 4 # of processors PLDI 2007 30 June 11th, 2007

Evaluation – Delaunay 8 Execution Time (s) 6 4 reference FGL meshgen 2 0 1 2 3 4 # of processors reference FGL 3 meshgen Speedup 2.5 2 ~3x speedup 1.5 1 1 2 3 4 # of processors PLDI 2007 31 June 11th, 2007

Performance Breakdown Client Object Runtime 18.8501 20 20 17.4675 16.8889 Instructions (billions) 13.8951 15 15 Cycle (billions) 10 10 5 5 0 0 1 proc 4 proc 1 proc 4 proc PLDI 2007 32 June 11th, 2007

Related Work ✦ Weihl, 1988 – Concurrency control using commutativity properties of ADTs ✦ Rinard & Diniz, 1996 – Static commutativity analysis for parallelization ✦ Wu & Padua, 1998 – Exploiting semantic properties of containers in compilation ✦ Ni et al , 2007 – Open nesting using abstract locks PLDI 2007 33 June 11th, 2007

Conclusions ✦ Optimistic parallelism necessary to parallelize irregular, worklist-based applications ✦ Need to exploit high-level semantics ✦ Iterators to expose parallelism ✦ Galois classes to expose semantics of objects PLDI 2007 34 June 11th, 2007

Thank You! Email: milind@cs.utexas.edu

Optimistic Parallelism Requires Abstractions Milind Kulkarni, - PowerPoint PPT Presentation

Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew Cornell University Optimistic Parallelism Requires

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Optimistic Parallelism Benefits from Data Partitioning Milind Kulkarni, Keshav Pingali, Ganesh

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

CSE 332 Data Abstractions: Introduction to Parallelism and Concurrency Kate Deibel Summer 2012

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Abstractions for Routing Abstractions for Network Routing Brighten Godfrey Brighten Godfrey

Planning and Optimization D2. Abstractions: Additive Abstractions Gabriele R oger and Thomas

Automatically Deriving Abstraction Heuristics PDB Abstractions Explicit-State Abstractions

Unified L2 Abstractions for L3-Driven Fast Handover draft-irtf-mobopts-l2-abstractions-01 F.

CSE 332 Data Abstractions: Introduction to Parallelism and Concurrency Kate Deibel Summer 2012

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes Ronan Fruit

Resilience & Optimism During a Crisis HELLO! I am Karen Maher I am an experienced HR

Timothy Cohen with Daniel Phalen and Aaron Pierce arXiv:1001.3408 Michigan Center for

SpeakUp Newpor t F isc al Year 2020-21 Adopted Budget July 8, 2020 1 Over view The FY

Technology and Inequality: reasons for concern, reasons for optimism Mark Stabile Stone Chaired

The Push/Pull Model of Transactions Matthew Parkinson Eric Koskinen Yale University, New Haven

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito 1 ,

(PS-1971) The Planning Fallacy and its Effect on Realistic Project Schedules Jeffrey A. Valdahl

Optimistic Parallelism Requires Abstractions Milind Kulkarni, - PowerPoint PPT Presentation

Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew Cornell University Optimistic Parallelism Requires

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Optimistic Parallelism Benefits from Data Partitioning Milind Kulkarni, Keshav Pingali, Ganesh

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

CSE 332 Data Abstractions: Introduction to Parallelism and Concurrency Kate Deibel Summer 2012

Optimistic Fair Priced Oblivious Transfer A. Rial B. Preneel Katholieke Universiteit Leuven -

Exa- to Yotta-scale Data An Optimistic View Rob Farber PNNL Optimistic about Storage Bandwidth

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Abstractions for Routing Abstractions for Network Routing Brighten Godfrey Brighten Godfrey

Planning and Optimization D2. Abstractions: Additive Abstractions Gabriele R oger and Thomas

Automatically Deriving Abstraction Heuristics PDB Abstractions Explicit-State Abstractions

Unified L2 Abstractions for L3-Driven Fast Handover draft-irtf-mobopts-l2-abstractions-01 F.

CSE 332 Data Abstractions: Introduction to Parallelism and Concurrency Kate Deibel Summer 2012

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes Ronan Fruit

Resilience &amp; Optimism During a Crisis HELLO! I am Karen Maher I am an experienced HR

Timothy Cohen with Daniel Phalen and Aaron Pierce arXiv:1001.3408 Michigan Center for

SpeakUp Newpor t F isc al Year 2020-21 Adopted Budget July 8, 2020 1 Over view The FY

Technology and Inequality: reasons for concern, reasons for optimism Mark Stabile Stone Chaired

The Push/Pull Model of Transactions Matthew Parkinson Eric Koskinen Yale University, New Haven

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito 1 ,

(PS-1971) The Planning Fallacy and its Effect on Realistic Project Schedules Jeffrey A. Valdahl

Resilience & Optimism During a Crisis HELLO! I am Karen Maher I am an experienced HR