Optimistic Parallelism Requires Abstractions
Milind Kulkarni, Keshav Pingali – The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew – Cornell University
Optimistic Parallelism Requires Abstractions Milind Kulkarni, - - PowerPoint PPT Presentation
Optimistic Parallelism Requires Abstractions Milind Kulkarni, Keshav Pingali The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew Cornell University Optimistic Parallelism Requires
Milind Kulkarni, Keshav Pingali – The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew – Cornell University
Milind Kulkarni, Keshav Pingali – The University of Texas at Austin Bruce Walter, Ganesh Ramanarayanan, Kavita Bala and L. Paul Chew – Cornell University
PLDI 2007 June 11th, 2007
✦ Parallel programming very important ✦ Multicore processors ✦ Parallel programming is hard! ✦ Limited success in domains which deal with
✦ Array programs ✦ Database applications ✦ What about irregular applications which deal
✦ Compile time techniques have failed
3
PLDI 2007 June 11th, 2007
✦ Irregular applications have worklist-style data
✦ Optimistic parallelization is crucial ✦ Parallelism should be hidden within natural
✦ High level application semantics are critical
4
PLDI 2007 June 11th, 2007
✦ Two challenge problems ✦ Galois programming model and
✦ Evaluation ✦ Related Work ✦ Conclusions
5
PLDI 2007 June 11th, 2007
✦ Iterative refinement procedure to produce
6
PLDI 2007 June 11th, 2007
7
Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); }
PLDI 2007 June 11th, 2007
8
Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } Worklist idiom
PLDI 2007 June 11th, 2007
✦ Can expand multiple cavities in parallel ✦ Provided cavities do not overlap ✦ Determining this statically is impossible ✦ Solution: Optimistic parallel execution
9
PLDI 2007 June 11th, 2007
✦ Create binary tree of points in a space in
✦ Always choose two closest points to cluster
10
a b c d e a b c d e a b c d e (a) Data points (b) Hierarchical clusters (c) Dendrogram
PLDI 2007 June 11th, 2007
✦ Two key data structures ✦ Priority Queue – Keeps pairs of points
✦ Ordered by distance ✦ KD-tree – Spatial structure to find nearest
11
PLDI 2007 June 11th, 2007
✦ Priority queue functions as a worklist ✦ Seems to be completely sequential ✦ If clusters are independent, can be done in
12 a b c d e
PLDI 2007 June 11th, 2007
✦ Worklist-style data parallelism ✦ May be dependences between iterations ✦ However, worklist abstractions are missing
✦ Concurrent access to shared objects a must ✦ worklist, priority queue, kd-tree
13
PLDI 2007 June 11th, 2007
✦ Object-based shared
✦ Client code must
✦ Client code has
✦ But runtime system
15
Client Code Galois Objects
PLDI 2007 June 11th, 2007
✦ Iterators over collections
✦ foreach e in set S do B(e)
✦ Iterations can execute in any order ✦ As in Delaunay mesh refinement
✦ foreach e in poSet S do B(e)
✦ Iterations must respect ordering of S ✦ As in agglomerative clustering ✦ May be dependences between iterations ✦ Sets can change during execution
16
PLDI 2007 June 11th, 2007
17
Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); while (wl.size() != 0) { Element e = wl.get(); if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); }
PLDI 2007 June 11th, 2007
18
Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); foreach Element e in wl { if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); } rest of code unchanged
PLDI 2007 June 11th, 2007
19
Mesh m = /* read in mesh */ WorkList wl; wl.add(mesh.badTriangles()); foreach Element e in wl { if (e no longer in mesh) continue; Cavity c = new Cavity(e); c.expand(); c.retriangulate(); mesh.update(c); wl.add(c.badTriangles()); }
PLDI 2007 June 11th, 2007
✦ Master thread begins execution ✦ When it encounters an iterator, it uses helper
✦ Iterations assigned to thread according to
✦ Parallel execution of iterator must respect
✦ Concurrent access control ✦ Serializability of iterations
20
PLDI 2007 June 11th, 2007
✦ Concurrent invocations
✦ Our current
✦ Can use other
21
S.add(y) S.add(x)
S
PLDI 2007 June 11th, 2007
22
S.contains?(x) S.remove(x) S.add(x)
S Workset
S.add() ... = S.get() S.add() ... = S.get()
(a) Interleaving is illegal (b) Interleaving is legal (and necessary)
PLDI 2007 June 11th, 2007
✦ Method calls which commute can be
✦ Else, commutativity violation ✦ Property of abstract data type ✦ Implementation independent
23
PLDI 2007 June 11th, 2007
✦ Inverse methods ✦ Allow for rollback
✦ Commutativity and
24
class SetInterface { void add(T x); [commutes] add(y) {y != x} remove(y) {y != x} contains(y) {y != x} [inverse] remove(x) bool contains(T x); [commutes] add(y) {y != x} remove(y) {y != x} ... }
PLDI 2007 June 11th, 2007
✦ Inverse methods ✦ Allow for rollback
✦ Commutativity and
25
class SetInterface { void add(T x); [commutes] add(y) {y != x} remove(y) {y != x} contains(y) {y != x} [inverse] remove(x) bool contains(T x); [commutes] add(y) {y != x} remove(y) {y != x} ... }
PLDI 2007 June 11th, 2007
✦ Two main components: ✦ Global commit pool ✦ Manages iterations ✦ Similar to reorder buffer in OOE
✦ Per object conflict logs ✦ Detects commutativity violations ✦ Triggers aborts if commutativity violated
26
PLDI 2007 June 11th, 2007
✦ Evaluation platform: ✦ Implementation in C++ ✦ gcc compiler on Red Hat Linux ✦ 4 processor, shared memory system ✦ Itanium 2 @ 1.5 GHz
27
PLDI 2007 June 11th, 2007
✦ Three different versions of benchmark ✦ reference – purely sequential code ✦ FGL – hand-written, optimistic parallel code
✦ meshgen – Galois version of code ✦ Input mesh generated using Triangle ✦ ~10K triangles ✦ ~4K bad triangles
28
PLDI 2007 June 11th, 2007
✦ Optimism must be warranted ✦ Conflicts lead to rollbacks, which waste
✦ FGL and meshgen have abort ratios <1% on 4
✦ Closely tied to scheduling policy ✦ Choice of proper scheduling policy is
29
PLDI 2007 June 11th, 2007
1 2 3 4
# of processors
1 1.5 2 2.5 3
Speedup
reference FGL meshgen 1 2 3 4
# of processors
2 4 6 8
Execution Time (s)
reference FGL meshgen
30
PLDI 2007 June 11th, 2007
1 2 3 4
# of processors
1 1.5 2 2.5 3
Speedup
reference FGL meshgen 1 2 3 4
# of processors
2 4 6 8
Execution Time (s)
reference FGL meshgen
31
~3x speedup
PLDI 2007 June 11th, 2007
32
Client Object Runtime 5 10 15 20 1 proc 4 proc
Cycle (billions)
13.8951 18.8501 5 10 15 20 1 proc 4 proc
Instructions (billions)
16.8889 17.4675
PLDI 2007 June 11th, 2007
✦ Weihl, 1988 – Concurrency control using
✦ Rinard & Diniz, 1996 – Static commutativity
✦ Wu & Padua, 1998 – Exploiting semantic
✦ Ni et al, 2007 – Open nesting using abstract
33
PLDI 2007 June 11th, 2007
✦ Optimistic parallelism necessary to parallelize
✦ Need to exploit high-level semantics ✦ Iterators to expose parallelism ✦ Galois classes to expose semantics of
34
Email: milind@cs.utexas.edu