Asynchronous Parallel DLA in Concurrent Collections Aparna - PowerPoint PPT Presentation

Asynchronous Parallel DLA in Concurrent Collections Aparna Chandramowlishwaran, Richard Vuduc – Georgia Tech Kathleen Knobe – Intel May 14, 2009 Workshop on Scheduling for Large-Scale Systems @ UTK 1 1

Motivation and goals Motivating recent work for multicore systems Tile algorithms for DLA, e.g. , Buttari, et al . (2007); Chan, et al . (2007) General parallel programming models suited to this algorithmic style, e.g. , Concurrent Collections (CnC) by Knobe & Offner (2004) Goals Study: Apply and evaluate CnC using PDLA examples Talk: CnC tutorial crash course; platform for your work? To download CnC, see: whatif.intel.com 2 2

Outline Overview of the Concurrent Collections (CnC) language Asynchronous parallel Cholesky & symmetric eigensolver in CnC Experimental results (preliminary) 3 3

Concurrent Collections (CnC) programming model Separates computation semantics from expression of parallelism Program = components + scheduling constraints Components: Computation , control , data Constraints: Relations among components No overwriting of data, no arbitrary serialization, and no side-effects Combines tuple-space, streaming, and dataflow models 4 4

CnC example: Outer product Z ← x · y T 5 5

CnC example: Outer product Z ← x · y T z i,j ← x i · y j Example only; coarser grain may be more realistic in practice. 6 6

CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances 7 7

CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances Step Unit of execution * Set of all (dynamic) multiplications 8 8

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag < a , b , …> = tuple of tag components 9 9

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Says whether , not when , step executes 10 10

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Tags prescribe steps 11 11

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data 12 12

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data → shows producer/consumer relations 13 13

CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data “ Environment ” may produce/consume 14 14

Essential properties of a CnC program z i,j ← x i · y j Written in terms of values, without overwriting ⇒ race-free ( dynamic single assignment ) <i,j> No arbitrary serialization, x only explicit ordering <i,j> constraints * Z ( avoids analysis ) <j> y Steps are side-effect free ( functional ) 15 15

CnC example: Tree search match ← find (value x in tree T ) Collections: Static representation of dynamic instances Step Unit of execution Control Tag Item Data 16 16

CnC example: Tree search Controller/controlee relations match ← find (value x in tree T ) <root> Collections: Static representation of <node> dynamic instances T Step Unit of execution = < ⋅ > <match> Control x Tag Item Data 17 17

Execution model z i,j ← x i · y j <i,j> x <i,j> * Z <j> y Recall: Outer product example 18 18

Execution model z i,j ← x i · y j Tag <i=2, j=5> available <2,5> 19 19

Execution model z i,j ← x i · y j Tag <i=2, j=5> available ⇒ Step prescribed <2,5> * 20 20

Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x * <5> y 21 21

Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x Prescribed + inputs-available * ⇒ enabled <5> y 22 22

Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x <2,5> Prescribed + inputs-available * Z ⇒ enabled <5> y Executes ⇒ Z:<2,5> available 23 23

z i,j ← Coding and execution [1] Write the specification (graph). [2] Implement steps in a “base” language (C/C++). [3] Build using CnC translator + compiler. [4] Run-time system maintains collections and schedules step execution. 24 24

Textual notation z i,j ← x i · y j <i,j> x <i,j> * Z <j> y Recall: Outer product example 25 25

Textual notation z i,j ← x i · y j <i,j> x <i,j> * Z <j> y 26 26

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>; <i,j> x <i,j> * Z <j> y 27 27

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> x <i,j> * Z <j> y 28 28

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> x <i,j> * Z <j> y 29 29

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <*: i,j> :: (*: i,j); x <i,j> * Z <j> y 30 30

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y 31 31

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 32 32

Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 33 33

Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 34 34

Asynchronous Parallel DLA in Concurrent Collections Aparna - PowerPoint PPT Presentation

Asynchronous Parallel DLA in Concurrent Collections Aparna Chandramowlishwaran, Richard Vuduc Georgia Tech Kathleen Knobe Intel May 14, 2009 Workshop on Scheduling for Large-Scale Systems @ UTK 1 1 Motivation and goals Motivating

Great Lakes Chloride, Inc. Direct Liquid Application (DLA) Direct Liquid Application (DLA)

Video/ Data/ Graphics Projectors DLA-M15U 30 - 55% shiftable axis, 1.5:1 (throw: screen-width)

CITY DLA Conference 2015 | Stefan Hffken | Urbanophil, TU Kaiserslautern DLA Conference 2015 |

The standard mode of DSQSS/DLA Standard mode files Input files of dla Simple mode file

Parallel and Concurrent Haskell Part I Asynchronous agents Simon Marlow Threads (Microsoft

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

The Natural Science Collections Facility Natural Science Collections Collections in South Africa

Scala Collections 1 / 20 Scala Collections Figure 1: Abstract classes and traits in

Agency(DLA)Land Warren, MI Vito Zuccaro DLA LAND Warren Director 24 Oct 2016 WARFIGHTER

Session IX DLA Troop Support Leadership Update May 24 th , 2017 Director of Subsistence DLA

CONCURRENT COLLECTIONS 2 5/24/11 Concurrent Collec9ons

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Concurrent Programming in Scala 1 / 7 Concurrent Programming 1 Concurrent programming:

Processes and Threads Placement of Parallel Applications. Why, How and for What gain? Joint work

On The Security of Unique- Witness Blind Signature Schemes December 2013 ASIACRYPT, Bangalore,

On Deciding MUS Membership with QBF s Janota 1 Joao Marques-Silva 1 , 2 Mikol a 1

Hybrid Resources Workshop November 24, 2020 9:30am - 4:30pm California Public Utilities

Neologism: Easy Vocabulary Publishing Cosmin Basca, Stphane Corlosquet, Richard Cyganiak,

Acts Series Lesson #118 August 6, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

A Topos Approach to the Formulation of Physical Theories Category Theory 2008 Calais 26. June

Surviving the Zombie Apocalypse Security in the Cloud Containers, KVM and Xen Ian Jackson