asynchronous parallel dla in concurrent collections
play

Asynchronous Parallel DLA in Concurrent Collections Aparna - PowerPoint PPT Presentation

Asynchronous Parallel DLA in Concurrent Collections Aparna Chandramowlishwaran, Richard Vuduc Georgia Tech Kathleen Knobe Intel May 14, 2009 Workshop on Scheduling for Large-Scale Systems @ UTK 1 1 Motivation and goals Motivating


  1. Asynchronous Parallel DLA in Concurrent Collections Aparna Chandramowlishwaran, Richard Vuduc – Georgia Tech Kathleen Knobe – Intel May 14, 2009 Workshop on Scheduling for Large-Scale Systems @ UTK 1 1

  2. Motivation and goals Motivating recent work for multicore systems Tile algorithms for DLA, e.g. , Buttari, et al . (2007); Chan, et al . (2007) General parallel programming models suited to this algorithmic style, e.g. , Concurrent Collections (CnC) by Knobe & Offner (2004) Goals Study: Apply and evaluate CnC using PDLA examples Talk: CnC tutorial crash course; platform for your work? To download CnC, see: whatif.intel.com 2 2

  3. Outline Overview of the Concurrent Collections (CnC) language Asynchronous parallel Cholesky & symmetric eigensolver in CnC Experimental results (preliminary) 3 3

  4. Concurrent Collections (CnC) programming model Separates computation semantics from expression of parallelism Program = components + scheduling constraints Components: Computation , control , data Constraints: Relations among components No overwriting of data, no arbitrary serialization, and no side-effects Combines tuple-space, streaming, and dataflow models 4 4

  5. CnC example: Outer product Z ← x · y T 5 5

  6. CnC example: Outer product Z ← x · y T z i,j ← x i · y j Example only; coarser grain may be more realistic in practice. 6 6

  7. CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances 7 7

  8. CnC example: Outer product z i,j ← x i · y j Collections: Static representation of dynamic instances Step Unit of execution * Set of all (dynamic) multiplications 8 8

  9. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag < a , b , …> = tuple of tag components 9 9

  10. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Says whether , not when , step executes 10 10

  11. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of dynamic instances Step Unit of execution * Control Tag Tags prescribe steps 11 11

  12. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data 12 12

  13. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data → shows producer/consumer relations 13 13

  14. CnC example: Outer product z i,j ← x i · y j <i,j> Collections: Static representation of <i> dynamic instances x <i,j> Step Unit of execution * Z <j> Control y Tag Item Data “ Environment ” may produce/consume 14 14

  15. Essential properties of a CnC program z i,j ← x i · y j Written in terms of values, without overwriting ⇒ race-free ( dynamic single assignment ) <i,j> <i> No arbitrary serialization, x only explicit ordering <i,j> constraints * Z ( avoids analysis ) <j> y Steps are side-effect free ( functional ) 15 15

  16. CnC example: Tree search match ← find (value x in tree T ) Collections: Static representation of dynamic instances Step Unit of execution Control Tag Item Data 16 16

  17. CnC example: Tree search Controller/controlee relations match ← find (value x in tree T ) <root> Collections: Static representation of <node> dynamic instances T Step Unit of execution = < ⋅ > <match> Control x Tag Item Data 17 17

  18. Execution model z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y Recall: Outer product example 18 18

  19. Execution model z i,j ← x i · y j Tag <i=2, j=5> available <2,5> 19 19

  20. Execution model z i,j ← x i · y j Tag <i=2, j=5> available ⇒ Step prescribed <2,5> * 20 20

  21. Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x * <5> y 21 21

  22. Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x Prescribed + inputs-available * ⇒ enabled <5> y 22 22

  23. Execution model z i,j ← x i · y j Tag <2,5> available ⇒ Step prescribed <2,5> Items x:<2>, y:<5> available <2> ⇒ Step inputs-available x <2,5> Prescribed + inputs-available * Z ⇒ enabled <5> y Executes ⇒ Z:<2,5> available 23 23

  24. z i,j ← Coding and execution [1] Write the specification (graph). [2] Implement steps in a “base” language (C/C++). [3] Build using CnC translator + compiler. [4] Run-time system maintains collections and schedules step execution. 24 24

  25. Textual notation z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y Recall: Outer product example 25 25

  26. Textual notation z i,j ← x i · y j <i,j> <i> x <i,j> * Z <j> y 26 26

  27. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>; <i,j> <i> x <i,j> * Z <j> y 27 27

  28. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> <i> x <i,j> * Z <j> y 28 28

  29. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; <i,j> <i> x <i,j> * Z <j> y 29 29

  30. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x <i,j> * Z <j> y 30 30

  31. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y 31 31

  32. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 32 32

  33. Textual notation // Input: z i,j ← x i · y j env → <*: i,j>, [x: i], [y: j]; // Prescription relations: <i,j> <i> <*: i,j> :: (*: i,j); x // Producer/consumer relations: <i,j> * Z [x: i], [y: j] → (*: i, j); <j> (*: i, j) → [Z: i, j]; y // Output: [Z: i, j] → env; 33 33

  34. Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 34 34

  35. Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 35 35

  36. Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 36 36

  37. Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 37 37

  38. Step code written in a sequential base language Return_t mult (Graph_t& G, z i,j ← x i · y j const Tag_t& t) { <i,j> <i> int i = t[0], j = t[1]; x double x_i = G.x.Get (Tag_t(i)); <i,j> * Z double y_j = G.y.Get (Tag_t(j)); <j> G.Z.Put (Tag_t(i, j), x_i*y_j); y return CNC_Success; } Intel’s implementation uses C++; Rice University’s uses Java (Habanero) 38 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend