harmonizing speculative and non speculative execution in
play

Harmonizing Speculative and Non-Speculative Execution in - PowerPoint PPT Presentation

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism MA MARK C. JEFFR FFREY , VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018 There is a (false)


  1. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  2. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  3. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  4. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  5. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  6. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  7. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  8. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  9. Parallelism in Dijkstra’s algorithm? Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  10. Parallelism in Dijkstra’s algorithm? Data dependences Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Speedup Dijkstra 256 performance Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  11. Parallelism in Dijkstra’s algorithm? Data dependences Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 E Valid Speedup Dijkstra A C B D out-of-order 256 performance schedule B D E Time Non-speculative 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 6

  12. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  13. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  14. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  15. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } Function Pointer Timestamp Arguments } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  16. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } Function Pointer Timestamp Arguments } } swarm::enqueue (dijkstraTask, 0, sourceVertex); swarm::run (); HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  17. Dijkstra as a Swarm program [MICRO’15] void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Implicit Parallelism for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue (dijkstraTask, nDist, n); } No explicit Function Pointer Timestamp Arguments } synchronization } swarm::enqueue (dijkstraTask, 0, sourceVertex); swarm::run (); Conveys new work to hardware as soon as possible HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 7

  18. Swarm microarchitecture [MICRO’15] Swarm executes all tasks Tile organization 64-tile, 256-core chip speculatively and out of order Mem / IO L3 slice Router Large hardware task queues L2 Tile Mem / IO Mem / IO Scalable ordered speculation L1I/D L1I/D L1I/D L1I/D Core Core Core Core Scalable ordered commits Task unit Mem / IO Efficiently supports thousands of tiny speculative tasks HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 8

  19. Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 Non-speculative Speedup Dijkstra 256 performance 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  20. Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  21. Dijkstra’s algorithm has speculative parallelism Task graph A C B E D E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 E Non-speculative Speedup Speedup Dijkstra A C B D 256 256 All-speculative performance [MICRO’15] B D E Time 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  22. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  23. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E 512 512 Non-speculative Speedup Speedup Dijkstra 256 256 All-speculative performance [MICRO’15] 1 1 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  24. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  25. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  26. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  27. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  28. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  29. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 Non-speculative Speedup Speedup Speedup Dijkstra 256 256 128 All-speculative performance [MICRO’15] 1 1 1 1c 1c 128c 128c 256c 256c 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  30. Dijkstra’s algorithm has speculative parallelism Task graph A C E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 256 20% Non-speculative Speedup Speedup Speedup Speedup Dijkstra 256 256 128 128 All-speculative performance [MICRO’15] 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  31. Dijkstra’s algorithm has speculative parallelism Task graph A C B E D All-or-nothing speculation unduly burdens programmers E B D 0 1 2 3 4 5 6 7 8 Order = Distance from source node Dijkstra on USA-E Dijkstra on cage14 512 512 256 256 20% Non-speculative Speedup Speedup Speedup Speedup Dijkstra 256 256 128 128 All-speculative performance [MICRO’15] 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 9

  32. Dijkstra’s algorithm needs a hybrid strategy HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  33. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished 0 1 2 3 4 5 6 7 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  34. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished 0 1 2 3 4 5 6 7 Order = Distance from source node HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  35. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  36. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  37. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  38. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  39. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  40. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  41. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism Each task must be runnable in either mode HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  42. Dijkstra’s algorithm needs a hybrid strategy Task graph To be processed Finished Running non-speculatively Running speculatively 0 1 2 3 4 5 6 7 Order = Distance from source node Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism Each task must be runnable in either mode Tasks in both modes must coordinate on shared data HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 10

  43. esso reaps the benefits of Espr Espresso non-speculative and speculative parallelism Dijkstra on USA Dijkstra on cage14 512 512 256 256 Espresso Speedup Speedup Speedup Speedup All-speculative 256 256 128 128 Non-speculative 1 1 1 1 1c 1c 128c 128c 256c 256c 1c 1c 128c 128c 256c 256c Espresso avoids pathologies and scales best HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 11

  44. Espr Espresso sso COORDINATING SPECULATIVE AND NON-SPECULATIVE PARALLELISM HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 12

  45. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively Non-Spec. Spec. ordered Timestamp barrier commits reduce Locale mutex conflicts HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  46. esso execution model Espresso Espr Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Timestamp barrier dijkstraTask, commits dist + weight(v, n), reduce n->id, Locale mutex n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  47. esso execution model Espresso Espr Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Timestamp barrier dijkstraTask, commits dist + weight(v, n), reduce n->id, Locale mutex n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  48. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  49. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  50. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  51. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  52. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  53. esso execution model Espr Espresso Programs consist of tasks that run speculatively or non-speculatively void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; Non-Spec. Spec. for (Vertex* n : v->neighbors) espresso::create ( ordered Function Timestamp barrier dijkstraTask, commits pointer dist + weight(v, n), reduce n->id, Locale mutex Arguments n); conflicts } Tasks in either mode can coordinate access to shared data } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 13

  54. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  55. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( 7 dijkstraTask, Core 9 dist + weight(v, n), n->id, 10 Core n); … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  56. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  57. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  58. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( SPEC 7 7 SPEC dijkstraTask, Core 9 9 SPEC dist + weight(v, n), n->id, 10 10 SPEC Core n); … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  59. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  60. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  61. Espresso Espr esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( NONSPEC SPEC 7 7 7 NONSPEC SPEC dijkstraTask, Core 9 9 9 SPEC SPEC dist + weight(v, n), n->id, 10 10 10 NONSPEC SPEC Core n); … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  62. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  63. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  64. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  65. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 NONSPEC MAYSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  66. Espr Espresso esso task dispatch Espresso supports three task types that control speculation void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { Tile v->distance = dist; for (Vertex* n : v->neighbors) Dispatch Candidates espresso::create< type >( MAYSPEC NONSPEC SPEC 7 7 7 7 MAYSPEC NONSPEC SPEC dijkstraTask, Core 9 9 9 9 SPEC SPEC SPEC dist + weight(v, n), n->id, 10 10 10 10 NONSPEC NONSPEC SPEC Core n); … … … … } MAYSPEC lets the system decide whether to speculate } HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 14

  67. sso improves efficiency and programmability Espr Espresso sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup 64 64 64 128 128 128 128 128 128 256 256 256 1 1 1 1 1 1 1 1 1 1 1 1 mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15

  68. sso improves efficiency and programmability Espresso Espr sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers 2.5x Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup 64 64 64 128 128 128 128 128 128 256 256 256 1 1 1 1 1 1 1 1 1 1 1 1 mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15

  69. sso improves efficiency and programmability Espresso Espr sssp-cage sssp-cage sssp-cage sssp-usa sssp-usa sssp-usa cf cf cf triangle triangle triangle 256 256 256 512 512 512 256 256 256 256 256 256 MAYSPEC allows programmers 2.5x Speedup Speedup Speedup to exploit the best of speculative 256 256 256 128 128 128 128 128 128 128 128 128 and non-speculative parallelism 1 1 1 1 1 1 1 1 1 1 1 1 genome genome genome kmeans kmeans kmeans color color color bfs bfs bfs 128 128 128 256 256 256 256 256 256 512 512 512 Speedup Speedup Speedup MAYSPEC: 198x 64 64 64 128 128 128 128 128 128 256 256 256 22% 6.9x 1 1 1 1 1 1 1 1 1 1 1 1 Swarm: 162x mis mis mis astar astar astar des des des 128 128 128 256 256 256 256 256 256 MAYSPEC NONSPEC: 29x gmean Speedup Speedup Speedup Swarm 64 64 64 128 128 128 128 128 128 NONSPEC 1 1 1 1 1 1 1 1 1 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c 1c 1c 1c 128c 128c 128c 256c 256c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 15

  70. Please see the paper for more details! Microarchitectural details Interactions between speculative and non-speculative tasks: ◦ How are conflicts detected and resolved? ◦ How do timestamps-as-barriers affect the ordered commit protocol? Espresso exception model Additional results analysis HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 16

  71. Cap Capsu sules les ENABLING SOFTWARE-MANAGED SPECULATION WITH ORDERED PARALLELISM HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 17

  72. Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18

  73. Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks Memory B A D C Read & Write A D Core Core HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18

  74. Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks DES Memory 256 B Ideal allocator A D Speedup C 128 Read & Write A D Core Core 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18

  75. Some actions should bypass HW speculation Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks DES Memory 256 B Ideal allocator Free list A D Speedup C 128 Read & Write A D Core Core 1 1c 128c 256c HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend