Harmonizing Speculative and Non-Speculative Execution in - - PowerPoint PPT Presentation

harmonizing speculative and non speculative execution in
SMART_READER_LITE
LIVE PREVIEW

Harmonizing Speculative and Non-Speculative Execution in - - PowerPoint PPT Presentation

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism MA MARK C. JEFFR FFREY , VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018 There is a (false)


slide-1
SLIDE 1

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism

MA MARK C. JEFFR FFREY, VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018

slide-2
SLIDE 2

There is a (false) dichotomy in parallelization

SPECULATIVE PARALLELIZATION NON-SPECULATIVE PARALLELIZATION

Lower overheads Parallel irrevocable actions

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

2

Simplifies parallel programming Uncovers abundant parallelism

Current systems offer all-or-nothing speculation

slide-3
SLIDE 3

Goal: Bring non-speculative execution to systems that support ordered parallelism

Espresso

  • Expressive task-based execution model
  • Coordinates concurrent speculative and non-speculative ordered tasks
  • 256-core speedups up to 2.5x vs. all-speculative

Capsules

  • Let speculative tasks safely invoke software-managed speculation
  • Enable important system services:

e.g. memory allocator that improves performance up to 69x

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

3

slide-4
SLIDE 4

Espr Espresso sso in action

THE NEED FOR SPECULATIVE AND NON-SPECULATIVE PARALLELISM

4

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

slide-5
SLIDE 5

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-6
SLIDE 6

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source

Order = Distance from source node Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-7
SLIDE 7

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source

Order = Distance from source node

A

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-8
SLIDE 8

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source

Order = Distance from source node

A

Task graph

First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-9
SLIDE 9

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source

Order = Distance from source node

A

Task graph

First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-10
SLIDE 10

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A

Order = Distance from source node

A

Task graph

First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-11
SLIDE 11

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A

Order = Distance from source node

A C B

Task graph

First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-12
SLIDE 12

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A

Order = Distance from source node

A C B

Task graph

First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-13
SLIDE 13

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A C 2

Order = Distance from source node

A C B

Task graph

2 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-14
SLIDE 14

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A C 2

Order = Distance from source node

A C B B

Task graph

D E 2 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-15
SLIDE 15

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A C 2

Order = Distance from source node

A C B B

Task graph

D E 2 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-16
SLIDE 16

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C 3 2

Order = Distance from source node

A C B B

Task graph

D E 2 3 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-17
SLIDE 17

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C 3 2

Order = Distance from source node

A C B B D

Task graph

D E 2 3 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-18
SLIDE 18

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C 3 2

Order = Distance from source node

A C B B D

Task graph

D E 2 3 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-19
SLIDE 19

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C 3 2

Order = Distance from source node

A C B B D

Task graph

D E 2 3 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

slide-20
SLIDE 20

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2

Order = Distance from source node

A C B D

Task graph

D E 2 3 4 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

B

slide-21
SLIDE 21

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2

Order = Distance from source node

A C B D

Task graph

D E E 2 3 4 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

B

slide-22
SLIDE 22

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2

Order = Distance from source node

A C B D

Task graph

D E E 2 3 4 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

B

slide-23
SLIDE 23

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2

Order = Distance from source node

A C B D

Task graph

D E E 2 3 4 E 3 5 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

B

slide-24
SLIDE 24

1 2 3 4 5 6 7 8

Example: Dijkstra’s algorithm

Finds shortest path tree on a graph with weighted edges

A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2

Order = Distance from source node

A C B D

Task graph

D E E 2 3 4 E 3 5 First to visit vertex Vertex already visited To be processed

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

5

Input graph

B

slide-25
SLIDE 25

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-26
SLIDE 26

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-27
SLIDE 27

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-28
SLIDE 28

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-29
SLIDE 29

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-30
SLIDE 30

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-31
SLIDE 31

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-32
SLIDE 32

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-33
SLIDE 33

1 256 512 Speedup 1c 128c 256c

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

Dijkstra on USA-E Non-speculative

Dijkstra performance

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-34
SLIDE 34

1 256 512 Speedup 1c 128c 256c

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

Dijkstra on USA-E Non-speculative

Dijkstra performance

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

slide-35
SLIDE 35

1 256 512 Speedup 1c 128c 256c

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

Dijkstra on USA-E Non-speculative

Dijkstra performance

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

Data dependences

slide-36
SLIDE 36

1 256 512 Speedup 1c 128c 256c

Parallelism in Dijkstra’s algorithm?

Order = Distance from source node

A C B B D D E E

Task graph

1 2 3 4 5 6 7 8

Dijkstra on USA-E Non-speculative

Dijkstra performance

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

6

Data dependences Valid

  • ut-of-order

schedule

A C B B D D E E

Time

slide-37
SLIDE 37

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

slide-38
SLIDE 38

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

slide-39
SLIDE 39

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

slide-40
SLIDE 40

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }

Timestamp

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

Function Pointer Arguments

slide-41
SLIDE 41

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } } swarm::enqueue(dijkstraTask, 0, sourceVertex); swarm::run();

Timestamp

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

Function Pointer Arguments

slide-42
SLIDE 42

Dijkstra as a Swarm program [MICRO’15]

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } } swarm::enqueue(dijkstraTask, 0, sourceVertex); swarm::run();

Implicit Parallelism No explicit synchronization Timestamp

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

7

Function Pointer Arguments

Conveys new work to hardware as soon as possible

slide-43
SLIDE 43

Swarm microarchitecture [MICRO’15]

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

8

Large hardware task queues Scalable ordered speculation Scalable ordered commits

64-tile, 256-core chip Tile organization

Core Core Core Core L1I/D L1I/D L1I/D L1I/D

L2 L3 slice

Router

Task unit

Mem / IO Mem / IO Mem / IO Mem / IO

Tile

Swarm executes all tasks speculatively and out of order

Efficiently supports thousands of tiny speculative tasks

slide-44
SLIDE 44

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B B D D E E

1 2 3 4 5 6 7 8

Non-speculative

1 256 512 Speedup 1c 128c 256c

Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-45
SLIDE 45

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B B D D E E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-46
SLIDE 46

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B B D D E E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15] A C B B D D E E

Time

1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-47
SLIDE 47

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-48
SLIDE 48

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-49
SLIDE 49

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-50
SLIDE 50

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-51
SLIDE 51

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-52
SLIDE 52

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-53
SLIDE 53

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-54
SLIDE 54

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

slide-55
SLIDE 55

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B D E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

20%

slide-56
SLIDE 56

Dijkstra’s algorithm has speculative parallelism

Order = Distance from source node

A C B B D D E E

1 2 3 4 5 6 7 8

Non-speculative All-speculative [MICRO’15]

1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on cage14 Dijkstra on USA-E

Task graph

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

9

Dijkstra performance

All-or-nothing speculation unduly burdens programmers

20%

slide-57
SLIDE 57

Dijkstra’s algorithm needs a hybrid strategy

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

slide-58
SLIDE 58

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

To be processed Finished

slide-59
SLIDE 59

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

To be processed Finished

slide-60
SLIDE 60

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

To be processed Finished

Run tasks non-speculatively when possible

slide-61
SLIDE 61

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

slide-62
SLIDE 62

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

slide-63
SLIDE 63

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

slide-64
SLIDE 64

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

slide-65
SLIDE 65

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

slide-66
SLIDE 66

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

Each task must be runnable in either mode

slide-67
SLIDE 67

Running non-speculatively

Dijkstra’s algorithm needs a hybrid strategy

Order = Distance from source node Task graph

1 2 3 4 5 6 7

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

10

Running speculatively To be processed Finished

Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism

Each task must be runnable in either mode Tasks in both modes must coordinate on shared data

slide-68
SLIDE 68

1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c

Dijkstra on USA

1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c

Espr Espresso esso reaps the benefits of non-speculative and speculative parallelism

Dijkstra on cage14

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

Non-speculative All-speculative Espresso

11

Espresso avoids pathologies and scales best

slide-69
SLIDE 69

Espr Espresso sso

COORDINATING SPECULATIVE AND NON-SPECULATIVE PARALLELISM

12

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

slide-70
SLIDE 70

Programs consist of tasks that run speculatively or non-speculatively

Espr Espresso esso execution model

Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-71
SLIDE 71

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-72
SLIDE 72

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-73
SLIDE 73

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-74
SLIDE 74

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-75
SLIDE 75

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-76
SLIDE 76

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-77
SLIDE 77

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

slide-78
SLIDE 78

Programs consist of tasks that run speculatively or non-speculatively

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso execution model

Arguments Function pointer Non-Spec. Spec. Timestamp barrier

  • rdered

commits Locale mutex reduce conflicts

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

13

Tasks in either mode can coordinate access to shared data

slide-79
SLIDE 79

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-80
SLIDE 80

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

Dispatch Candidates

Tile

7 9 10 …

Core Core

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-81
SLIDE 81

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-82
SLIDE 82

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-83
SLIDE 83

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-84
SLIDE 84

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-85
SLIDE 85

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC Dispatch Candidates

Tile

7 9 10 …

Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

Core

slide-86
SLIDE 86

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC Dispatch Candidates

Tile

7 9 10 …

Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

Core

slide-87
SLIDE 87

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC MAYSPEC Dispatch Candidates

Tile

7 9 10 …

Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

Core

slide-88
SLIDE 88

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC MAYSPEC Dispatch Candidates

Tile

7 9 10 …

Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

Core

slide-89
SLIDE 89

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC MAYSPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-90
SLIDE 90

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC MAYSPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

slide-91
SLIDE 91

Espresso supports three task types that control speculation

void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }

Espr Espresso esso task dispatch

SPEC NONSPEC MAYSPEC Dispatch Candidates

Tile

7 9 10 …

Core Core

7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

14

MAYSPEC lets the system decide whether to speculate

slide-92
SLIDE 92

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

Espr Espresso sso improves efficiency and programmability

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

NONSPEC Swarm MAYSPEC

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

15

MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism

slide-93
SLIDE 93

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

Espr Espresso sso improves efficiency and programmability

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

NONSPEC Swarm MAYSPEC

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

15

MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism

2.5x

slide-94
SLIDE 94

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

Espr Espresso sso improves efficiency and programmability

1 128 256 Speedup

sssp-cage

1 256 512

sssp-usa

1 128 256

cf

1 128 256

triangle

1 64 128 Speedup

genome

1 128 256

kmeans

1 128 256

color

1 256 512

bfs

1 64 128 Speedup 1c 128c 256c

mis

1 128 256 1c 128c 256c

astar

1 128 256 1c 128c 256c

des

NONSPEC Swarm MAYSPEC

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

15

NONSPEC: 29x gmean Swarm: 162x MAYSPEC: 198x 22% 6.9x

MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism

2.5x

slide-95
SLIDE 95

Please see the paper for more details!

Microarchitectural details Interactions between speculative and non-speculative tasks:

  • How are conflicts detected and resolved?
  • How do timestamps-as-barriers affect the ordered commit protocol?

Espresso exception model Additional results analysis

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

16

slide-96
SLIDE 96

Cap Capsu sules les

ENABLING SOFTWARE-MANAGED SPECULATION WITH ORDERED PARALLELISM

17

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

slide-97
SLIDE 97

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

slide-98
SLIDE 98

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory Core

D A

Core

Read & Write

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-99
SLIDE 99

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-100
SLIDE 100

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-101
SLIDE 101

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-102
SLIDE 102

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

Dependences on allocator metadata cause aborts among otherwise independent tasks

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-103
SLIDE 103

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

Dependences on allocator metadata cause aborts among otherwise independent tasks

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-104
SLIDE 104

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c

DES

Ideal allocator

Dependences on allocator metadata cause aborts among otherwise independent tasks

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-105
SLIDE 105

Some actions should bypass HW speculation

Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks

Memory

Free list

Core

D A

Core

Read & Write

1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c

DES

T C M a l l

  • c

Ideal allocator

Dependences on allocator metadata cause aborts among otherwise independent tasks

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

18

A B C D

slide-106
SLIDE 106

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

slide-107
SLIDE 107

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

slide-108
SLIDE 108

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

slide-109
SLIDE 109

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

slide-110
SLIDE 110

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

slide-111
SLIDE 111

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D Unchecked

slide-112
SLIDE 112

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

slide-113
SLIDE 113

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

!

slide-114
SLIDE 114

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

!

slide-115
SLIDE 115

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D

!

slide-116
SLIDE 116

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D Unchecked

!

slide-117
SLIDE 117

Disable hardware speculation [Moravan, ASPLOS’06]? Sp Specula lativ ive d data f a for

  • rwar

ardin ing g creates challenges

Critical for ordered parallelism Can cause tasks to lose integrity !

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

19

Speculative tasks can access data written by earlier, uncommitted tasks

1 128 256 Speedup 1c 128c 256c

DES

No forwarding With forwarding

5x

Memory

Free list

Core

D A

Core

A B C D Unchecked

!

Simply disabling hardware speculation is unsafe with speculative forwarding

slide-118
SLIDE 118

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

slide-119
SLIDE 119

D

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory Core

A

Core

A B C D

slide-120
SLIDE 120

D

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory Core

A

Core

A B C D

slide-121
SLIDE 121

D

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D

slide-122
SLIDE 122

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D Unchecked

slide-123
SLIDE 123

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D

!

Unchecked

slide-124
SLIDE 124

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D

!

Unchecked

slide-125
SLIDE 125

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D

slide-126
SLIDE 126

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity Vectored call interface: guarantees control-flow integrity in a capsule

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory Core

A

Core

A B C D

slide-127
SLIDE 127

D

Unversioned, no conflict checks Only accessible by

  • non-speculative tasks
  • speculative capsules

Holds the capsule call vector

Ca Capsu sules ensure safety through OS-like protections

Untracked memory: protected from tasks that lose integrity Vectored call interface: guarantees control-flow integrity in a capsule

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

20

Tracked memory

Free list

Untracked memory

&malloc &calloc …

Core

A

Core

A B C D

slide-128
SLIDE 128

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

Ca Capsu sules enable important system services

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

TCMalloc Ideal allocator capalloc

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

21

Capsule-based allocator malloc, etc. are capsule functions metadata resides in untracked memory Only gmean 30% slower than ideal

slide-129
SLIDE 129

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

Ca Capsu sules enable important system services

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

1 64 128 Speedup

genome

1 128 256

des

1 256 512 Speedup 1c 128c 256c

nocsim

1 64 128 1c 128c 256c

silo

TCMalloc Ideal allocator capalloc

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

21

capalloc retains the scalability of an ideal allocator

Capsule-based allocator malloc, etc. are capsule functions metadata resides in untracked memory Only gmean 30% slower than ideal

slide-130
SLIDE 130

Conclusion

Speculative systems should support non-speculative execution to improve efficiency, ease programmability, and enable new capabilities Espresso: an execution model for speculative and non-speculative tasks

  • Provides shared synchronization mechanisms to all tasks
  • Lets the system adaptively run tasks speculatively or non-speculatively

Capsules: speculative tasks safely invoke software-managed speculation

  • Enable important speculation-friendly services like scalable memory allocation

HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM

22