FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE - - PowerPoint PPT Presentation

fractal
SMART_READER_LITE
LIVE PREVIEW

FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE - - PowerPoint PPT Presentation

FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM SU SUVINAY Y SU SUBRAMANIAN , MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ IS ISCA 2017 Current speculative systems


slide-1
SLIDE 1

SU SUVINAY Y SU SUBRAMANIAN, MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ IS ISCA 2017

FRACTAL

AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

slide-2
SLIDE 2

Current speculative systems scale poorly

Speculative parallelization, e.g. TM, simplifies parallel programming Performs poorly on real world applications… …because applications comprise large atomic tasks

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 2

slide-3
SLIDE 3

Large atomic tasks limit performance

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 3

Database Transaction

query X … … update Z … … query U … … update V

Millions of cycles Prone to aborts Challenging to track Serial (misses parallelism)

slide-4
SLIDE 4

Large atomic tasks have abundant nested parallelism!

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 4

qry X qry K upd Z qry Y qry Y upd J qry S

qry M upd L … qry U upd V

… …

How to

  • extract parallelism?
  • maintain atomicity?
  • achieve high performance?

slide-5
SLIDE 5

Prior TMs fail to exploit nested parallelism

  • 1. Merging of “nested” speculative

state with parent

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 5

Core 2 Core 4 Core 1 Core 3 Time

X

A B

Y X A B Y

  • 2. Cyclic dependence between parent

and nested children

Large speculative state, prone to aborts Deadlock and livelock issues

See the paper for more details!

slide-6
SLIDE 6

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 6

Ordering tasks to guarantee atomicity

Core 2 Core 4 Core 1 Core 3 Time

X

A B

Y X A B Y X Y 1 2 X Y 1 2 A B 1.1 1.2

slide-7
SLIDE 7

Fr Fractal decouples atomicity from parallelism

  • 1. Decouples unit of atomicity from unit of parallelism
  • Domain: All tasks belonging to a domain appear to execute atomically
  • 2. Implementation guarantees atomicity by ordering tasks
  • No merging speculative state

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 7

Benefits of Fr

Fractal

Tiny tasks Easy to track Composable speculative parallelism

slide-8
SLIDE 8

Fr Frac actal al Execution Model

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 8

DECOUPLING ATOMICITY FROM PARALLELISM

slide-9
SLIDE 9

Domains to group tasks into atomic units

Fractal programs consist of atomic tasks Tasks may access arbitrary data Tasks may create child tasks Tasks belong to a hierarchy of nested domains

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 9

slide-10
SLIDE 10

Semantics across domains

Each task:

  • can create a single subdomain
  • can enqueue child tasks to

subdomain or current domain

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 10

A B C D E X L M N O P Y

(All tasks in domain + creator of domain) Appear to execute as single atomic unit à Root domain

slide-11
SLIDE 11

Semantics within a domain

Unordered

  • Arbitrary order while respecting

parent-child dependences

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 11

A B C D E X L M N O P Y Timestamp-ordered

  • Tasks appear to execute in

increasing timestamp order

  • Children appear to execute

after parent 1 10 2 3 12 Root domain

slide-12
SLIDE 12

Fr Fractal software API

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 12

fractal::enqueue(function_pointer, timestamp, arguments...); fractal::create_subdomain(<domain_type>);

Creating and enqueuing tasks Creating sub-domains

forall(), callcc(), parallel_reduce()

High-level programming interface, e.g.

slide-13
SLIDE 13

Example: Database transactions in Fr

Fractal

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 13

query X query Z update Z query U update V qry X qry Z upd Z qry U upd V query A query B update C update Z update K qry A qry B upd C upd Z upd K

Root domain TXN 1 TXN 2 1 2 3 4 5 1 2 3 4 5

T1 T2

slide-14
SLIDE 14

Fr Frac actal al Implementation

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 14

ATOMICITY THROUGH ORDERING

slide-15
SLIDE 15

Fr Fractal Virtual Time (VT)

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 15

Fractal assigns a fractal virtual time (VT) to each task Captures the ordering of tasks across domains, within a domain Fractal VT=

45 23 108 … 9 Domain VT …

slide-16
SLIDE 16

Example: Database transactions in Fr

Fractal

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16

query X query Z update Z query U update V qry X qry Z upd Z qry U upd V query A query B update C update Z update K qry A qry B upd C upd Z upd K

Root domain TXN 1 TXN 2 1 2 3 4 5 1 2 3 4 5 1 1 1 2 1 3 1 4 1 5 2 1 2 4 2 2 2 5 2 3

T1 T2

1 2

slide-17
SLIDE 17

Example: Database transactions in Fr

Fractal

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16

query X query Z update Z query U update V qry X qry Z upd Z qry U upd V query A query B update C update Z update K qry A qry B upd C upd Z upd K

Root domain TXN 1 TXN 2 1 2 3 4 5 1 2 3 4 5 1 1 1 2 1 3 1 4 1 5 2 1 2 4 2 2 2 5 2 3

Fractal VT captures all ordering information

T1 T2

1 2

slide-18
SLIDE 18

Swarm[MICRO’15] : An efficient substrate for

  • rdered speculation

Large hardware task queues Scalable ordered commits Scalable ordered speculation

17

64-tile, 256-core chip Tile organization

Core Core Core Core L1I/D L1I/D L1I/D L1I/D

L2 L3 slice

Router

Task unit

Mem / IO Mem / IO Mem / IO Mem / IO

Tile

Efficiently supports tiny speculative tasks

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

Swarm executes tasks speculatively and out of order

slide-19
SLIDE 19

Fr Fractal features

Fractal VT construction requires no centralized structures Fractal VT assigns order dynamically Hardware supports a few number of concurrent depths

  • “Zooming” operations allow for unbounded nesting
  • Spill tasks from shallower domains to memory
  • Parallelism compounds quickly with depth

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 18

See the paper for more details!

slide-20
SLIDE 20

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K

T1

1

slide-21
SLIDE 21

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry U

1 4

T1

1

slide-22
SLIDE 22

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

qry U

1 4

upd V

1 5

T1

1

slide-23
SLIDE 23

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

qry U

1 4

upd V

1 5

T1

1

T2

2

slide-24
SLIDE 24

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

qry U

1 4

upd V

1 5

qry A

2 1

T1

1

T2

2

slide-25
SLIDE 25

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

T1

1

T2

2

slide-26
SLIDE 26

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

T1

1

T2

2

slide-27
SLIDE 27

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

T1

1

T2

2

slide-28
SLIDE 28

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

T1

1

T2

2

slide-29
SLIDE 29

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

T1

1

T2

2

slide-30
SLIDE 30

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

Tracking, conflict detection at level of fine-grain tasks

T1

1

T2

2

slide-31
SLIDE 31

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

Tracking, conflict detection at level of fine-grain tasks

T1

1

T2

2

slide-32
SLIDE 32

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

Tracking, conflict detection at level of fine-grain tasks Selective aborts waste less work

T1

1

T2

2

slide-33
SLIDE 33

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

qry B

2 2

upd C

2 3

Task-level tracking Task-level CD Selective aborts

T1

1

T2

2

slide-34
SLIDE 34

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd C

2 3

Task-level tracking Task-level CD Selective aborts

T1

1

T2

2

slide-35
SLIDE 35

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19

Time

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

qry U

1 4

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts

T1

1

T2

2

slide-36
SLIDE 36

qry U

1 4

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 20

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts Time

T1

1

T2

2

slide-37
SLIDE 37

qry U

1 4

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 20

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task qry X

1 1

qry Z

1 2

upd Z

1 3

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts Time

T2

2

T1

slide-38
SLIDE 38

qry U

1 4

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 20

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task upd Z

1 3

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts Time qry X qry Z

T2

2

T1

slide-39
SLIDE 39

qry U

1 4

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 20

TXN 1 TXN 2

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task upd Z

1 3

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts Commit parent before child completes Time qry X qry Z

T2

2

T1

slide-40
SLIDE 40

T1

qry U

1 4

upd K

2 5

upd Z

2 4

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 21

TXN 1 = X TXN 2 = Y

Core 2 Core 4 Core 1 Core 3 query X query Z update Z query U update V query A query B update C update Z update K Abort task upd Z

1 3

upd V

1 5

qry A

2 1

upd Z

2 4

qry B

2 2

upd K

2 5

upd C

2 3

Task-level tracking Task-level CD Selective aborts Commit parent before child completes Time qry X qry Z

T2

2

Fractal unlocks the benefits of fine-grain parallelism

slide-41
SLIDE 41

Evaluation

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 22

slide-42
SLIDE 42

Event-driven, Pin-based simulator Target system: 256-core, 64-tile chip

Methodology

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 23

Scalability experiments from 1–256 cores

  • Scaled-down systems have fewer tiles

Core Core Core Core L1I/D L1I/D L1I/D L1I/D

L2 L3 slice

Router

Task unit

Mem / IO Mem / IO Mem / IO Mem / IO

Tile

64 MB shared L3 (1MB/tile) 256 KB per-tile L2s 16 KB per-core L1s 16K task queue entries (64/core) 4K commit queue entries (16/core) In-order, single-issue, scoreboarded

Applications

  • Unordered (STAMP):

labyrinth, bayes

  • Ordered: color, msf, silo,

maxflow, mis

slide-43
SLIDE 43

Fr Fractal uncovers abundant nested parallelism

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 24

Flat Fractal

Large atomic tasks Nested parallelism exposed through fine-grained tasks

slide-44
SLIDE 44

Fr Fractal uncovers abundant nested parallelism

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 25

1 128 256 Speedup 1c 128c 256c

322x

maxflow

1 128 256 1c 128c 256c

bayes

1 64 128 1c 128c 256c

labyrinth

Flat 1x—4.9x Fractal 88x—322x

Flat Fractal Flat 3260 1.8 M 16 M Fractal 373 3590 220 Average task length (cycles)

slide-45
SLIDE 45

Fr Fractal avoids over-serialization

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 26

1 64 128 Speedup 1c 128c 256c

145x

mis

1 64 128 1c 128c 256c

color

1 32 64 1c 128c 256c

msf

Flat Fractal Swarm

Flat 26x—98x Swarm 21x—119x Fractal 40x—145x

Flat 162 633 113 Fractal 115 96 49 Average task length (cycles)

slide-46
SLIDE 46

Conclusion

Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism

  • Decouple atomicity from parallelism
  • Guarantee atomicity by ordering tasks

Fractal unlocks the benefits of fine-grain speculative parallelism

  • Parallelizes many challenging workloads
  • Enables composition of speculative parallel algorithms

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 27

slide-47
SLIDE 47

Thank You! Questions?

Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism

  • Decouple atomicity from parallelism
  • Guarantee atomicity by ordering tasks

Fractal unlocks the benefits of fine-grain speculative parallelism

  • Parallelizes many challenging workloads
  • Enables composition of speculative parallel algorithms

FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 28