Executing a Program on the MIT Tagged Token Dataflow Architecture - - PowerPoint PPT Presentation

executing a program on the mit tagged token dataflow
SMART_READER_LITE
LIVE PREVIEW

Executing a Program on the MIT Tagged Token Dataflow Architecture - - PowerPoint PPT Presentation

Executing a Program on the MIT Tagged Token Dataflow Architecture Arvind and Nikhil Notes on the paper This is a Big A Architecture paper Its a PL, an ISA, and an execution model and a dash of hardware Execution Models:


slide-1
SLIDE 1

Executing a Program on the MIT Tagged Token Dataflow Architecture

Arvind and Nikhil

slide-2
SLIDE 2

Notes on the paper

  • This is a “Big A” Architecture paper
  • It’s a PL, an ISA, and an execution model
  • and a dash of hardware
slide-3
SLIDE 3

3

Execution Models: Von Neumann

 Von Neumann (CMP)

 Program counter  Centralized  Sequential

 To serialization points

 Instruction fetch  Memory access

slide-4
SLIDE 4

4

Execution Model: Dataflow

 Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives

 Instructions act independently  All ready instructions can fire at once  Massive parallelism

+

slide-5
SLIDE 5

4

Execution Model: Dataflow

 Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives

 Instructions act independently  All ready instructions can fire at once  Massive parallelism

+ 2

slide-6
SLIDE 6

4

Execution Model: Dataflow

 Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives

 Instructions act independently  All ready instructions can fire at once  Massive parallelism

+ 2 2 +

slide-7
SLIDE 7

4

Execution Model: Dataflow

 Not a new idea [Dennis, ISCA’75]  Programs are dataflow graphs  Instructions fire when data arrives

 Instructions act independently  All ready instructions can fire at once  Massive parallelism

+ 2 4 2 +

slide-8
SLIDE 8

5

Von Neumann example

A[j + i*i] = i; b = A[i*j]; Mul t1 ← i, j Mul t2 ← i, i Add t3 ← A, t1 Add t4 ← j, t2 Add t5 ← A, t4 Store (t5) ← i Load b ← (t3)

slide-9
SLIDE 9

6

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-10
SLIDE 10

7

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-11
SLIDE 11

8

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-12
SLIDE 12

9

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-13
SLIDE 13

10

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-14
SLIDE 14

11

Dataflow example

A[j + i*i] = i; b = A[i*j];

* Load Store + j i * b A + +

slide-15
SLIDE 15

Conditionals

 Use a switch operator

 No wasted work.  Natural correspondence to if-then  Can build loops

 Use a gated phi function (ala SSA)

 More parallelism -- defer predicate

computation

 Not suitable for loops  Computing predicate is tricky (but

solved)

12

phi

T F P

Switch

T F P

slide-16
SLIDE 16

Conditionals

 Use a “steering” operator.

13

slide-17
SLIDE 17

Loops

14

slide-18
SLIDE 18

Managing parallelism: Static dataflow

 Exactly one input on each dataflow arc at one

time

 Finite state (~ the size of the dataflow graph)  Scheduling is easy  Parallelism limited by dataflow graph size (i.e. static

instruction count)

 No loop parallelism.

15

+

A B

slide-19
SLIDE 19

Managing Parallelism: Dynamic dataflow

 Dynamic dataflow

 Multiple inputs on an arc at one time  Parallelism is possible -- pipeline iterations through

the loops graph

 Unbounded state  Circulation speed mismatch -- mis-matched inputs  Tags are required.

16

+

A A A B B B S S S

+

1:A 3:A 2:A 3:B 1:B 2:B 3:S 2:S 1:S

slide-20
SLIDE 20

Dataflow tags

 Tags distinguish between different dynamic

instances of the same value

 Tag management in TTDA

 Tags are the address of an activation record (aka

stack frame)

 A dynamic instance of an “instruction block” has a

tag.

 A central manager allocates/reclaims them.

17

slide-21
SLIDE 21

Dataflow Granularity

 How big should the threads that “fire” be?  Fine-grain

 In the limit, each instruction is a thread  Maximum parallelism  Lots synchronization overhead.  Bounded # of inputs

 Coarse-grain

 Potentially less parallelism (in practice?)  less synchronization overhead and variable inputs

 It’s had to beat straight-line code on a pipelined machine.

 5-stages == 5-way parallelism  Pretty good for short threads

18

slide-22
SLIDE 22

Challenges in Dataflow Execution

 Building well-formed graphs.

 In von Neumann ISAs any sequence of instructions is

valid

 Complex rules for well-formed dataflow graphs

 Detecting completion

 It is hard to tell when a fully distributed system is

“finished”

 Preventing tag explosion

 k-loop bounding et. al.

 Executing “normal” languages.

19

slide-23
SLIDE 23

 j will

probably run ahead of s.

 token pile up!

 But it might

not

 Tokens out of

  • rder!

20

slide-24
SLIDE 24

Id

 Elegant

 Determinate  Functional  Non-strict  Implicit parallelism.  I-structures

 Non-strictness is the least intuitive property

 Exposes enormous parallelism.  Leadings to mind bending code.

21

slide-25
SLIDE 25

I-structures

 A sort of dataflow-enabled storage element  Simple rules

 Write/initialize once.  Read from an uninitialized I-structure blocks.  Read from an initialized I-structure returns.  Write to an uninitialized I-structure unblocks reads  Write to an initialized I-structure is an error.

 Implementation is tricky: you need a queue for

for blocked reads.

22

slide-26
SLIDE 26

In context

 Id never really went anywhere  This paper is a good snap shot of late 80’s

dataflow thinking.

 Eventually gives rise to OOO execution (a la

HPS)

 Excellent example of vertical co-design.

 They rethought the whole system  Almost always impractical  Often yields great ideas.

23

slide-27
SLIDE 27

Bits from your summaries

 How do you execute normal languages?  How do you multitask?  How does function linking work?  Top-to-bottom design.  Where’s the data?  Would I-structures be useful today?

24