Executing a Program on the MIT Tagged Token Dataflow Architecture
Arvind and Nikhil
Executing a Program on the MIT Tagged Token Dataflow Architecture - - PowerPoint PPT Presentation
Executing a Program on the MIT Tagged Token Dataflow Architecture Arvind and Nikhil Notes on the paper This is a Big A Architecture paper Its a PL, an ISA, and an execution model and a dash of hardware Execution Models:
Arvind and Nikhil
3
Von Neumann (CMP)
Program counter Centralized Sequential
To serialization points
Instruction fetch Memory access
4
Not a new idea [Dennis, ISCA’75] Programs are dataflow graphs Instructions fire when data arrives
Instructions act independently All ready instructions can fire at once Massive parallelism
+
4
Not a new idea [Dennis, ISCA’75] Programs are dataflow graphs Instructions fire when data arrives
Instructions act independently All ready instructions can fire at once Massive parallelism
+ 2
4
Not a new idea [Dennis, ISCA’75] Programs are dataflow graphs Instructions fire when data arrives
Instructions act independently All ready instructions can fire at once Massive parallelism
+ 2 2 +
4
Not a new idea [Dennis, ISCA’75] Programs are dataflow graphs Instructions fire when data arrives
Instructions act independently All ready instructions can fire at once Massive parallelism
+ 2 4 2 +
5
A[j + i*i] = i; b = A[i*j]; Mul t1 ← i, j Mul t2 ← i, i Add t3 ← A, t1 Add t4 ← j, t2 Add t5 ← A, t4 Store (t5) ← i Load b ← (t3)
6
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
7
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
8
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
9
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
10
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
11
A[j + i*i] = i; b = A[i*j];
* Load Store + j i * b A + +
Use a switch operator
No wasted work. Natural correspondence to if-then Can build loops
Use a gated phi function (ala SSA)
More parallelism -- defer predicate
computation
Not suitable for loops Computing predicate is tricky (but
solved)
12
phi
T F P
Switch
T F P
Use a “steering” operator.
13
14
Exactly one input on each dataflow arc at one
Finite state (~ the size of the dataflow graph) Scheduling is easy Parallelism limited by dataflow graph size (i.e. static
instruction count)
No loop parallelism.
15
Dynamic dataflow
Multiple inputs on an arc at one time Parallelism is possible -- pipeline iterations through
the loops graph
Unbounded state Circulation speed mismatch -- mis-matched inputs Tags are required.
16
A A A B B B S S S
1:A 3:A 2:A 3:B 1:B 2:B 3:S 2:S 1:S
Tags distinguish between different dynamic
Tag management in TTDA
Tags are the address of an activation record (aka
stack frame)
A dynamic instance of an “instruction block” has a
tag.
A central manager allocates/reclaims them.
17
How big should the threads that “fire” be? Fine-grain
In the limit, each instruction is a thread Maximum parallelism Lots synchronization overhead. Bounded # of inputs
Coarse-grain
Potentially less parallelism (in practice?) less synchronization overhead and variable inputs
It’s had to beat straight-line code on a pipelined machine.
5-stages == 5-way parallelism Pretty good for short threads
18
Building well-formed graphs.
In von Neumann ISAs any sequence of instructions is
valid
Complex rules for well-formed dataflow graphs
Detecting completion
It is hard to tell when a fully distributed system is
“finished”
Preventing tag explosion
k-loop bounding et. al.
Executing “normal” languages.
19
j will
token pile up!
But it might
Tokens out of
20
Elegant
Determinate Functional Non-strict Implicit parallelism. I-structures
Non-strictness is the least intuitive property
Exposes enormous parallelism. Leadings to mind bending code.
21
A sort of dataflow-enabled storage element Simple rules
Write/initialize once. Read from an uninitialized I-structure blocks. Read from an initialized I-structure returns. Write to an uninitialized I-structure unblocks reads Write to an initialized I-structure is an error.
Implementation is tricky: you need a queue for
22
Id never really went anywhere This paper is a good snap shot of late 80’s
Eventually gives rise to OOO execution (a la
Excellent example of vertical co-design.
They rethought the whole system Almost always impractical Often yields great ideas.
23
How do you execute normal languages? How do you multitask? How does function linking work? Top-to-bottom design. Where’s the data? Would I-structures be useful today?
24