Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend - - PowerPoint PPT Presentation

▶

Feb 19, 2023 302 likes •690 views

Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia University MEMOCODE, Vienna, Austria, October 1, 2017 gcd( a , b ) = if a = b a else if a < b gcd( a , b a ) else gcd( a b , b ) a b gcd(

SLIDE 1

Compositional Dataflow Circuits

Stephen A. Edwards Richard Townsend Martha A. Kim

Columbia University

MEMOCODE, Vienna, Austria, October 1, 2017

SLIDE 2

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

SLIDE 3

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b) a b

SLIDE 4

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

1 0 1 0

a b

=

mux

1

initial token

SLIDE 5

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

1 0 1 0

a b

=

1 0 1 0

fork

gcd(a,b)

discard

1

demux

SLIDE 6

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

1 0 1 0

a b

=

1 0 1 0

gcd(a,b)

discard

<

1

SLIDE 7

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

1 0 1 0

a b

=

1 0 1 0 1 0 1 0

gcd(a,b)

discard

<

1

−

SLIDE 8

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

1 0 1 0

a b

=

1 0 1 0 1 0 1 0

gcd(a,b)

discard

<

1

1 0 1 0

−

SLIDE 9

gcd(a,b) = if a = b a else if a < b gcd(a,b − a) else gcd(a −b,b)

Townsend et al. CC ’2017

1 0 1 0

a b

=

1 0 1 0 1 0 1 0

gcd(a,b)

discard

<

1

1 0 1 0

− −

SLIDE 10

Patience Through Handshaking

Want patient blocks to handle delays from Memory systems Data-dependent computations Full buffers Shared resources Busy computational units

SLIDE 11

Patience Through Handshaking

Want patient blocks to handle delays from Memory systems Data-dependent computations Full buffers Shared resources Busy computational units

upstream downstream

data valid ready valid ready Meaning 1 1 Token transferred 1 Token valid; held

−

No token to transfer

Latency-insensitive Design (Carloni et al.) Elastic Circuits (Cortadella et al.) FIFOs with backpressure

SLIDE 12

Combinational Function Block

Strict/Unit Rate: All input tokens required to produce an output in0 in1

f

Datapath Combinational function ignores flow control

SLIDE 13

Combinational Function Block

Strict/Unit Rate: All input tokens required to produce an output in0 in1

f

Valid network Output valid if both inputs are valid

SLIDE 14

Combinational Function Block

Strict/Unit Rate: All input tokens required to produce an output in0 in1

f

Ready network Input tokens consumed if output token is consumed (output is valid and ready)

SLIDE 15

Multiplexer Block

in0 in1 in2

select in0 in1 in2 select

decoder

SLIDE 16

Demultiplexer Block

ut0 out1 out2

in select select in

decoder

SLIDE 17

Buffering a Linear Pipeline (Point 1/4)

Combinational block

SLIDE 18

Buffering a Linear Pipeline (Point 1/4)

Long Combinational Path (Data + Valid)

SLIDE 19

Buffering a Linear Pipeline (Point 1/4)

Data buffer: Pipeline register with valid, enable

1

SLIDE 20

Buffering a Linear Pipeline (Point 1/4)

1

Long Combinational Path (Ready)

SLIDE 21

Buffering a Linear Pipeline (Point 1/4)

1

Control Buffer: Register diverts token when downstream suddenly stops

1 1 1

Cao et al. MEMOCODE 2015 Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)

SLIDE 22

The Problem with Fork

Combinational Block: inputs ready when both valid &

utput ready

SLIDE 23

The Problem with Fork

Combinational Block: inputs ready when both valid &

utput ready

SLIDE 24

The Problem with Fork

Fork:

utputs valid only

when all are ready

SLIDE 25

The Problem with Fork

Fork:

utputs valid only

when all are ready

SLIDE 26

The Problem with Fork

Fork:

utputs valid only

when all are ready Oops: Combinational Cycle This is not compositional

SLIDE 27

The Solution to Combinational Loops (Point 2/4)

valid ready

SLIDE 28

The Solution to Combinational Loops (Point 2/4)

valid ready

SLIDE 29

The Solution to Combinational Loops (Point 2/4)

valid ready Allowed: Combinational paths from valid to ready

SLIDE 30

The Solution to Combinational Loops (Point 2/4)

valid ready

X X X X X

Allowed: Combinational paths from valid to ready Prohibited: Combinational paths from ready to valid

SLIDE 31

The Solution to Fork: A Little State (Point 3/4)

in

Valid out ignores ready

f other outputs

SLIDE 32

The Solution to Fork: A Little State (Point 3/4)

in

Valid out ignores ready

f other outputs

Flip-flop set after token sent suppresses duplicates

SLIDE 33

The Solution to Fork: A Little State (Point 3/4)

in

Valid out ignores ready

f other outputs

Flip-flop set after token sent suppresses duplicates Input consumed once one token sent on every output

SLIDE 34

Nondeterministic Merge (Point 4/4)

f f f

Share with merge/demux merge

f

demux select

SLIDE 35

Two-Way Nondeterministic Merge Block w/ Select

in0 in1

sel 1 Arbiter “Two-way fork with multiplexed output selected by an arbiter”

SLIDE 36

Experiments: Random Buffer Placement

2 4 6 2 4 6 8 10 (7 buffers) Completion Time (µs) Number of buffer pairs GCD(100,2) 750 1500 2250 2 4 6 8 10 (80 buffers) 21-way Conveyor 1 2 3 2 4 6 8 10 (96 buffers) BSN

SLIDE 37

Best Buffering for GCD (Manually Obtained)

Each loop has one of each buffer Data Buffer Control Buffer

SLIDE 38

Summary

Compositional Dataflow Networks as an IR Patient dataflow blocks with valid/ready handshaking

1. Break downstream, upstream paths w/ two buffer types
2. Avoid comb. cycles: prohibit ready-to-valid paths
3. Add one state bit per output so forks may “race ahead”
4. Tame nondeterministic merge with a select output