Automatic Compilation of Data-Driven Circuits Sam Taylor, Doug - - PowerPoint PPT Presentation

automatic compilation of data driven circuits
SMART_READER_LITE
LIVE PREVIEW

Automatic Compilation of Data-Driven Circuits Sam Taylor, Doug - - PowerPoint PPT Presentation

Automatic Compilation of Data-Driven Circuits Sam Taylor, Doug Edwards, Luis Plana University of Manchester smtaylor|doug|lplana@cs.manchester.ac.uk Summary Handshake Circuit paradigm is nice Control-driven style is flexible but slow


slide-1
SLIDE 1

Automatic Compilation of Data-Driven Circuits

Sam Taylor, Doug Edwards, Luis Plana University of Manchester smtaylor|doug|lplana@cs.manchester.ac.uk

slide-2
SLIDE 2

Summary

  • Handshake Circuit paradigm is nice
  • Control-driven style is flexible but slow
  • Data-driven approaches provide better

performance

  • Combine data-driven approach with

handshake circuit paradigm

  • An alternative option for designers?
slide-3
SLIDE 3

Balsa Design Flow

Balsa code Handshake Circuit (Breeze netlist) Gate−level netlist

balsa−netlist Balsa compiler Gate−level simulation Layout simulation Behavioural simulation (breeze−sim)

Behaviour Function Layout

Commercial layout tools

Timing

re−use Design refinement (manual process)

slide-4
SLIDE 4

Handshake Circuits

  • Intermediate representation independent
  • f implementation styles
  • Networks of small components

communicating by handshakes

  • Each component (relatively)

straightforward to implement in isolation

  • Successful method of implementing large

circuits

  • Syntax-directed translation
slide-5
SLIDE 5

Balsa one-place buffer

#

;

V Sync (activation) channel Data channel Request Acknowledge

variable v loop i -> v;

  • <- v

end

O activate i

slide-6
SLIDE 6

Advantages of control-driven structure

  • Passive-ported variable is very flexible.

Read and write in any order like a sequential programming language

  • Familiar control structures - loops etc.
  • Low power – nothing gets done that does

not need doing.

slide-7
SLIDE 7

Why does the structure of Balsa circuits make them slow?

  • Control-driven compilation
  • Monolithic control
  • Lots of sequencers
  • Frequent synchronisation between control and

data

  • Control Overhead. Data is always waiting for

control.

  • Data-driven style attempts to avoid all of these

problems

slide-8
SLIDE 8

Control-driven structure

V1 ; FV @ Output control activate Write control conditional processing

  • utput

processing V0 Input control A O Write control Input control Input control Output control conditional processing

  • utput

processing

slide-9
SLIDE 9

Three main issues

  • All inputs are synchronised
  • Sequential activation of ‘reads’ and ‘writes’
  • Data processing operations occur

sequentially after control instead of in parallel So look at the main structures of Balsa handshake circuits and replace with data- driven alternatives

slide-10
SLIDE 10

Input control

FV FV Processing activate a b Processing

dup

a b activate

slide-11
SLIDE 11

Localised sequencing

input i

  • utput v

during v <- i end input v

  • utput o

during

  • <- v

end

#

;

V V

loop i -> v;

  • <- v

end

i

  • i
slide-12
SLIDE 12

Data processing

FV FV activate a b + | |

  • 1
  • 2

a, b -> then

  • 1 <- a + b

|| o2 <- b end

slide-13
SLIDE 13

Data processing

input a, b

  • utput o1, o2

during

  • 1 <- a + b
  • 2 <- b

end

dup

a b +

  • 1
  • 2
slide-14
SLIDE 14

T T C C C T T

  • 1.req
  • 1.ack
  • 2.req
  • 2.ack

activate.req activate.ack a.req a.ack b.req b.ack

T C T C

  • 2.ack

a.req a.ack b.req b.ack

  • 1.req
  • 1.ack
  • 2.req
slide-15
SLIDE 15

Data-driven structure

V1 @ Output control Write control conditional processing

  • utput

processing V0 A O Write control Output control conditional processing

  • utput

processing

slide-16
SLIDE 16

Code

a, b -> then

  • 1 <- a + b

|| o2 <- b end input a, b

  • utput o1, o2

during

  • 1 <- a + b
  • 2 <- b

end

Each block in data-driven code is basically the description of a pipeline stage.

slide-17
SLIDE 17

Balsa vs. data-driven philosophy

  • List of operations
  • Do all of these
  • perations as soon as

you can (speculate)

  • Don't synchronise

until you absolutely must

  • Throw away the

results of operations you don't need

  • Collect all inputs
  • Decide what
  • peration to do
  • Do the operation
  • Release the inputs
slide-18
SLIDE 18

Design Flow

Handshake Circuit (Breeze netlist) Gate−level netlist

Gate−level simulation Layout simulation Behavioural simulation (breeze−sim)

Behaviour Function Layout

Commercial layout tools

Timing Data−driven code Balsa code

Balsa compiler re−use Data−driven compiler

behaviour descriptions new component gate−level descriptions new component

balsa−netlist Design refinement (manual process)

slide-19
SLIDE 19

nanoSpa

  • Cut-down ARM processor
  • Balsa design intended for maximum

performance

  • Data-driven equivalent with same architecture

and handshake component implementation style (try to look just at improvement from structure)

  • Data-driven bundled data and dual-rail

implementations both about 1.5x improvement

  • ver Balsa version
slide-20
SLIDE 20

Syntax-directed translation?

  • To use syntax-directed translation I restricted the

input language so that one could only write what I wanted to produce!

  • This is probably fine for an experienced designer

– it gives them what they want.

  • Probably not fine for others – they don’t know

how to think ‘asynchronous’.

  • But the same thinking is needed to write fast

Balsa.

slide-21
SLIDE 21

Conclusion

  • The structure of control-driven handshake

circuits is familiar and flexible but contributes to their poor performance

  • Data-driven circuits perform better but are not as

familiar and flexible

  • Both styles can be combined in the same flow
  • Future work could include automatic

transformation from control to data-driven or at least more structures to assist data-driven design

slide-22
SLIDE 22
slide-23
SLIDE 23

C C C T T T C T C

CD CD adder

activate.ack activate.req a.ack b.ack b.req a.req

  • 1.ack
  • 2.ack
  • 1.req
  • 2.req
slide-24
SLIDE 24

T T C

adder CD

a.ack b.ack a.req b.req

  • 1.ack
  • 2.ack
  • 1.req
  • 2.req
slide-25
SLIDE 25

@ | | | | |

to execute

LDM/STM decode Iterative Regular

from fetch

decode

@ | | | | |

from fetch to execute ctrl

LDM/STM decode Regular decode

slide-26
SLIDE 26

Write Control r0 r1 r3 r4 control data

r0 r1 r2 r3 control data

Control Write Control Write Control Write Control Write

slide-27
SLIDE 27

| |