Asynchronous Circuits in W ORKCRAFT Victor Khomenko, Danil Sokolov, - - PowerPoint PPT Presentation

asynchronous circuits
SMART_READER_LITE
LIVE PREVIEW

Asynchronous Circuits in W ORKCRAFT Victor Khomenko, Danil Sokolov, - - PowerPoint PPT Presentation

Logic Decomposition of Asynchronous Circuits in W ORKCRAFT Victor Khomenko, Danil Sokolov, Alex Yakovlev Motivation Logic decomposition is one of the most difficult tasks in the design flow Much more difficult than for synchronous


slide-1
SLIDE 1

Logic Decomposition of Asynchronous Circuits in WORKCRAFT

Victor Khomenko, Danil Sokolov, Alex Yakovlev

slide-2
SLIDE 2

2

Motivation

  • Logic decomposition is one of the most

difficult tasks in the design flow

  • Much more difficult than for synchronous

circuits – no guarantee of success

  • The quality of the resulting circuit (in terms
  • f area and latency) depends to a large

extent on the way logic decomposition was performed

slide-3
SLIDE 3

3

Speed-independency assumptions

  • Gates are atomic (so no internal hazards)
  • Gates’ delays are positive and unbounded

(and perhaps variable)

  • Wire delays are negligible (SI) or,

alternatively, wire forks are isochronic (QDI)

F

instant evaluator delay

slide-4
SLIDE 4

4

Speed-independent decomposition

G

H1 Hk

… …

delay delay delay

F

instant evaluator delay

slide-5
SLIDE 5

5

VME Bus Controller

Device VME Bus Controller lds ldtack d Data Transceiver Bus dsr dtack lds- d- ldtack- ldtack+ dsr- dtack+ d+ dtack- dsr+ lds+ csc+ csc-

slide-6
SLIDE 6

6

Complex-gate implementation

Device

d

Data Transceiver Bus

dsr dtack lds ldtack csc May be not in the gate library and has to be decomposed

slide-7
SLIDE 7

7

Naïve decomposition is hazardous

d dsr dtack lds ldtack csc x lds- d- ldtack- ldtack+ dsr- dtack+ d+ dtack- dsr+ lds+ csc+ csc- Unexpected! Unexpected!

slide-8
SLIDE 8

8

Decompose at the level of STG

d dsr dtack lds ldtack csc dec lds- d- ldtack- ldtack+ dsr- dtack+ d+ dtack- dsr+ lds+ csc+ csc- dec+ dec-

Insert a new signal dec whose implementation is [dec] = ldtack + csc

Multiway acknowledgement

slide-9
SLIDE 9

9

Latch utilisation

d dsr dtack lds ldtack csc d dsr dtack lds ldtack csc

C Only possible because there is no globally reachable state at which dsr=ldtack=0 and csc=1

slide-10
SLIDE 10

10

Logic decomposition algorithm

  • Synthesise the circuit from the STG (several

complex-gate and standard-C implementations are considered for each signal)

  • Heuristically select a non-mappable gate, and

a decomposition of this gate

  • Insert a new signal into the STG for the sub-

function in the selected decomposition

  • Repeat the above steps until all gates are

mappable or no further progress is possible

slide-11
SLIDE 11

11

Function-guided signal insertion

Problem: given a Boolean function F, insert a new signal dec (i.e. a set of new transitions labelled dec+ or dec-) with the implementation [dec]=F into the STG

slide-12
SLIDE 12

12

Transition insertions

Sequential pre-insertion Sequential post-insertion Concurrent insertion

slide-13
SLIDE 13

13

Example: imec-sbuf-ram-write

dec+ dec-

dec Implementation of prbar: (csc2  req)  csc1  wsldin

imec-sbuf-ram-write

req precharged done wsldin wenin prbar wen wsen ack wsld

slide-14
SLIDE 14

14

Generalised transition insertion

s1 s2 s3 d1 d2

sources destinations Sources and destinations are locked

slide-15
SLIDE 15

15

Cost function

Parameterised by the user; takes into account:

  • the delay introduced by the insertion
  • the number of syntactic triggers of all non-

input signals

  • the number of inserted transitions of a signal
  • the number of signals which are not locked

with the newly inserted signal

slide-16
SLIDE 16

16

Overcoming mapping failure

  • Logic decomposition is not guaranteed to

succeed, so tools occasionally fail

  • May need to help the tools:

▪ methods & tricks ▪ “think outside the box” – knowledge of the environment, capacity to redesign the system and its environment ▪ “high-level understanding of the design” – knowing the causal dependencies between the signals, which environment signals are fast/slow (useful for concurrency reduction), etc. ▪ relative timing assumptions

slide-17
SLIDE 17

17

0 Prevention is better than cure

  • Large monolithic STGs are difficult, both for

humans and for tools

  • Hierarchical design:

▪ architectural decomposition into modules ▪ … until each module is small, say ~10 signals (this size is about right for humans* and tools) ▪ Advantages: human- and tool-friendly, more predictable, module re-use (within and between designs), easy to document and maintain, etc.

  • Workcraft has support for hierarchical designs
slide-18
SLIDE 18

18

Example: stage of multiphase buck

slide-19
SLIDE 19

19

1 Expanding gate library

  • Add a missing gate to the library
  • Usually not an option 
slide-20
SLIDE 20

20

2 Inserting a useful signal

  • Tools often fail because:

▪ some heuristic selects a bad sub-function ▪ there is no structural signal insertion to implement a useful sub-function

  • One can help the tool by inserting an internal

signal implementing a useful sub-function

slide-21
SLIDE 21

21

Example: OR5

slide-22
SLIDE 22

22

3.1 Simplifying the STG structure

  • If the STG has complicated structure, it may

be impossible to insert a signal structurally (e.g. one would have to merge and then split some choice branches for that)

  • Try to simplify the STG structure by reducing

the number of choice and merge (i.e. explicit) places, in particular controlled choices can

  • ften be removed
slide-23
SLIDE 23

23

Example: OR5

slide-24
SLIDE 24

24

3.2 STG re-synthesis

  • Re-synthesis builds the state graph and then

derives an equivalent STG from it, often with simpler structure

  • Fully automatic, so easy to try if technology

mapping fails

  • Try various command-line options
slide-25
SLIDE 25

25

4 Concurrency reduction

  • CR does not necessarily decrease

performance – though events are less concurrent, the gates become smaller and some internal signals may become unnecessary

  • CR may change the contract with the

environment and introduce a deadlock or global deterioration of performance that is difficult to debug

slide-26
SLIDE 26

26

Example: xyz

slide-27
SLIDE 27

27

Example: xyz with CR

slide-28
SLIDE 28

28

Example: xyz with more CR

slide-29
SLIDE 29

29

5 Relative timing assumptions

  • Occasionally, the described techniques still

fail to yield a solution

  • Breaking up a large gate yields a non-speed-

independent decomposition

  • The correct operation can then be ensured by

relative timing assumptions

  • This has implications for place&route
  • Easy to make a mistake, need tool support
slide-30
SLIDE 30

30

Example: VME read phase

MaxDelay(x-) < MinDelay(d- → lds-) MaxDelay(x-) < MinDelay(d- → dtack- → dsr+)

slide-31
SLIDE 31

31

Thank you! Any questions?