[PPT] - Synchronizers, Arbiters, GALS and Metastability David Kinniment PowerPoint Presentation

SLIDE 1

TUBS 1 July 2009

1

David Kinniment University of Newcastle, UK

Based on contributions from: Alex Bystrov, Keith Heron, Nikolaos Minas, Gordon Russell, Alex Yakovlev, and Jun Zhou

Synchronizers, Arbiters, GALS and Metastability

SLIDE 2

TUBS 1 July 2009

2

Outline

 What’s the problem? Why does it

matter?

 Time and distributions  Noise – decisions are not

deterministic.

 Synchronizer latency, can it be

avoided?

 Measuring and some interesting

effects

SLIDE 3

TUBS 1 July 2009

3

What’s the problem: The digital IP world and the rest of the world

Your system

The synchronizer is the guy that allows timing flexibility

Everything else,

r Reality

SLIDE 4

TUBS 1 July 2009

4

Synchronizers and arbiters

Your system Input Your system Input 1 Input 2

 Synchronizer

Decides which clock cycle to use for the input data

 Asynchronous arbiter

Decides the order of inputs

SLIDE 5

TUBS 1 July 2009

5

Time Comparison Hardware

 Digital comparison hardware

(which compares integers) is easy

– Fast – Bounded time

 Analog comparison hardware (which compares reals like time)

is hard

– Normally fast, but takes longer as the difference becomes smaller – Can take forever, (Buridan’s Ass ~1340)

 Synchronization and arbitration involve comparison of time  Known to early computer designers:

– Lubkin 1952, Catt 1966 – Chaney and Littlefield 1966/72

SLIDE 6

TUBS 1 July 2009

6

Asynchronous Network (Sparsø, ASYNC 2005)

Sparsø

Synchronization required Multiple Clocks Asynchronous Arbitration required

SLIDE 7

TUBS 1 July 2009

7

Synchronization of data

 Non pausible clocks require data synchronization  You have a limited time to synchronize.  Synchronizer circuits may fail to work in that time  System sometimes fails (you fly into a mountain)  Synchronization time = latency

D Q D Q CLK a VALID #1 #2 CLK b

SLIDE 8

TUBS 1 July 2009

8

Synchronization of clocks

 Clock paused to prevent contention  Wait for MUTEX output to resolve  Can take forever (with decreasing probability)  This may not be acceptable (you fly into a mountain)

C MUTEX Request Grant Clock Ack Run Delay

SLIDE 9

TUBS 1 July 2009

9

Trends

 Communications are limiting performance  Difficulty (Impossibility?) in maintaining one

clock per chip

 Need reduce wasted power in clocks  Timing closure problems

– Interconnect delay times long – Variability increases

 Move towards asynchronous networks

SLIDE 10

TUBS 1 July 2009

10

Metastability is....

Not being able to decide…

Q Q Clock D

∆tin ∆tin -> 0

D Clock

Request Processor Clock Set-up time violated

SLIDE 11

TUBS 1 July 2009

11

Metastability in a Latch

V1 V2 I1 I2 V1 V2

Stable points Metastable Point

V1 V2 I1 V2 V1 I2

SLIDE 12

TUBS 1 July 2009

12

Simple Linear Model

 Simple linear model

leads to two exponentials

 τa is convergent, τb is

divergent

b a

t b t a

e K e K V

τ τ

. .

1

+ =

−

1 1

1 2 2 1 2 1 2 1 2 1

= + + + − τ τ τ τ . . ( ) . ( ). d V dt A dV dt A V

τ τ

1 1 1 2 2 2

= = C R A C R A . , .

Q1 Q2

A*V1

R1 C1 +

V2

V2 V1 V1 V2 V1

R A gm =

SLIDE 13

TUBS 1 July 2009

13

Synchronizer

 t is time allowed for the Q to change between CLK a and CLK b  τ is the recovery time constant, usually the gain-bandwidth of the circuit  Tw is the “metastability window”  τ and Tw depend on the circuit  We assume that all values of ∆tin are equally probable

D Q D Q CLK a VALID #1 #2

d c w t

f f T e MTBF . .

/τ

=

CLK b

SLIDE 14

TUBS 1 July 2009

14

Typical responses

 All starting points are equally probable  Most are a long way from the “balance point”  A few are very close and take a long time to resolve

Clock Q Output

SLIDE 15

TUBS 1 July 2009

15

Event Histogram

Propagation delay

Events The slope is -1/τ Log Probability of event depends on ∆ time Propagation delay Normal delay The intercept is ~Tw

SLIDE 16

TUBS 1 July 2009

16

Synchronizer state of the art

 You require about 35 τ s in order to get the MTBF

ut to about 1 century. (That’s for 1 synchronizer)

 There is nothing else you can do while

synchronizing

 Each typical static gate delay is equivalent to

about 5 τ Synchronizers are analog devices, so worse affected by scaling

 Bigger SoCs, in future systems so more

synchronizers, worse reliability

 Inputs can be ‘malicious’ i.e. always causing

metastability.

SLIDE 17

TUBS 1 July 2009

17

Outline

 What’s the problem? Why does it

matter?

 Time and distributions  Noise – decisions are not

deterministic.

 Synchronizer latency, can it be

avoided?

 Measuring and some interesting

effects

SLIDE 18

TUBS 1 July 2009

18

The arbiter (MUTEX)

Grant 2 Grant 1 Request 1

Gnd

Request 2

Asynchronous arbitration,No time bound
Seitz metastability filter
Grants cannot occur until after the latch resolves any metastability

SLIDE 19

TUBS 1 July 2009

19

Arbitration time

 Unlike a synchronizer, an arbiter may take for ever.  It usually doesn’t, long responses are rare.  On average the time should only be τ longer than the

normal response, so DEAD time ought to be low

 Outputs are always monotonic

Request 1 Request 2 Grant 1 Grant 2 tm

SLIDE 20

TUBS 1 July 2009

20

Gate metastability filter

 Half levels due to metastability need to be removed

– Low (or high) threshold inverters – Measure divergence

 Filters define the time to reach a stable state

Vt =Vdd/4 Vdd/2 Vdd/2 Vdd/2

SLIDE 21

TUBS 1 July 2009

21

Resolution time can be affected

The start/finish points make a difference
Grant appears when trajectory crosses the filter threshold
Gate metastability filter shows more variation
Seitz filter shows more delay

1.1 1.3 1.5 1.7 1.9 2.1 50 100 150 200 250 300 350 400 450 500 ps Volts Low Start 1.55V High Start

SLIDE 22

TUBS 1 July 2009

22

Event distribution

 Assumption: Time

between two events (R1 and R2) is evenly distributed

 Typically it is not.  Can be as much

as 7:1 variation.

 Here there are

more cases where R1 is just after R2

 Arbitration may

not be fair

Distribution of event times

SLIDE 23

TUBS 1 July 2009

23

Non uniform probabilities and time

 Time for metastability averaged over a distribution T in a

MUTEX is:

in T in w

t d t T time Average ∆       ∆ = ∫ . ln _ τ

 If T = ∞, Average_time = τ

BUT For a malicious input, distribution limited by noise

 If T = 4ps, Tw =25ps, Average_time = 4τ  Noise can sometimes make average time faster

SLIDE 24

TUBS 1 July 2009

24

Predicting performance

 MUTEX circuits are fast  BUT delay times may not be easy to predict

– Cannot rely on ALL times being fast

 Synchronizers are known to be unreliable

SLIDE 25

TUBS 1 July 2009

25

Future processes

 Synchronizers and arbiters don’t work well

in nanometre technologies, especially at low Vdd

 Worse that gates! Why?  A gate input is either HIGH

– Output pulled down

 Or LOW

– Output pulled up

 A metastable gate is neither

– Both transistors can be off

 Vdd decreases with process shrink,

– gm very low

 Synchronization time constant τ = C/gm

Vdd Ground

Ids

Vdd

Ids

Ground

Ids Ids

Vdd/2

SLIDE 26

TUBS 1 July 2009

26

Robust synchronizer

Data Reset Clock Vdd 0V

Jamb latch synchronizer

slow for low Vdd, low temp

In metastability, both
utputs are the same
Extra p-types are switched

fully on, so gm increases, and τ improves

SLIDE 27

TUBS 1 July 2009

27

Results

τ (metastability time constant) vs Vdd

Measurement Results (ps) Vdd(v) Jamb Latch B Robust Synchronizer 1.8 35.55 34.92 1.7 37.29 35.76 1.6 40.93 38.25 1.5 52.36 43.07 1.4 66.17 50.36 1.35 75.35 58.19

SLIDE 28

TUBS 1 July 2009

28

Outline

 What’s the problem? Why does it

matter?

 Time and distributions  Noise – decisions are not

deterministic.

 Synchronizer latency, can it be

avoided

 Measuring and some interesting

effects

SLIDE 29

TUBS 1 July 2009

29

Does noise affect τ ?

 Probability of escape from metastability does not

change with gaussian noise (Couranz and Wann 1975)

Trajectories

0.7
0.5
0.3
0.1

0.1 0.3 0.5 0.7 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time Volts

SLIDE 30

TUBS 1 July 2009

30

Does noise affect τ ?

 Probability of escape from metastability does not

change with gaussian noise (Couranz and Wann 1975)

Trajectories

0.7
0.5
0.3
0.1

0.1 0.3 0.5 0.7 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time Volts

SLIDE 31

TUBS 1 July 2009

31

Can noise change the failure rate?

Or maybe not..…

Q Q Clock D

∆tin ∆tin -> 0

D Clock

SLIDE 32

TUBS 1 July 2009

32

The normal case

Probability of initial difference due to noise component P1(v) Probability of initial difference due to input clock data overlap P0(v) Convolution Result of convolution P(v)

Probability

T >> tn tn

Time

SLIDE 33

TUBS 1 July 2009

33

The malicious input

Probability of initial difference due to noise component P1(v) Probability of initial difference with zero input clock data overlap P0(v) Result of convolution P(v)

Probability

T << tn tn

Time

SLIDE 34

TUBS 1 July 2009

34

Non-determinism

 Synchronizers and arbiters can produce non-

deterministic outcomes

 Some noise is deterministic (5-50ps)

– Power supply, crosstalk

 Some noise is non-deterministic (typically 0.1ps -

0.5ps).

– Thermal noise, which increases as dimensions reduce.

 Sequence and time of an individual computation

paths is unpredictable

 System performance can only be predicted

probabilistically

SLIDE 35

TUBS 1 July 2009

35

Outline

 What’s the problem? Why does it

matter?

 Time and distributions  Noise – decisions are not

deterministic.

 Synchronizer latency, can it be

avoided?

 Measuring and some interesting

effects

SLIDE 36

TUBS 1 July 2009

36

Request and Acknowledge

D Q D Q Read Clocks REQ D Q D Q Write Clocks Data Available Read done ACK

DATA

SLIDE 37

TUBS 1 July 2009

37

Latency

 It takes one - two receive clocks to

synchronise the request

 Then one – two write clocks to acknowledge it  Significant latency (1-3 clocks)  Poor data rate (2 – 6 Clocks)

SLIDE 38

TUBS 1 July 2009

38

FIFO

 Can improve data rate by using a FIFO  But not latency (which gets worse)  FIFO is asynchronous (usually RAM + read and write

pointers)

D Q D Q Read Clock 2 Data Available WRITE FIFO D Q D Q Write clock 1 Write Data Read done Free to write Full Not Empty READ

DATA DATA

Write clock 2 Read Clock 1

SLIDE 39

TUBS 1 July 2009

39

Timing regions can have predictable relationships

 Phase locked (mesochronous)

– Timing of input data is constant, therefore predictable

 Same or related frequencies but phase

difference can drift in an unbounded manner. (plesiochronous)

– Timing is not constant but is still predictable

 Unrelated frequencies (Heterochronous)

– No assumptions about timing can be made – Need a synchronizer

SLIDE 40

TUBS 1 July 2009

40

Don’t synchronise when you don’t need to

 If the two clocks are locked together, you don’t need

a synchroniser, just an asynchronous FIFO big enough to accommodate any jitter/skew

 FIFO must never overflow/underflow, so there is

latency

REQ IN Write Data Available Read done ACK IN REQ OUT ACK OUT FIFO

DATA DATA

SLIDE 41

TUBS 1 July 2009

41

Mesochronous data exchange

 Intermediate X register used to retime data  Need to find a place where write data is stable, and read register

available. There is always a place which can be found at start up

– Chakraborty and Greenstreet ASYNC 2003

Controller DATA In DATA Out Write Clock Read Clock R W X

SLIDE 42

TUBS 1 July 2009

42

Clock delay synchronizer (Ginosar AINT 2000)

conflict region

DATA REG RCLK tKO 1 conflict detector WCLK RCLK d d tKO SYNC

SLIDE 43

TUBS 1 July 2009

43

Delay synchronizer latency

 Nominally 0 – 1 clock cycle  Relies on accurately predicting conflicts  Clocks must remain stable over

synchronisation time.

 Always lose tko of next computation stage  Alternative: shift all conflicts to next read

cycle

– On average this loses 2d – 2d must be big enough to cover any clock drift/jitter over synchronization time

SLIDE 44

TUBS 1 July 2009

44

Speculation

 Mostly, the synchronizer does not need 30τ to

settle

 Only e-13 (0.00023%) need more than 13τ  Why not go ahead anyway, and try again if

more time was needed

SLIDE 45

TUBS 1 July 2009

45

Low latency synchronization

 Data Available, or Free to write are produced early.  If they prove to be in error, synchronization failed.  Read Fail or Write Fail flag is then raised and the

action can be repeated.

Read Fail Data Available WRITE FIFO Write Fail Write Data Read done Free to write Full Not Empty READ

DATA DATA

Write clock Read Clock

Speculative synchronizer Speculative synchronizer

SLIDE 46

TUBS 1 July 2009

46

When to recover

1. Early Half Cycle - 2τ 2. Speculative Half Cycle Final End of Cycle Fail 1 & 2 different End of Cycle Comment

? ? metastable? metastable? Unrecoverable error, Probability low. No data was available 1 1 1 Stable at the end of the cycle, but the speculative output may have been metastable. Fail 1 1 1 Normal data Transfer

Early Data Available is set after a half cycle – 2 inverter delays Speculative Data Available after a half cycle If these two are different at end of cycle, set Fail

SLIDE 47

TUBS 1 July 2009

47

Synchronizer latency reduction

1.

Only have one clock for the whole system

2.

Use clocks with a predictable relationship

3.

Speculate

4.

Synchronise at the start of a burst transfer, the data rate is predictable during the burst

SLIDE 48

TUBS 1 July 2009

48

Outline

 What’s the problem? Why does it

matter?

 Time and distributions  Noise – decisions are not

deterministic.

 Synchronizer latency, can it be

avoided?

 Measuring and some interesting

effects

SLIDE 49

TUBS 1 July 2009

49

What we know about metastability

 Things we know

– Synchronizers are unreliable, the more there are the more unreliable the system – How to measure reliability up to a few hours

 Things we know we don’t know

– What reliability is at 3 years – How to measure it – Complex circuits give complex results, the simple MTBF formula may not apply

 Things we don’t know we don’t know

– What happens on the back edge of the clock

SLIDE 50

TUBS 1 July 2009

50

74F5074 Histogram

 Slope, τ, is about 120ps (in fast region)  Typical delay time (most events) is 4ns  99.9% of clock cycles do not cause useful events  To get 1 event at 7ns requires hours

4ns
7ns

SLIDE 51

TUBS 1 July 2009

51

10 MHz

Test FF

D Q

Integrator

Variable Delay

Slave FF

D Q

Increasing the number of events

 Test FF is driven to metastability  Every clock produces a metastable response  Integrator ensures half outputs high, half low

SLIDE 52

TUBS 1 July 2009

52

What you get

 Clock to D (Input)

histogram

 Q to Clock (Output)

histogram

200ps 3ns

SLIDE 53

TUBS 1 July 2009

53

Total input events normalized 0.2 0.4 0.6 0.8 1 200 250 300 350 Input time, ps Total output events normalized 0.0 0.2 0.4 0.6 0.8 1.0 3.50 4.50 5.50 Output time, ns

Interpreting results

5000 10000 15000 20000 25000 30000 35000 40000 50 150 250 350 D to Clock, ps Events

Input time distribution is not flat Proportion of total inputs causing events vs input time Mapping output times to input times Proportion of total output events vs output time

0 < Balance point > 1

SLIDE 54

TUBS 1 July 2009

54

Results

 ∆t is the time from the “balance

point” of ~200ps

 Similar to original graph BUT not

events

 Orders of magnitude quicker to

gather data

 Reliability for days not minutes  Only one oscillator, so no

distribution issues

 ∆t does not depend on fc and fd or

measurement time. Events do

1.00E-17 1.00E-16 1.00E-15 1.00E-14 1.00E-13 1.00E-12 1.00E-11 1.00E-10 1.00E-09 3.00 5.00 7.00 Q time, ns Delta t d c t

f f MTBF ∆ = 1

τ t w t

e T

−

= ∆

SLIDE 55

TUBS 1 July 2009

55

When the clock goes low

 Clock goes high, master goes

metastable

1E-18 1E-17 1E-16 1E-15 1E-14 1E-13 5.0 6.0 7.0 8.0 9.0 10.0 ns Delta t No Back Edge 4.5 Back Edge 5.5 Back Edge

D Q S

Slave latch

D Q M

Master Latch Clock Clock Inverse Clock

D Q M

Back edge of clock causes increased delay

 Master output arrives at slave

– Before slave clock high: transparent gate delay td – As slave clock goes high: metastable, slightly longer delay

SLIDE 56

TUBS 1 July 2009

56

Effect of clock low on 74F5074

 1 – 3 ns additional delay

1.00E-21 1.00E-20 1.00E-19 1.00E-18 1.00E-17 1.00E-16 1.00E-15 1.00E-14 1.00E-13 1.00E-12 1.00E-11 5.00E- 09 6.00E- 09 7.00E- 09 8.00E- 09 9.00E- 09 1.00E- 08 1.10E- 08 1.20E- 08 Output time Input time 5ns pulse 4ns pulse No back edge

6 ns pulse 4 ns pulse

SLIDE 57

TUBS 1 July 2009

57

Measurement results

 Reliability measurements extended from – 10-15 s or MTBF = 16 min at 10MHz, to – 10-22 s or MTBF = 3 years  We can see variations in τ not previously seen  Measurement is statistical, not affected by noise  Not affected by oscillator linking  Back edge of clock pulse is seen to be an important

effect, can be 0 – 15τ

SLIDE 58

TUBS 1 July 2009

58

Conclusions

 Synchronization/arbitration requires special circuit

elements

 They’re not digital!  Well known models of synchronizers and arbiters

exist

 Design gets more difficult with small dimensions  Synchronizers and arbiters are not deterministic.  Both circuits and systems may not conform to

idealized models

 Differences can lead to poor reliability, unfair