TUBS 1 July 2009
1
David Kinniment University of Newcastle, UK
Based on contributions from: Alex Bystrov, Keith Heron, Nikolaos Minas, Gordon Russell, Alex Yakovlev, and Jun Zhou
Synchronizers, Arbiters, GALS and Metastability David Kinniment - - PowerPoint PPT Presentation
Synchronizers, Arbiters, GALS and Metastability David Kinniment University of Newcastle, UK Based on contributions from: Alex Bystrov, Keith Heron, Nikolaos Minas, Gordon Russell, Alex Yakovlev, and Jun Zhou 1 TUBS 1 July 2009 Outline
TUBS 1 July 2009
1
Based on contributions from: Alex Bystrov, Keith Heron, Nikolaos Minas, Gordon Russell, Alex Yakovlev, and Jun Zhou
TUBS 1 July 2009
2
What’s the problem? Why does it
Time and distributions Noise – decisions are not
Synchronizer latency, can it be
Measuring and some interesting
TUBS 1 July 2009
3
TUBS 1 July 2009
4
Your system Input Your system Input 1 Input 2
Synchronizer
Decides which clock cycle to use for the input data
Asynchronous arbiter
Decides the order of inputs
TUBS 1 July 2009
5
Digital comparison hardware
(which compares integers) is easy
– Fast – Bounded time
Analog comparison hardware (which compares reals like time)
is hard
– Normally fast, but takes longer as the difference becomes smaller – Can take forever, (Buridan’s Ass ~1340)
Synchronization and arbitration involve comparison of time Known to early computer designers:
– Lubkin 1952, Catt 1966 – Chaney and Littlefield 1966/72
TUBS 1 July 2009
6
Sparsø
Synchronization required Multiple Clocks Asynchronous Arbitration required
TUBS 1 July 2009
7
Non pausible clocks require data synchronization You have a limited time to synchronize. Synchronizer circuits may fail to work in that time System sometimes fails (you fly into a mountain) Synchronization time = latency
D Q D Q CLK a VALID #1 #2 CLK b
TUBS 1 July 2009
8
Clock paused to prevent contention Wait for MUTEX output to resolve Can take forever (with decreasing probability) This may not be acceptable (you fly into a mountain)
C MUTEX Request Grant Clock Ack Run Delay
TUBS 1 July 2009
9
Communications are limiting performance Difficulty (Impossibility?) in maintaining one
Need reduce wasted power in clocks Timing closure problems
Move towards asynchronous networks
TUBS 1 July 2009
10
Q Q Clock D
∆tin ∆tin -> 0
D Clock
Request Processor Clock Set-up time violated
TUBS 1 July 2009
11
V1 V2 I1 I2 V1 V2
V1 V2 I1 V2 V1 I2
TUBS 1 July 2009
12
Simple linear model
τa is convergent, τb is
b a
t b t a
τ τ
1
−
1 1
1 2 2 1 2 1 2 1 2 1
= + + + − τ τ τ τ . . ( ) . ( ). d V dt A dV dt A V
τ τ
1 1 1 2 2 2
= = C R A C R A . , .
Q1 Q2
R1 C1 +
V2 V1 V1 V2 V1
R A gm =
TUBS 1 July 2009
13
t is time allowed for the Q to change between CLK a and CLK b τ is the recovery time constant, usually the gain-bandwidth of the circuit Tw is the “metastability window” τ and Tw depend on the circuit We assume that all values of ∆tin are equally probable
D Q D Q CLK a VALID #1 #2
CLK b
TUBS 1 July 2009
14
All starting points are equally probable Most are a long way from the “balance point” A few are very close and take a long time to resolve
TUBS 1 July 2009
15
Events The slope is -1/τ Log Probability of event depends on ∆ time Propagation delay Normal delay The intercept is ~Tw
TUBS 1 July 2009
16
You require about 35 τ s in order to get the MTBF
There is nothing else you can do while
Each typical static gate delay is equivalent to
Bigger SoCs, in future systems so more
Inputs can be ‘malicious’ i.e. always causing
TUBS 1 July 2009
17
What’s the problem? Why does it
Time and distributions Noise – decisions are not
Synchronizer latency, can it be
Measuring and some interesting
TUBS 1 July 2009
18
Grant 2 Grant 1 Request 1
Gnd
Request 2
TUBS 1 July 2009
19
Unlike a synchronizer, an arbiter may take for ever. It usually doesn’t, long responses are rare. On average the time should only be τ longer than the
Outputs are always monotonic
Request 1 Request 2 Grant 1 Grant 2 tm
TUBS 1 July 2009
20
Half levels due to metastability need to be removed
– Low (or high) threshold inverters – Measure divergence
Filters define the time to reach a stable state
Vt =Vdd/4 Vdd/2 Vdd/2 Vdd/2
TUBS 1 July 2009
21
1.1 1.3 1.5 1.7 1.9 2.1 50 100 150 200 250 300 350 400 450 500 ps Volts Low Start 1.55V High Start
TUBS 1 July 2009
22
Assumption: Time
Typically it is not. Can be as much
Here there are
Arbitration may
Distribution of event times
TUBS 1 July 2009
23
Time for metastability averaged over a distribution T in a
MUTEX is:
in T in w
If T = ∞, Average_time = τ
If T = 4ps, Tw =25ps, Average_time = 4τ Noise can sometimes make average time faster
TUBS 1 July 2009
24
MUTEX circuits are fast BUT delay times may not be easy to predict
Synchronizers are known to be unreliable
TUBS 1 July 2009
25
Synchronizers and arbiters don’t work well
in nanometre technologies, especially at low Vdd
Worse that gates! Why? A gate input is either HIGH
– Output pulled down
Or LOW
– Output pulled up
A metastable gate is neither
– Both transistors can be off
Vdd decreases with process shrink,
– gm very low
Synchronization time constant τ = C/gm
Vdd Ground
Ids
Vdd
Ids
Ground
Ids Ids
Vdd/2
TUBS 1 July 2009
26
Data Reset Clock Vdd 0V
TUBS 1 July 2009
27
Measurement Results (ps) Vdd(v) Jamb Latch B Robust Synchronizer 1.8 35.55 34.92 1.7 37.29 35.76 1.6 40.93 38.25 1.5 52.36 43.07 1.4 66.17 50.36 1.35 75.35 58.19
TUBS 1 July 2009
28
What’s the problem? Why does it
Time and distributions Noise – decisions are not
Synchronizer latency, can it be
Measuring and some interesting
TUBS 1 July 2009
29
Probability of escape from metastability does not
Trajectories
0.1 0.3 0.5 0.7 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time Volts
TUBS 1 July 2009
30
Probability of escape from metastability does not
Trajectories
0.1 0.3 0.5 0.7 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time Volts
TUBS 1 July 2009
31
Q Q Clock D
∆tin ∆tin -> 0
D Clock
TUBS 1 July 2009
32
Probability of initial difference due to noise component P1(v) Probability of initial difference due to input clock data overlap P0(v) Convolution Result of convolution P(v)
Probability
T >> tn tn
Time
TUBS 1 July 2009
33
Probability of initial difference due to noise component P1(v) Probability of initial difference with zero input clock data overlap P0(v) Result of convolution P(v)
Probability
T << tn tn
Time
TUBS 1 July 2009
34
Synchronizers and arbiters can produce non-
Some noise is deterministic (5-50ps)
– Power supply, crosstalk
Some noise is non-deterministic (typically 0.1ps -
– Thermal noise, which increases as dimensions reduce.
Sequence and time of an individual computation
System performance can only be predicted
TUBS 1 July 2009
35
What’s the problem? Why does it
Time and distributions Noise – decisions are not
Synchronizer latency, can it be
Measuring and some interesting
TUBS 1 July 2009
36
D Q D Q Read Clocks REQ D Q D Q Write Clocks Data Available Read done ACK
DATA
TUBS 1 July 2009
37
It takes one - two receive clocks to
Then one – two write clocks to acknowledge it Significant latency (1-3 clocks) Poor data rate (2 – 6 Clocks)
TUBS 1 July 2009
38
Can improve data rate by using a FIFO But not latency (which gets worse) FIFO is asynchronous (usually RAM + read and write
D Q D Q Read Clock 2 Data Available WRITE FIFO D Q D Q Write clock 1 Write Data Read done Free to write Full Not Empty READ
DATA DATA
Write clock 2 Read Clock 1
TUBS 1 July 2009
39
Phase locked (mesochronous)
Same or related frequencies but phase
Unrelated frequencies (Heterochronous)
TUBS 1 July 2009
40
If the two clocks are locked together, you don’t need
FIFO must never overflow/underflow, so there is
REQ IN Write Data Available Read done ACK IN REQ OUT ACK OUT FIFO
DATA DATA
TUBS 1 July 2009
41
Intermediate X register used to retime data Need to find a place where write data is stable, and read register
– Chakraborty and Greenstreet ASYNC 2003
Controller DATA In DATA Out Write Clock Read Clock R W X
TUBS 1 July 2009
42
DATA REG RCLK tKO 1 conflict detector WCLK RCLK d d tKO SYNC
TUBS 1 July 2009
43
Nominally 0 – 1 clock cycle Relies on accurately predicting conflicts Clocks must remain stable over
Always lose tko of next computation stage Alternative: shift all conflicts to next read
TUBS 1 July 2009
44
Mostly, the synchronizer does not need 30τ to
Only e-13 (0.00023%) need more than 13τ Why not go ahead anyway, and try again if
TUBS 1 July 2009
45
Data Available, or Free to write are produced early. If they prove to be in error, synchronization failed. Read Fail or Write Fail flag is then raised and the
Read Fail Data Available WRITE FIFO Write Fail Write Data Read done Free to write Full Not Empty READ
DATA DATA
Write clock Read Clock
Speculative synchronizer Speculative synchronizer
TUBS 1 July 2009
46
1. Early Half Cycle - 2τ 2. Speculative Half Cycle Final End of Cycle Fail 1 & 2 different End of Cycle Comment
? ? metastable? metastable? Unrecoverable error, Probability low. No data was available 1 1 1 Stable at the end of the cycle, but the speculative output may have been metastable. Fail 1 1 1 Normal data Transfer
TUBS 1 July 2009
47
1.
2.
3.
4.
TUBS 1 July 2009
48
What’s the problem? Why does it
Time and distributions Noise – decisions are not
Synchronizer latency, can it be
Measuring and some interesting
TUBS 1 July 2009
49
Things we know
Things we know we don’t know
Things we don’t know we don’t know
TUBS 1 July 2009
50
Slope, τ, is about 120ps (in fast region) Typical delay time (most events) is 4ns 99.9% of clock cycles do not cause useful events To get 1 event at 7ns requires hours
TUBS 1 July 2009
51
10 MHz
Test FF
D Q
Variable Delay
Slave FF
D Q
Test FF is driven to metastability Every clock produces a metastable response Integrator ensures half outputs high, half low
TUBS 1 July 2009
52
Clock to D (Input)
Q to Clock (Output)
TUBS 1 July 2009
53
Total input events normalized 0.2 0.4 0.6 0.8 1 200 250 300 350 Input time, ps Total output events normalized 0.0 0.2 0.4 0.6 0.8 1.0 3.50 4.50 5.50 Output time, ns
5000 10000 15000 20000 25000 30000 35000 40000 50 150 250 350 D to Clock, ps Events
Input time distribution is not flat Proportion of total inputs causing events vs input time Mapping output times to input times Proportion of total output events vs output time
0 < Balance point > 1
TUBS 1 July 2009
54
∆t is the time from the “balance
point” of ~200ps
Similar to original graph BUT not
events
Orders of magnitude quicker to
gather data
Reliability for days not minutes Only one oscillator, so no
distribution issues
∆t does not depend on fc and fd or
measurement time. Events do
1.00E-17 1.00E-16 1.00E-15 1.00E-14 1.00E-13 1.00E-12 1.00E-11 1.00E-10 1.00E-09 3.00 5.00 7.00 Q time, ns Delta t d c t
τ t w t
−
TUBS 1 July 2009
55
Clock goes high, master goes
1E-18 1E-17 1E-16 1E-15 1E-14 1E-13 5.0 6.0 7.0 8.0 9.0 10.0 ns Delta t No Back Edge 4.5 Back Edge 5.5 Back Edge
D Q S
Slave latch
D Q M
Master Latch Clock Clock Inverse Clock
D Q M
Back edge of clock causes increased delay
Master output arrives at slave
– Before slave clock high: transparent gate delay td – As slave clock goes high: metastable, slightly longer delay
TUBS 1 July 2009
56
1 – 3 ns additional delay
1.00E-21 1.00E-20 1.00E-19 1.00E-18 1.00E-17 1.00E-16 1.00E-15 1.00E-14 1.00E-13 1.00E-12 1.00E-11 5.00E- 09 6.00E- 09 7.00E- 09 8.00E- 09 9.00E- 09 1.00E- 08 1.10E- 08 1.20E- 08 Output time Input time 5ns pulse 4ns pulse No back edge
6 ns pulse 4 ns pulse
TUBS 1 July 2009
57
Reliability measurements extended from – 10-15 s or MTBF = 16 min at 10MHz, to – 10-22 s or MTBF = 3 years We can see variations in τ not previously seen Measurement is statistical, not affected by noise Not affected by oscillator linking Back edge of clock pulse is seen to be an important
TUBS 1 July 2009
58
Synchronization/arbitration requires special circuit
They’re not digital! Well known models of synchronizers and arbiters
Design gets more difficult with small dimensions Synchronizers and arbiters are not deterministic. Both circuits and systems may not conform to
Differences can lead to poor reliability, unfair