Clocking & Timing Asynchronous Self Timed Design Self Timed - - PowerPoint PPT Presentation

clocking timing
SMART_READER_LITE
LIVE PREVIEW

Clocking & Timing Asynchronous Self Timed Design Self Timed - - PowerPoint PPT Presentation

Advanced Digital IC-Design Overview Synchronous Clocking & Timing Asynchronous Self Timed Design Self Timed Design Synchronous Circuit Asynchronous Circuit Req Req Req Handshake Handshake A k Ack Ack A k Ack REG REG REG OUT


slide-1
SLIDE 1

1 Advanced Digital IC-Design

Clocking & Timing

Overview

Synchronous Asynchronous Self Timed Design Self Timed Design

Synchronous Circuit

CLK

REG

Logic

D Q

REG

Logic

D Q

REG

D Q

IN OUT

Global synchronization Clock period > Max Delay (t logic + t R)

Asynchronous Circuit

Req A k

Handshake Handshake

Req A k Req

REG

Logic

D Q

REG

Logic

D Q

IN OUT Ack Go Done Ack Ack Go Done

Local synchronization (handshaking): Request Acknowledge

slide-2
SLIDE 2

2

Globally Async Locally Sync (GALS)

Clocked Domain

Local

REG

Logic

D Q

REG

D Q

IN

Interface Interface

OUT Req Ack Req Ack

Clocked Domain

Local Clock

Asynchronous Environment

Synchronous Design

The purpose of the clock is to The purpose of the clock is to

Synchronize the registers on the chip with each other Synchronize the registers on the chip with the external world

Clock skew is a large problem

Sequential Logic

Registers Latches

Combinational

Latches Flip-flops

Logic State Comb. Logic State Comb. Logic State Comb. Logic State

Latch versus Register

Latch stores data when clock is low (high) Flip-Flop (or Register) stores data when clock rises (falls) Register: Edge Triggered Latch: Level Sensitive

Clk

D Clk Q

Latch

Clk

Register

D Clk Q

D Q D Q

Q on Clock Edge Q on Data

slide-3
SLIDE 3

3

Clock Non-Idealities

Clock skew

Spatial a iation in tempo all eq i alent Spatial variation in temporally equivalent clock edges

Clock jitter

Temporal variations in consecutive edges of the clock signal the clock signal

Clock Non-Idealities

Both skew and jitter affects the cycle time Skew might lead to race through the registers Same clock at two

tskew tjitter

different locations on the chip

Clock Non-Idealities - Feedthrough

Clock feedthrough

VDD

Φ

0,5 1,5 2,5

Q

A C B Q

Φ

  • 0,5

0,5 1

Time, ns

Clock feedthrough

Φ

Coupling in dynamic devices can lift the output

Example – Clock System

VDD (Always on)

Module 1 Global Clock Module 2

Data De- skew Enable 2 Enable 3 Phase Locked Loop

N

System Clock

CLK

f

SYS CLK

N f f M = ×

Local Clock Signals Module 3

N M On-Chip Clock Generation Clock Gating Clocked Modules

slide-4
SLIDE 4

4

Synchronous Pipelined Datapath

In tpd,reg tpd1 D R1 Q CLK Logic Block #1 tpd2 D R2 Q Logic Block #2 tpd3 D R3 Q D R4 Q Logic Block #3 Delay Delay Delay

The delay give clock skew

Clock Skew

10x10 mm Chip

Absolute Skew Relative Skew

Example: 15 mm wire C = 300 fF R = 4 kΩ tpHL = 0.69RC = 0.8 ns

Δ L = 15mm

CLK

“Max frequency”

1 1 600 MHz 2 2 0.8 ns

pHL

t = = ×

Clock Skew

Negative Skew Out

REG

Abs

REG

Log

REG REG

In

Clock and data routing

R

Positive Skew CLK

Setup- and Hold-times

Data bus tjitter thold Clock line tjitter tsetup

jitter hold jitter setup

slide-5
SLIDE 5

5

Clock Skew

Ext. CLK

φ1

CLK

φ2 φ1

Large skew require large non-overlap

φ1 φ2

General Clock Distribution Tree

R t Branches Leaves

Clock Source

Root Trunk

Have a large relative skew Balanced Clock Net

Distributed Buffers

All wires and buffers are carefully balanced

Clock

Clock Distribution: H-Tree Small relative skew Absolute skew of less importance less importance

Clock

slide-6
SLIDE 6

6

Clock Distribution: H-Tree

Realistic H-Tree

IBM G4 Processor

A balanced H- tree structure Achieves a skew control of ± 25 ps

Symmetric Clock Distribution Networks

H-tree X-tree

Distributed Buffers Small relative skew Absolute skew of less importance less importance

Clock

slide-7
SLIDE 7

7

Clock Grid

Low impedance interconnect

Clock

Power Hungry

Clock Deskewing

Clock

Ph Deskew Control Delay Line Delay Line Phase Det.

Clock Ring

Clock

AVG AVG AVG AVG AVG AVG AVG AVG AVG AVG

Local Clocks

A A AVG AVG AVG

Example: Alpha 21164 (0.55um)

Clock Frequency 300MHz Transistors 10 Million Total Clock Load 3.75nF Clock Power 20W (out of 50W) Clock Levels 2 Driver Size 58cm Driver Size 58cm Clock Grid TSPC

slide-8
SLIDE 8

8

Example: Alpha 21164

Clock Drivers

Example: Alpha 21164

600 MHz Alpha “Hybrid” Four clock grids under a balanced clock net

Clock

Relative Skew 72ps

600 MHz Alpha

slide-9
SLIDE 9

9

Skew Analysis - Example

R1

M U X

L L L R2 R3 L L L lk

  • a. Determine the minimum clock period time if clock skew is disregarded.
  • b. Determine the minimum clock period time if there is 1ns positive clock skew between

adjacent registers.

  • c. Determine the minimum clock period time if there is 3ns positive clock skew between

adjacent registers.

  • d. Calculate the maximum “clock skew” for the datapath, both positive and negative if the

clk Positive "clock skew"

  • d. Calculate the maximum clock skew for the datapath, both positive and negative if the

clock signal has a period of 16ns. Register R setup time t S 0.5 ns Register R delay time t R 0.5 ns Logic L delay time t L 3.0 ns Mux delay time t M 1.0 ns

R1

M U X

L L L R2 R3 L L L lk

  • a. Determine the

minimum clock period time if clock

Skew Analysis - Example

Answer: The minimum clock period time is 10 ns clk Positive "clock skew" skew is disregarded R2 to R3, t R+ 3t L+ t S = 0.5+ 3* 3.0+ 0.5 = 10ns R2 to R2, t R+ 2t L+ t m+ t S = 0.5+ 2* 3.0+ 1.0+ 0.5 = 8ns R1 to R2, t R+ 2t L+ t m+ t S = 0.5+ 2* 3.0+ 1.0+ 0.5 = 8ns Answer: The minimum clock period time is 10 ns

Register R setup time t S 0.5 ns Register R delay time t R 0.5 ns Logic L delay time t L 3.0 ns Mux delay time t M 1.0 ns

R1

M U X

L L L R2 R3 L L L lk

  • b. Determine the

minimum clock period time if there is 1ns positive clock skew

Skew Analysis - Example

R2 to R3, t R+ 3t L+ t S-t SKEW = 0.5+ 3* 3.0+ 0.5-1 = 9ns Answer: The minimum clock period time is 9 ns clk Positive "clock skew" between adjacent registers.

Register R setup time t S 0.5 ns Register R delay time t R 0.5 ns Logic L delay time t L 3.0 ns Mux delay time t M 1.0 ns

R1

M U X

L L L R2 R3 L L L lk

  • c. Determine the

minimum clock period time if there is 3ns positive clock skew

Skew Analysis - Example

R2 to R2, t R+ 2t L+ t m+ t S = 0.5+ 2* 3.0+ 1.0+ 0.5 = 8ns (No skew in feedback) clk Positive "clock skew" between adjacent registers

Register R setup time t S 0.5 ns Register R delay time t R 0.5 ns Logic L delay time t L 3.0 ns Mux delay time t M 1.0 ns

slide-10
SLIDE 10

10

R1

M U X

L L L R2 R3 L L L lk

  • d. Calculate the

maximum “clock skew” for the datapath, both positive and negative if

Skew Analysis - Example

Negative skew - R2 to R3, 16-t R+ 3t L+ t S = 16-0.5-3* 3.0-0.5 = 6ns (6 ns for clk to R2 plus 10 ns for signal through logic) Positive skew - R1 to R2, t R+ t L+ t m+ t S = 0.5+ 3.0+ 1.0+ 0.5 = 5ns clk Positive "clock skew" the clock signal has a period of 16ns (R2 must close before signal arrives)

Register R setup time t S 0.5 ns Register R delay time t R 0.5 ns Logic L delay time t L 3.0 ns Mux delay time t M 1.0 ns

From asynchronous domains or From synchronous domains with different clock

Synchronizing Signals (Metastability)

Asynchronous Asynchronous system system synchronous synchronous t

periods

system system system system synchronization synchronization

Metastable state: possible output from a flip-flop

Synchronizing Signals (Metastability)

Aperture window t res is important t res Many designers are not aware of metastability important for MTBF Can occur if the setup t SU, hold time t H, or clock pulse width t PW of a flip-flop is not met

Synchronizing Signals (Metastability)

D Q DATA IN CLK Q1 DATA IN tW tSU tres tCO CLK Q1 “1” “0”

tW = Time window where input transition may cause a metastable condition tSU = Actual clock setup time for flip-flop tCO = Actual flip-flop propagation delay tres = Metastability resolution time

slide-11
SLIDE 11

11

Metastability

2 1 res CLK DATA

K t K f f

MTBF e

× × ×

=

Mean Time Between Failure (MTBF) is exponential t res is the slack time available for settling K and K are constants that are

MTBF e =

K1 and K2 are constants that are characteristics of the flip-flop fCLK and fDATA are the frequency of the synchronizing clock and asynchronous data MTBF variations due to the metastability resolution time t res

11

Metastability

2 1 res CLK DATA

K t K f f

MTBF e

× × ×

=

10

5

10

6

10

7

10

8

10

9

10

10

10

11

MTBF (seconds) 1000 years 1 year 1 month 1 day i tres (ns) 10

1

10

2

10

3

10

4

2 4 6 8 10 1 hour fDATA = 1 MHz FCLOCK = 10 MHz ACTEL ACT 1 Devices

t res = available slack time (ns)

D D FF1 FF2 Asynchronous input

Synchronizer

Q1 Q2 D Q D Q

CLK Da Ds Synchronized signal

Q1 Q2

A5

Global low-skew clock

If D is in the aperture time (setup+ hold) of the flip flop – Q1 is uncertain However, FF2 might have registered a proper data before Much higher probability for a stable Q2 than Q1

Synchronizer

CLK D Timing Violation

Q1

Asynchronous input

D

Q2

Leads to Metastability Correct in next register if Q1 have become stable D Q D Q

CLK Da Ds FF1 FF2 Synchronized signal

Q1 Q2

D CLK

slide-12
SLIDE 12

12

Synchronous - Asynchronous

Synchronous Clock skew Worst case delay sets the speed Asynchronous Non-trivial design task due to race S l i Solution Self-timed design?

Why Asynchronous Circuits?

Common arguments:

Low power - Maybe Low power Maybe High speed - Sometimes Low emission - Yes Low sensitivity to Process, Voltage, and Temperature variations - Yes N l k di t ib ti d ti i bl Y No clock distribution and timing problems – Yes No clock skew problems - Yes Less interference to analog domain - Yes

Drawbacks - Asynchronous Design Increased complexity and design-time Poor support from design tools Circuit overhead compared to synchronous

100% is not unusual 100% is not unusual

Metastability, deadlock, and race hazards Motivation Asynchron design Supply current d

Synchron

in two designs Asynchron designs are more noise

Asynchron

more noise robust

slide-13
SLIDE 13

13

Noise in Supply Plane

Synchron DSP Asynchron DSP

Source: James Awad, Octasic Semiconductor

Asynchronous Modules

logic logic handshake handshake go done go done req req req data data data ack ack ack

The most Basic Protocol

1.The sender issues a request 2.The receiver replies by an acknowledge 3.Then the sender sends the data

Module 1 Module 2

  • 1. Req
  • 2. Ack
  • 3. Data

If the sender initiates the data transfer The transfer channel is a push-channel If the receiver initiates the transfer The channel is a pull-channel

n

The Two-Phase Protocol

  • 1. The sender establish stable data

On both raising and falling edges

(No return-to-zero

  • 2. The sender produces a request

(No return-to-zero transitions)

  • 3. The receiver absorbs data and

produces an acknowledge 1 Data 2 3 Req Ack Cycle 1 Cycle 2

slide-14
SLIDE 14

14

The Four-Phase Protocol

1. The sender issues data and sets Req to high 2. The receiver absorbs the data and sets Ack to high 3 The sender responds by setting Req to low

Return-to- zero transitions

3. The sender responds by setting Req to low 4. The receiver acknowledges by setting Ack to low 1 Data 3 2 Req Ack Cycle 1 Cycle 2 3 4

The Muller-C Element

A B Q

Q A R S

Static

1 Q 1 Q 1 1 1 A

B R A

VDD

B

Dynamic

Q B

C

Q B B A

Two-Phase Handshake Protocol

Implementation using Muller-C elements

A B Q 1 Q 1 Q 1 1 1

Data ready

C

Sender Logic

R

Receiver Logic

Data accepted

Data n

0 1 1 Q 0 Q

Q

1 1 1

Req Ack

1 1 0 1

Q

Four-Phase Handshake Protocol

Implementation using Muller-C elements

Data Data ready Data accepted Req S C C Sender logic Receiver logic Handshake logic Ack

slide-15
SLIDE 15

15

Clocking & Timing

Advanced Digital IC-Design

Clocking & Timing Cont Cont.

Student Lectures

Send your slides to me, latest the night before your presentation Preferred format - .ppt You will be evaluated by your fiends Please look at the template: Please look at the template:

http: / / www.eit.lth.se/ course/ eti135 -> Presentations

Home Exercises

Solutions to 4 hand-in Solutions to 4 hand in assignments are required, see

http: / / www.eit.lth.se/ course/ eti135 -> Home Exercises

Deadline: March 8 Invited Lecture

Advanced Digital IC Design

Static tim ing analysis 11/ 02, 15.15-17.00 Design for test is canceled es g

  • test s ca ce ed
slide-16
SLIDE 16

16

Circuit Implementation Styles

  • Four-phase bundled-data – which most closely resembles the

design of synchronous circuits and which normally leads to the g y y most efficient circuits, due to the extensive use of timing assumptions (example: Amulet 2 processor).

  • Two-phase bundled-data – known also as micropipelines and

introduced by Ivan Sutherland in his 1988 Turing Award lecture (example: Amulet 1 processor)

  • Four-phase dual-rail – the classic approach introduced by

Muller’s pioneering work in the 1950s.

  • Two-phase dual-rail – such as Level-Encoded two-phase Dual-

Rail scheme (LEDR).

2-Phase Protocol Example

From [Horowitz]

Example

slide-17
SLIDE 17

17

Example Example Completion Signal Generation

B0

Start

PDN PDN

Dual

A&A B&B

Start

B1

B Dual Rail Used in

Phase B0 B1 B Comment Precharge Not Done Evaluation 1 1 Done Evaluation 1 1 Done Illegal 1 1

  • Illegal

Used in self-timed modules

Self-Timed Pipelining

In R1 F1 Hand Shake

Start Done

Ack Req

R2 F2 Hand Shake

Start Done

Ack Req

R3 F3 Hand Shake

Start Done

Ack Req

Out tp1 tp3 tp2

slide-18
SLIDE 18

18

Delay Model

Hand Shake

Ack Req

Hand Shake

Ack Req

Shake

Start Done

Ack

Shake

Start Done

Ack

Delay Model

e.g. Critical Path

Delay Model

e.g. Critical Path

In R1 F1 R2 F2 Out

Delay Matched Completion Detection

Delay replicas matched Delay replicas matched to critical paths Worst-case delay Sensitive to process variations Small circuit overhead

Combined Methods

Hand

Req

Hand

Req

Hand

Req

In R1 Self- timed Hand Shake

Start Done

Ack

R2

Delay- Model

Hand Shake

Start Done

Ack

R3 Self- timed Hand Shake

Start Done

Ack

Out tp1 tp3 tp2

Completion Detection

Done C C C

Dual Rail Logic Waits for all parts to be ready

slide-19
SLIDE 19

19

Other Asynchron Modules

Linear Pipelines (only one input and output)

F F

Non-Linear Pipelines

F F

Fork Join Conditional Split Conditional Join

F F

Synchronous – Asynchronous

Global Synchronous Asynchronous racticed

GALS

Traditional

Globally Asynchronous Locally Synchronous

Local Pr

Divide into smaller synchronous blocks Clocking becomes less troublesome for small clock domains

Globally Asynchronous Locally Synchronous to Avoid Skew

Input reference Input delay state

Synchronous - Asynchronous

p

Digitally Controlled Oscillator Delay control counter

Cycle counter up/down Multiplication factor

Local synchronous clock generation

factor Output clock

slide-20
SLIDE 20

20

Higher On-Chip Frequency Clock Generation (PLL)

Phase Detector VCO

Voltage-contr.

  • scillator

Loop Filter

Off-Chip Clock On-Chip Clock

Divider

PLL (AXIS)