The Design of Low-Latency Interfaces for Mixed-Timing Systems - - PowerPoint PPT Presentation

the design of low latency interfaces for mixed timing
SMART_READER_LITE
LIVE PREVIEW

The Design of Low-Latency Interfaces for Mixed-Timing Systems - - PowerPoint PPT Presentation

The Design of Low-Latency Interfaces for Mixed-Timing Systems Tiberiu Chelcea and Steven M. Nowick Department of Computer Science Columbia University IEEE Workshop on Complexity-Effective Design (ISCA) May 26, 2002 Trends and Challenges


slide-1
SLIDE 1

The Design of Low-Latency Interfaces for Mixed-Timing Systems

Tiberiu Chelcea and Steven M. Nowick

Department of Computer Science Columbia University IEEE Workshop on Complexity-Effective Design (ISCA) May 26, 2002

slide-2
SLIDE 2

Trends and Challenges

Trends in Chip Design: next decade

! “Semiconductor Industry Association (SIA) Roadmap” (97-8)

Unprecedented Challenges:

! complexity and scale (= size of systems) ! clock speeds ! power management ! reusability & scalability ! “time-to-market”

Design becoming unmanageable using a centralized single clock (synchronous) approach….

slide-3
SLIDE 3

Trends and Challenges (cont.)

  • 1. Clock Rate:

! 1980: several MegaHertz ! 2001: ~ 750 MegaHertz - 1+ GigaHertz ! 2004:

several GigaHertz

Design Challenge:

! “clock skew”: clock must be near-simultaneous across

entire chip

slide-4
SLIDE 4

Trends and Challenges (cont.)

  • 2. Chip Size and Density:

Total # Transistors per Chip: 60-80% increase/year

! ~ 1970: 4 thousand (Intel 4004) ! today: 10-100+ million ! 2004 and beyond:

100 million-1 billion

Design Challenges:

! system complexity, design time, clock distribution ! clock will not reach across chip in 1 cycle

slide-5
SLIDE 5

Trends and Challenges (cont.)

  • 3. Power Consumption

! Low power: ever-increasing demand

! consumer electronics: battery-powered ! high-end processors: avoid expensive fans, packaging

Design Challenge:

! clock inherently consumes power continuously ! “power-down” techniques: only partly effective

slide-6
SLIDE 6

Trends and Challenges (cont.)

  • 4. Time-to-Market, Design Re-Use, Scalability

Increasing pressure for faster “time-to-market”. Need:

! reusable components:

“plug-and-play” design

! scalable design: easy system upgrades

Design Challenge: mismatch w/ central fixed-rate clock

slide-7
SLIDE 7

Trends and Challenges (cont.)

  • 5. Future Trends: “Mixed Timing” Domains

Chips themselves becoming distributed systems….

! contain many sub-regions, operating at different speeds:

Design Challenge: breakdown of single centralized clock control

slide-8
SLIDE 8

Introduction

Example: System-on-a-Chip (SoC) Design

" Building entire large-scale system on a single chip " Benefit: Higher-level of integration

! Improved performance, cost, area

" Challenges:

! Mixed-timing: moving to multiple timing domains ! Performance degradation: synchronization overhead ! Complexity, scale, integration ! Designing & incorporating of asynchronous subsystems

slide-9
SLIDE 9

Future Chips

Asynchronous Domain Synchronous Domain 1 Synchronous Domain 2

slide-10
SLIDE 10

Research Areas

Asynchronous Domain Synchronous Domain 1 Synchronous Domain 2

Goal # 1: interface mixed-timing domains with low latency Goal # 2: synthesis + optimization of asynchronous systems

Asynchronous Domain

slide-11
SLIDE 11

Summary: Key Challenges in System Design

Two key issues not yet completely addressed:

  • 1. Communication between mixed-timing domains:

! Goals: performance and scalability

  • 2. Synthesis of large-scale asynchronous systems:

! Goals: develop powerful optimizing CAD tools,

facilitating “design-space exploration”

slide-12
SLIDE 12

Asynchronous Design: Motivation

Need for large-scale asynchronous systems:

! Future chips: likely a mix of async and sync domains

Asynchronous Systems: offer a number of advantages GALS: “globally-asynchronous, locally-synchronous”

! Hybrid style: introduced by Chapiro [84]

! synchronous “processing elements” (“satellites”) ! asynchronous communication

! Recent interest: “Communication-Based Design”

! UC Berkeley/Stanford: W. Dally, K. Keutzer, A. Sangiovanni ! orthogonalization of concerns: function vs. communication

slide-13
SLIDE 13

Asynchronous Design: Potential Advantages

" Modularity:

! Interface easily with sync domains & environment

" Reusability and scalability:

! Handle wide range of interface speeds ⇒ reuse ! Scalability: easily add new subsystems

" Average-case performance:

! Intel RAPPID instruction-length decoder: 3-4x faster than sync design ! Differential equation solver: 1.5x faster than sync design

" Lower power consumption:

! Avoids clock distribution power ! Provides automatic “clock gating” … at arbitrary granularity ! Digital hearing aid chip: 4-5.5x less power

" Low electromagnetic interference (EMI): no regular clock spikes

! Philips, commercial 80c51 microcontrollers: in cell phones, pagers

Industrial interest: Intel, Sun, IBM, Philips, Theseus, Fulcrum

slide-14
SLIDE 14

Related Work # 1: Interfacing in Single Clock Domain

Handling Timing Discrepancies...:

Clock Skew:

! STARI Chip [M. Greenstreet, ICCD-95]

Use async buffer to smooth out discrepancies between sender and receiver

! Skew-Tolerant Domino [M. Horowitz] ! Clock-Skew Scheduling [E. Friedman] ! Long interconnect delays [Carloni99]: limited to single clock

Long Interconnect Delays:

! “Relay Stations” [Carloni, Sangiovanni-Vincentelli, DAC-00]

Break up overlong wires by pipelining communication

slide-15
SLIDE 15

Related Work: Interfacing Mixed-Timing Domains

Two common approaches…:

" Modify Receiver’s Clock:

! “stretchable” and “pausible” clocks ! Chapiro84, Yun96, Bormann97, Sjogren/Myers97, Moore02 ! drawbacks:

" Use Synchronization Components:

! data/control synchronization ! Seitz80, Seizovic94, Intel97, Sarmenta95, Kol98 ! drawbacks: overheads in throughput, latency, area

  • Penalties in restarting clock
  • Does not support design reuse
slide-16
SLIDE 16

Contribution: Mixed-Timing Interfaces

A complete family of mixed-timing FIFO’s

Characteristics:

" Low-latency " Modular and scalable:

! Define interfaces for each combination of:

! Synchronous or Asynchronous domains

! Combine interfaces to design new async/sync FIFO’s

" High throughput:

! In steady state: no synchronization overhead, no failure

probability

! Enqueue/Dequeue data items: one/cycle

" Low area overheads

Also, solve issue of long interconnect delays between domains

slide-17
SLIDE 17

Contribution: Mixed-Timing Interfaces

Publications

Latest Solution:

IEEE/ACM Design Automation Conference (DAC, June 2001)

  • T. Chelcea and S.M. Nowick, “Robust Interfaces for Mixed-Timing

Systems with Application to Latency-Insensitive Protocols”

Initial Solution:

IEEE Computer Society Workshop on VLSI (WVLSI, April 2000)

  • T. Chelcea and S.M. Nowick, “A Low-Latency FIFO for

Mixed-Clock Systems”

See also:

  • A. Iyer and D. Marculescu, ISCA-02.
slide-18
SLIDE 18

Outline

I . Mixed-Timing I nterface Circuits

! Sync/Sync ! Async/Async ! Async/Sync

I I . Handling Long I nterconnect Delays Experimental Results Conclusions

slide-19
SLIDE 19

Part I Mixed- Timing I nterf ace Circuits

slide-20
SLIDE 20

Mixed-Timing Interfaces: Overview

Asynchronous Domain Synchronous Domain 1 Synchronous Domain 2

Problem: potential data synchronization errors

slide-21
SLIDE 21

Mixed-Timing Interfaces: Overview

Asynchronous Domain Synchronous Domain 1 Synchronous Domain 2

Async- Sync FI FO Async- Sync FI FO Sync- Async FI FO Mixed- Clock FI FO’s

Problem: potential data synchronization errors Solution: insert mixed-timing FI FO’s ⇒ ⇒ ⇒ ⇒ safe data transfer

slide-22
SLIDE 22

Mixed-Clock FIFO: Block Level

full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO

synchronous

put inteface

synchronous

get interface

slide-23
SLIDE 23

Mixed-Clock FIFO: Block Level

full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO Bus f or data items Controls get operations I nitiates get operations Bus f or data items

synchronous

put inteface

synchronous

get interface

I nitiates put operations Controls put operations

slide-24
SLIDE 24

Mixed-Clock FIFO: Block Level

full req_put data_put CLK_put req_get valid_get empty data_get CLK_get Mixed-Clock FIFO

synchronous

put inteface

synchronous

get interface

I ndicates when FI FO empty I ndicates when FI FO f ull I ndicates data items validity (always 1 in this design)

slide-25
SLIDE 25

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

slide-26
SLIDE 26

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

Array of identical cells Token Ring Architecture

slide-27
SLIDE 27

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

Common Data/ Control Buses f or put interf ace Put I nterf ace

slide-28
SLIDE 28

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

Put Token: used to enqueue data items Cell with put token = tail of queue Put Token Ring

slide-29
SLIDE 29

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

Full Detector: detects when FI FO f ull full Put Controller:

  • enables & disables put operations
  • stalls put interf ace when FI FO f ull
slide-30
SLIDE 30

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get empty

Get Token: used to dequeue data items Cell with get token = head of queue Get Token Ring Get I nterf ace

slide-31
SLIDE 31

Mixed-Clock FIFO: Architecture

cell cell cell cell cell

Get Controller

Empty Detector Full Detector

Put Controller

full req_put data_put CLK_put CLK_get data_get req_get valid_get

empty Empty Detector: detects when FI FO empty Get Controller:

  • enables & disables get operations
  • stalls get interf ace when FI FO empty
slide-32
SLIDE 32

REG

Mixed-Clock FIFO: Cell Implementation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get en_get valid data_get CLK_put en_put req_put data_put

SR

slide-33
SLIDE 33

REG

Mixed-Clock FIFO: Cell Implementation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR GET I NTERFACE PUT I NTERFACE

en_get valid req_put

slide-34
SLIDE 34

REG

Mixed-Clock FIFO: Cell Implementation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

en_put en_get

Enables get operation Enables put operation valid data_get Data Bus: item out valid data_put Data Bus: item in GET I NTERFACE PUT I NTERFACE

slide-35
SLIDE 35

REG

Mixed-Clock FIFO: Cell Implementation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

f _i e_i

Cell FULL Cell EMPTY Status Bits:

req_put en_get valid

slide-36
SLIDE 36

REG

Mixed-Clock FIFO: Cell Implementation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get en_get valid data_get CLK_put en_put req_put data_put

SR

ptok_out ptok_in gtok_out gtok_in

En En

Token Passing:

slide-37
SLIDE 37

REG

Mixed-Clock FIFO Cell: Put Operation

En En

f _i e_i ptok_out ptok_in = 1 gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

Simulation # 1: Put Operation

req_put

Cell Has Put Token:

en_get valid

slide-38
SLIDE 38

REG

Mixed-Clock FIFO Cell: Put Operation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

en_put

valid data_put Put Request Arrives:

en_get valid

slide-39
SLIDE 39

REG

Mixed-Clock FIFO Cell: Put Operation

En En

f _I = 1 e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

en_put

valid data_put Data Latch Enabled: “FULL CELL” Asserted

en_get valid

slide-40
SLIDE 40

REG

Mixed-Clock FIFO Cell: Put Operation

En En

f _i e_i ptok_out=1 ptok_in = 0 gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR NEXT CLK: Data Latched NEXT CLK: Token Passed

req_put en_get valid

slide-41
SLIDE 41

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _i e_i ptok_out ptok_in gtok_in gtok_out CLK_get data_get CLK_put en_put data_put

SR

req_put

Simulation # 2: Get Operation

en_get valid

slide-42
SLIDE 42

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _i e_i ptok_out ptok_in gtok_in = 1 gtok_out CLK_get data_get CLK_put en_put data_put

SR

req_put

Cell Has Get Token

en_get valid

slide-43
SLIDE 43

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _i e_i ptok_out ptok_in gtok_in = 1 gtok_out CLK_get data_get CLK_put en_put data_put

SR

req_put

Get Request Arrives

en_get valid

slide-44
SLIDE 44

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _I = 0 e_I = 1 ptok_out ptok_in gtok_in = 1 gtok_out CLK_get data_get CLK_put en_put data_put

SR

req_put en_get valid

Tri- State Buf f ers Enabled “EMPTY CELL” Asserted

slide-45
SLIDE 45

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _I = 0 e_I = 1 ptok_out ptok_in gtok_in = 1 gtok_out CLK_get data_get CLK_put en_put data_put

SR

req_put en_get valid

Data Broadcast

  • n Get Bus
slide-46
SLIDE 46

REG

Mixed-Clock FIFO Cell: Get Operation

En En

f _I = 0 e_I = 1 ptok_out ptok_in gtok_in = 0 gtok_out = 1 CLK_get data_get CLK_put en_put data_put

SR

req_put en_get valid

NEXT CLK: Token Passed

slide-47
SLIDE 47

Synchronization Issues: Overview

Challenge: highly concurrent behavior

! Global FIFO state controlled by two different clocks

Problem # 1: Metastability

! Each FIFO interface needs clean state signals

Solution # 1: Synchronize “full” & “empty” signals

! “full” with CLK_put ! “empty” with CLK_get

Add 2 synchronizing latches each

slide-48
SLIDE 48

Mixed-Clock FIFO: Full/ E mpty Detectors

Problem # 2: FIFO now may underflow/overflow!

! synchronizing latches add extra latency

Solution # 2: Change Full/Empty definitions

New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left

e_0 e_1 e_2 e_3 e_3 e_2 e_1 e_0 CLK_put CLK_put CLK_put

f ull Synchronizing Latches

New Full Detector

Observable full/empty safely approximate FIFO’s state

slide-49
SLIDE 49

Mixed-Clock FIFO: Full/ E mpty Detectors

Problem # 2: FIFO now may underflow/overflow!

! synchronizing latches add extra latency

Solution # 2: Change Full/Empty definitions

New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left

e_0 e_1 e_2 e_3 e_3 e_2 e_1 e_0 CLK_put CLK_put CLK_put

f ull ≥ ≥ ≥ ≥ Two consecutive empty cells

New Full Detector

Observable full/empty safely approximate FIFO’s state

slide-50
SLIDE 50

Mixed-Clock FIFO: Full/ E mpty Detectors

Problem # 2: FIFO now may underflow/overflow!

! synchronizing latches add extra latency

Solution # 2: Change Full/Empty definitions

New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left

e_0 e_1 e_2 e_3 e_3 e_2 e_1 e_0 CLK_put CLK_put CLK_put

f ull ≥ ≥ ≥ ≥ Two consecutive empty cells FI FO “not f ull” =

New Full Detector

Observable full/empty safely approximate FIFO’s state

slide-51
SLIDE 51

Mixed-Clock FIFO: Full/ E mpty Detectors

Problem # 2: FIFO now may underflow/overflow!

! synchronizing latches add extra latency

Solution # 2: Change Full/Empty definitions

New FULL: 0 or 1 empty cells left New EMPTY: 0 or 1 full cells left

e_0 e_1 e_2 e_3 e_3 e_2 e_1 e_0 CLK_put CLK_put CLK_put

f ull NO two consecutive empty cells

New Full Detector

Observable full/empty safely approximate FIFO’s state

FI FO “f ull”

slide-52
SLIDE 52

Deadlock Avoidance

Problem # 3: potential for deadlock Scenario: only 1 data item in FIFO

! FIFO still considered “empty” (new definition) ! Get interface: cannot dequeue item!

Solution # 3: bi-modal empty detector

! “New empty” detector (0 or 1 data items) ! “True empty” detector (0 data items)

Combine two results into single global “empty”

slide-53
SLIDE 53

Mixed-Clock FIFO: Deadlock Avoidance

f _0 f _1 f _2 f _3 f _3 f _2 f _1 f _0 CLK_get CLK_get CLK_get

ne

f _1 f _3 f _2 f _0 CLK_get CLK_get CLK_get

  • e

req_get en_get empty

slide-54
SLIDE 54

Mixed-Clock FIFO: Deadlock Avoidance

f _0 f _1 f _2 f _3 f _3 f _2 f _1 f _0 CLK_get CLK_get CLK_get

ne

f _1 f _3 f _2 f _0 CLK_get CLK_get CLK_get

  • e

req_get en_get empty Detects “new empty” (0 or 1 empty cells) Detects “true empty” (0 empty cells) Combine into global “empty”

slide-55
SLIDE 55

Mixed-Clock FIFO: Deadlock Avoidance

f _0 f _1 f _2 f _3 f _3 f _2 f _1 f _0 CLK_get CLK_get CLK_get

ne

f _1 f _3 f _2 f _0 CLK_get CLK_get CLK_get

  • e

req_get en_get empty

Bi- modal empty detection: select either ne or oe

Reconf igure whenever active get interf ace

slide-56
SLIDE 56

Mixed-Clock FIFO: Deadlock Avoidance

f _0 f _1 f _2 f _3 f _3 f _2 f _1 f _0 CLK_get CLK_get CLK_get

ne

f _1 f _3 f _2 f _0 CLK_get CLK_get CLK_get

  • e

req_get en_get empty

Bi- modal empty detection:

Reconf igure whenever active get interf ace When reconf igured, use “ne”: FI FO active ⇒ ⇒ ⇒ ⇒ avoids underf low

slide-57
SLIDE 57

Mixed-Clock FIFO: Deadlock Avoidance

f _0 f _1 f _2 f _3 f _3 f _2 f _1 f _0 CLK_get CLK_get CLK_get

ne

f _1 f _3 f _2 f _0 CLK_get CLK_get CLK_get

  • e

req_get en_get empty

Bi- modal empty detection:

When NOT reconf igured, use “oe”: FI FO quiescent ⇒ ⇒ ⇒ ⇒ avoids deadlock

slide-58
SLIDE 58

Related Work: Intel Mixed-Clock Synchronizer

Intel Patent [1997]: J. Jex, C. Dike, K. Self (5,598,113)

! Similar FIFO structure ! Similar notion of “almost full”/”almost empty”

Differences/Limitations: N-stage FIFO

# synchronizers required:

! INTEL: N+ 1 ! US: 3

Interface types:

! INTEL: only sync-sync ! US: introduce a complete family (sync+ async combinations)

slide-59
SLIDE 59

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get get_ack

slide-60
SLIDE 60

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get get_ack

Asynchronous Put Part

slide-61
SLIDE 61

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get

Asynchronous Get Part

get_ack

slide-62
SLIDE 62

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get get_ack

Get I nterf ace: 4- phase bundled data channel Put I nterf ace: 4- phase bundled data channel

slide-63
SLIDE 63

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get get_ack

No Detectors or External Controllers

slide-64
SLIDE 64

Async-Async FIFO: Architecture

cell cell cell cell cell

put_ack put_req put_data get_req data_get

When FI FO f ull, acknowledgment withheld until saf e to perf orm the put operation

get_ack

slide-65
SLIDE 65

Async-Async FIFO Cell

we1 re1 put_req put_data put_ack GC

REG

C+

+

C

+

get_req get_ack get_data we re OPT OGT PC

+C

DV

slide-66
SLIDE 66

Async-Async FIFO Cell

we1 re1 put_req put_data put_ack GC

REG

C+

+

C

+

get_req get_ack get_data we re OPT OGT PC

+C

DV

Asynchronous Put Part

reusable

Asynchronous Get Part

reusable

Data Validity Controller

slide-67
SLIDE 67

Synchronous Get I nterf ace: exactly as in Mixed- Clock FI FO Asynchronous Put I nterf ace: exactly as in Async- Async FI FO

Reusability: Async-Sync FIFO Architecture

cell cell cell cell cell

Get Controller

Empty Detector

put_ack put_req put_data CLK_get data_get req_get valid_get empty

slide-68
SLIDE 68

Asynchronous Put Part Data Validity Controller new REG

Reusability: Async-Sync FIFO Cell

C + OPT DV

En

put_req put_data put_ack we f _i

gtok_out

we1 gtok_in CLK_get en_get get_data e_i Synchronous Get Part reused (f rom mixed- clock FI FO) reused f rom async- async FI FO

slide-69
SLIDE 69

Part I I Handling Long I nterconnect Delays

slide-70
SLIDE 70

Issues in Handling Long Interconnect

System 1 System 2

Relay Stations: Background [Carloni, Sangiovanni-Vincentelli ’99]

system 1 sends “data items” to system 2

slide-71
SLIDE 71

Issues in Handling Long Interconnect

System 1 System 2

Relay Stations Background [Carloni’99]

Delay = > 1 cycle

slide-72
SLIDE 72

Issues in Handling Long Interconnect

system 1 now sends “data packets” to system 2 RS RS RS RS System 1 System 2

Relay Stations Background [Carloni’99]

CLK

slide-73
SLIDE 73

Issues in Handling Long Interconnect

RS RS RS RS System 1 System 2 Data Packet = data item + validity bit

Relay Stations Background [Carloni’99]

CLK Delay = 1 cycle

slide-74
SLIDE 74

Issues in Handling Long Interconnect

RS RS RS RS System 1 System 2

Relay Stations Background [Carloni’99]

Steady State: pass data on every cycle (either valid or invalid) CLK

slide-75
SLIDE 75

Issues in Handling Long Interconnect

RS RS RS RS System 1 System 2 “stop” control = stopI n + stopOut

  • apply counter- pressure
  • result: stall communication

Relay Stations Background [Carloni’99]

Problem: Works only f or single- clock systems!

CLK

slide-76
SLIDE 76

Relay Station Mixed-Clock FIFO

Steady state: always pass data Data items: both valid & invalid Stopping mechanism: stopIn & stopOut Steady state: only pass data when requested Data items: only valid data Stopping mechanism: none (only full/empty)

validOut dataOut stopIn validIn dataIn stopOut empty full req_get req_put valid_get data_get data_get Relay Station Mixed- Clock FIFO

slide-77
SLIDE 77

full req_put data_put CLK_put empty req_get valid_get data_get CLK_get Mixed-Clock FIFO CLK

Mixed-Clock Relay Stations (MCRS)

RS RS RS RS System 1 System 2 Mixed-Clock Relay Station: derived from Mixed-Clock FIFO valid_put data_put stopOut stopI n valid_get data_get Mixed- Clock Relay Station CLK1 CLK2 MCRS CLK1 CLK2

Change ONLY Put and Get Controllers

packetI n packetOut

slide-78
SLIDE 78

Part I I I Experimental Results

slide-79
SLIDE 79

Preliminary Results

Each new Mixed-Timing FIFO designed:

! using both academic and industry tools

! MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99] ! PETRIFY: Petri-Net controllers [Cortadella et al. ‘97]

Pre-layout simulations in 0.6µm HP CMOS technology Experiments:

! various FIFO capacities (4/16 cells) ! 8-bit data items

slide-80
SLIDE 80

Sync receiver ⇒ ⇒ ⇒ ⇒ latency not uniquely def ined: Min/ Max

Preliminary Results: Latency

Experimental setup: 8-bit data items + various FIFO capacities (4, 16)

Latency = time from enqueuing to dequeueing data into an empty FIFO

2.43 1.86 Sync-Async RS 7.62 6.57 6.35 5.61 Async-Sync RS 7.28 6.23 6.41 5.48 Mixed-Clock RS 2.44 1.95 Sync-Async FIFO 7.51 6.47 6.45 5.53 Async-Sync FIFO 2.29 1.73 Async-Async FIFO 7.17 6.14 6.34 5.43 Mixed-Clock FIFO Max Min Max Min 16-place 4-place Version

slide-81
SLIDE 81

2.43 1.86 Sync-Async RS 7.62 6.57 6.35 5.61 Async-Sync RS 7.28 6.23 6.41 5.48 Mixed-Clock RS 2.44 1.95 Sync-Async FIFO 7.51 6.47 6.45 5.53 Async-Sync FIFO 2.29 1.73 Async-Async FIFO 7.17 6.14 6.34 5.43 Mixed-Clock FIFO Max Min Max Min 16-place 4-place Version

Async receiver ⇒ ⇒ ⇒ ⇒ lower, unique latency, no synchronization

Preliminary Results: Latency

Experimental setup: 8-bit data items + various FIFO capacities (4, 16)

Latency = time from enqueuing to dequeueing data into an empty FIFO

slide-82
SLIDE 82

Preliminary Results: Maximum Operating Rate

484 357 549 421

Async-Sync FIFO

360 509 454 580

Sync-Async RS

360 505 454 565

Sync-Async FIFO

357 359 454 423

Async-Async FIFO

475 357 539 421

Async-Sync RS

475 509 539 580

Mixed-Clock RS

484 505 549 565

Mixed-Clock FIFO

Get Put Get Put 16-place 4-place Design

Synchronous interfaces: MegaHertz Asynchronous interfaces: MegaOps/sec

Put vs. Get rates:

  • sync put faster than sync get
  • async put slower than async get

Async vs. Sync rates:

  • async slower than sync
slide-83
SLIDE 83

Conclusions

Introduced complete family of mixed-timing FIFO’s :

! sync-sync, async-async, async-sync, sync-async ! create FIFO’s from reusable parts ! extend to handle issue of long interconnect delays

Characteristics:

" Low-latency " Modular and scalable: distributed token-ring architecture " High throughput:

! steady state: no synchronization overhead, no failure probability ! enqueue/dequeue data items: one/cycle

" Low area overheads: simple design

Extensions:

! Deeper synchronizers (more latches) = > arbitrary robustness ! powering down of inactive cells