A Level-Encoded Transition Signaling Protocol for High-Throughput - - PowerPoint PPT Presentation

a level encoded transition signaling protocol for high
SMART_READER_LITE
LIVE PREVIEW

A Level-Encoded Transition Signaling Protocol for High-Throughput - - PowerPoint PPT Presentation

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication Peggy B. McGee, Melinda Y. Agyekum, Moustafa M. Mohamed and Steven M. Nowick { pmcgee, melinda, mmohamed, nowick } @cs.columbia.edu Department


slide-1
SLIDE 1

A Level-Encoded Transition Signaling Protocol for High-Throughput Asynchronous Global Communication

Peggy B. McGee, Melinda Y. Agyekum, Moustafa M. Mohamed and Steven M. Nowick

{pmcgee, melinda, mmohamed, nowick}@cs.columbia.edu

Department of Computer Science Columbia University

April 10, 2008

1/48

slide-2
SLIDE 2

Trends in Digital Systems Design

◮ Increased design complexity

  • More functionality on a single chip

→ Smaller transistor size → Larger die size

  • Multiple clock domains

◮ High-performance computing

  • Multi-Giga Hertz clock rate
  • Multiple independent computation nodes

→ Processor cores, memories, etc. ◮ Plug-&-play components

  • For re-usability

System-on-Chip (SoC)

2/48

slide-3
SLIDE 3

System-on-Chip (SoC): Challenges

◮ Heterogeneity

  • Multiple clock domains
  • Mixed asynchronous/synchronous components

◮ Wires do not scale at the same rate as transistors

  • Increasing proportion of delay in interconnects
  • Challenges for global routing in physical design

◮ Deep submicron effects

  • Handling dynamic timing variability, crosstalk, EMI, noise, etc.
  • Clock jittering and/or drifting effects

◮ Power dissipation

  • Interconnects a significant source of of power

Need for new approaches for interconnect design

3/48

slide-4
SLIDE 4

SoC Communication Fabric: Ideal Requirements

◮ Speed

  • High throughput, low latency

◮ Low power

  • Low switching activity

◮ Robustness

  • Against timing variation
  • Handling dynamic voltage scaling
  • Handling single-event upset effects (soft errors)

◮ Flexibility

  • Easy integration of modular Intellectual Properties (IPs)

4/48

slide-5
SLIDE 5

Asynchronous Design for SoC Communication

◮ Potential benefits of asynchronous design

  • Significant power advantage

→ No clock routing

→ “Compute-on-demand” approach

  • Timing robustness using delay-insensitive (DI) encoding

→ Eliminates global timing constraints

→ Accommodates uncertainties in routing delay → Accommodates skew between bits

  • Supports modular design methodologies

→ e.g. GALS (globally-asynchronous, locally-synchronous)

→ Mixed synchronous/asynchronous components

Asynchronous design well-suited for ideal requirements of SoC communication

5/48

slide-6
SLIDE 6

Application Model: Target SoC Architecture

Computation node Asynchronous / Synchronous Computation node Asynchronous / Synchronous Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel

Our focus

6/48

slide-7
SLIDE 7

Application Model: Target SoC Architecture

Computation node Asynchronous / Synchronous Computation node Asynchronous / Synchronous Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel

Our focus

  • 1. Timing-robust, high-throughput

asynchronous encoding scheme

6/48

slide-8
SLIDE 8

Application Model: Target SoC Architecture

Computation node Asynchronous / Synchronous Computation node Asynchronous / Synchronous Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel

Our focus

  • 2. Protocol conversion interface

→ Allows separation of computation and communication

  • Some codes are better for computation
  • Some codes are better for communication
  • 1. Timing-robust, high-throughput

asynchronous encoding scheme

6/48

slide-9
SLIDE 9

Application Model: Target SoC Architecture

Computation node Asynchronous / Synchronous Computation node Asynchronous / Synchronous Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel

Our focus

Current focus is on asynchronous computation nodes → Expandable to synchronous

6/48

slide-10
SLIDE 10

Key Contributions: Theoretical

◮ A new class of delay-insensitive code for global communication “Level-Encoded Transition Signaling (LETS)”

  • Delay-insensitive

→ Timing-robust

  • Uses two-phase (transition) signaling

→ High throughput: no return-to-zero phase → most existing schemes use four-phase: have spacer phase → Low switching activity

  • Level-encoded data

→ Data values easily extracted from encoding

  • Supports 1-of-N encoding

→ Lower switching activity → compared to existing level-encoded transition signaling code → Main focus: 1-of-4 codes

7/48

slide-11
SLIDE 11

Key Contributions: Practical

◮ Practical 1-of-4 LETS codes

  • Two example codes shown

→ “Quasi-1-hot/cold” → “Quasi-binary” ◮ Generalization to 1-of-N LETS codes

  • First to demonstrate 1-of-N level-encoded codes
  • Systematic procedure to generate LETS codes for all N = 2n

◮ Hardware support

  • Efficient conversion circuit for 1-of-4 LETS proposed

→ To/from 4-phase dual-rail signaling

  • Pipeline design for global communication proposed

→ Improves throughput

8/48

slide-12
SLIDE 12

Outline

◮ Introduction ◮ Background

  • Handshake protocol control signaling
  • Handshake protocol: control signaling + data
  • Asynchronous data encoding

◮ 1-of-4 LETS codes ◮ 1-of-N LETS codes ◮ Hardware support ◮ Analytical evaluation ◮ Conclusions

9/48

slide-13
SLIDE 13

Handshake Protocol Control Signaling: 4-Phase

1 2 3 4

REQ ACK One transaction evaluate reset transaction # 1

◮ Four wire transition events per transaction ◮ All wires must return to zero → Before next transaction

10/48

slide-14
SLIDE 14

Handshake Protocol Control Signaling: 2-Phase

1 2 1 2

REQ ACK transaction #1 transaction #2 Two transactions

◮ Two wire transition events per transaction ◮ No return-to-zero phase

11/48

slide-15
SLIDE 15

Handshake Protocol: Control Signaling + Data

Sender Receiver Data wire Control = Ack

12/48

slide-16
SLIDE 16

Handshake Protocol: Control Signaling + Data

Sender Receiver Data

12/48

slide-17
SLIDE 17

Handshake Protocol: Control Signaling + Data

Sender Receiver Entire data wave arrives

12/48

slide-18
SLIDE 18

Handshake Protocol: Control Signaling + Data

Sender Receiver Entire data wave arrives Receiver sends Ack

12/48

slide-19
SLIDE 19

Handshake Protocol: Control Signaling + Data

Sender Receiver Entire data wave arrives Receiver sends Ack 2-phase transition signaling protocol completes → Transition signaling = non-return-to-zero (NRZ)

12/48

slide-20
SLIDE 20

Handshake Protocol: Control Signaling + Data

Sender Receiver Spacer tokens (spacer = data reset to zero) Round trip for 4-phase (return-to-zero) protocol

12/48

slide-21
SLIDE 21

Handshake Protocol: Control Signaling + Data

Sender Receiver All wires reset to zero Receiver sends Ack 4-phase (return-to-zero) protocol completes

12/48

slide-22
SLIDE 22

Asynchronous Data Encoding: DI Codes

◮ Properties of delay-insensitive (DI) codes

  • Timing-robust

→ Insensitive to input arrival time

  • Completion of data transaction encoded into data itself

→ Unambiguous recognition of code → no valid codeword seen when transitioning between codewords

13/48

slide-23
SLIDE 23

DI Return-to-Zero (RZ) Code #1: Dual-Rail

◮ Two wires to encode a single bit a

(1 bit of data)

a1 a0

Encoding Symbolic value a1 a0 a “reset” value 1 1 1 1 1 illegal

◮ Each dual-rail pair provides

  • Data value: whether 1 or 0 is being transmitted
  • Data validity: whether data is a value, illegal or reset

◮ Main benefit: allows simple hardware for computation blocks ◮ Main disadvantage: low throughput and high power → Needs reset phase: all bits always reset to zero

14/48

slide-24
SLIDE 24

DI Return-to-Zero (RZ) Code #2: 1-of-N

◮ N wires to encode log N bits (one-hot encoding) a

(logN bits of data)

aN−1 a1 a0

Example: 1-of-4 code Encoding Symbolic value a3 a2 a1 a0 a “reset" value 1 00 1 01 1 10 1 11 All other codewords illegal

◮ Main benefit: uses lower power than dual-rail → 1 out of N rails changes value per data transaction ◮ Main disadvantage: gets expensive beyond 1-of-4 → Coding density decrease → Complicated to concatenate irregularly-sized data streams

15/48

slide-25
SLIDE 25

DI Non-Return-to-Zero (NRZ) Code #1: LEDR

LEDR = Level-Encoded Dual-Rail

◮ Two wires to encode a single bit a

(1 bit of data)

parity rail data rail

Encoding Symbolic value Phase Parity Data a rail rail Even 1 1 1 Odd 1 1 1

◮ Properties of LEDR codes:

  • Level encoded: can retrieve data value directly from wires
  • Alternating phase protocol: between odd and even phases
  • Only 1 rail changes value: per bit per data transaction

Dean et al., “Efficient Self-Timing with Level-Encoded 2-Phase Dual-Rail (LEDR)”, Proc.

  • f UCSC Conf. on Adv. Research in VLSI, ’91

16/48

slide-26
SLIDE 26

DI Non-Return-to-Zero (NRZ) Code #1: LEDR (cont’d)

◮ Main benefits

  • No return-to-zero phase

→ High throughput, low power

  • Easy to extract data

◮ Main disadvantages

  • Significantly more complicated function blocks

→ No practical solutions have been proposed → Potential solution strategy:

→ LEDR for global communication → 4-phase RZ (dual-rail or single-rail) for computation → Need efficient hardware for conversion between protocols:

Mitra, McLaughlin and Nowick, “Efficient asynchronous protocol converters for two-phase delay-insensitive global communication”, ASYNC’07

  • Uses more power than synchronous communication

→ Uses less power than RZ

17/48

slide-27
SLIDE 27

Outline

◮ Introduction ◮ Background ◮ 1-of-4 LETS codes ◮ 1-of-N LETS codes ◮ Hardware support ◮ Analytical evaluation ◮ Conclusions

18/48

slide-28
SLIDE 28

LETS Codes: Motivation & Contributions

“LETS = Level-Encoded Transition Signaling”

◮ A new class of delay-insensitive codes

  • Extension of LEDR = 1-of-2 LETS

→ Uses fewer wire transitions per data transaction → Analogous to 1-of-N extension to dual-rail in RZ

  • Goal:

→ Generate and evaluate entire family of 1-of-N codes ◮ Key benefits

  • Maintains benefits of LEDR

→ High throughput → Delay-insensitive → Efficient hardware conversion to 4-phase protocols

  • Additional benefit

→ Lower power consumption than LEDR

19/48

slide-29
SLIDE 29

1-of-4 LETS Code Derivation: Overview

w=0 w=1 x y z

Starting point: 4-bit code space Code space represented by 4-D hypercube 16 codewords in code space

20/48

slide-30
SLIDE 30

1-of-4 LETS Code Derivation: Overview

w=0 w=1 x y z

→ such that all LETS properties are observed Goal: assign symbols to codewords → Symbols to assign = {S0, S1, S2, S3} → Codewords = {0000, 0001, ...., 1111}

20/48

slide-31
SLIDE 31

1-of-4 LETS Code Derivation: Overview

w=0 w=1 x y z

Goal: assign symbols to codewords → Symbols to assign = {S0, S1, S2, S3} → Codewords = {0000, 0001, ...., 1111} Rule 2 (Reachability): → Each symbol Sx must reach all symbols S0 − S3 in opposite phase Rule 1 (Alternating phases): → Odd and even phases must alternate

20/48

slide-32
SLIDE 32

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 Step 1: assign arbitrary symbol to arbitrary codeword 0000

EVEN phase

21/48

slide-33
SLIDE 33

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 S0 S2 S3 S1 Step 2: assign symbols to all neighbors of S0 at 0000 in ODD phase

Rule 1 (Reachability): → Each symbol Sx must reach all symbols S0 − S3 in opposite phase ODD phase

21/48

slide-34
SLIDE 34

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 S0 S2 S3 S1

EVEN phase

Step 3: assign symbols to all neighbors of S1 at 1000 in EVEN phase

Assign neighbors to S1

21/48

slide-35
SLIDE 35

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 S0 S2 S3 S1

EVEN phase

Step 3: assign symbols to all neighbors of S1 at 1000 in EVEN phase

S0 already assigned to 0000

21/48

slide-36
SLIDE 36

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 S0 S2 S3 S1 S2’ S1’ S3’

EVEN phase

Step 3: assign symbols to all neighbors of S1 at 1000 in EVEN phase

Assign S1, S2 and S3 to remaining neighbors

21/48

slide-37
SLIDE 37

1-of-4 LETS Code Derivation: Details

w=0 w=1 x y z S0 S0 S2 S3 S1 S2’ S1’ S3’ S1’ S3’ S2’ S0’ S3 S1 S2 S0’ Final steps: complete symbol assignment

Follow same reasoning in previous steps

21/48

slide-38
SLIDE 38

1-of-4 LETS Code Derivation: Summary

w=0 w=1 x y z S0 S0 S2 S3 S1 S2’ S1’ S3’ S1’ S3’ S2’ S0’ S3 S1 S2 S0’

Code space divided into EVEN and ODD phases Entire code space filled up

Codewords in even phase Codewords in odd phase

22/48

slide-39
SLIDE 39

1-of-4 LETS Codes: Code Space

◮ Many valid 1-of-4 codes possible

  • 1152 unique codes derivable from method shown

→ Complete enumeration derived in paper ◮ Some codes more “practical” than others

  • All data values easily extracted from codeword

◮ Our focus: Two “Practical” codes

  • “Quasi-1-hot/cold”
  • “Quasi-binary”

23/48

slide-40
SLIDE 40

A Practical 1-of-4 LETS Code: “Quasi-1-Hot/Cold"

symbol r3 r2 r1 r0 S0 1 S1 1 S2 1 S3 1 S0 1 1 1 1 S1 1 1 S2 1 1 S3 1 1 symbol r3 r2 r1 r0 S0’ 1 1 1 S1’ 1 1 1 S2’ 1 1 1 S3’ 1 1 1 S0’ S1’ 1 1 S2’ 1 1 S3’ 1 1

16 codewords for 4 symbols

24/48

slide-41
SLIDE 41

A Practical 1-of-4 LETS Code: “Quasi-1-Hot/Cold"

symbol r3 r2 r1 r0 S0 1 S1 1 S2 1 S3 1 S0 1 1 1 1 S1 1 1 S2 1 1 S3 1 1 symbol r3 r2 r1 r0 S0’ 1 1 1 S1’ 1 1 1 S2’ 1 1 1 S3’ 1 1 1 S0’ S1’ 1 1 S2’ 1 1 S3’ 1 1 ODD code- words EVEN code- words

Code space divided into ODD and EVEN phases

24/48

slide-42
SLIDE 42

A Practical 1-of-4 LETS Code: “Quasi-1-Hot/Cold"

symbol r3 r2 r1 r0 S0 1 S1 1 S2 1 S3 1 S0 1 1 1 1 S1 1 1 S2 1 1 S3 1 1 symbol r3 r2 r1 r0 S0’ 1 1 1 S1’ 1 1 1 S2’ 1 1 1 S3’ 1 1 1 S0’ S1’ 1 1 S2’ 1 1 S3’ 1 1 ODD code- words EVEN code- words

Multicode: 2 codewords for each symbol in each phase

24/48

slide-43
SLIDE 43

A Practical 1-of-4 LETS Code: “Quasi-1-Hot/Cold"

symbol r3 r2 r1 r0 S0 1 S1 1 S2 1 S3 1 S0 1 1 1 1 S1 1 1 S2 1 1 S3 1 1 symbol r3 r2 r1 r0 S0’ 1 1 1 S1’ 1 1 1 S2’ 1 1 1 S3’ 1 1 1 S0’ S1’ 1 1 S2’ 1 1 S3’ 1 1

1-hot 1-cold 1-cold 1-hot

Quasi-1-hot/1-cold data value easily extracted from codeword

24/48

slide-44
SLIDE 44

Outline

◮ Introduction ◮ Background ◮ 1-of-4 LETS codes ◮ 1-of-N LETS codes ◮ Hardware support ◮ Analytical evaluation ◮ Conclusions

25/48

slide-45
SLIDE 45

1-of-N LETS Codes

◮ Goal

  • To extend solution for 1-of-4 LETS codes to 1-of-N

◮ Challenge:

  • Solution is not obvious for arbitrary N
  • Must satisfy several properties

→ Level-encoding: data can be extracted directly from codeword

→ Transition signaling: each symbol must reach all others via 1 flip → alternating phase

◮ Contributions

  • Proof: existence of legal LETS codes for every N = 2n
  • Systematic procedure to generate LETS codes

→ LETS properties formulated as set of constraints

→ Constraints captured in code generator matrix → Many different LETS codes exist for each N

See paper for details

26/48

slide-46
SLIDE 46

Outline

◮ Introduction ◮ Background ◮ 1-of-4 LETS codes ◮ 1-of-N LETS codes ◮ Hardware support

  • Conversion circuit: interfacing channels to nodes
  • LETS pipeline circuit: improving channel throughput

◮ Analytical evaluation ◮ Conclusions

27/48

slide-47
SLIDE 47

LETS Hardware Support: Protocol Conversion

Computation node Asynchronous 4-phase RZ Computation node Asynchronous 4-phase RZ Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel (LETS) First, focus on protocol conversion circuits

28/48

slide-48
SLIDE 48

LEDR Converter: Prior Architecture Overview

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic 2-phase comm. channel 2-phase comm. channel LEDR Converter from Mitra et al., "Efficient Asynchronous Protocol Converters for Two-Phase Delay-Insensitive Global Communication", ASYNC’07

29/48

slide-49
SLIDE 49

LEDR Converter: Prior Architecture Overview

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic 2-phase comm. channel 2-phase comm. channel 2/4-phase conversion circuit 2-phase completion detector 2-phase completion detector

29/48

slide-50
SLIDE 50

LEDR Converter: Control Signals

two phase signals four phase signals

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp 30/48

slide-51
SLIDE 51

New contribution: 1-of-4 LETS Converter

◮ Based on existing LEDR (1-of-2 LETS) converter

  • Only minor modifications needed

→ Same overall architecture → Most pieces identical → Internal logic of some blocks have minimal changes

31/48

slide-52
SLIDE 52

1-of-4 LETS Converter

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

= Changed logic blocks

32/48

slide-53
SLIDE 53

Completion Detector: LEDR vs. 1-of-4 LETS

completion detector

C C C C C C C C

LEDR completion detector 1-of-4 LETS completion detector One layer of C-elements replaced by XNOR gates

33/48

slide-54
SLIDE 54

Left Encoder: LEDR vs. 1-of-4 LETS

left encoder

Enable Enable 4−phase true rail b0 false rail b0 4−phase true rail b1 4−phase 4−phase false rail b1 data bit b1 LEDR data bit b0 LEDR

Enable Enable 4−phase true rail b0 4−phase false rail b0 4−phase true rail b1 false rail b1 4−phase LETS data_r0 data_r1 LETS LETS data_r0 data_r2 LETS

LEDR left encoder 1-of-4 LETS left encoder Extra layer of XNOR gates ◮ Not on critical path!

34/48

slide-55
SLIDE 55

Right Encoder: LEDR vs. 1-of-4 LETS

right encoder

Input phase

LEDR

parity rail b0

LEDR

data rail b0 parity

LEDR

rail b1

LEDR

data rail b1

S R Q S Q R G Q S R Q S Q D R

complete 4−phase true rail b0 4−phase false rail b0 4−phase true rail b1 4−phase false rail b1

S R S R S R S R

STORAGE COMPARATOR r3 r1 r0 r2

r0 r1 r3

r0 r1 r2 r3

SELECT

z2 z1 z3 z0 r2 r2 r1 r0 r3

true b1 φ φ φ φ

complete enable

z3 z2 z1 z0

LETS OUTPUTS

false b1 true b0 false b0 4 4 4 4

Q’ Q D Q’ Q D Q’ Q D Q’ Q D

LEDR right encoder 1-of-4 LETS right encoder Extra storage logic ◮ Not on critical path! select block

35/48

slide-56
SLIDE 56

1-of-4 LETS Converter Performance Evaluation

◮ Layout performed for LEDR (1-of-2 LETS) conversion circuits

Mitra et al., "Efficient Asynchronous Protocol Converters for Two-Phase Delay-Insensitive Global Communication", ASYNC’07

  • With a 4-phase multiplier function block
  • 0.18µm TSMC CMOS process
  • Summary of simulation results:

Forward latency input arrival → output data available 6.8ns Stabilization time input arrival → reset complete 10.5ns Pipelined cycle time min processing time / data item (steady state) 8.3ns

◮ 1-of-4 LETS expected to add 15 - 20% overhead ◮ Design is delay-insensitive → Except for two simple one-sided timing constraints

36/48

slide-57
SLIDE 57

LETS Hardware Support: Pipelining Channels

Computation node Asynchronous 4-phase RZ Computation node Asynchronous 4-phase RZ Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel (LETS) Completed: hardware for interfacing with computation nodes

37/48

slide-58
SLIDE 58

LETS Hardware Support: Pipelining Channels

Computation node Asynchronous 4-phase RZ Computation node Asynchronous 4-phase RZ Data encode

  • r

decode Data encode

  • r

decode

Asynchronous communication channel (LETS) Completed: hardware for interfacing with computation nodes Now focus on: improving performance of global communication → through pipelining

37/48

slide-59
SLIDE 59

LETS Pipeline: Improving Channel Throughput

◮ Support #1: MOUSETRAP-based design

Singh & Nowick, “MOUSETRAP: High-Speed Transition Signaling Asynchronous Pipelines”, TVLSI’07

  • Original MOUSETRAP pipeline

→ High-speed pipeline scheme for bundled-data encoding

  • Proposed design

→ Pipelines DI communication channel based on MOUSETRAP

→ Eliminates MOUSETRAP bundled-data timing requirements → only retains one simple 1-sided timing constraint

  • Simple hardware design

◮ Support #2: LEDR-based design

Dean et al., “Efficient Self-Timing with Level-Encoded 2-Phase Dual-Rail (LEDR)”,

  • Proc. of UCSC Conf. on Adv. Research in VLSI, ’91
  • Timing-robust approach, see paper for details

38/48

slide-60
SLIDE 60

1-of-4 LETS Pipeline: MOUSETRAP-based design

Stage N−1 Stage N Bank Control N+1 Stage

1−of−4 1−of−4 CD LETS 1−of−4 LETS CD LETS CD

Stage Register Stage Latch

1−of−4

Data Inputs

1−of−4

Data Outputs LETS LETS

D D D D Q Q Q Q Q D D D D Q Q Q D D D D Q Q Q Q

39/48

slide-61
SLIDE 61

1-of-4 LETS Pipeline: MOUSETRAP-based design

Stage N−1 Stage N Bank Control N+1 Stage

1−of−4 1−of−4 CD LETS 1−of−4 LETS CD LETS CD

Stage Register Stage Latch

1−of−4

Data Inputs

1−of−4

Data Outputs LETS LETS

D D D D Q Q Q Q Q D D D D Q Q Q D D D D Q Q Q Q

Latch control: → same as MOUSTRAP Completion detector: → replaced with 1-of-4 LETS CD

39/48

slide-62
SLIDE 62

Outline

◮ Introduction ◮ Background ◮ 1-of-4 LETS codes ◮ 1-of-N LETS codes ◮ Hardware support ◮ Analytical evaluation

  • Coding efficiency and transition power metric

◮ Conclusions

40/48

slide-63
SLIDE 63

Analytical Evaluation: Coding Efficiency (LETS vs. RZ)

1/10 1/5 3/10 2/5 1/2 3/5

RZ LETS bits/rails

1of N LETS vs. 1of N RZ

# of Rails

2 4 8 16 32 64 128 264

Coding Efficiency

1-of-N LETS vs. RZ codes ◮ Same coding efficiency

41/48

slide-64
SLIDE 64

Analytical Evaluation: Coding Efficiency (LETS vs. RZ)

1/10 1/5 3/10 2/5 1/2 3/5

RZ LETS bits/rails

1of N LETS vs. 1of N RZ

# of Rails

2 4 8 16 32 64 128 264

Coding Efficiency

1-of-N LETS vs. RZ codes ◮ Same coding efficiency Coding efficiency drops off after N>4

41/48

slide-65
SLIDE 65

Analytical Evaluation: Transition Power (LETS vs. RZ)

1/2 1 1 1/2 2 2 1/2

LETS RZ

wireflips/transaction

1of N LETS vs. 1ofN RZ

Transition Power

# of Rails 2 4 8 16 32 64 128 264

1-of-N LETS vs. RZ codes ◮ LETS uses less power

42/48

slide-66
SLIDE 66

Analytical Evaluation: Interpreting LETS Scaling

1/5 2/5 3/5 4/5 1 1 1/5 Transition Power Coding Efficiency

wireflips/transaction bits/rails

1ofN LETS

Transition Power and Coding Efficiency

# of Rails 2 4 8 16 32 64 128 264

43/48

slide-67
SLIDE 67

Analytical Evaluation: Interpreting LETS Scaling

1/5 2/5 3/5 4/5 1 1 1/5 Transition Power Coding Efficiency

wireflips/transaction bits/rails

1ofN LETS

Transition Power and Coding Efficiency

# of Rails 2 4 8 16 32 64 128 264

Trend: Power decreases as # of rails increase → but coding efficiency also decreases

43/48

slide-68
SLIDE 68

Analytical Evaluation: Interpreting LETS Scaling

1/5 2/5 3/5 4/5 1 1 1/5 Transition Power Coding Efficiency

wireflips/transaction bits/rails

1ofN LETS

Transition Power and Coding Efficiency

# of Rails 2 4 8 16 32 64 128 264

Trend: Power decreases as # of rails increase → but coding efficiency also decreases Sweet spot: going from LEDR to 1-of-4 LETS → halves the power, same coding efficiency

43/48

slide-69
SLIDE 69

Analytical Evaluation: LETS vs. Synchronous

◮ Coding efficiency (# bits encoded/wire)

  • Synchronous better than 1-of-N LETS

→ Synchronous: N bits for N wires → 1-of-N LETS: log N bits for N wires ◮ Transition power metric (# transitions/wire/data transaction)

  • 1-of-N LETS better than synchronous as N increases

→ Synchronous: constant

→ assumes equal probability of wire transition

→ 1-of-N LETS: decreases as N grows

→ = 1 / log N

→ Transition power metric same for N = 4

44/48

slide-70
SLIDE 70

Conclusions

◮ A new class of delay-insensitive codes “Level-Encoded Transition Signaling (LETS)”

  • High throughput, low power for global communication
  • Two example 1-of-4 LETS codes shown
  • Generalization to 1-of-N LETS

→ first 1-of-N level-encoded transition signaling scheme ◮ Efficient hardware

  • For protocol conversion to/from four-phase dual-rail signaling
  • For pipelining global communication channel

◮ Power and throughput improvements over existing codes

  • Demonstrated via analytical evaluation

45/48

slide-71
SLIDE 71

Future Work

◮ Better evaluation of performance/power metrics

  • Layout of proposed circuits
  • Evaluation of second-order effects

→ e.g. cross-coupling, noise, etc ◮ Extend conversion circuits to support other encoding styles

  • e.g. 1-of-4 RZ, single-rail bundled

46/48

slide-72
SLIDE 72

Appendix

47/48

slide-73
SLIDE 73

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp completion detection

LEDR Inputs arrive

Step 1: Two-phase inputs arrive

LEDR inputs begin arriving at quiescent system

48/48

slide-74
SLIDE 74

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Phase signal changes

Step 2: Two-to-four phase conversion

Input completion detection sent to control

48/48

slide-75
SLIDE 75

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Enable rises

Step 2: Two-to-four phase conversion

Control enables four-phase evaluate phase

48/48

slide-76
SLIDE 76

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Enable now high

Step 2: Two-to-four phase conversion

LEDR input converted to four-phase

48/48

slide-77
SLIDE 77

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Step 3: Four-phase evaluate

Four-phase function evaluation

48/48

slide-78
SLIDE 78

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

LEDR output generated

Step 4: Four-to-two phase conversion

Four-phase bits decoded to LEDR

48/48

slide-79
SLIDE 79

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Ack from right may arrive at any time after all pairs are sent

Step 4: Four-to-two phase conversion

LEDR output completion detection

48/48

slide-80
SLIDE 80

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Enable falls

Step 5: Four-phase reset

Control enables four-phase reset phase

48/48

slide-81
SLIDE 81

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Enable now low Pipeline concurrency: Request new data during reset

Step 5: Four-phase reset

Function block inputs return to zero

48/48

slide-82
SLIDE 82

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Complete falls

Step 5: Four-phase reset

Four-phase reset propagates through logic block

48/48

slide-83
SLIDE 83

LEDR Converter: System Simulation

four phase function block four phase encode four phase decode

data parity

LEDR CD

data parity

LEDR CD control logic

LEDR input LEDR output ack_left ack_right phase phase enb comp

Ready to evaluate again

New evaluate phase begins when enable rises again

48/48