Delay Insensitive Codes to Mitigate Single Event Effects Julian - - PowerPoint PPT Presentation

delay insensitive codes to
SMART_READER_LITE
LIVE PREVIEW

Delay Insensitive Codes to Mitigate Single Event Effects Julian - - PowerPoint PPT Presentation

Adding Temporal Redundancy to Delay Insensitive Codes to Mitigate Single Event Effects Julian Pontes (FACIN-PUCRS) Pascal Vivet (CEA-LETI) Ney Calazans (FACIN-PUCRS) FACIN-PUCRS(Brazil) & LETI-CEA (France) Motivation Advanced Tech Nodes


slide-1
SLIDE 1

Adding Temporal Redundancy to Delay Insensitive Codes to Mitigate Single Event Effects

Julian Pontes (FACIN-PUCRS) Pascal Vivet (CEA-LETI) Ney Calazans (FACIN-PUCRS)

FACIN-PUCRS(Brazil) & LETI-CEA (France)

slide-2
SLIDE 2

Motivation

  • Advanced Tech Nodes Constraints

– Signal Integrity and Process Variation

  • Solved at design time

– Soft Errors

  • Not treated in standard flow
  • Soft Errors in Asynchronous

– Timing Deviations

  • Almost immune except for forks

– A bit flip in control may stall handshake

ASYNC’12 Lyngby 2

slide-3
SLIDE 3

Our Objective

“Take advantage of m-of-n DI Codes to add temporal redundancy, allowing to detect and (eventually) correct soft errors”

ASYNC’12 Lyngby 3

slide-4
SLIDE 4

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 4

slide-5
SLIDE 5

Related Work

Asynchronous Design Hardening Techniques

  • Asynchronous x Synchronous (Asyncs are more robust!)

– Bastos et al. (Microeletronics Reliability-2010) – Rahbaran and Steininger (IEEE Trans. on Dep. & Sec. Comp.-2009)

  • Standard-cell level (Resizing to improve roibustness)

– Bastos et al. (IOLTS-2010)

  • Logic-level redundancy

– Jang and Martin (ASYNC-2005) (Double-check, spatial redundancy) – Monet, Renaudin and Leveugle (IOLTS-05 ) (High area overhead or improved filtering capability)

  • Pipeline level (Various design techniques against glitches)

– Bainbridge and Salisbury (ASYNC-2009) (no error correction, though)

  • New Delay Insensitive Codes (Hard to DI, due to validity det)

– Agyekum and Nowick (DATE-2011)

ASYNC’12 Lyngby 5

slide-6
SLIDE 6

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 6

slide-7
SLIDE 7

SEE Physical Impact

) ( ) (

/ /

e e

t t

I t I

   

 

  • Collection Time Constant of the Junction
  • Time Constant for Initially Establishing the Ion Track

* - IBM experiments in soft fails in computer electronics(1978-1994) – 1996

*

ASYNC’12 Lyngby 7

slide-8
SLIDE 8

SEE in QDI Pipelines

  • QDI logic is almost

immune to delay variations

– Except for isochronic forks

  • Bit flip may cause

– Stall in handshake protocol – Erroneous or invalid data

  • Final effect depends on

– Victim cell 

  • Mostly C-elements

– The 4-phase protocol step affected

  • To understand  deeper

look into C-element behavior

C C C C C C C C C C C C DI0 DI1 DI2 DI3 Ack In DO0 DO1 DO2 DO3 Ack Out Input Data Output Data

ASYNC’12 Lyngby 8

slide-9
SLIDE 9

SEE in C-elements

Charge to cause SEE (normalized to state 111) States  Charge  000 010 011 100 101 111 0.720 0.088 0.120 0.097 0.100 1.000

  • C-element driving a capacitance of 8.1fF
  • Single Event Transients

– States 000 and 111 are driving nodes

  • Single Event Upsets

– Floating Nodes (the rest) – much less charge required

C

000 100 010 110 111 101 011 001 SET SET SEU SEU

ASYNC’12 Lyngby 9

slide-10
SLIDE 10

SEE in QDI Pipelines

  • Detection based on C-

element trees are almost immune to soft errors

– The last C-element in the tree is dangerous

  • Protocol SEE and timing

analysis consider

– data link errors only

C C C C C C C C C C C C DI0 DI1 DI2 DI3 Ack In DO0 DO1 DO2 DO3 Ack Out Input Data Output Data

C C C

A0 A1 B0 B1 C0 C1 D0 D1 Valid Individual Detection Detection Tree

ASYNC’12 Lyngby 10

slide-11
SLIDE 11

1-of-n Pipeline

C C C C C C C C C C C C DI0 DI1 DI2 DI3 Ack In DO0 DO1 DO2 DO3 Ack Out

Data link always in an excited state

  • 1-bit distance between data and spacer

VCD = Valid Corrupted Data ICD = Invalid Corrupted Data ES = Early Spacer US = Unexpected Spacer UD = Unexpected Data Spacer Data VCD

SEU↑ Spacer Delay Ack Delay Data Delay Ack Delay SEU↑ SET↓ SET↑ SEU↓

ICD

  • r

ES UD VCD Timing Input Data Ack In Output Data Possible SEE Spacer ICD Or US

Data Delay SET↑ SEU↑

ASYNC’12 Lyngby 11

slide-12
SLIDE 12

m-of-n QDI Pipeline (m>1)

  • Encoding has SEE filtering

properties

  • Detection is more complex  2-of-3

example besides

  • Higher code density (for 1<m<(n-1))

C C C

A0 A1 A2 Individual Detection valid

Spacer Data ID ID* ID

SEU↑ Spacer Skew Best Case Spacer Delay Worst Case Data Delay Ack Delay Data Skew Best Case Data Delay Worst Case Spacer Delay Ack Delay SEU↑ SET↓ SEU↑ SET↓ SET↑ SEU↓ SET↑ SEU↓ SET↑

VCD ICD

  • r

ID ICD or ID ES ID Timing Input Data Ack Output Data Possible SEE

ASYNC’12 Lyngby 12

slide-13
SLIDE 13

SEE QDI Timing Analysis

  • Effect depends on the window where SEE

happens

  • Adding timing constraints may eliminate

possibility of Valid Corrupted Data (VCD), verifiable by STA

  • Stall probability depends on sender-receiver

performance relationship

ASYNC’12 Lyngby 13

slide-14
SLIDE 14

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 14

slide-15
SLIDE 15

TRDIC: Temporal Redundancy in DI Codes

  • Principle

– Convert1-of-n code into 2-of-(n+1) code

  • A more robust code

– Encode current data with previous data

  • It is as if we sent

every datum twice

– Double check & correction at the receiver side

  • Advantages

– Increase SEE robustness by adding redundancy – Preserve performance by keeping token throughput – Good for intrachip communication architectures

TRDIC Encoder TRDIC Decoder QDI Data Link 1-of-n 1-of-n 2-of-(n+1) 2-of-(n+1) Data Sender Data Receiver ASYNC’12 Lyngby 15

slide-16
SLIDE 16

TRDIC: Encoding Method

0001 0001 0010 0100 1000 00011 00110 01010 10001 00101 01001 01100 10010 10100 11000 0010 0001 0010 0100 1000 0100 0001 0010 0100 1000 1000 0001 0010 0100 1000

2-of-5 TRDIC Encoding Data[i] Data[i] Data[i-1] Data[i-1]

  • Conversion done simply by ORing of consecutive codewords
  • MSB of TRDIC indicates if consecutive codewords are equal (1) or not (0)

ASYNC’12 Lyngby 16

slide-17
SLIDE 17

TRDIC Converters

TRDIC Encoder TRDIC Decoder QDI Data Link 1-of-n 1-of-n 2-of-(n+1) 2-of-(n+1) Data Sender Data Receiver ASYNC’12 Lyngby 17

slide-18
SLIDE 18

TRDIC Double-Check Decoder

  • Can solve just Invalid Corrupted Data (ICD) errors (2-stage trellis)

– More common situation in 2-of-n codes

  • A more complex trellis-based decoder increases error detection and

correction capabilities

C C C C C

Data Expected Data Decoded Data

D0 D1 D2 D3 D4

ASYNC’12 Lyngby 18

  • Assume 0001 followed by 0010
  • Encoder outputs 10001 (assumed) and

next 00011

  • Decoder obtains 0001. Next data must

contain 00010. If not, error detected or corrected!

slide-19
SLIDE 19

TRDIC 3-stage Trellis Decoding

00011 00101 00110 01001 01010 01100 10001 10010 10100 11000 00011 00101 00110 01001 01010 01100 10001 10010 10100 11000 00011 00101 00110 01001 01010 01100 10001 10010 10100 11000

2 3 1 00110 01100 00011 1 3 2 1

ASYNC’12 Lyngby 19

slide-20
SLIDE 20

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 20

slide-21
SLIDE 21

SEE Validation Flow

[Pontes, Vivet , Calazans, DATE’12] An accurate SEE digital flow

  • Based on SEE Std. Cell characterization

– For all cells, including C-elements

  • Pipeline Timing Annotation include SEE

glitches & delays

  • Pipeline Attack using fault simulator
  • Using std-tools & formats (Verilog

netlist, SDF back-annotation, liberty .lib format)

ASYNC’12 Lyngby 21

slide-22
SLIDE 22

SEE Validation Environment

  • Design Flow

– Implementation using pseudo-synchronous technique (dummy rst clk) [Thonnart, Beigné, Vivet ASYNC’12] – SEE Characterization & Simulation Environment – Attack on pipeline components

Fault Generator SEE BUS[n:0] TestCase Data Checker

SEE Characterization Environment

C C C C

  • Study of various QDI pipelines

– 1-of-4 – 2-of-5 – 2-of-5 TRDIC (without encoder/decoder)

  • Technology

– STMicroelectronics, LP CMOS, 32nm

ASYNC’12 Lyngby 22

slide-23
SLIDE 23

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 23

slide-24
SLIDE 24

SEE Fault Simulation Results (1/2)

  • Failure x SEE Injection Rate

– SEE Injection Charge = 175fC

500 1000 1500 2000 2500 3000 3500 100 200 400 500 700 800 1000

Single Event Effect Interval (ns) Failures in Time (x1000 Failures/second)

1-of-4 2-of-5 TRDIC 2-of-5

ASYNC’12 Lyngby 24

slide-25
SLIDE 25

SEE Fault Simulation Results (2/2)

  • Failure x SEE Injection Charge

– SEE Rate = 5*106 SEEs/second

100 200 300 400 500 600 700 30 50 70 100 130 160 175 190 210 500 800 1000 1500 Injected Charge (fC) Failure in Time(x1000 Failures/second) 1-of-4 2-of-5 TRDIC 2-of-5

ASYNC’12 Lyngby 25

slide-26
SLIDE 26

Results (16 stages, 32-bit WCHB pipeline)

Asynchronous Cells Combinational Cells Total 1-of-4 1264/1919.7 482/1007.7 1746/2927.4 2-of-5 4080/6120.3 1280/2142.0 5226/8338.0 Leakage (μW) Dynamic (μW) Total (μW) 1-of-4 134.7 2578.8 2713.5 2-of-5 317.4 5335.4 5652.8 Code Maximum Throughput (Gbits/sec) Latency (ns) 1-of-4 40.80 1.2125 2-of-5 32.52 1.3685

Area Power Performance

ASYNC’12 Lyngby 26

slide-27
SLIDE 27

Completion Detection complexity

C C C C C C C C C C

OR

A0 A1 A2 A3 A4

2

A0 A1 A2 A3 A4

Completion detection for 2-of-5 What if we used complex gates? (like an NCL gate)

ASYNC’12 Lyngby 27

slide-28
SLIDE 28

Outline

  • Related Work
  • SEE in QDI Pipelines Analysis
  • TRDIC Proposal
  • SEE Validation - Flow and Environment
  • Results
  • Conclusions and Ongoing Work

ASYNC’12 Lyngby 28

slide-29
SLIDE 29

Conclusions and Ongoing Work

  • TRDIC: Temporal Redundancy for DI Code

– SEE filtering is provided by 2-of-n encoding – Temporal Redundancy allows multi-bit correction – Well adapted for QDI pipeline & communication architecture – Fully evaluated with a SEE fault simulator

  • Ongoing Work

– Complete the design of TRDIC Encoder/Decoder – Evaluation of different TRDIC pipeline implementations – Integration of TRDIC in Asynchronous Network-on-Chips

  • Hermes-A and Hermes-AA (PUCRS) [PATMOS’10], [SOCC’10]
  • ANoC (CEA-LETI) [ASYNC’05]

– Impact in Area and Power

  • Design of specific cells for 2-of-5 completion detection

ASYNC’12 Lyngby 29

slide-30
SLIDE 30

Thanks to:

  • CNPq (Brazilian Research Funding Agency)

ASYNC’12 Lyngby 30