Using Timing-Error Detection and Correction for Transient-Error - - PowerPoint PPT Presentation

using timing error detection and
SMART_READER_LITE
LIVE PREVIEW

Using Timing-Error Detection and Correction for Transient-Error - - PowerPoint PPT Presentation

A Power-Efficient 32b ARM ISA Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation David Bull 1 , Shidhartha Das 1 , Karthik Shivashankar 1 , Ganesh Dasika 2 , Krisztian Flautner 1 ,


slide-1
SLIDE 1

A Power-Efficient 32b ARM ISA Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation

David Bull1, Shidhartha Das1, Karthik Shivashankar1, Ganesh Dasika2, Krisztian Flautner1, David Blaauw2

1ARM Ltd., U.K. 2University of Michigan

slide-2
SLIDE 2

2

Design Margins

CLK

voltage process temp coupling jitter ageing safety Inter-die process variation Wear-out (BTI, TDDB, EM) Regulator Ripple Ambient temperature variation PLL jitter IR drop Ldi/dt Intra-die process variation Hot-spots Coupling noise Clock-tree jitter

SLOW-CHANGING FAST-CHANGING GLOBAL LOCAL STATIC

slide-3
SLIDE 3

3

Razor principles

Key idea: Exploit the dynamic nature of variations

  • Speculatively operate without full setup margin
  • Explicitly check for late-arriving signals
  • In the event of a timing error, invoke system recovery mechanism
  • Adapt VDD/CLK to target near-zero error-rate operation

Survive fast moving and transient changes Adapt to slower moving or static conditions

  • Capacitive coupling
  • Critical-path sensitization
  • Ldi/dt
  • Localized IR drop
  • PLL Jitter
  • Ageing
  • Process variation
  • Global or long-term IR drop
  • Low-frequency supply ripple
  • Temperature
slide-4
SLIDE 4

4

Razor-enabled energy-efficient ARM processor

UMC 65SP (High Performance) Process

  • 1V nominal VDD and 1.1V Overdrive

Implements a sub-set of ARM ISA

  • Critical-paths representative of ARM

industrial processor designs

87 die from split lots

  • 30FF/37TT/20SS

724MHz sign-off frequency

  • 0.9V/SS/125C

Adaptive Control Experiments

  • Adaptive Frequency Control - DFS
  • Adaptive Voltage Control - DVS

Adaptive F/V Control

External I/O

DRAM

IRAM Processor Core

slide-5
SLIDE 5

5

Outline

  • Motivation and Razor background
  • Transition-Detector circuit design
  • Micro-architecture design
  • Adaptive voltage and frequency scaling
  • Parametric yield improvement with Razor
  • Conclusion
slide-6
SLIDE 6

6

Transition-Detector Circuit Design

CK nCK CK nCK D Q DP ERN HRN ERROR Pulse-generators generate pulses out of transitions on D. Sticky Error history bit identifies failing FF for

  • ff-line diagnostics

Delay on CK defines CK pulse width Main Flip-Flop

slide-7
SLIDE 7

7

Transition-Detector Circuit Design

TD

D DP

TCK

CK nCK ERROR

Tov

CK nCK CK nCK DP ERROR D TCK TD

Earliest Detection

slide-8
SLIDE 8

8

D DP

Transition-Detector Circuit Design

TD

D DP

TCK

CK nCK

TD + TCK – 2TOV

Tov

ERROR

CK nCK CK nCK DP ERROR D TCK TD

Latest Detection

Error Detection Window = TD + TCK – 2TOV Tsu Pessimism

slide-9
SLIDE 9

9

Transition-Detector Circuit Design

D DP

TCK Tov

ERROR

CK nCK CK nCK DP ERROR D TCK TD

Minimum Delay

Min Delay Constraint = TCK – TOV

CK nCK

slide-10
SLIDE 10

10

Transition-Detector Comparison

Advantages

  • Reduced min-delay constraint
  • Operates with conventional 50% clocking
  • Simplifies integration with a conventional ASIC flow

Disadvantages

  • Flagging errors before actual failure occurs incurs

performance penalty

  • Additional transistors on the clock network

Trade-off setup pessimism for reduced min-delay

slide-11
SLIDE 11

11

Micro-architecture Design

Balanced pipe-stages with critical-endpoints at clock-gating, IRAM and DRAM inputs protected by Transition-Detectors

slide-12
SLIDE 12

12

Micro-architecture Design

  • ja jkds s

Stabilization stages allow sufficient time for Razor validation

  • f critical signals and synchronization overhead of ERROR
slide-13
SLIDE 13

13

Micro-architecture Design

Recovery occurs by replaying the pipeline from the last un- committed instruction at half-frequency

slide-14
SLIDE 14

14

Implementation Details

Flip-flops 2976 Flip-flops with TD 503 (17%) ICGs 149 ICGs with TD 27 TD for RAMs 20 TD Power Overhead 5.7% Power Overhead of Min-delay Buffers 1.3% Stabilization Stages Power Overhead 2.4% Total Power Overhead 8.4% Total Area Overhead @ 70% utilization 6.9% Measured Setup Pessimism of TD 5% @ 1GHz/1V IRAM and DRAM size 2KB

slide-15
SLIDE 15

15

Map of Failing Endpoints - #TT9 1V VDD

  • 4 TDs fail at 1.1GHz compared to 122 at 1.2GHz

Typical Workload Typical Workload

BBusEx[7] InstrDe[8] InstrDe[25] FlagsMe[2]

1.1GHz

TD with Errors TD without Errors

1.2GHz

slide-16
SLIDE 16

16

Comparing Different Workloads - #TT9 1V VDD

BBusEx[7] InstrDe[8] InstrDe[25] FlagsMe[2]

  • Significant variation in PoFF across workloads

1.1GHz

TD with Errors TD without Errors

1.1GHz Typical Workload Power Virus

slide-17
SLIDE 17

17

Frequency Tuning – Fixed 1V VDD #TT9

Frequency (MHz) NOP Power Virus Typical 1228MHz 1003MHz 1143MHz 1068MHz

Slow-down on every error Speedup on 1024 cycles without error

Time 14%

slide-18
SLIDE 18

18

Voltage Tuning – Fixed 1GHz Frequency #TT9

Voltage (V) TT9 Errors NOP Power Virus Typical 0.97V Time 1.07V TT9

slide-19
SLIDE 19

19

Voltage (V) TT9 Errors NOP Power Virus Typical 1.07V TT9 Time 1.1V (3% margin) 30% power saving 0.97V

Voltage Tuning – Fixed 1GHz Frequency #TT9

slide-20
SLIDE 20

20

SS/TT/FF Comparison – 1GHz Frequency

Voltage (V) NOP Power Virus Typical 1.17V 1.07V SS6 TT9 FF5 1.08V 0.97V 1.03V 0.92V Time

slide-21
SLIDE 21

21

Minimum Voltage – 1GHz Operation

3% margin

1.2V

Voltage (V) 1.17V 1.07V SS6 TT9 FF5 1.08V 0.97V 1.03V 0.92V Time NOP Power Virus Typical

slide-22
SLIDE 22

22

42mW 40mW 48mW 1.2V 1.2V 1.2V

Razor tuned VDD

1.2V vs Razor – Typical Workload

  • Tune voltage to zero margin point using Razor
  • SS6 part now consumes maximum power
  • Power outlier for distribution reduces from 100mW to

48mW with Razor for typical code

52% power saving

906mV 71mW 100mW 64mW 1.063V 964mV

slide-23
SLIDE 23

23

Power distribution at 1.2V vs Razor

  • Power distribution without Razor is wide
  • Razor improves both the m and the s of the distribution

Power Distribution

Razor OD (1.2V)

30mW (40%)

slide-24
SLIDE 24

24

Parametric Yield

With Razor 1GHz operation is possible at 1.1V

  • All code except pathological power virus runs below 1.1V

Without Razor 1GHz operation is only possible at 1.2V

  • Power virus code requires 1.2V for SS6
  • 1.2V exceeds 1.1V overdrive limit of the process
  • Excessive leakage and wear-out implications

Discarding fast/leaky parts and slow parts might be correct trade-off without Razor

  • Limit overdrive to 1.1V with parametric screening
slide-25
SLIDE 25

25

Parametric Yield – Native Distribution

Number of Chips Power at 1GHz (mW) Maximum Frequency at 1.1V (MHz) 87 devices at 1.1V

FF (30) TT (37) SS (20) FF5 SS6

slide-26
SLIDE 26

26

Parametric Yield – Power vs Frequency

Power at 1GHz (mW)

FF (30) TT (37) SS (20)

87 devices at 1.1V

1.1V OD

FF5 SS6

Maximum Frequency at 1.1V (MHz)

slide-27
SLIDE 27

27

Parametric Yield – Power vs Frequency

Power at 1GHz (mW)

FF (30) TT (37) SS (20)

87 devices at 1.1V

1.1V OD

FF5 SS6

Maximum Frequency at 1.1V (MHz) Power Limit Frequency Limit

slide-28
SLIDE 28

28

Parametric Yield – Prune Distribution

Power Limit Frequency Limit 1.1V OD Power at 1GHz (mW) Yielding Parts Maximum Frequency at 1.1V (MHz)

slide-29
SLIDE 29

29

Parametric Yield – Prune Distribution

1.1V OD Power at 1GHz (mW) >60mW (21) Yielding Parts = 28 out of 87 <1GHz (38) Maximum Frequency at 1.1V (MHz)

slide-30
SLIDE 30

30

Parametric Yield – Razor

1.1V OD Power at 1GHz (mW) >60mW (0) Yielding Parts = 87 out of 87 <1GHz (0) Razor Maximum Frequency at 1.1V (MHz)

slide-31
SLIDE 31

31

Parametric Yield – Razor

1.1V OD Power at 1GHz (mW) >60mW (0) Yielding Parts = 87 out of 87 <1GHz (0) Razor 20% power saving Maximum Frequency at 1.1V (MHz)

slide-32
SLIDE 32

32

Parametric Yield – 100% yield at 1.1V vs Razor

Power at 1GHz (mW) Yielding Parts = 87 out of 87 Razor 78mW 890MHz 38% power saving 14% Fmax gain 1.1V OD Maximum Frequency at 1.1V (MHz)

slide-33
SLIDE 33

33

Summary and Conclusion

  • Reclaim margins for gains in energy-efficiency and

parametric yield

  • Obtained 52% power saving at 1GHz operation on an

ARM prototype through Razor

  • Developed a new Transition-Detector design with

reduced min-delay impact

  • Demonstrated run-time adaptation to PVT variations and

tolerance to fast transients

  • Demonstrated potential for parametric yield

improvements using Razor

slide-34
SLIDE 34

34

Backup Slides

slide-35
SLIDE 35

35

Tracking Circuits

Multiple worst-case paths converge to the same end-point

  • 100 paths within 70ps (3%) of the critical-path to same endpoint
  • 377 unique instances and 119 unique cell masters covered by the paths
  • Extracted critical-path spice netlist has 9120 resistors, 2413 coupling

and ground capacitors and 1442 instances including aggressors Critical paths highlighted

Requires multiple tracking circuits for reasonable approximation

Alternatively, just 1 Razor flop at the end-point is sufficient

slide-36
SLIDE 36

36

Transition-Detector Timing Diagram

TD TOV

CK Pulse DP ERROR D

Min Delay Constraint = TCK - TOV TCK Error Detection Window = TCK + TD - 2TOV

Advantages

  • Reduced min-delay constraint
  • 50% duty-cycle clocking

Disadvantages

  • Setup pessimism
  • Extra clock transistors
slide-37
SLIDE 37

37

Parametric Yield – Yield Loss TT Lot

PASS

TT Chip PV PoFF

TT51 1.026 TT56 1.035 TT52 1.054 TT54 1.054 TT55 1.060 TT5 1.061 TT7 1.062 TT14 1.063 TT19 1.065 TT57 1.066 TT58 1.066 TT17 1.068 TT8 1.068 TT60 1.069 TT9 1.071 TT31 1.071 TT47 1.072 TT53 1.075 TT34 1.079 TT3 1.08 TT18 1.08 TT32 1.084 TT16 1.084 TT10 1.087 TT33 1.09 TT45 1.09 TT11 1.094

TT12 1.097

FAIL

TT Chip PV PoFF

TT30 1.102 TT40 1.107 TT15 1.11 TT26 1.110 TT26 1.114 TT2 1.122 TT27 1.126 TT59 1.128 TT28 1.144 TT13 1.168

slide-38
SLIDE 38

38

Voltage Controller Transition Response

Voltage Controller Output (V) TT9 Errors in 100 Samples Voltage Controller Output (V) TT9 Errors in 100 Samples NOP to Power Virus Transition Power Virus to Typical Transition

slide-39
SLIDE 39

39

Transition-Detector Timing Diagram

  • Cover setup time with sufficient margin

TCK

Earliest Detection

TD TOV

CK nCK D ERROR

TOV TD

Latest Detection

TD + TCK – 2TOV

Min Delay Constraint = TCK - TOV

TSU TMARGIN

slide-40
SLIDE 40

40

Throughput versus Frequency

Normalized Throughput Number of Failing TDs Frequency (MHz)

Signoff Frequency PoFF

Typical Workload

slide-41
SLIDE 41

41

Adaptive Frequency Control

  • 31-tap Ring Oscillator used as the clock source in adaptive

mode

  • Course selection through changing tap setting
  • Switched cap network for fine-grained frequency setting
  • Programmable control algorithm