Circuit Reliability: Mechanisms, Monitors, and Effects in - - PowerPoint PPT Presentation

circuit reliability mechanisms monitors and effects in
SMART_READER_LITE
LIVE PREVIEW

Circuit Reliability: Mechanisms, Monitors, and Effects in - - PowerPoint PPT Presentation

Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors Chris H. Kim University of Minnesota, Minneapolis, MN chriskim@umn.edu www.umn.edu/~chriskim/ Scaling Challenges 2000 2010 2020 Power wall Variability


slide-1
SLIDE 1

Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors

Chris H. Kim

University of Minnesota, Minneapolis, MN

chriskim@umn.edu www.umn.edu/~chriskim/

slide-2
SLIDE 2

2

Scaling Challenges

Power wall Reliability wall Variability wall 2000 2010 2020

Year Power (W)

slide-3
SLIDE 3

3

Overcoming the Power Wall

  • Proven solutions: Multi-core chips, dynamic voltage

frequency scaling, clock gating, power gating, …

Y=AxB Y=AxB Y=AxB Freq=1 Vdd=1 Throughput=1 Area=1 Power=1 Pwr Den=1 Freq=0.5 Vdd=0.5 Throughput=1 Area=2 Power=0.25 Pwr Den=0.125

87%↓

slide-4
SLIDE 4

4

Overcoming the Variability Wall

VID 6

+

  • DAC

+ -

PLimit VDie IIR PCalc Calc VConnector A/D A/D Power Supply Micro-Controller RPackage Package/Die

VConnector VDie RPackage

Intel Foxton Technology

  • Proven solutions: Variation aware design, memory

assist/repair, lithography techniques, adaptive systems

slide-5
SLIDE 5

5

Overcoming the Reliability Wall

  • Possible solutions: Guardbanding, sensing and

compensation, wear-leveling, failure resistant systems, …

slide-6
SLIDE 6

6

Outline

  • Device Reliability Issues
  • Reliability Monitors and Measurements
  • Reliability Effects in NTV Processors
  • Summary
slide-7
SLIDE 7

Aging in CMOS Transistors

7

slide-8
SLIDE 8

8

  • Transistors are exposed to different stress conditions

during normal digital circuit operation

HCI, BTI, and TDDB in Digital Logic

D

Inverted Channel

ID

Inverted Channel

slide-9
SLIDE 9

Practical Solutions for Preventing Aging Related Failures

  • BTI and HCI

– Gradual decline in performance – Guard banding (static or dynamic), adjust Vmax – CAD, firmware & architecture level support essential

  • TDDB

– Single incident may lead to outright system failure – Can happen anywhere inside a chip – Improve fabrication procedure, adjust Vmax

  • Bottom line: Precise measurement and

understanding of circuit degradation a key aspect of robust design

9

slide-10
SLIDE 10

Transistor Lifetime Estimation

  • Extrapolate stress results with respect to:

– Op. conditions based on acceleration models – Larger chip areas (e.g., Poisson scaling for TDDB) – Lower percentiles based on chosen distribution

10

real supply voltage

slide-11
SLIDE 11

Benefits of In-Situ Reliability Monitors over Device Probing

11

  • Information from actual circuits (test

circuit must be representative)

  • High (timing) precision + short

measurement interrupt

  • No expensive equipment
  • Short test time and reduced test area
  • Measurements at use condition 

allows realistic lifetime projection

  • Complements traditional probing

methods

slide-12
SLIDE 12

Usage Scenarios and Design Issues

  • f In-situ Reliability Monitors
  • Usage scenario 1: Process characterization and

yield improvement

  • Early technology characterization is often performed

before many metallization layers are being fabricated

  • Library cells may not be available (flip-flops, scan)
  • Device probing would still be a competitive solution

for extracting analog parameters such as I–V or C–V

  • Usage scenario 2: In-field monitoring and data

collection

  • Workload unknown
  • Simple circuits are practical but they have limited

capabilities

  • Firmware and architecture support needed

12

slide-13
SLIDE 13

Usage Scenarios and Design Issues

  • f In-situ Reliability Monitors
  • Usage scenario 3: Sensor for real time aging

compensation

  • Effectiveness versus overhead
  • Measurements are from a proxy circuit
  • Practical issues: type of sensor, temporal granularity,

spatial granularity, communication with sensors, interface and protocol

  • Personally not a big fan

13

slide-14
SLIDE 14

14

Outline

  • Device Reliability Issues
  • Monitors and Measurements
  • Effects in NTV Processors
  • Summary
slide-15
SLIDE 15

Circuit Based Reliability Monitors (or Silicon Odometers)

15 Die Photo Odometer Projects Original Silicon Odometer Focused Reliability Issues Year 2007 2008 2009 2010 2011 130nm 65nm 65nm 65nm 32nmSOI All-In-One Odometer Process Statistical, Duty-Cycle, and RTN Odometer Interconnect Odometer PBTI and SRAM Odometer 32nmSOI SRAM and RTN Odometer 2012 NBTI Induced Frequency Degradation Separately Monitoring NBTI, HCI and TDDB Statistical Behavior of NBTI; RTN on Logic Circuit Impact of Interconnect

  • n BTI and

HCI Aging Monitoring PBTI in HKMG Process; BTI Impact on SRAM Read/ Write SRAM Timing Issues Due to BTI; RTN Impact

  • n Ring

Oscillator

slide-16
SLIDE 16

Beat Frequency Silicon Odometer

  • Beat frequency of two free running ROSCs

measured by DFF and edge detector

  • Benefits of beat frequency detection system

– Achieve ps resolution with μs measurement interrupt – Insensitive to common mode noise such as temperature drifts – Fully digital, scan based interface, easy to implement

16

slide-17
SLIDE 17

Beat Frequency Silicon Odometer

ref stress

beat ref stress stress ref

  • Sample stressed ROSC output with reference ROSC

– 1% frequency difference before stress  N=100 – 2% frequency difference after stress  N=50 – Δf or ΔT sensing resolution is >0.01%

17

slide-18
SLIDE 18

18

ROSC Based Aging Sensor Comparison

Block Diagram Function

Count Stress ROSC periods during externally controlled meas. time Count Stress ROSC periods during N1 periods

  • f Ref. ROSC

Count Ref. ROSC periods during one period of PC_OUT

Features

Simple; compact Simple; immune to common mode variations High resolution w/ short

  • meas. time; immune to

common mode variations

Issues

Voltage and temp. varations; meas. time vs. resolution tradeoff; requires absolute timing reference (e.g. oscilloscope)

  • Meas. time vs. resolution

tradeoff Requires extra circuits (e.g., Phase Comp., edge detector, etc...)

  • Meas. time

for 0.01% max res. * 30 μs 30 μs

  • Meas. error
  • wrt. common

mode variations ** +10.18% / -8.57% +0.06% / -0.07% +0.26% / -0.38% *ROSC period = 3 ns ** simulated with +/- 4% ∆VCC 0.3 μs

System Single ROSC 2 ROSC, simple 2 ROSC, beat freq.

slide-19
SLIDE 19

Separately Monitoring NBTI and PBTI

  • PBTI becoming an important concern in

high-k metal-gate

  • Conventional Ring Oscillator (ROSC)

can only provide overall frequency degradation information due to combined NBTI and PBTI effects

  • New RO structure separates NBTI and

PBTI effects

19

PBTI stress NBTI stress N/PBTI stress

  • J. Kim, et al., IBM, IRPS 2011
slide-20
SLIDE 20

Separately Monitoring BTI and HCI

20

BTI_ROSC (BTI Stress Only) DRIVE_ROSC (BTI & HCI Stress) BTI_REF_ROSC DRIVE_REF_ROSC Beat Frequency Detection Circuit 1

SCAN OUT

STRESSED UNSTRESSED

SCAN OUT

Beat Frequency Detection Circuit 2 BTI_ROSC DRIVE_ROSC

slide-21
SLIDE 21

Separately Monitoring BTI and HCI

  • Backdriving action equalizes BTI in both BTI_ROSC and

DRIVE_ROSC

  • Negligible HCI in BTI_ROSC: only 3-5% of the switching

current in the DRIVE_ROSC

  • Fresh power gates are used for frequency measurements

21

slide-22
SLIDE 22
  • Temp. and Voltage Dependencies
  • HCI slightly reduced with temperature

– Due to reduced drain current

  • Both mechanisms degrade with stress voltage

– Point when HCI begins to dominate pushed out in time by >1 order of magnitude at 1.8V vs. 2.4V

1.E-02 1.E-01 1.E+00 1.E+0 1.E+1 1.E+2 1.E+3 1.E+4 1.E+5

Frequency Shift (%) Stress Time (s)

250MHz stress freq. 2.0V stress

30OC: HCIDEG , BTIDEG 120OC: HCIDEG , BTIDEG

1.E-02 1.E-01 1.E+00 1.E+01 1.E+00 1.E+02 1.E+04 1.E+06

Frequency Shift (%) Stress Time (s)

26OC 470MHz stress freq.

2.4V stress 1.8V stress

22

slide-23
SLIDE 23

Aging Issues in Interconnects

  • Interconnect affects the voltage and current shapes

– Increased transition time (decreased slew rate) – Increased current pulse; decreased current peak value

  • BTI and HCI have different sensitivities to bias

conditions

23

slide-24
SLIDE 24

Interconnect Aging Monitor

24

  • Serpentine wires for a dense chip implementation
  • Ground shielding on both sides for reducing noise
  • X. Wang, et al., IRPS 2012, TVLSI 2014
slide-25
SLIDE 25
  • BTI aging decreases with interconnect length
  • HCI degradation peaks at L=500µm

BTI and HCI Aging: With Interconnect

25

slide-26
SLIDE 26

BTI Aging vs. Interconnect Length

  • BTI induced frequency degradation decreases with

longer interconnect

  • Longer transition time  shorter PMOS stress

duration  Less BTI aging

26

slide-27
SLIDE 27

HCI Aging vs. Interconnect Length

  • HCI aging exhibits a non-monotonic behavior with

respect to interconnect length

– Current pulse width increases – Current peak decreases

27

slide-28
SLIDE 28

Statistical Behavior of Aging

  • Finite number and random spatial distribution of

discrete charges NBTI & HCI variation

  • Inversely proportional to AGATE worse with scaling
  • Small number of aging measurements not sufficient to

characterize aging

28

Spread in ∆Vt increases with scaling CDF of ∆Vt at different stress times

  • S. Pae, et al., TDMR‘08 S. Rauch, TDMR, Dec. ‘07
slide-29
SLIDE 29

Statistical Reliability Monitor

  • Need stressed &

reference ROSC frequencies to be close

  • Difficult, costly to

tune each stressed ROSC

  • Use multiple ref.

ROSCs with different frequencies

  • Cover the frequency

distribution of the stressed array

SCANOUT RESULTS

FSM + Scan Chain Column Peripherals

Ref ROSC 3 Ref ROSC 1 Ref ROSC 2

3 Silicon Odometer Beat Frequency Detection Systems

  • J. Keane, et al., IEDM 2010, JSSC 2011

29

slide-30
SLIDE 30

65nm Test Chip Data

  • Fresh and post-stress ROSC frequency PDFs
  • No significant correlation of the frequency shift

with fresh frequency

Percentage of ROSCs

245 254 263 272

DUT Frequency (MHz)

Fresh 3.1hr Stress 1.8V 2.0V 2.2V

30

0.011

  • 0.126
  • 0.112

20OC, DC; 2.2V, 2.0V, & 1.8V 120 ROSCs each @ 11200s

  • Correl. Coef.
slide-31
SLIDE 31

SRAM Memory Design Challenges at Low Supply Voltages

  • Ratio-ed operation leads to poor noise margin at

low voltages for 6T SRAM cells

  • Conflicting requirements: a stronger access

transistor improves write margin but worsens read margin

BLB BL

31

slide-32
SLIDE 32
  • With BTI: Read stability

degrades

  • Cell recovers on a fail

32

Impact of BTI on SRAM Read

Bit Failure Rate (BFR, %)

slide-33
SLIDE 33

33

Impact of BTI on SRAM Write

  • With BTI: Write stability

improves or remains unchanged

  • Cell recovers on a pass

Bit Failure Rate (BFR, %)

slide-34
SLIDE 34

34

Representative SRAM Reliability Macro

256x128b SRAM array Decoder FSM VMEAS, VSTRESS 128b scan reg.

  • Col. peripheral

VCO Peripheral supply Off-chip DAQ CLK Array supply domain Peripheral supply domain WL BL Supply switches

  • Represents a product SRAM sub-array
  • BIST function done by on-chip FSM with supply switches
  • P. Jain, et al., IEDM, 2012

0.1 1 100

Read Failure Rate (%) TSTRESS (s)

1 10 100 1000 10

TMEAS increased from 3µs to 2ms

32nm, 0.52V, 85°C

10x

slide-35
SLIDE 35

35

  • Implemented on IBM’s z196 Enterprise

systems for long term degradation under real-use conditions.

  • Over 500 days worth of ring oscillator

degradation data from customer systems

  • Other companies have aging monitors too,

but they tend not to publish their work

Aging Monitor in IBM Microprocessors

Pongfei Flu, Keith Jenkins, IBM, IRPS 2013

slide-36
SLIDE 36

36

  • Time-zero problem: Some time will elapse between applying voltage

(burn-in, test, operation) and making the first measurement  time-zero frequency is completely unknown  incorrect time slope of 0.42

  • Use fitting parameters assuming Δf = A(t-to)n-Atn  time slope of 0.172

Aging Monitor in IBM Microprocessors

Pongfei Flu, Keith Jenkins, IBM, IRPS 2013

slide-37
SLIDE 37

37

Design Considerations Examples of Practical Issues

BTI, HCI, TDDB, RTN, transient errors, memory bit failures, etc. Type of Sensor Temporal Granularity Sensing period, threshold setting, dynamic range, etc. Spatial Granularity Per CPU/GPU/memory, per functional unit, per sub-block, etc. Stress and Measurement Condition AC vs. DC, accelerated vs. usage condition, fast measurement Communication Between data gathering sensor, across sensors, between sensors and processor Interface and Protocol Interrupt based, polling, event alarms, performance counter based, etc.

Aging Sensor Implementation in IBM z196 Server [3]

Ring Oscillator based BTI monitor for long- term frequency degradation measurement Sampling period: once a week Total: 5 sensors per chip; One sensor per core (x4 cores) plus one sensor in L2 cache AC stress, usage condition, 0.5ms measurement time Sensors are integrated with IBM z196 pervasive infrastructure with firmware support Interrupt based in-field frequency degradation measurement Testing and Calibration Similar to any other on-chip monitor circuit Time 0 frequency shift unknown since first sample is taken after some stress

Aging Monitor in IBM Microprocessors

Pongfei Flu, Keith Jenkins, IBM, IRPS 2013

slide-38
SLIDE 38

Outline

  • Device Reliability Issues
  • Monitors and Measurements
  • Effects in NTV Processors
  • Summary

38

slide-39
SLIDE 39

DVFS Systems in ISSCC 2014

22nm Intel Haswell processor

  • N. Kurd, et al., ISSCC, 2014
  • Latest trends: On-chip distributed VRM (fast transients,

supply noise suppression), per-core DVS, NTV/Turbo

22nm IBM POWER8 processor

  • Z. Toprak-Deniz, et al., ISSCC, 2014

<1% area

  • verhead

39

slide-40
SLIDE 40

40

Frequency Fluctuation in DVFS (BTI Example)

Time VDD Frequency Time VDD Frequency Time VDD Frequency

  • Constant VDD: Frequency degrades with stress
  • High VDD to low VDD: Freq. dips due to lower VDD followed

by recovery

  • Low VDD to high VDD: Freq. jumps and then degrades
  • Freq. dip
  • Freq. peak
slide-41
SLIDE 41

41

Time VDD Frequency Time VDD Frequency Time VDD Frequency fCLK

Guardband

fCLK

Guardband

fCLK

Guardband

  • Constant VDD: Frequency degrades with stress
  • High VDD to low VDD: Freq. dips due to lower VDD followed

by recovery

  • Low VDD to high VDD: Freq. jumps and then degrades

Frequency Fluctuation in DVFS (BTI Example)

slide-42
SLIDE 42

42

Modeling Approach using Superposition

  • Rationale for empirical

superposition method

– Complicated VDD trace can be broken down into multiple pulses – Suitable for long-and short-term, DC and AC – Computation is more efficient, short runtime

Δf(%) ΔVT ΔVT3 ΔVT2 ΔVT1 VDD V1 V2 V3 V1 ΔVT1 V2 ΔVT2 V3 ΔVT3 ΔVT=ΔVT1+ΔVT2+ΔVT3 Time (a.u.) Not captured in previous work Time (a.u.)

  • C. Zhou, et al., IRPS 2014
slide-43
SLIDE 43

43

BTI Recovery Model using Superposition

  • Stress model: tn (power law)
  • Recovery model derived from superposition property:

ΔVT,recovery(t) = tn-(t-t0)n

slide-44
SLIDE 44

44

Translating VT Shift to Delay Shift

  • ROSC mimics logic path
  • Translate ΔVT to pull-up,

pull-down delay

Pull-down Delay Pull-up Delay

0.6 0.7 0.8 0.9 1.0 0.005 0.010 0.015 0.020 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Dealy (normalized) ΔVT (normalized) VDD (normalized)

0.6 0.7 0.8 0.9 1 0.005 0.010 0.015 0.020 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

HSPICE, TT, 60°C Dealy (normalized) ΔVT (normalized) VDD (normalized) HSPICE, TT, 60°C

slide-45
SLIDE 45

45

Android Development Board for Collecting DVFS Traces

  • VDD and operating frequency collected in real time
  • Navigating websites, running benchmark applications

Linux kernel v 3.4.5 Operating system Android v 4.2.2 Processor ARM Cortex A15 System Samsung Exynos 5410 SoC Frequency 0.8 – 1.8 GHz Voltage 0.9 – 1.25 V DVFS meas. National Instr. DAQ Sampling frequency 1000 samples per second Process 28nm

slide-46
SLIDE 46

46

Sample Waveform and Estimated Frequency Shift

1 2 3 4 5 6

Time (s)

  • 0.8
  • 0.6
  • 0.4
  • 0.2

Frequency Shift (%)

  • 1.0
  • 0.90%

high VDD stress low VDD recovery

Amazon.com

0.6 0.8 1 1.2

VDD (normalized)

  • High VDD duration: Freq. degrades with time
  • Low VDD duration: Freq. shift dips and then recovers
slide-47
SLIDE 47

47

Applying Model to Other DVFS Traces

  • Worst case frequency dip

– 3D-raytrace: Δf=1.0% at t=6s when VDD drops by 29% after staying in high VDD mode for 5.8s

Sina.com Google.com NYTimes.com Amazon.com

slide-48
SLIDE 48

48

Summary

  • Power wall (2000)  Variability wall (2010) 

Reliability wall (2020)

  • Example: NTV + RDF + BTI
  • Aging sensor deployed for the first time in a

commercial processor (IBM z systems)

  • Per-Core DVFS with sub-microsecond ramp time

becoming a standard feature in new processors

  • Turbo boost + NTV: Best of both worlds in terms of

power and performance, but presents new reliability challenges