Circuit Reliability: Mechanisms, Monitors, and Effects in - - PowerPoint PPT Presentation
Circuit Reliability: Mechanisms, Monitors, and Effects in - - PowerPoint PPT Presentation
Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors Chris H. Kim University of Minnesota, Minneapolis, MN chriskim@umn.edu www.umn.edu/~chriskim/ Scaling Challenges 2000 2010 2020 Power wall Variability
2
Scaling Challenges
Power wall Reliability wall Variability wall 2000 2010 2020
Year Power (W)
3
Overcoming the Power Wall
- Proven solutions: Multi-core chips, dynamic voltage
frequency scaling, clock gating, power gating, …
Y=AxB Y=AxB Y=AxB Freq=1 Vdd=1 Throughput=1 Area=1 Power=1 Pwr Den=1 Freq=0.5 Vdd=0.5 Throughput=1 Area=2 Power=0.25 Pwr Den=0.125
87%↓
4
Overcoming the Variability Wall
VID 6
+
- DAC
+ -
PLimit VDie IIR PCalc Calc VConnector A/D A/D Power Supply Micro-Controller RPackage Package/Die
VConnector VDie RPackage
Intel Foxton Technology
- Proven solutions: Variation aware design, memory
assist/repair, lithography techniques, adaptive systems
5
Overcoming the Reliability Wall
- Possible solutions: Guardbanding, sensing and
compensation, wear-leveling, failure resistant systems, …
6
Outline
- Device Reliability Issues
- Reliability Monitors and Measurements
- Reliability Effects in NTV Processors
- Summary
Aging in CMOS Transistors
7
8
- Transistors are exposed to different stress conditions
during normal digital circuit operation
HCI, BTI, and TDDB in Digital Logic
D
Inverted Channel
ID
Inverted Channel
Practical Solutions for Preventing Aging Related Failures
- BTI and HCI
– Gradual decline in performance – Guard banding (static or dynamic), adjust Vmax – CAD, firmware & architecture level support essential
- TDDB
– Single incident may lead to outright system failure – Can happen anywhere inside a chip – Improve fabrication procedure, adjust Vmax
- Bottom line: Precise measurement and
understanding of circuit degradation a key aspect of robust design
9
Transistor Lifetime Estimation
- Extrapolate stress results with respect to:
– Op. conditions based on acceleration models – Larger chip areas (e.g., Poisson scaling for TDDB) – Lower percentiles based on chosen distribution
10
real supply voltage
Benefits of In-Situ Reliability Monitors over Device Probing
11
- Information from actual circuits (test
circuit must be representative)
- High (timing) precision + short
measurement interrupt
- No expensive equipment
- Short test time and reduced test area
- Measurements at use condition
allows realistic lifetime projection
- Complements traditional probing
methods
Usage Scenarios and Design Issues
- f In-situ Reliability Monitors
- Usage scenario 1: Process characterization and
yield improvement
- Early technology characterization is often performed
before many metallization layers are being fabricated
- Library cells may not be available (flip-flops, scan)
- Device probing would still be a competitive solution
for extracting analog parameters such as I–V or C–V
- Usage scenario 2: In-field monitoring and data
collection
- Workload unknown
- Simple circuits are practical but they have limited
capabilities
- Firmware and architecture support needed
12
Usage Scenarios and Design Issues
- f In-situ Reliability Monitors
- Usage scenario 3: Sensor for real time aging
compensation
- Effectiveness versus overhead
- Measurements are from a proxy circuit
- Practical issues: type of sensor, temporal granularity,
spatial granularity, communication with sensors, interface and protocol
- Personally not a big fan
13
14
Outline
- Device Reliability Issues
- Monitors and Measurements
- Effects in NTV Processors
- Summary
Circuit Based Reliability Monitors (or Silicon Odometers)
15 Die Photo Odometer Projects Original Silicon Odometer Focused Reliability Issues Year 2007 2008 2009 2010 2011 130nm 65nm 65nm 65nm 32nmSOI All-In-One Odometer Process Statistical, Duty-Cycle, and RTN Odometer Interconnect Odometer PBTI and SRAM Odometer 32nmSOI SRAM and RTN Odometer 2012 NBTI Induced Frequency Degradation Separately Monitoring NBTI, HCI and TDDB Statistical Behavior of NBTI; RTN on Logic Circuit Impact of Interconnect
- n BTI and
HCI Aging Monitoring PBTI in HKMG Process; BTI Impact on SRAM Read/ Write SRAM Timing Issues Due to BTI; RTN Impact
- n Ring
Oscillator
Beat Frequency Silicon Odometer
- Beat frequency of two free running ROSCs
measured by DFF and edge detector
- Benefits of beat frequency detection system
– Achieve ps resolution with μs measurement interrupt – Insensitive to common mode noise such as temperature drifts – Fully digital, scan based interface, easy to implement
16
Beat Frequency Silicon Odometer
ref stress
beat ref stress stress ref
- Sample stressed ROSC output with reference ROSC
– 1% frequency difference before stress N=100 – 2% frequency difference after stress N=50 – Δf or ΔT sensing resolution is >0.01%
17
18
ROSC Based Aging Sensor Comparison
Block Diagram Function
Count Stress ROSC periods during externally controlled meas. time Count Stress ROSC periods during N1 periods
- f Ref. ROSC
Count Ref. ROSC periods during one period of PC_OUT
Features
Simple; compact Simple; immune to common mode variations High resolution w/ short
- meas. time; immune to
common mode variations
Issues
Voltage and temp. varations; meas. time vs. resolution tradeoff; requires absolute timing reference (e.g. oscilloscope)
- Meas. time vs. resolution
tradeoff Requires extra circuits (e.g., Phase Comp., edge detector, etc...)
- Meas. time
for 0.01% max res. * 30 μs 30 μs
- Meas. error
- wrt. common
mode variations ** +10.18% / -8.57% +0.06% / -0.07% +0.26% / -0.38% *ROSC period = 3 ns ** simulated with +/- 4% ∆VCC 0.3 μs
System Single ROSC 2 ROSC, simple 2 ROSC, beat freq.
Separately Monitoring NBTI and PBTI
- PBTI becoming an important concern in
high-k metal-gate
- Conventional Ring Oscillator (ROSC)
can only provide overall frequency degradation information due to combined NBTI and PBTI effects
- New RO structure separates NBTI and
PBTI effects
19
PBTI stress NBTI stress N/PBTI stress
- J. Kim, et al., IBM, IRPS 2011
Separately Monitoring BTI and HCI
20
BTI_ROSC (BTI Stress Only) DRIVE_ROSC (BTI & HCI Stress) BTI_REF_ROSC DRIVE_REF_ROSC Beat Frequency Detection Circuit 1
SCAN OUT
STRESSED UNSTRESSED
SCAN OUT
Beat Frequency Detection Circuit 2 BTI_ROSC DRIVE_ROSC
Separately Monitoring BTI and HCI
- Backdriving action equalizes BTI in both BTI_ROSC and
DRIVE_ROSC
- Negligible HCI in BTI_ROSC: only 3-5% of the switching
current in the DRIVE_ROSC
- Fresh power gates are used for frequency measurements
21
- Temp. and Voltage Dependencies
- HCI slightly reduced with temperature
– Due to reduced drain current
- Both mechanisms degrade with stress voltage
– Point when HCI begins to dominate pushed out in time by >1 order of magnitude at 1.8V vs. 2.4V
1.E-02 1.E-01 1.E+00 1.E+0 1.E+1 1.E+2 1.E+3 1.E+4 1.E+5
Frequency Shift (%) Stress Time (s)
250MHz stress freq. 2.0V stress
30OC: HCIDEG , BTIDEG 120OC: HCIDEG , BTIDEG
1.E-02 1.E-01 1.E+00 1.E+01 1.E+00 1.E+02 1.E+04 1.E+06
Frequency Shift (%) Stress Time (s)
26OC 470MHz stress freq.
2.4V stress 1.8V stress
22
Aging Issues in Interconnects
- Interconnect affects the voltage and current shapes
– Increased transition time (decreased slew rate) – Increased current pulse; decreased current peak value
- BTI and HCI have different sensitivities to bias
conditions
23
Interconnect Aging Monitor
24
- Serpentine wires for a dense chip implementation
- Ground shielding on both sides for reducing noise
- X. Wang, et al., IRPS 2012, TVLSI 2014
- BTI aging decreases with interconnect length
- HCI degradation peaks at L=500µm
BTI and HCI Aging: With Interconnect
25
BTI Aging vs. Interconnect Length
- BTI induced frequency degradation decreases with
longer interconnect
- Longer transition time shorter PMOS stress
duration Less BTI aging
26
HCI Aging vs. Interconnect Length
- HCI aging exhibits a non-monotonic behavior with
respect to interconnect length
– Current pulse width increases – Current peak decreases
27
Statistical Behavior of Aging
- Finite number and random spatial distribution of
discrete charges NBTI & HCI variation
- Inversely proportional to AGATE worse with scaling
- Small number of aging measurements not sufficient to
characterize aging
28
Spread in ∆Vt increases with scaling CDF of ∆Vt at different stress times
- S. Pae, et al., TDMR‘08 S. Rauch, TDMR, Dec. ‘07
Statistical Reliability Monitor
- Need stressed &
reference ROSC frequencies to be close
- Difficult, costly to
tune each stressed ROSC
- Use multiple ref.
ROSCs with different frequencies
- Cover the frequency
distribution of the stressed array
SCANOUT RESULTS
FSM + Scan Chain Column Peripherals
Ref ROSC 3 Ref ROSC 1 Ref ROSC 2
3 Silicon Odometer Beat Frequency Detection Systems
- J. Keane, et al., IEDM 2010, JSSC 2011
29
65nm Test Chip Data
- Fresh and post-stress ROSC frequency PDFs
- No significant correlation of the frequency shift
with fresh frequency
Percentage of ROSCs
245 254 263 272
DUT Frequency (MHz)
Fresh 3.1hr Stress 1.8V 2.0V 2.2V
30
0.011
- 0.126
- 0.112
20OC, DC; 2.2V, 2.0V, & 1.8V 120 ROSCs each @ 11200s
- Correl. Coef.
SRAM Memory Design Challenges at Low Supply Voltages
- Ratio-ed operation leads to poor noise margin at
low voltages for 6T SRAM cells
- Conflicting requirements: a stronger access
transistor improves write margin but worsens read margin
BLB BL
31
- With BTI: Read stability
degrades
- Cell recovers on a fail
32
Impact of BTI on SRAM Read
Bit Failure Rate (BFR, %)
33
Impact of BTI on SRAM Write
- With BTI: Write stability
improves or remains unchanged
- Cell recovers on a pass
Bit Failure Rate (BFR, %)
34
Representative SRAM Reliability Macro
256x128b SRAM array Decoder FSM VMEAS, VSTRESS 128b scan reg.
- Col. peripheral
VCO Peripheral supply Off-chip DAQ CLK Array supply domain Peripheral supply domain WL BL Supply switches
- Represents a product SRAM sub-array
- BIST function done by on-chip FSM with supply switches
- P. Jain, et al., IEDM, 2012
0.1 1 100
Read Failure Rate (%) TSTRESS (s)
1 10 100 1000 10
TMEAS increased from 3µs to 2ms
32nm, 0.52V, 85°C
10x
35
- Implemented on IBM’s z196 Enterprise
systems for long term degradation under real-use conditions.
- Over 500 days worth of ring oscillator
degradation data from customer systems
- Other companies have aging monitors too,
but they tend not to publish their work
Aging Monitor in IBM Microprocessors
Pongfei Flu, Keith Jenkins, IBM, IRPS 2013
36
- Time-zero problem: Some time will elapse between applying voltage
(burn-in, test, operation) and making the first measurement time-zero frequency is completely unknown incorrect time slope of 0.42
- Use fitting parameters assuming Δf = A(t-to)n-Atn time slope of 0.172
Aging Monitor in IBM Microprocessors
Pongfei Flu, Keith Jenkins, IBM, IRPS 2013
37
Design Considerations Examples of Practical Issues
BTI, HCI, TDDB, RTN, transient errors, memory bit failures, etc. Type of Sensor Temporal Granularity Sensing period, threshold setting, dynamic range, etc. Spatial Granularity Per CPU/GPU/memory, per functional unit, per sub-block, etc. Stress and Measurement Condition AC vs. DC, accelerated vs. usage condition, fast measurement Communication Between data gathering sensor, across sensors, between sensors and processor Interface and Protocol Interrupt based, polling, event alarms, performance counter based, etc.
Aging Sensor Implementation in IBM z196 Server [3]
Ring Oscillator based BTI monitor for long- term frequency degradation measurement Sampling period: once a week Total: 5 sensors per chip; One sensor per core (x4 cores) plus one sensor in L2 cache AC stress, usage condition, 0.5ms measurement time Sensors are integrated with IBM z196 pervasive infrastructure with firmware support Interrupt based in-field frequency degradation measurement Testing and Calibration Similar to any other on-chip monitor circuit Time 0 frequency shift unknown since first sample is taken after some stress
Aging Monitor in IBM Microprocessors
Pongfei Flu, Keith Jenkins, IBM, IRPS 2013
Outline
- Device Reliability Issues
- Monitors and Measurements
- Effects in NTV Processors
- Summary
38
DVFS Systems in ISSCC 2014
22nm Intel Haswell processor
- N. Kurd, et al., ISSCC, 2014
- Latest trends: On-chip distributed VRM (fast transients,
supply noise suppression), per-core DVS, NTV/Turbo
22nm IBM POWER8 processor
- Z. Toprak-Deniz, et al., ISSCC, 2014
<1% area
- verhead
39
40
Frequency Fluctuation in DVFS (BTI Example)
Time VDD Frequency Time VDD Frequency Time VDD Frequency
- Constant VDD: Frequency degrades with stress
- High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery
- Low VDD to high VDD: Freq. jumps and then degrades
- Freq. dip
- Freq. peak
41
Time VDD Frequency Time VDD Frequency Time VDD Frequency fCLK
Guardband
fCLK
Guardband
fCLK
Guardband
- Constant VDD: Frequency degrades with stress
- High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery
- Low VDD to high VDD: Freq. jumps and then degrades
Frequency Fluctuation in DVFS (BTI Example)
42
Modeling Approach using Superposition
- Rationale for empirical
superposition method
– Complicated VDD trace can be broken down into multiple pulses – Suitable for long-and short-term, DC and AC – Computation is more efficient, short runtime
Δf(%) ΔVT ΔVT3 ΔVT2 ΔVT1 VDD V1 V2 V3 V1 ΔVT1 V2 ΔVT2 V3 ΔVT3 ΔVT=ΔVT1+ΔVT2+ΔVT3 Time (a.u.) Not captured in previous work Time (a.u.)
- C. Zhou, et al., IRPS 2014
43
BTI Recovery Model using Superposition
- Stress model: tn (power law)
- Recovery model derived from superposition property:
ΔVT,recovery(t) = tn-(t-t0)n
44
Translating VT Shift to Delay Shift
- ROSC mimics logic path
- Translate ΔVT to pull-up,
pull-down delay
Pull-down Delay Pull-up Delay
0.6 0.7 0.8 0.9 1.0 0.005 0.010 0.015 0.020 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Dealy (normalized) ΔVT (normalized) VDD (normalized)
0.6 0.7 0.8 0.9 1 0.005 0.010 0.015 0.020 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
HSPICE, TT, 60°C Dealy (normalized) ΔVT (normalized) VDD (normalized) HSPICE, TT, 60°C
45
Android Development Board for Collecting DVFS Traces
- VDD and operating frequency collected in real time
- Navigating websites, running benchmark applications
Linux kernel v 3.4.5 Operating system Android v 4.2.2 Processor ARM Cortex A15 System Samsung Exynos 5410 SoC Frequency 0.8 – 1.8 GHz Voltage 0.9 – 1.25 V DVFS meas. National Instr. DAQ Sampling frequency 1000 samples per second Process 28nm
46
Sample Waveform and Estimated Frequency Shift
1 2 3 4 5 6
Time (s)
- 0.8
- 0.6
- 0.4
- 0.2
Frequency Shift (%)
- 1.0
- 0.90%
high VDD stress low VDD recovery
Amazon.com
0.6 0.8 1 1.2
VDD (normalized)
- High VDD duration: Freq. degrades with time
- Low VDD duration: Freq. shift dips and then recovers
47
Applying Model to Other DVFS Traces
- Worst case frequency dip
– 3D-raytrace: Δf=1.0% at t=6s when VDD drops by 29% after staying in high VDD mode for 5.8s
Sina.com Google.com NYTimes.com Amazon.com
48
Summary
- Power wall (2000) Variability wall (2010)
Reliability wall (2020)
- Example: NTV + RDF + BTI
- Aging sensor deployed for the first time in a
commercial processor (IBM z systems)
- Per-Core DVFS with sub-microsecond ramp time
becoming a standard feature in new processors
- Turbo boost + NTV: Best of both worlds in terms of