1
Statistical Analysis and Optimization of Asynchronous Digital - - PowerPoint PPT Presentation
Statistical Analysis and Optimization of Asynchronous Digital - - PowerPoint PPT Presentation
Statistical Analysis and Optimization of Asynchronous Digital Circuits Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley 1 Outline Motivation Variability model of CMOS digital circuit Performance model for different
Outline
- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion
2
3
Variability Continues to Increase as Technology and Voltage Scales Down
Device variability vs. Technology node
- 80% ~ +110% @0.3V
- 40% ~ +30% @1V
Normalized Delay
Delay spread due to process variations
Normalized Delay Count Count
- Higher variability with finer design rules and larger wafers
- Higher variability with lower supply voltages
[Cao, ASU]
Circuit Performance Characteristics with Different Timing Schemes
Original circuit Self-timed circuit Conventional synchronous circuit
Computation Delay Probability
- Self-timed circuit is a variation-monitoring circuit by itself
- Becomes advantageous when the variation is large (B>A)
- Statistical analysis framework is necessary
B: 3σ delay variation A: protocol circuit delay A B
4
Statistical Analysis Framework
5
Circuit Variability Model
- Supply voltage
- Logic depth
- Width and length
- Body bias
Performance Model
- Computation overhead
- Communication overhead
- Delay and energy
performance
Delay Energy
Processors Communications Sensors
Determine the optimal timing strategy in the presence of variability
Outline
- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion
6
Delay Model of CMOS Digital Circuit
7
- One unified current model across different operating regions
- Model error <2% from 0.3V to 1V
4-stage FO4 INV chain
0.2 0.4 0.6 0.8 1 10 10
1
10
2
Supply Voltage [V] Delay [FO4(@VDD=1V)] Simulation data Model
I ! VDD "Vth
( )
2
1+ VDD "Vth EsatL # $ % & ' ( I !exp VDD "Vth S # $ % & ' (
I ! ln 1+exp VDD "Vth 2S # $ % & ' ( ) * + ,
- .
/ 1 2 3 4
2
1+ ln 1+exp VDD "Vth EsatL # $ % & ' ( ) * + ,
- .
/ 5 1 5 2 3 5 4 5
0.2 0.4 0.6 0.8 1 1.5 1 0.5 0.5 1 Supply Voltage [V] Error [%]
Delay Variability Model
8
Within die variation (WID) “Local mismatch” Die-to-die variation (DTD) “Global variation”
! Td µTd = STd
Vth
( )
2 ! ! Vth
µ
Vth
" # $ $ % & ' '
2
+ STd
K
( )
2 ! ! K
µ
K
" # $ $ % & ' '
2
STd
Vth =
!Vth Vth !Td Td
0.2 0.4 0.6 0.8 1 5 10 15 20 25 Supply Voltage [V] σ/μ [%] Simulation data Model (WID) Model (Threshold voltage) Model (Geometry) 0.2 0.4 0.6 0.8 1 5 10 15 20 Supply Voltage [V] σ/μ [%] Simulation data Model (DTD) Model (Threshold voltage) Model (Geometry)
Threshold voltage Geometry
STd
K =
!K K !Td Td .
Delay Variability Model
9 0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 Supply Voltage [V] /µ [%] Simulation data Model (total) Model (DTD) Model (WID)
0.2 0.4 0.6 0.8 1 8 6 4 2 2 4 Supply Voltage [V] Error [%]
! Td,total µTd,total = ! Td,DTD µTd,DTD ! " # # $ % & &
2
+ ! Td,WID µTd,WID ! " # # $ % & &
2
- Model error <8% from 0.3V to 1V
- Local mismatch dominates at low supply voltages
0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 Supply Voltage [V] /µ [%] Simulation data (n=4) Model (n=4) Simulation data (n=8) Model (n=8) Simulation data (n=24) Model (n=24)
Delay Variability Model with Different Logic Depths
10
! Td,total _n µTd,total _n = ! Td,DTD_ 4 µTd,DTD_ 4 ! " # # $ % & &
2
+ 4 n ! " # $ % &' ! Td,WID_ 4 µTd,WID_ 4 ! " # # $ % & &
2
0.2 0.4 0.6 0.8 1 10 5 5 10 15 Supply Voltage [V] Error [%] n=4 n=8 n=24
- Use 4-stage inverter chain model as baseline model
- Model error <13% for n=8 and <15% for n=24
Outline
- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion
11
Delay Overhead Evaluation
Original circuit Dual-rail timing Synchronous timing
Computation Delay Probability
- Assumption: Process variation follows Gaussian distribution
- Dual-rail approach: have only protocol overhead but no delay overhead
- Synchronous approach: have only delay overhead
B: 3σ delay variation A: protocol circuit delay A B
12
Dsync = 3! logic,total µlogic,total
For 99.7% yield:
Bundled-Data Self-Timed Approach
13
Main data path
fdelay!line = N(µdelay!line,! delay!line
2
)
Goal: Assume main data path and replica delay line exhibit similar statistics:
Dbundled!data = µdelay!line !µlogic µlogic
where
flogic(t) = N µlogic,! logic
2
( )
P tlogic ! tdelay"line
( ) #1
Dbundled!data = Dvariation
2
" 0.5+ 0.25+ 2 Dvariation
2
# $ % % & ' ( ( Dvariation = 3! logic,WID µlogic,WID
Replica delay line Probability Computation Delay Main data path Replica delay line
For 99.7% yield:
50 100 150 200 100 200 300 400 500 600 Process Variability [%] Delay Overhead [%]
Bundled-Data Delay Overhead
14
O(n2) O(n) Dbundled!data " 2 # Dvariation, when Dvariation $ 0 Dvariation
2
, when Dvariation $ % . & ' ( ) (
- Delay overhead becomes
much larger as process variability increases!
Performance Model under Variations
15
Eleakage=VIleakageTdelay Tcomp= Tdelay (1+P+D) Edynamic=αCswitchV2 Etotal=αCswitchV2
+VIleakageTdelay
Tcomp= Tdelay Eleakage=VIleakage(1+P)Tdelay (1+P+D) Edynamic=αCswitch(1+P)V2 Etotal=αCswitch(1+P)V2 +VIleakage(1+P)Tdelay (1+P+D) Original delay and energy model Statistical delay and energy model
Timing scheme Synchronous Bundled-Data Dual-Rail Delay Overhead (D) Dsync Dbundled-data Protocol Overhead (P) Pbundled-data Pdual-rail
- Evaluate computation delay and energy under variations
- Overhead changes with supply voltage and logic depth
Outline
- Motivation
- Variability model of CMOS digital circuit
- Performance model for different timing schemes
- Performance comparison
- Conclusion
16
17
- Global variation affects only synchronous approach
- Local mismatch dominates at low supply voltages
- Local mismatch has less impact on longer critical path
4-stage FO4 INV chain
Delay Overhead Comparison
24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 Supply Voltage [V] Delay Overhead [%] Synchronous Timing BundledData SelfTiming 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 Supply Voltage [V] Delay Overhead [%] Synchronous Timing BundledData SelfTiming
18
- Assumption: Pbundled-data = 1TFO4; Pdual-rail = 2TFO4
- Synchronous scheme is better for small critical path at high supply voltages
- Dual-rail scheme is better for large critical path at low supply voltages
Speed Performance Comparison
4-stage FO4 INV chain 24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1 0.8 0.9 1 1.1 1.2 1.3 Supply Voltage [V] Normalized Delay DualRail SelfTiming BundledData SelfTiming 0.2 0.4 0.6 0.8 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Supply Voltage [V] Normalized Delay DualRail SelfTiming BundledData SelfTiming
19
Energy Performance Comparison
24-stage FO4 INV chain
0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 Supply [V] Energy [fJ] Synchronous Timing ( = 0.1) DualRail SelfTiming ( = 0.1) BundledData Selftiming ( = 0.1) 0.2 0.4 0.6 0.8 1 20 30 40 50 60 70 Supply [V] Energy [fJ] EnergyDelay Plot Synchronous Timing ( = 0.01) DualRail SelfTiming ( = 0.01) BundledData Selftiming ( = 0.01)
- Synchronous scheme is better for high activity at high supply voltages
- Dual-rail scheme is better for low activity at low supply voltages
- Leakage dominates for low activity at low supply voltages
20
Conclusion
- A statistical analysis framework is proposed to evaluate
performance of CMOS digital circuit in the presence of process variations.
- Designer can efficiently determine the optimal timing
strategy, pipeline depth and supply voltage based on the proposed variability and statistical performance models.
- Asynchronous design exhibits better energy and delay
characteristics for circuits with low activity and larger critical path delay under process variations
21
Acknowledgement
- Berkeley Wireless Research Center
- NSF Infrastructure Grant
- STMicroelectronics
- Multiscale System Center