Statistical Analysis and Optimization of Asynchronous Digital - - PowerPoint PPT Presentation

statistical analysis and optimization of asynchronous
SMART_READER_LITE
LIVE PREVIEW

Statistical Analysis and Optimization of Asynchronous Digital - - PowerPoint PPT Presentation

Statistical Analysis and Optimization of Asynchronous Digital Circuits Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley 1 Outline Motivation Variability model of CMOS digital circuit Performance model for different


slide-1
SLIDE 1

1

Statistical Analysis and Optimization of Asynchronous Digital Circuits

Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley

slide-2
SLIDE 2

Outline

  • Motivation
  • Variability model of CMOS digital circuit
  • Performance model for different timing schemes
  • Performance comparison
  • Conclusion

2

slide-3
SLIDE 3

3

Variability Continues to Increase as Technology and Voltage Scales Down

Device variability vs. Technology node

  • 80% ~ +110% @0.3V
  • 40% ~ +30% @1V

Normalized Delay

Delay spread due to process variations

Normalized Delay Count Count

  • Higher variability with finer design rules and larger wafers
  • Higher variability with lower supply voltages

[Cao, ASU]

slide-4
SLIDE 4

Circuit Performance Characteristics with Different Timing Schemes

Original circuit Self-timed circuit Conventional synchronous circuit

Computation Delay Probability

  • Self-timed circuit is a variation-monitoring circuit by itself
  • Becomes advantageous when the variation is large (B>A)
  • Statistical analysis framework is necessary

B: 3σ delay variation A: protocol circuit delay A B

4

slide-5
SLIDE 5

Statistical Analysis Framework

5

Circuit Variability Model

  • Supply voltage
  • Logic depth
  • Width and length
  • Body bias

Performance Model

  • Computation overhead
  • Communication overhead
  • Delay and energy

performance

Delay Energy

Processors Communications Sensors

Determine the optimal timing strategy in the presence of variability

slide-6
SLIDE 6

Outline

  • Motivation
  • Variability model of CMOS digital circuit
  • Performance model for different timing schemes
  • Performance comparison
  • Conclusion

6

slide-7
SLIDE 7

Delay Model of CMOS Digital Circuit

7

  • One unified current model across different operating regions
  • Model error <2% from 0.3V to 1V

4-stage FO4 INV chain

0.2 0.4 0.6 0.8 1 10 10

1

10

2

Supply Voltage [V] Delay [FO4(@VDD=1V)] Simulation data Model

I ! VDD "Vth

( )

2

1+ VDD "Vth EsatL # $ % & ' ( I !exp VDD "Vth S # $ % & ' (

I ! ln 1+exp VDD "Vth 2S # $ % & ' ( ) * + ,

  • .

/ 1 2 3 4

2

1+ ln 1+exp VDD "Vth EsatL # $ % & ' ( ) * + ,

  • .

/ 5 1 5 2 3 5 4 5

0.2 0.4 0.6 0.8 1 1.5 1 0.5 0.5 1 Supply Voltage [V] Error [%]

slide-8
SLIDE 8

Delay Variability Model

8

Within die variation (WID) “Local mismatch” Die-to-die variation (DTD) “Global variation”

! Td µTd = STd

Vth

( )

2 ! ! Vth

µ

Vth

" # $ $ % & ' '

2

+ STd

K

( )

2 ! ! K

µ

K

" # $ $ % & ' '

2

STd

Vth =

!Vth Vth !Td Td

0.2 0.4 0.6 0.8 1 5 10 15 20 25 Supply Voltage [V] σ/μ [%] Simulation data Model (WID) Model (Threshold voltage) Model (Geometry) 0.2 0.4 0.6 0.8 1 5 10 15 20 Supply Voltage [V] σ/μ [%] Simulation data Model (DTD) Model (Threshold voltage) Model (Geometry)

Threshold voltage Geometry

STd

K =

!K K !Td Td .

slide-9
SLIDE 9

Delay Variability Model

9 0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 Supply Voltage [V] /µ [%] Simulation data Model (total) Model (DTD) Model (WID)

0.2 0.4 0.6 0.8 1 8 6 4 2 2 4 Supply Voltage [V] Error [%]

! Td,total µTd,total = ! Td,DTD µTd,DTD ! " # # $ % & &

2

+ ! Td,WID µTd,WID ! " # # $ % & &

2

  • Model error <8% from 0.3V to 1V
  • Local mismatch dominates at low supply voltages
slide-10
SLIDE 10

0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 Supply Voltage [V] /µ [%] Simulation data (n=4) Model (n=4) Simulation data (n=8) Model (n=8) Simulation data (n=24) Model (n=24)

Delay Variability Model with Different Logic Depths

10

! Td,total _n µTd,total _n = ! Td,DTD_ 4 µTd,DTD_ 4 ! " # # $ % & &

2

+ 4 n ! " # $ % &' ! Td,WID_ 4 µTd,WID_ 4 ! " # # $ % & &

2

0.2 0.4 0.6 0.8 1 10 5 5 10 15 Supply Voltage [V] Error [%] n=4 n=8 n=24

  • Use 4-stage inverter chain model as baseline model
  • Model error <13% for n=8 and <15% for n=24
slide-11
SLIDE 11

Outline

  • Motivation
  • Variability model of CMOS digital circuit
  • Performance model for different timing schemes
  • Performance comparison
  • Conclusion

11

slide-12
SLIDE 12

Delay Overhead Evaluation

Original circuit Dual-rail timing Synchronous timing

Computation Delay Probability

  • Assumption: Process variation follows Gaussian distribution
  • Dual-rail approach: have only protocol overhead but no delay overhead
  • Synchronous approach: have only delay overhead

B: 3σ delay variation A: protocol circuit delay A B

12

Dsync = 3! logic,total µlogic,total

For 99.7% yield:

slide-13
SLIDE 13

Bundled-Data Self-Timed Approach

13

Main data path

fdelay!line = N(µdelay!line,! delay!line

2

)

Goal: Assume main data path and replica delay line exhibit similar statistics:

Dbundled!data = µdelay!line !µlogic µlogic

where

flogic(t) = N µlogic,! logic

2

( )

P tlogic ! tdelay"line

( ) #1

Dbundled!data = Dvariation

2

" 0.5+ 0.25+ 2 Dvariation

2

# $ % % & ' ( ( Dvariation = 3! logic,WID µlogic,WID

Replica delay line Probability Computation Delay Main data path Replica delay line

For 99.7% yield:

slide-14
SLIDE 14

50 100 150 200 100 200 300 400 500 600 Process Variability [%] Delay Overhead [%]

Bundled-Data Delay Overhead

14

O(n2) O(n) Dbundled!data " 2 # Dvariation, when Dvariation $ 0 Dvariation

2

, when Dvariation $ % . & ' ( ) (

  • Delay overhead becomes

much larger as process variability increases!

slide-15
SLIDE 15

Performance Model under Variations

15

Eleakage=VIleakageTdelay Tcomp= Tdelay (1+P+D) Edynamic=αCswitchV2 Etotal=αCswitchV2

+VIleakageTdelay

Tcomp= Tdelay Eleakage=VIleakage(1+P)Tdelay (1+P+D) Edynamic=αCswitch(1+P)V2 Etotal=αCswitch(1+P)V2 +VIleakage(1+P)Tdelay (1+P+D) Original delay and energy model Statistical delay and energy model

Timing scheme Synchronous Bundled-Data Dual-Rail Delay Overhead (D) Dsync Dbundled-data Protocol Overhead (P) Pbundled-data Pdual-rail

  • Evaluate computation delay and energy under variations
  • Overhead changes with supply voltage and logic depth
slide-16
SLIDE 16

Outline

  • Motivation
  • Variability model of CMOS digital circuit
  • Performance model for different timing schemes
  • Performance comparison
  • Conclusion

16

slide-17
SLIDE 17

17

  • Global variation affects only synchronous approach
  • Local mismatch dominates at low supply voltages
  • Local mismatch has less impact on longer critical path

4-stage FO4 INV chain

Delay Overhead Comparison

24-stage FO4 INV chain

0.2 0.4 0.6 0.8 1 20 40 60 80 100 120 Supply Voltage [V] Delay Overhead [%] Synchronous Timing BundledData SelfTiming 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 Supply Voltage [V] Delay Overhead [%] Synchronous Timing BundledData SelfTiming

slide-18
SLIDE 18

18

  • Assumption: Pbundled-data = 1TFO4; Pdual-rail = 2TFO4
  • Synchronous scheme is better for small critical path at high supply voltages
  • Dual-rail scheme is better for large critical path at low supply voltages

Speed Performance Comparison

4-stage FO4 INV chain 24-stage FO4 INV chain

0.2 0.4 0.6 0.8 1 0.8 0.9 1 1.1 1.2 1.3 Supply Voltage [V] Normalized Delay DualRail SelfTiming BundledData SelfTiming 0.2 0.4 0.6 0.8 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Supply Voltage [V] Normalized Delay DualRail SelfTiming BundledData SelfTiming

slide-19
SLIDE 19

19

Energy Performance Comparison

24-stage FO4 INV chain

0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 Supply [V] Energy [fJ] Synchronous Timing ( = 0.1) DualRail SelfTiming ( = 0.1) BundledData Selftiming ( = 0.1) 0.2 0.4 0.6 0.8 1 20 30 40 50 60 70 Supply [V] Energy [fJ] EnergyDelay Plot Synchronous Timing ( = 0.01) DualRail SelfTiming ( = 0.01) BundledData Selftiming ( = 0.01)

  • Synchronous scheme is better for high activity at high supply voltages
  • Dual-rail scheme is better for low activity at low supply voltages
  • Leakage dominates for low activity at low supply voltages
slide-20
SLIDE 20

20

Conclusion

  • A statistical analysis framework is proposed to evaluate

performance of CMOS digital circuit in the presence of process variations.

  • Designer can efficiently determine the optimal timing

strategy, pipeline depth and supply voltage based on the proposed variability and statistical performance models.

  • Asynchronous design exhibits better energy and delay

characteristics for circuits with low activity and larger critical path delay under process variations

slide-21
SLIDE 21

21

Acknowledgement

  • Berkeley Wireless Research Center
  • NSF Infrastructure Grant
  • STMicroelectronics
  • Multiscale System Center

Thank you!