VARIUS: A Model of Process Variation and Resulting Timing Errors for - - PowerPoint PPT Presentation
VARIUS: A Model of Process Variation and Resulting Timing Errors for - - PowerPoint PPT Presentation
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Smruti R. Sarangi, Brian Greskamp, Radu Teodorescu, J Jun Nakano, Abhishek Tiwari and Josep Torrellas N k Abhi h k Ti i d J T ll University of Illinois
Parameter Variation Parameter Variation
Parameter Variation Parameter Variation
P T V
Process Supply Voltage Temperature Threshold Voltage: Vt Effective Gate Length: Leff
2
Process Variation is a Problem Process Variation is a Problem
Variation of Vt and Leff:
t eff
– Chip leakage power – Chip frequency
3
Chip Frequency Decreases Chip Frequency Decreases
c Paths c Paths er of Logic P er of Logic P
Timing errors
Number Delay T T Number Delay T
Distribution of path delays i i t With i ti
nom T T var
Distribution of path delays in pipe stage: No variation
nom T
4
in pipe stage: With variation in pipe stage: No variation
Implications on Design Decisions Implications on Design Decisions
- Unlikely designs will be for worst-case par. values
Unlikely designs will be for worst case par. values
– Chips too slow or too costly to design – Performance of a generation lost g
- Alternative: design closer to avg. par values
– Some parts of the chip will be too slow: can we live with timing errors? – Some parts of the chip will dissipate too much power: can we push it to other parts of the chip? can we push it to other parts of the chip? – Multi-tiered solution required: circuits, CAD, micro- architecture, software this talk focuses on μarch.
5
Variation components Variation components
die-to-die within die systematic random
spatial correlation
6
Modeling Process Variation Modeling Process Variation
Process Variation (Not to Scale) Process Variation (Not to Scale) S stematic Variation Random Variation Systematic Variation Random Variation Lens aberrations Variable dopant density
- Mask deformities
Thickness variation in CMP Photo-lithographic effects Line edge roughness
7
Photo lithographic effects
Systematic Variation Systematic Variation
- We divide the chip into a grid of points
- Multivariate normal distribution (μsys, σsys)
- Each point has one random value of ΔPsys
- Characterized by a correlation function:
Px r
- Correlation is position independent and isotropic
- Py
- For ρ(r) we choose the spherical model
- Random: modeled analytically at transistor granularity
8
Spherical Model Spherical Model
Stronger correlation Weaker correlation
Px P r
Stronger correlation
Px r
Weaker correlation
- Matches measured data [Friedberg et al 05]
Py Py 9
- Matches measured data [Friedberg et al. 05]
Modeling Systematic Variation
1000 Break into a million cells Multivariate normal distribution with Spherical Spatial Correlation (μ, σ, Φ) 1000 000 1 Example variation map
10
Example variation map
Paths in a Pipeline Stage
y Density pdf)
Paths in a Pipeline Stage
Timing errors
t
(PE) Path Delay Probability D Function (pd rror Rate (PE Frequency
1
Path Delay nom T T var Pro Fu
pdf(t) cdf (t) df)
f var f nom Erro Frequency
f
Error rate: PE (t) = 1 – cdf(t) pdf(t) cdf (t) 1 ative unction (cdf)
1
Path Delay Cummulati
- Distrib. Fun
1 − cdf Path Delay 1
11
nom T T var C D
nom T T var
Error-rate vs Frequency Error rate vs Frequency
(PE) Rate (PE Error R Frequency f var f nom E
12
Basic Kinds of Structures Basic Kinds of Structures
L i M Logic Memory
ALUs, comparators, sense-amps Path delays: heterogeneous SRAMs, CAMs Path delays: homogenous
Mixed
Renamer, wakeup/select
13
x% memory and (100-x)% logic
Logic Logic
Sample Path 35% Wiring 65% Logic
Elmore Delay Model Elmore Delay Model Alpha Power Law
α
) )( (
DD eff g
V V T V L T ∝
14
α
μ ) )( (
th DD g
V V T −
Logic Delay Logic Delay
Distribution of path delays – no variation
dwire + dgate = 1
(dwire+ η* dgate)* Dvarlogic = Dlogic
wire gate
+dgate*Dextra
Relative gate delay due to systematic variation in P,V, T Delay due to Distribution of path delays with variation Delay due to random variation
- Obtain Dlogic using a timing analysis tool
15
Memory Delay
WL
Memory Delay
VDD Y
cell mem
I T 1 ∝
- Solve for Icell using long
channel eqns.
I Y X
cell
I
- Icell = f(VtX,VtY,LX,LY)
- VtX,VtY,LX and LY are
i i bl
Icell X
gaussian variables
BL BR
- μ
μ μ μ are the systematic components
- μvtx, μvty, μlx, μly are the systematic components
- σvtx, σvty, σlx, σly are the random components
16
Memory Delay - II Memory Delay II
- Find a distribution for T
- Find a distribution for Tmem
– Tmem is a function of four gaussian variables – Model T as a normal distribution Model Tmem as a normal distribution – Find the μ and σ for Tmem using multi-variable Taylor expansion – This is the access time dist. for 1 bit
- A typical entry has 32-128 bits
– Find the max distribution of 32-128 normal variables
17
- Error probability = 1 – cdf(tmem)
Memory Delay Memory Delay
Memory Cell
Memory Line
Use Kirchoff’s equations Long channel trans equations Long channel trans. equations Multi-variable Taylor expansion Delay dist.
- max. distribution
Delayline = max(Delaycell)
18
Combined Error Model Combined Error Model
- We have the delay distributions
We have the delay distributions
- For each structure
P(E) 1 df(t) – per access, P(E) = 1 – cdf(t) – P(E) per inst = αP(E) , α=accesses/inst.
- Combined error rate per instruction
– P(E)total = Σ αP(E)
- CPI penalty per instruction
– recovery penalty * P(E)total
19
y_p y ( )total
Validation – Logic Validation Logic
20
Validation – Memory Validation Memory
21
Finding out the Distributions Finding out the Distributions
sity
Using a timing analysis tool Adding our variation model
ity Densit (pdf)
Adding our variation model
Path Delay robability unction (p Path Delay nom T T var Prob Func
22
nom T var
Adding Up all Pipe Stages Adding Up all Pipe Stages
Whole processor
b a + b
e (PE) e (PE)
a b
Error Rate (P Error Rate ( Frequency
)) ( ( ) ( f P f P
∑
Er Frequency f var f nom Er Frequency
)) ( ( ) ( f P f P
Ei i i E
× =∑ α
23
Overview Overview
Model for Process Variation Model for Timing Errors due to P V i ti Process Variation Techniques to Techniques to Tolerate Timing Errors
24
Variation Aware Timing S l ti (VATS) Speculation (VATS)
Multicore Chi Chip
Processor
Diva Checker
Processor Core
Checker
L0 Cache
L1 Cache
Razor Latches
25
Performance vs Frequency Performance vs Frequency
CPI CPI CPI f f Perf = ) (
rec mem stall comp
CPI CPI CPI f f + +
_
) ( P (f) x recovery_penalty
E
erf) te (P ) Perf ance (Perf E rror Rate ( erforman Erro Per Frequency
fopt
26
Frequency
- pt
AMD Athlon-like Processor AMD Athlon like Processor
27
Conclusion Conclusion
- Micro-architects can help solve par variation
Micro architects can help solve par variation
– Cores that assume faults occur all the time Frequency / Power / Error rate are tradeable – Frequency / Power / Error rate are tradeable – Techniques to mitigate variation-induced errors Develop models that give insights – Develop models that give insights – Work with circuits, CAD, and software folks
28