Helping Moore’s Law: Architectural Techniques to Address Parameter Variation
Radu Teodorescu Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/~teodores
Helping Moores Law: Architectural Techniques to Address Parameter - - PowerPoint PPT Presentation
Helping Moores Law: Architectural Techniques to Address Parameter Variation Radu Teodorescu Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/~teodores Technology scaling continues Quad
Radu Teodorescu Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/~teodores
Radu Teodorescu Architectural Techniques to Address Parameter Variation
transistor size
2
number
Pentium 3 Pentium 4 Core 2 Duo Quad Opteron
Radu Teodorescu Architectural Techniques to Address Parameter Variation
3
Sub-wavelength lithography
192nm light 45nm
Dopant density fluctuations
!"#$
%&'(
(07$ (2(>$
?@A BC@A
!"#$
%&'(
(07$ (2(>$
?@A BC@A
Temperature variation Supply voltage fluctuations
!"#$%&µ µ µ µ'$() *+,,-.%/0-123$%&4)
4#256%7$-"28"-"1.%9%,0:$7 4#";6%<7$=+$;(.
!"#$%&µ µ µ µ'$() *+,,-.%/0-123$%&4)
4#256%7$-"28"-"1.%9%,0:$7 4#";6%<7$=+$;(.
Manufacturing process Environmental
Radu Teodorescu Architectural Techniques to Address Parameter Variation
4
Frequency Power Reliability pdf switching speed leakage power
AMD Quad-core Opteron nominal Intel 80-core Polaris
Radu Teodorescu Architectural Techniques to Address Parameter Variation
5
QOM HOQ HOH HOG HOD HOF Q J HQ HJ GQ
5*#$"6)7%-'8#%92%+0: 5*#$"6)7%-';%"<"=%'>.,?@ ABC DBE
QOM HOQ HOH HOG HOD HOF Q J HQ HJ GQ
5*#$"6)7%-'8#%92%+0: 5*#$"6)7%-';%"<"=%'>.,?@ ABC DBE
One generation of process technology is lost to process variation.
Shekhar Borkar et al, Intel, DAC 2003
130nm
Radu Teodorescu Architectural Techniques to Address Parameter Variation
6
die-to-die C1 C2 C3 C4 within-die C1 C2 C3 C4 slower, less leaky transistors fast, leaky transistors
Radu Teodorescu Architectural Techniques to Address Parameter Variation 7
Circuits Microarchitecture Runtime system Circuits Microarchitecture Runtime system computing stack
C1 C2 C3 C4
reduce power
speed up slow cells
Variation reduction Variation tolerance
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
variation tolerance variation-aware application scheduling and power management variation reduction dynamic fine-grain body biasing
Radu Teodorescu Architectural Techniques to Address Parameter Variation
8
Circuits Microarchitecture Runtime system Circuits Microarchitecture Runtime system variation reduction variation tolerance
[ISCA’08]
Radu Teodorescu Architectural Techniques to Address Parameter Variation
9
Circuits Microarchitecture Runtime system Circuits Microarchitecture Runtime system variation tolerance variation reduction
Radu Teodorescu Architectural Techniques to Address Parameter Variation
10
DVFS
Frequency Dynamic power
BB
Frequency Leakage power
Frequency Leakage Frequency Leakage
Radu Teodorescu Architectural Techniques to Address Parameter Variation
11
C1 C2 C3 C4
RBB reduces static power
FBB speeds up slow cells
FGBB
Frequency Leakage power
[Tschanz et al, Intel]
Radu Teodorescu Architectural Techniques to Address Parameter Variation
Bin 4 Bin 3 Bin 2 Bin 1
12
Leakage power limit High power Leakage Frequency
BB values fixed for the lifetime
Fmax
Worst case conditions (temperature, power) are assumed
S-FGBB has to be conservative
Radu Teodorescu Architectural Techniques to Address Parameter Variation
13
T delay
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
changes
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
Temp
(07$ BC@A (07$ BC@A (07$ BC@A (07$ BC@Afast
(07$ BC@A (07$ BC@A (07$ BC@A (07$ BC@Aslow
Radu Teodorescu Architectural Techniques to Address Parameter Variation
fast slow
14
Target: Fmax FBB RBB
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
average T Higher power consumption Lower power consumption FBB RBB Target: Fmax
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
max T
S-FGBB
BB - fixed
D-FGBB
BB - variable
Radu Teodorescu Architectural Techniques to Address Parameter Variation
fast slow
14
Target: Fmax FBB RBB
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
average T Higher power consumption Lower power consumption FBB RBB Target: Fmax
(07$
BC@A
(07$
BC@A
(07$
BC@A
(07$
BC@A
max T
S-FGBB
BB - fixed
D-FGBB
BB - variable The goal of D-FGBB is to keep the body bias
Radu Teodorescu Architectural Techniques to Address Parameter Variation
delay sampling circuit
15 Critical Path Replica Phase Detector FBB RBB
CLK
Radu Teodorescu Architectural Techniques to Address Parameter Variation
16
Standard
Improve frequency and power
High performance
Maximize frequency
Low power
Minimize leakage power
environment goal
Radu Teodorescu Architectural Techniques to Address Parameter Variation
Forig Fmax
17
Power limit Original chip
Frequency
S-FGBB at Tavg
Leakage
D-FGBB at Tavg
S-FGBB finds and sets Fmax
Average conditions (Tavg) D-FGBB saves leakage power compared to S-FGBB at Fmax
Radu Teodorescu Architectural Techniques to Address Parameter Variation
D-FGBB is very effective at reducing WID variation:
18
0.5 1.0 leakage 0.812 0.850 0.887 0.925 0.962 1.000 1.037 frequency
NoBB
frequency leakage power
S-FGBB
1.0 0.5 1.0 leakage 0.812 0.850 0.887 0.925 0.962 1.000 1.037 S-FGBB64
leakage power
1.0 0.5 1.0 leakage 0.812 0.850 0.887 0.925 0.962 1.000 1.037
D-FGBB
leakage power
Radu Teodorescu Architectural Techniques to Address Parameter Variation
19
[ISCA’08]
Circuits Microarchitecture Runtime system Circuits Microarchitecture Runtime system variation reduction variation tolerance
Radu Teodorescu Architectural Techniques to Address Parameter Variation 20
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C2 C20
Total power 40% Leakage power 2X Frequency 30% vs.
fastest slowest
Design-identical cores will have significantly different properties
Radu Teodorescu Architectural Techniques to Address Parameter Variation
21
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
frequency it can achieve
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
Radu Teodorescu Architectural Techniques to Address Parameter Variation C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
22
Applications
Radu Teodorescu Architectural Techniques to Address Parameter Variation C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
23
Variation-aware scheduling algorithms:
Assign applications with high dynamic power
to low power cores (VarPower)
Assign high IPC applications to high
frequency cores (VarPerf)
High IPC Low IPC
Radu Teodorescu Architectural Techniques to Address Parameter Variation 24
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F
Radu Teodorescu Architectural Techniques to Address Parameter Variation
0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Frequency Total power
25
Vdd=0.6-1V 1V 0.6V 0.85V Vdd=1V 0.9V 0.8V 0.7V 0.6V
Radu Teodorescu Architectural Techniques to Address Parameter Variation
26
Given a mapping of threads to cores (VarPerf):
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
FIND!
V V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F
best (Vi,Fi) of each core
50W 75W 100W
Radu Teodorescu Architectural Techniques to Address Parameter Variation
26
Given a mapping of threads to cores (VarPerf):
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
FIND!
V V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F
best (Vi,Fi) of each core
50W 75W 100W
Radu Teodorescu Architectural Techniques to Address Parameter Variation
27
FIND
Radu Teodorescu Architectural Techniques to Address Parameter Variation
28
Radu Teodorescu Architectural Techniques to Address Parameter Variation C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
29
PMU
Radu Teodorescu Architectural Techniques to Address Parameter Variation 30
Post-manufacturing profiling Each core: frequency, static power Dynamic profiling Each app: dynamic power, IPC LinOpt
Power target Goal
LinOpt 10ms Time OS scheduling interval
V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F V,F
best (Vi,Fi) of each core
Radu Teodorescu Architectural Techniques to Address Parameter Variation
31
Circuits Microarchitecture Runtime system Circuits Microarchitecture Runtime system variation reduction variation tolerance
Radu Teodorescu Architectural Techniques to Address Parameter Variation
32
Radu Teodorescu Architectural Techniques to Address Parameter Variation
33
(1-144 cells)
FGBB16 FGBB64 FGBB144
C1 C2 C3 C4
Radu Teodorescu Architectural Techniques to Address Parameter Variation 34
Leakage Frequency
1.10 1.15 0.25 0.50 0.75 1.05 1.00 1
D-FGBB1 D-FGBB16 D-FGBB64 D-FGBB144
More BB cells result in higher frequency and lower leakage
NoBB
28% 42% Leakage reduction
S-FGBB144 S-FGBB64 S-FGBB16 S-FGBB1
Radu Teodorescu Architectural Techniques to Address Parameter Variation
35
compared to S-FGBB
compared to S-FGBB
Radu Teodorescu Architectural Techniques to Address Parameter Variation
36 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 L2 Cache L2 Cache
Radu Teodorescu Architectural Techniques to Address Parameter Variation
Foxton+: baseline VarPerf+LinOpt: proposed scheme VarPerf+SAnn: approximate upper bound
37
Goal: - maximize throughput Constraint: - keep power below budget (75W)
Radu Teodorescu Architectural Techniques to Address Parameter Variation 38
4 Threads 8 Threads 16 Threads 20 Threads 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 MIPS Foxton+ VarPerf+LinOpt VarPerf+SAnn
12% 17% 13% 16%
Radu Teodorescu Architectural Techniques to Address Parameter Variation
How much of the performance/power have we recovered?
39
0.5 0.6 0.7 0.8 0.9 1.0 Frequency No Variation WID Variation D-FGBB Standard D-FGBB HiPerf 0.5 0.7 0.9 1.1 1.3 Leakage Power No Variation WID Variation D-FGBB Standard D-FGBB LowPower 0.5 0.6 0.7 0.8 0.9 1.0 Throughput No Variation WID Variation VarPerf+LinOpt
dynamic fine-grain body biasing variation-aware scheduling and power management
Radu Teodorescu Architectural Techniques to Address Parameter Variation
How much of the performance/power have we recovered?
39
0.5 0.6 0.7 0.8 0.9 1.0 Frequency No Variation WID Variation D-FGBB Standard D-FGBB HiPerf 0.5 0.7 0.9 1.1 1.3 Leakage Power No Variation WID Variation D-FGBB Standard D-FGBB LowPower 0.5 0.6 0.7 0.8 0.9 1.0 Throughput No Variation WID Variation VarPerf+LinOpt
dynamic fine-grain body biasing variation-aware scheduling and power management
Radu Teodorescu Architectural Techniques to Address Parameter Variation
40
Intel 80-core Polaris
Radu Teodorescu Architectural Techniques to Address Parameter Variation
41
Radu Teodorescu Architectural Techniques to Address Parameter Variation
Integrated approach to system reliability
42
Circuits Microarchitecture Operating system Compiler Software environment sensing detection, correction migration, adaptation application hardening
timing errors
Radu Teodorescu Architectural Techniques to Address Parameter Variation
Integrated approach to system reliability
42
Circuits Microarchitecture Operating system Compiler Software environment sensing detection, correction migration, adaptation application hardening
timing errors
Radu Teodorescu Architectural Techniques to Address Parameter Variation
43
checkpointing and rollback, in FPGA [FCCM’05][WCED’05] [BUGS’05][Micro Magazine’06]
[HPCA’07]
code [ASID’06] Hardware support for on-line software debugging
Radu Teodorescu Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/~teodores