SLIDE 1 Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)
* Work done during summer internship at Microsoft Research
SLIDE 2 Symmetric SMP Asymmetric multicore processor AMP
Different types
P P P P P P P
CPU Cores multicore processor
SLIDE 3 B B C A
P P P P P P P SMP AMP SMP AMP Application
time
T 2T 3T
Speedup!
SLIDE 4
- How good are AMPs as compared to SMPs?
- Can datacenter applications save power using
AMPs?
SLIDE 5 S
…
S S S S
…
S S S S
…
S S S
. . . . . . . . . . . .
Others Processor
Datacenter Server
P P P P P P P SMP AMP
λdatacenter (throughput)
SLIDE 6
- Constant work
- Meet latency SLA
P
datacenter AMP
< P
datacenter SMP
?
SLIDE 7
- Energy Scaling
- Parallel Speedup
…
Sequential execution Parallel execution
SLIDE 8
P P P Sequential application SMP AMP Area equivalent
SLIDE 9
time TSLA TSMP TAMP Slack tlarge
tsmall P
P P SMP AMP
Smaller core = lesser power
SLIDE 10 P P P P P P P P P P P P P P P P P P
P
P P P P P P P P P P
…
SMP AMP
Parallel application
SLIDE 11 P P P P P P P P P P P P P P P P P P
P
P P P P P P P P P P
SMP
AMP
… …
Sequential Phase Small cores: Bottleneck Run on the fast core
Speedup = Higher throughput
SLIDE 12
Server Request Queue Arrival Rate λ Service Rate µ Latency SLA
M/M/1 Queuing Model E[T] = 1 µ − λ Avg. Response Time
SLIDE 13 P P P P P P P P P P P P P P P P P P
P
P P P P P P P P P P
…
SMP AMP
Parallel application
Amdahl’s Law for Multicores
Parallel Speedup (PS) (refer to paper for ES)
SLIDE 14 Area = 1 Area = r Perf = perf(r)
n = Chip area
P P
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P
SMP n=16, r=1 AMP n=16, r=4 SMP n=16, r=4
r = Area(Big/Core) f = fraction of computation that can be parallelized
SLIDE 15
µAMP( f ,n,r) = 1 1− f perf (r) + f n − r Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08) µSMP( f ,n,r) = 1 1− f perf (r) + f n r * perf (r)
SLIDE 16 λdatacenter = Nserver
SMP * λserver SMP
λdatacenter = Nserver
AMP * λserver AMP
Datacenter capacity =
- No. of servers * Server throughput
Constant Work
λserver
peak = µ − 1
TSLA
SLIDE 17 P
datacenter SMP
= Nserver
SMP * P server SMP
P
datacenter AMP
= Nserver
AMP * P server AMP
Datacenter power (P) =
- No. of servers * Server power
SLIDE 18
CPU Utilization (U)
Server Power Consumption P(U) Idle Power Peak Power
Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007
SLIDE 19 CPU Utilization (U)
Fraction of time Server load distribution (Wload) P
server =
Wload (U)* P
server(U)
∑
SLIDE 20
P
datacenter AMP
< P
datacenter SMP
?
SLIDE 21 Upto 52% power savings
n = 64
0% 10% 20% 30% 40% 50% 60% 0.2 0.4 0.6 0.8 1 Power savings of AMP
Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4
SLIDE 22 Upto 14% power savings
0% 5% 10% 15% 20%
5% 10% 15% 20% 25% 30% 35% 40% 45%
Power savings of AMP over SMP Fraction of area sacrificed for small core Small core bias Uniform bias Large core bias
Application A Application B Application C
SLIDE 23
- PS looks more promising that ES
- Can we achieve these savings in reality?
SLIDE 24 High r (realistic r = 3) High (but not too high!) f
0% 10% 20% 30% 40% 50% 60% 0.5 1 Power savings of AMP
Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4
SLIDE 25
- Scalability: Amdahl’s law assumes unbounded
scalability
- Migration overhead: zero migration overhead
- Perfect scheduling: oracle scheduler
Actual savings are going to be lower
SLIDE 26
- Potential for power savings in datacenters using AMPs
- Parallel Speedup more promising than Energy Scaling
- Practical considerations to realize full benefits
Future work: Extend our analysis to functional asymmetry