Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * - - PowerPoint PPT Presentation

▶

Jan 04, 2024 114 likes •379 views

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research Different types of CPU cores CPU Cores P P P P P P P Symmetric Asymmetric multicore processor multicore

SLIDE 1

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)

* Work done during summer internship at Microsoft Research

SLIDE 2

Symmetric SMP Asymmetric multicore processor AMP

Different types

f CPU cores

P P P P P P P

CPU Cores multicore processor

SLIDE 3

B B C A

P P P P P P P SMP AMP SMP AMP Application

time

T 2T 3T

Speedup!

SLIDE 4

How good are AMPs as compared to SMPs?
Can datacenter applications save power using

AMPs?

SLIDE 5

…

S S S S

…

S S S S

…

S S S

. . . . . . . . . . . .

Others Processor

Datacenter Server

P P P P P P P SMP AMP

λdatacenter (throughput)

SLIDE 6

Constant work
Meet latency SLA

P

datacenter AMP

< P

datacenter SMP

?

SLIDE 7

Energy Scaling
Parallel Speedup

…

Sequential execution Parallel execution

SLIDE 8

P P P Sequential application SMP AMP Area equivalent

SLIDE 9

time TSLA TSMP TAMP Slack tlarge

tsmall P

P P SMP AMP

Smaller core = lesser power

SLIDE 10

P P P P P P P P P P P P P P P P P P

P

P P P P P P P P P P

…

SMP AMP

Parallel application

SLIDE 11

P P P P P P P P P P P P P P P P P P

P

P P P P P P P P P P

SMP

AMP

… …

Sequential Phase Small cores: Bottleneck Run on the fast core

Speedup = Higher throughput

SLIDE 12

Server Request Queue Arrival Rate λ Service Rate µ Latency SLA

M/M/1 Queuing Model E[T] = 1 µ − λ Avg. Response Time

SLIDE 13

P P P P P P P P P P P P P P P P P P

P P P P P P P P P P

…

SMP AMP

Parallel application

Amdahl’s Law for Multicores

Parallel Speedup (PS) (refer to paper for ES)

SLIDE 14

Area = 1 Area = r Perf = perf(r)

n = Chip area

P P

P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P

SMP n=16, r=1 AMP n=16, r=4 SMP n=16, r=4

r = Area(Big/Core) f = fraction of computation that can be parallelized

SLIDE 15

µAMP( f ,n,r) = 1 1− f perf (r) + f n − r Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08) µSMP( f ,n,r) = 1 1− f perf (r) + f n r * perf (r)

SLIDE 16

λdatacenter = Nserver

SMP * λserver SMP

λdatacenter = Nserver

AMP * λserver AMP

Datacenter capacity =

No. of servers * Server throughput

Constant Work

λserver

peak = µ − 1

TSLA

SLIDE 17

P

datacenter SMP

= Nserver

SMP * P server SMP

P

datacenter AMP

= Nserver

AMP * P server AMP

Datacenter power (P) =

No. of servers * Server power

SLIDE 18

CPU Utilization (U)

Server Power Consumption P(U) Idle Power Peak Power

Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007

SLIDE 19

CPU Utilization (U)

Fraction of time Server load distribution (Wload) P

server =

Wload (U)* P

server(U)

∑

SLIDE 20

P

datacenter AMP

< P

datacenter SMP

?

SLIDE 21

Upto 52% power savings

n = 64

0% 10% 20% 30% 40% 50% 60% 0.2 0.4 0.6 0.8 1 Power savings of AMP

ver SMP

Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4

SLIDE 22

Upto 14% power savings

0% 5% 10% 15% 20%

5% 10% 15% 20% 25% 30% 35% 40% 45%

Power savings of AMP over SMP Fraction of area sacrificed for small core Small core bias Uniform bias Large core bias

Application A Application B Application C

SLIDE 23

PS looks more promising that ES
Can we achieve these savings in reality?

SLIDE 24

High r (realistic r = 3) High (but not too high!) f

0% 10% 20% 30% 40% 50% 60% 0.5 1 Power savings of AMP

ver SMP

Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4

SLIDE 25

Scalability: Amdahl’s law assumes unbounded

scalability

Migration overhead: zero migration overhead
Perfect scheduling: oracle scheduler

Actual savings are going to be lower

SLIDE 26

Potential for power savings in datacenters using AMPs
Parallel Speedup more promising than Energy Scaling
Practical considerations to realize full benefits