Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * - - PowerPoint PPT Presentation

vishal gupta georgia tech ripal nathuji microsoft research
SMART_READER_LITE
LIVE PREVIEW

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * - - PowerPoint PPT Presentation

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research Different types of CPU cores CPU Cores P P P P P P P Symmetric Asymmetric multicore processor multicore


slide-1
SLIDE 1

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)

* Work done during summer internship at Microsoft Research

slide-2
SLIDE 2

Symmetric SMP Asymmetric multicore processor AMP

Different types

  • f CPU cores

P P P P P P P

CPU Cores multicore processor

slide-3
SLIDE 3

B B C A

P P P P P P P SMP AMP SMP AMP Application

time

T 2T 3T

Speedup!

slide-4
SLIDE 4
  • How good are AMPs as compared to SMPs?
  • Can datacenter applications save power using

AMPs?

slide-5
SLIDE 5

S

S S S S

S S S S

S S S

. . . . . . . . . . . .

Others Processor

Datacenter Server

P P P P P P P SMP AMP

λdatacenter (throughput)

slide-6
SLIDE 6
  • Constant work
  • Meet latency SLA

P

datacenter AMP

< P

datacenter SMP

?

slide-7
SLIDE 7
  • Energy Scaling
  • Parallel Speedup

Sequential execution Parallel execution

slide-8
SLIDE 8

P P P Sequential application SMP AMP Area equivalent

slide-9
SLIDE 9

time TSLA TSMP TAMP Slack tlarge

tsmall P

P P SMP AMP

Smaller core = lesser power

slide-10
SLIDE 10

P P P P P P P P P P P P P P P P P P

P

P P P P P P P P P P

SMP AMP

Parallel application

slide-11
SLIDE 11

P P P P P P P P P P P P P P P P P P

P

P P P P P P P P P P

SMP

AMP

… …

Sequential Phase Small cores: Bottleneck Run on the fast core

Speedup = Higher throughput

slide-12
SLIDE 12

Server Request Queue Arrival Rate λ Service Rate µ Latency SLA

M/M/1 Queuing Model E[T] = 1 µ − λ Avg. Response Time

slide-13
SLIDE 13

P P P P P P P P P P P P P P P P P P

P

P P P P P P P P P P

SMP AMP

Parallel application

Amdahl’s Law for Multicores

Parallel Speedup (PS) (refer to paper for ES)

slide-14
SLIDE 14

Area = 1 Area = r Perf = perf(r)

n = Chip area

P P

P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P

SMP n=16, r=1 AMP n=16, r=4 SMP n=16, r=4

r = Area(Big/Core) f = fraction of computation that can be parallelized

slide-15
SLIDE 15

µAMP( f ,n,r) = 1 1− f perf (r) + f n − r Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08) µSMP( f ,n,r) = 1 1− f perf (r) + f n r * perf (r)

slide-16
SLIDE 16

λdatacenter = Nserver

SMP * λserver SMP

λdatacenter = Nserver

AMP * λserver AMP

Datacenter capacity =

  • No. of servers * Server throughput

Constant Work

λserver

peak = µ − 1

TSLA

slide-17
SLIDE 17

P

datacenter SMP

= Nserver

SMP * P server SMP

P

datacenter AMP

= Nserver

AMP * P server AMP

Datacenter power (P) =

  • No. of servers * Server power
slide-18
SLIDE 18

CPU Utilization (U)

Server Power Consumption P(U) Idle Power Peak Power

Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007

slide-19
SLIDE 19

CPU Utilization (U)

Fraction of time Server load distribution (Wload) P

server =

Wload (U)* P

server(U)

slide-20
SLIDE 20

P

datacenter AMP

< P

datacenter SMP

?

slide-21
SLIDE 21

Upto 52% power savings

n = 64

0% 10% 20% 30% 40% 50% 60% 0.2 0.4 0.6 0.8 1 Power savings of AMP

  • ver SMP

Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4

slide-22
SLIDE 22

Upto 14% power savings

  • 25%
  • 20%
  • 15%
  • 10%
  • 5%

0% 5% 10% 15% 20%

5% 10% 15% 20% 25% 30% 35% 40% 45%

Power savings of AMP over SMP Fraction of area sacrificed for small core Small core bias Uniform bias Large core bias

Application A Application B Application C

slide-23
SLIDE 23
  • PS looks more promising that ES
  • Can we achieve these savings in reality?
slide-24
SLIDE 24

High r (realistic r = 3) High (but not too high!) f

0% 10% 20% 30% 40% 50% 60% 0.5 1 Power savings of AMP

  • ver SMP

Fraction of work that can be parallelized (f) r=32 r=16 r=8 r=4

slide-25
SLIDE 25
  • Scalability: Amdahl’s law assumes unbounded

scalability

  • Migration overhead: zero migration overhead
  • Perfect scheduling: oracle scheduler

Actual savings are going to be lower

slide-26
SLIDE 26
  • Potential for power savings in datacenters using AMPs
  • Parallel Speedup more promising than Energy Scaling
  • Practical considerations to realize full benefits

Future work: Extend our analysis to functional asymmetry