Energy'Proportionality'and'Worload' - - PowerPoint PPT Presentation

energy proportionality and worload consolidation for
SMART_READER_LITE
LIVE PREVIEW

Energy'Proportionality'and'Worload' - - PowerPoint PPT Presentation

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion Latency(critical,applications, [OSDI14] 99 th %*Latency*(s) Linux IX


slide-1
SLIDE 1

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications

George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion

slide-2
SLIDE 2

2

Latency(critical,applications,[OSDI14]

  • Memcached,*Facebook*USR*workload,*2752*connections
  • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520

Throughput*(RPS*x*106) 99th %*Latency*(µs)

Linux IX

slide-3
SLIDE 3

3

Throughput*(RPS*x*106) 99th %*Latency*(µs)

  • Memcached,*Facebook*USR*workload,*2752*connections
  • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520

Linux IX

SLO Latency(critical,applications,[OSDI14]

slide-4
SLIDE 4

4

Throughput*(RPS*x*106) Power)(W) 99th %*Latency*(µs)

  • Memcached,*Facebook*USR*workload,*2752*connections
  • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520

Over.configurations) drain)a)lot)of)power Linux IX

What,about,energy,efficiency?

slide-5
SLIDE 5

Static,configurations,trade(off

5

LinuxY*Memcached*MRPS*at*SLO

Power*(W)

nominal (2.4*GHz,*8*cores)

slide-6
SLIDE 6

Static,configurations,trade(off

6

Power*(W)

min (1.2*GHz,*1*core) nominal (2.4*GHz,*8*cores)

LinuxY*Memcached*MRPS*at*SLO

slide-7
SLIDE 7

Static,configurations,trade(off

7

Power*(W)

min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)

LinuxY*Memcached*MRPS*at*SLO

slide-8
SLIDE 8

Static,configurations,trade(off

8

224*static*configurations

Power*(W)

min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)

slide-9
SLIDE 9

Pareto,Frontier

9

Power*(W)

min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)

Pareto)frontier

Theoretical)optimum:)pick)the)best)static) configuration)for)any)given)load)level

LinuxY*224*static*configurations

slide-10
SLIDE 10

Potential,energy,savings

10

Memcached*MRPS*at*SLO Power*(W)

IX*Pareto Linux Pareto IX*max Linux*max

slide-11
SLIDE 11

Contributions

  • Dynamic'resource'controls'

– for'low6latency,' high6throughput' dataplanes – Supports'energy'proportionality'and' workload'consolidation'policies

  • Evaluation'of'dynamic'resource'controls'vs.

– Maximum'configuration'(static) – Pareto6optimal' behavior'(theoretical'bound)

11

slide-12
SLIDE 12

C

12

Host* Kernel

Userspace Host

C

Dynamic,resource,control

IX

Latency*critical* workload Background task

dataplane

CP

Policy

C

Add'a'core

RX TX RX TX

C C

slide-13
SLIDE 13

13

Host* Kernel

Userspace Host

C

Dynamic,resource,control

IX

Latency*critical* workload Background task

dataplane

CP

Policy

C

RX TX RX TX RX TX

C C C

slide-14
SLIDE 14

14

Host* Kernel

Userspace Host

C

Dynamic,resource,control

IX

Latency*critical* workload

dataplane

CP

Policy

RX TX RX TX RX TX RX TX

C C C C

slide-15
SLIDE 15

15

Host* Kernel

Userspace Host

RX TX RX TX

Dynamic,resource,control

IX

Latency*critical* workload

dataplane

CP

Policy

RX TX RX TX

C C C C C

slide-16
SLIDE 16

Key,challenges

1. Which'resources'to'add/remove'? – inferred'by'Pareto'analysis – different'for'energy'proportionality'and' workload'consolidation 2. When'to'add/remove'resources'?' – Need'to'design'a'stable'control'loop – Different'triggers'to'add/remove'resources 3. How'to'add/remove'cores – Fast,'TCP6friendly'rebalancing'mechanism'

16

slide-17
SLIDE 17

#1:,Resource,Adjustment,Policies

17

Power*(W)

A

+core

+ht

+dvfs +Turbo)

slide-18
SLIDE 18

#2:,Detection

  • Centralized'queues'provide'a'single'point'of'

load'detection in'sub6second'timescales

18

RX FIFO TCP/IP

RX

Event@driven* app

TCP/IP

slide-19
SLIDE 19

#2,Detection,(add)

  • Centralized'queues'provide'a'single'point'of'

load'detection in'sub6second'timescales

19

RX FIFO TCP/IP

RX

Event@driven* app

TCP/IP

Q

queuing*delay**<*300*us

slide-20
SLIDE 20

#3:,TCP(friendly,add/remove,core

  • Challenge:'maintain'coherence6free'IX'design

– Queues'dedicated'to'cores – Lock6free'TCP'stack

20

Flow)groups

RSS* hash

App

TCP/I P TCP/I P

App

TCP/I P TCP/I P

App

TCP/I P TCP/I P

NIC

App

TCP/I P TCP/I P

Challenge:)inherent)race)condition)in)NIC)HW)update

slide-21
SLIDE 21

#3:,Flow(group,Migration,Algorithm

  • TCP6friendly
  • Without'dropping'or'reordering'packets

21

Completes)in)less)than)2)ms 95%)of)the)time

slide-22
SLIDE 22

Experimental,setup

  • Latency6sensitive' workload'– memcached

– Energy,prop.;'workload'consolidation

  • 3'demanding'synthetic'load'patterns:

– slope,'step, sine+noise – 4min+cycle+time

  • 10'load'generating'clients'+'1'latency'measuring'

client'@1'second'intervals

  • 2752'connections;' Poisson6distribution

22

slide-23
SLIDE 23

Evaluation,– step,pattern

23

Achieved*MRPS 99th pct latency(µs) Time*(s)

Add'a' core

Adequate)compliance:)violations)~)1)second

slide-24
SLIDE 24

Evaluation,– step,pattern

24

Power*(W) Time*(s)

max dynamic Pareto

Average:)max=91W))Dynamic=48W)Pareto=41W

slide-25
SLIDE 25

Evaluation

25

Slope Step Sine+noise Energy* Proportionality (W) Max.*power 91 92 94 Measured 42 48 53 Pareto*

  • ptimal

39 41 45 Server* Consolidation* Opportunity* (%*of*peak) Measured 46% 39% 32% Pareto*

  • ptimal

50% 47% 39%

Saving)44%.54%)of)processor)energy)! 85%.93%)of)Pareto.optimal)bound Running)background)job)at)32%.46%)of)their) standalone)throughput)! 82%.92%)of)the)Pareto.optimal)bound

slide-26
SLIDE 26

Conclusion

  • Real'challenges'to'latency6sensitive' applications

– Maintain'service6level'objectives'while – Minimize'energy'consumption'or – Maximize'workload'consolidation

  • Design'using'Pareto'methodology'to'determine'

theoretical'bound'and'derive'control'policies

  • Implement'dynamic'resource'controls'to'IX'

dataplane'operating'system

26

slide-27
SLIDE 27

Thank,you