Energy'Proportionality'and'Worload' - - PowerPoint PPT Presentation
Energy'Proportionality'and'Worload' - - PowerPoint PPT Presentation
Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion Latency(critical,applications, [OSDI14] 99 th %*Latency*(s) Linux IX
2
Latency(critical,applications,[OSDI14]
- Memcached,*Facebook*USR*workload,*2752*connections
- Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520
Throughput*(RPS*x*106) 99th %*Latency*(µs)
Linux IX
3
Throughput*(RPS*x*106) 99th %*Latency*(µs)
- Memcached,*Facebook*USR*workload,*2752*connections
- Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520
Linux IX
SLO Latency(critical,applications,[OSDI14]
4
Throughput*(RPS*x*106) Power)(W) 99th %*Latency*(µs)
- Memcached,*Facebook*USR*workload,*2752*connections
- Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520
Over.configurations) drain)a)lot)of)power Linux IX
What,about,energy,efficiency?
Static,configurations,trade(off
5
LinuxY*Memcached*MRPS*at*SLO
Power*(W)
nominal (2.4*GHz,*8*cores)
Static,configurations,trade(off
6
Power*(W)
min (1.2*GHz,*1*core) nominal (2.4*GHz,*8*cores)
LinuxY*Memcached*MRPS*at*SLO
Static,configurations,trade(off
7
Power*(W)
min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)
LinuxY*Memcached*MRPS*at*SLO
Static,configurations,trade(off
8
224*static*configurations
Power*(W)
min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)
Pareto,Frontier
9
Power*(W)
min (1.2*GHz,*2*cores) nominal (2.4*GHz,*8*cores) max (TurboBoost,*8*cores)
Pareto)frontier
Theoretical)optimum:)pick)the)best)static) configuration)for)any)given)load)level
LinuxY*224*static*configurations
Potential,energy,savings
10
Memcached*MRPS*at*SLO Power*(W)
IX*Pareto Linux Pareto IX*max Linux*max
Contributions
- Dynamic'resource'controls'
– for'low6latency,' high6throughput' dataplanes – Supports'energy'proportionality'and' workload'consolidation'policies
- Evaluation'of'dynamic'resource'controls'vs.
– Maximum'configuration'(static) – Pareto6optimal' behavior'(theoretical'bound)
11
C
12
Host* Kernel
Userspace Host
C
Dynamic,resource,control
IX
Latency*critical* workload Background task
dataplane
CP
Policy
C
Add'a'core
RX TX RX TX
C C
13
Host* Kernel
Userspace Host
C
Dynamic,resource,control
IX
Latency*critical* workload Background task
dataplane
CP
Policy
C
RX TX RX TX RX TX
C C C
14
Host* Kernel
Userspace Host
C
Dynamic,resource,control
IX
Latency*critical* workload
dataplane
CP
Policy
RX TX RX TX RX TX RX TX
C C C C
15
Host* Kernel
Userspace Host
RX TX RX TX
Dynamic,resource,control
IX
Latency*critical* workload
dataplane
CP
Policy
RX TX RX TX
C C C C C
Key,challenges
1. Which'resources'to'add/remove'? – inferred'by'Pareto'analysis – different'for'energy'proportionality'and' workload'consolidation 2. When'to'add/remove'resources'?' – Need'to'design'a'stable'control'loop – Different'triggers'to'add/remove'resources 3. How'to'add/remove'cores – Fast,'TCP6friendly'rebalancing'mechanism'
16
#1:,Resource,Adjustment,Policies
17
Power*(W)
A
+core
+ht
+dvfs +Turbo)
#2:,Detection
- Centralized'queues'provide'a'single'point'of'
load'detection in'sub6second'timescales
18
RX FIFO TCP/IP
RX
Event@driven* app
TCP/IP
#2,Detection,(add)
- Centralized'queues'provide'a'single'point'of'
load'detection in'sub6second'timescales
19
RX FIFO TCP/IP
RX
Event@driven* app
TCP/IP
Q
queuing*delay**<*300*us
#3:,TCP(friendly,add/remove,core
- Challenge:'maintain'coherence6free'IX'design
– Queues'dedicated'to'cores – Lock6free'TCP'stack
20
Flow)groups
RSS* hash
App
TCP/I P TCP/I P
App
TCP/I P TCP/I P
App
TCP/I P TCP/I P
NIC
App
TCP/I P TCP/I P
Challenge:)inherent)race)condition)in)NIC)HW)update
#3:,Flow(group,Migration,Algorithm
- TCP6friendly
- Without'dropping'or'reordering'packets
21
Completes)in)less)than)2)ms 95%)of)the)time
Experimental,setup
- Latency6sensitive' workload'– memcached
– Energy,prop.;'workload'consolidation
- 3'demanding'synthetic'load'patterns:
– slope,'step, sine+noise – 4min+cycle+time
- 10'load'generating'clients'+'1'latency'measuring'
client'@1'second'intervals
- 2752'connections;' Poisson6distribution
22
Evaluation,– step,pattern
23
Achieved*MRPS 99th pct latency(µs) Time*(s)
Add'a' core
Adequate)compliance:)violations)~)1)second
Evaluation,– step,pattern
24
Power*(W) Time*(s)
max dynamic Pareto
Average:)max=91W))Dynamic=48W)Pareto=41W
Evaluation
25
Slope Step Sine+noise Energy* Proportionality (W) Max.*power 91 92 94 Measured 42 48 53 Pareto*
- ptimal
39 41 45 Server* Consolidation* Opportunity* (%*of*peak) Measured 46% 39% 32% Pareto*
- ptimal
50% 47% 39%
Saving)44%.54%)of)processor)energy)! 85%.93%)of)Pareto.optimal)bound Running)background)job)at)32%.46%)of)their) standalone)throughput)! 82%.92%)of)the)Pareto.optimal)bound
Conclusion
- Real'challenges'to'latency6sensitive' applications
– Maintain'service6level'objectives'while – Minimize'energy'consumption'or – Maximize'workload'consolidation
- Design'using'Pareto'methodology'to'determine'
theoretical'bound'and'derive'control'policies
- Implement'dynamic'resource'controls'to'IX'
dataplane'operating'system
26