energy proportionality and worload consolidation for
play

Energy'Proportionality'and'Worload' - PowerPoint PPT Presentation

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion Latency(critical,applications, [OSDI14] 99 th %*Latency*(s) Linux IX


  1. Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications George'Prekas,'Mia'Primorac,' Adam'Belay,'Christos'Kozyrakis, Edouard'Bugnion

  2. Latency(critical,applications, [OSDI14] 99 th %*Latency*(µs) Linux IX Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 2

  3. Latency(critical,applications, [OSDI14] 99 th %*Latency*(µs) SLO Linux IX Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 3

  4. What,about,energy,efficiency? 99 th %*Latency*(µs) Power)(W) Linux IX Over.configurations) drain)a)lot)of)power Throughput*(RPS*x*10 6 ) • Memcached,*Facebook*USR*workload,*2752*connections • Server:*Xeon*E5@2665*@2.4*Ghz,*8*cores*and*16*HTs,*Intel*x520 4

  5. Static,configurations,trade(off nominal (2.4*GHz,*8*cores) Power*(W) LinuxY*Memcached*MRPS*at*SLO 5

  6. Static,configurations,trade(off nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*1*core) LinuxY*Memcached*MRPS*at*SLO 6

  7. Static,configurations,trade(off max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*2*cores) LinuxY*Memcached*MRPS*at*SLO 7

  8. Static,configurations,trade(off max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) min (1.2*GHz,*2*cores) 224*static*configurations 8

  9. Pareto,Frontier max (TurboBoost,*8*cores) nominal (2.4*GHz,*8*cores) Power*(W) Pareto)frontier Theoretical)optimum:)pick)the)best)static) min (1.2*GHz,*2*cores) configuration)for)any)given)load)level LinuxY*224*static*configurations 9

  10. Potential,energy,savings Linux*max IX*max Power*(W) Linux Pareto IX*Pareto Memcached*MRPS*at*SLO 10

  11. Contributions • Dynamic'resource'controls' – for'low6latency,' high6throughput' dataplanes – Supports'energy'proportionality'and' workload'consolidation'policies • Evaluation'of'dynamic'resource'controls'vs. – Maximum'configuration'(static) – Pareto6optimal' behavior'(theoretical'bound) 11

  12. Dynamic,resource,control dataplane Add'a'core Latency*critical* CP Background workload task Policy Userspace IX Host* Host Kernel RX RX TX TX C C C C C 12

  13. Dynamic,resource,control dataplane Latency*critical* CP Background workload task Policy Userspace IX Host* Host Kernel RX RX RX TX TX TX C C C C C 13

  14. Dynamic,resource,control dataplane Latency*critical* CP workload Policy Userspace IX Host* Host Kernel RX RX RX RX TX TX TX TX C C C C C 14

  15. Dynamic,resource,control dataplane Latency*critical* CP workload Policy Userspace IX Host* Host Kernel RX RX RX RX TX TX TX TX C C C C C 15

  16. Key,challenges 1. Which'resources'to'add/remove'? inferred'by'Pareto'analysis – different'for'energy'proportionality'and' – workload'consolidation 2. When'to'add/remove'resources'?' Need'to'design'a'stable'control'loop – Different'triggers'to'add/remove'resources – 3. How'to'add/remove'cores Fast,'TCP6friendly'rebalancing'mechanism' – 16

  17. #1:,Resource,Adjustment,Policies Power*(W) A +Turbo) +core +ht +dvfs 17

  18. #2:,Detection • Centralized'queues'provide'a'single'point'of' load'detection in'sub6second'timescales RX Event@driven* TCP/IP TCP/IP FIFO app RX 18

  19. #2,Detection,(add) • Centralized'queues'provide'a'single'point'of' load'detection in'sub6second'timescales RX Event@driven* TCP/IP TCP/IP FIFO app Q RX queuing*delay**<*300*us 19

  20. #3:,TCP(friendly,add/remove,core Challenge:'maintain'coherence6free'IX'design • – Queues'dedicated'to'cores – Lock6free'TCP'stack TCP/I TCP/I App P P NIC TCP/I TCP/I App P P RSS* TCP/I TCP/I hash App P P Challenge:)inherent)race)condition)in)NIC)HW)update TCP/I TCP/I App P P Flow)groups 20

  21. #3:,Flow(group,Migration,Algorithm • TCP6friendly • Without'dropping'or'reordering'packets Completes)in)less)than)2)ms 95%)of)the)time 21

  22. Experimental,setup • Latency6sensitive' workload'– memcached – Energy,prop. ;'workload'consolidation • 3'demanding'synthetic'load'patterns: – slope ,' step , sine+noise – 4min+cycle+time • 10'load'generating'clients'+'1'latency'measuring' client'@1'second'intervals • 2752'connections;' Poisson6distribution 22

  23. Evaluation,– step,pattern Achieved*MRPS Add'a' core 99 th pct latency(µs) Adequate)compliance:)violations)~)1)second Time*(s) 23

  24. Evaluation,– step,pattern max dynamic Power*(W) Pareto Average:)max=91W))Dynamic=48W)Pareto=41W Time*(s) 24

  25. Evaluation Slope Step Sine+noise Max.*power 91 92 94 Proportionality Energy* Saving)44%.54%)of)processor)energy) ! (W) Measured 42 48 53 85%.93%)of)Pareto.optimal)bound Pareto* 39 41 45 optimal Consolidation* Opportunity* (%*of*peak) Measured 46% 39% 32% Server* Running)background)job)at)32%.46%)of)their) standalone)throughput) ! Pareto* 50% 47% 39% 82%.92%)of)the)Pareto.optimal)bound optimal 25

  26. Conclusion • Real'challenges'to'latency6sensitive' applications – Maintain'service6level'objectives' while – Minimize'energy'consumption' or – Maximize'workload'consolidation • Design'using'Pareto'methodology'to'determine' theoretical'bound'and'derive'control'policies • Implement'dynamic'resource'controls'to'IX' dataplane'operating'system 26

  27. Thank,you

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend