the computational sprinting game
play

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi - PowerPoint PPT Presentation

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi , Benjamin C. Lee { songchun.fan, seyedmajid.zahedi, benjamin.c.lee } @duke.edu [ Co-First Authors] Computational Sprinting Supply extra power to enhance performance


  1. The Computational Sprinting Game Songchun Fan ∗ , Seyed Majid Zahedi ∗ , Benjamin C. Lee { songchun.fan, seyedmajid.zahedi, benjamin.c.lee } @duke.edu [ ∗ Co-First Authors]

  2. Computational Sprinting • Supply extra power to enhance performance for short durations • Activate more cores, boost voltage/frequency 2 / 25

  3. Computational Sprinting • Supply extra power to enhance performance for short durations • Activate more cores, boost voltage/frequency Non−sprinting Sprinting Average Temperature (°C) 1.5 Normalized Speedup 6 50 Normalized Power 5 40 1.0 4 30 3 20 2 0.5 10 1 0 0.0 0 e n t m r s s n k c e e n t m r s s n k c e e n t m r s s n k c e n a n n a n l n c v o n l o n c l v o a n l o n c l v o a o l e v e a g a g e v e g i i a i a i e v e a a a i i a i a a s i s n t n a s i s t i n s i s n t n d e a r i n r d e a r n i e a n i d e a n i i e a c a l i m l c i e a c a l m l e i a l m l e e g i e r g r e e g i r r k r r r g k r a t d g a t d g k r a t d r r r o p o p o p c c c 2 / 25

  4. Sprinting Architecture • Power for sprints supplied by shared rack • Heat from sprints absorbed by thermal packages Fig. www.fortlax.se and Raghavan, Arun, et al. ”Computational sprinting on a hardware/software testbed.” 3 / 25

  5. Power Emergencies 3600 L o n g - d e l a y C o n v e n t i o n a l T r i p p i n g Duration of Current Draw (sec) Non-deterministic S h o r t C i r c u i t • Sprints may trip breaker 120 • Current ↑ with sprinters T olerance Band 2 P =1 trip P =0 trip Tripped • Time ↑ with sprint duration 0.1 • Risk ↑ with current, time Not Tripped 1 2 3 5 10 20 Current Normalized to Rated Current Fig. Fu, Wang, and Lefurgy. ”How much power oversubscription is safe and allowed in data centers.” 4 / 25

  6. Uninterruptible Power Supplies • When sprints trip breaker, draw on batteries • When sprints complete, recharge batteries Fig. www.amper-ecuador.com 5 / 25

  7. Example – Private Clouds • Applications compute on servers that share power • Processors sprint independently • Processors sprint selfishly for performance Fig. Google, www.lasknet.net 6 / 25

  8. Sprinting Management When should processors sprint? • Phases with higher performance from sprints • But sprints prohibited as chip cools Which processors should sprint? • Processors that benefit most from sprints • But sprints prohibited as batteries recover 7 / 25

  9. Management Desiderata Individual Performance • Sprints account for phase behavior • Sprints now constrain future sprints System Stability • Sprints account for others’ sprinting strategies • Sprints risk power emergencies 8 / 25

  10. Sprinting Strategy • Optimize sprints given constraints • Sprint, wait ∆ cooling for chip cooling • Sprint, wait ∆ recovery for rack recovery if breaker trips ● 8 ● Utility from Sprint ● ● ? ● ● ● ● 7 ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● 0 5 10 15 20 25 30 Epoch 9 / 25

  11. Sprinting Strategy • Optimize sprints given constraints • Sprint, wait ∆ cooling for chip cooling • Sprint, wait ∆ recovery for rack recovery if breaker trips × ● × 8 × ● Utility from Sprint ● × × ● ● ● ● ● 7 ● ● × × ● ● ● ● ● × × ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● 0 5 10 15 20 25 30 Epoch 9 / 25

  12. Game Theory Study strategic agents • Agents selfishly maximize individual utility Optimize responses • Response maximizes utility, given others’ strategies Find equilibrium • State where all agents play their best responses 10 / 25

  13. Sprinting Game States • Active – can sprint • Cooling – cannot sprint, chip cooling • Recovery – cannot sprint, batteries recharging Actions • Sprint or not, when active Strategies • Agent’s state, app’s phase, history, ... • Others’ strategies, utilities, and states, ... 11 / 25

  14. Mean Field Equilibrium (MFE) Challenges • Large system with many agents • Complex strategies and many competitors • Intractable optimization for best response Solution • Abstract many agents with statistical distributions • Optimize agents’ strategies against expectations 12 / 25

  15. Equilibrium Strategy Agents maximize expected value of (not) sprinting • Current state • Utility from sprinting, u • Probability of tripping, P trip Agents employ threshold strategy • If active and u ≥ u T , then sprint 13 / 25

  16. Find Equilibrium – Offline • Initialize probability of breaker trip P trip • Given P trip , optimize threshold strategy u T • Given u T , estimate number of sprinters N • Given N , update probability P ′ trip • Iterate if P ′ trip � = P trip 14 / 25

  17. Execute Strategy – Online If active and u ≥ u T , then sprint 15 / 25

  18. Sprinting Thresholds Linear Regression PageRank 0.4 0.3 0.20 Density Density 0.0 0.1 0.2 0.10 0.00 2 4 5 3 6 0 5 10 15 Utility from Sprint Utility from Sprint • Thresholds are optimal and diverse • Agents behave strategically to maximize performance 16 / 25

  19. Management Architecture Coordinator Alg 1 Pro fi le Strategy User User User Agent Agent Agent Predictor Predictor Predictor . . . Executor Engine Executor Engine Executor Engine T ask T ask T ask • Offline : coordinator profiles utility, optimizes thresholds • Online : predictors estimate sprint utility • Online : agents apply threshold strategy • Online : executor adapts computation 17 / 25

  20. Experimental Methodology Sprinting • 3 cores @1.2GHz → 12 cores @ 2.7GHz Workloads • Apache Spark • Spark engine dynamically schedules tasks on active cores Performance Metric • Tasks completed per second (TPS) Simulation Method • R-based simulator using traces of Spark computation 18 / 25

  21. Management Policies Greedy • Sprint if neither cooling nor recovering Exponential Back-off • Sprint if neither cooling nor recovering • Wait randomly for U[0, 2 k ] epochs after k th trip Cooperative Threshold • Enforce globally optimized threshold Equilibrium Threshold • Announce decentralized, strategic threshold 19 / 25

  22. Case for Equilibria + Equilibrium Cooperative Performance + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate 20 / 25

  23. Case for Equilibria + Equilibrium Cooperative Performance - + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate • Cooperative (–): enforce strategies globally 20 / 25

  24. Case for Equilibria + + Equilibrium Cooperative Performance - + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate • Cooperative (–): enforce strategies globally • Equilibrium (+): maximize individual performance 20 / 25

  25. Sprinting Behavior 600 Greedy 300 Number of Sprinting Users 0 600 Exponential Backoff 300 0 600 Cooperative Threshold 300 0 600 Equilibirum Threshold 300 0 0 200 400 600 800 1000 Epoch Index 21 / 25

  26. Sprinting Performance Performance (Normalized to Greedy) Greedy 6 Exponential Backoff Equilibrium Threshold 5 Cooperative Threshold 4 3 2 1 0 e n m s s n k c e t r n a v o n l o n c l e v e a g i a a a i i s i s n t n d e a r n i e a c a i m l l e e g i r r k r d g a t r o p c • Greedy – aggressive, incurs emergencies • Exponential – conservative, untimely sprints • Equilibrium – strategic, produces equilibrium • Cooperative – optimal, requires enforcement 22 / 25

  27. Game States Active (not sprinting) Global recovery Local cooling Sprinting 100% 75% 50% 25% 0% Greedy Exponential Equilibrium Cooperative • Greedy – time in recovery • Exponential – untimely sprints • Equilibrium – timely sprints • Cooperative – timely sprints 23 / 25

  28. Conclusion Management with game theory • Agents sprint according to threshold – inexpensive • Agents have no incentives to deviate – stable • Agents optimize response – high performance Future directions • Use game theory to manage scarce resources • E.g., big/small processors, accelerators 24 / 25

  29. Thank you Questions? 25 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend