thermos
play

ThermOS System Support for Dynamic Thermal Management of Chip - PowerPoint PPT Presentation

22nd International Conference on Parallel Architectures and Compilation Techniques (PACT -22), 2013 September 9, 2013 Edinburgh, Scotland, UK ThermOS System Support for Dynamic Thermal Management of Chip Multi-Processors Filippo Sironi


  1. 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT -22), 2013 September 9, 2013 Edinburgh, Scotland, UK ThermOS System Support for Dynamic Thermal Management of Chip Multi-Processors Filippo Sironi (sironi@elet.polimi.it) Martina Maggio, Riccardo Cattaneo, Giovanni F. Del Nero Donatella Sciuto, Marco D. Santambrogio 1

  2. DVFS is dangerous! (I know this is scary) temperature increase (°C) 20 10 swaptions @ 2.80 GHz ab @ 2.80 GHz 0 0 100 200 300 400 500 600 time (s) 2

  3. DVFS is dangerous! (I know this is scary) temperature increase (°C) 20 10 swaptions @ 2.80 GHz ab @ 2.80 GHz 0 0 100 200 300 400 500 600 time (s) 2

  4. DVFS is dangerous! (I know this is scary) DVFS from 2.80 to 2.13 GHz Δ 1 temperature increase (°C) 20 Δ 2 10 swaptions @ 2.80 GHz Δ 2 ab @ 2.80 GHz swaptions w/ DVFS ab w/ DVFS 0 0 100 200 300 400 500 600 time (s) 2

  5. DVFS is dangerous! (I know this is scary) DVFS from 2.80 to 2.13 GHz Δ 1 temperature increase (°C) 20 it may impair multi-programmed workloads... Δ 2 think about multi-tenant virtualization 10 infrastructures! swaptions @ 2.80 GHz Δ 2 ab @ 2.80 GHz swaptions w/ DVFS ab w/ DVFS 0 0 100 200 300 400 500 600 time (s) 2

  6. Idle cycle injection improves! temperature increase (°C) 20 10 swaptions @ 2.80 GHz ab @ 2.80 GHz 0 0 100 200 300 400 500 600 time (s) 3

  7. Idle cycle injection improves! temperature increase (°C) 20 10 swaptions @ 2.80 GHz ab @ 2.80 GHz 0 0 100 200 300 400 500 600 time (s) 3

  8. Idle cycle injection improves! Δ 1 temperature increase (°C) 20 10 swaptions @ 2.80 GHz ab @ 2.80 GHz swaptions w/ ThermOS ab w/ ThermOS 0 0 100 200 300 400 500 600 time (s) 3

  9. Outline • Why DTM • DTM in commodity CMPs • ThermOS • Related work • Conclusions and Future work 4

  10. Why DTM • Transistors per unit of area are still increasing (Moore’s law) • Power density is getting worse as lithography advances (failure of Dennard’s law) • High temperature impairs performance, energy efficiency, and reliability (Srinivasan et al. in ISCA’04 [3]) 5

  11. DTM in commodity CMPs • Commodity CMPs exploits DVFS • DVFS has chip-wide side effects • DVFS with core-wide side effects becomes costly as soon as the core count overcomes 2 (Kim et al. in HPCA’08 [8]) • Intel Haswell supports per-core DVFS but integrated voltage regulators may cause high temperature • Side effects are especially bad in shared environments (e.g., multi-tenant virtualized infrastructures) 6

  12. DTM in commodity CMPs • Commodity CMPs exploits DVFS • DVFS has chip-wide side effects • DVFS with core-wide side effects becomes costly as soon as the core count overcomes 2 (Kim et al. in HPCA’08 [8]) • Intel Haswell supports per-core DVFS but integrated voltage regulators may cause high temperature • Side effects are especially bad in shared environments (e.g., multi-tenant virtualized infrastructures) software-driven DTM of CMPs 6

  13. ThermOS • Linear discrete-time modeling of temperature dynamic • Commodity solution to measure temperature (i.e., DTSs and MSRs) • Formal feedback control for idle cycle determination • Idle cycle injection via operating system scheduling 7

  14. Modeling of temperature dynamic • Modeling approaches either have shortcomings (Wattch, Brooks and Martonosi in HPCA’01 [14]) or require too many information and become impractical (HotSpot, Skadron et al. in TACO’01 [1]) • No need to understand the full temperature dynamic: we need the dynamic near the temperature threshold 8

  15. Modeling of temperature dynamic 50 w/o ICI temperature increase (°C) 40 30 20 10 0 0 50 100 150 200 time (ms) 9

  16. Modeling of temperature dynamic 50 w/o ICI temperature increase (°C) 40 30 20 10 0 0 50 100 150 200 time (ms) 9

  17. Modeling of temperature dynamic 50 w/o ICI temperature increase (°C) w/ ICI 40 30 20 10 0 0 50 100 150 200 time (ms) 9

  18. Modeling of temperature dynamic 50 w/o ICI temperature increase (°C) w/ ICI 40 30 20 40 10 80 90 100 110 0 0 50 100 150 200 time (ms) 9

  19. Modeling of temperature dynamic 50 w/o ICI temperature increase (°C) w/ ICI 40 30 T(k + 1) = a T(k) + b I(k) 20 40 10 80 90 100 110 0 0 50 100 150 200 time (ms) 9

  20. Linear discrete-time thermal model: offline estimation • Low overhead but requires the model to be conservative • Linear regression over 70% of a dataset of over 1.5 million of {temperature_next, temperature, idle} tuples; different regressions yields 95% prediction accuracy over the remaining 30% of the dataset • Estimated variances of a and b parameters is almost negligible 10

  21. Formal feedback control • Proportional-Integral (PI) controller • proportional term to capture the dependency from the current error (i.e., expected minus current temperature) • integral term to get the dependency from past errors • Synthesis of a “stable by definition” controller • Robust to estimation errors of the b parameter 11

  22. Formal feedback control idle = previous idle + • Proportional-Integral (PI) controller • proportional term to capture the dependency from the current error A current error - (i.e., expected minus current temperature) • integral term to get the dependency from past errors • Synthesis of a “stable by definition” controller B previous error • Robust to estimation errors of the b parameter 11

  23. Formal feedback control I(k) = I(k - 1) + • Proportional-Integral (PI) controller • proportional term to capture the dependency from the current error e(k) (1 - p) / b - (i.e., expected minus current temperature) • integral term to get the dependency from past errors • Synthesis of a “stable by definition” controller e(k - 1) a (1 - p) / b • Robust to estimation errors of the b parameter 11

  24. Idle cycle injection • Do not affect the scheduling of high-priority and vital tasks (e.g., real-time task and kernel tasks) • Exploit task scheduling and cpuidle (Pallipadi et al. in Linux Symposium’07 [10]) and is not invasive thanks to the use of the dynamic tick code • Alternative solutions are suboptimal from either a software engineering or an effectiveness stand point 12

  25. ThermOS _ T e I% I T C A P + - T S 13

  26. ThermOS _ T e I% I T C A P + - T S one feedback controller per core 13

  27. ThermOS 10 ms of control period _ T e I% I T C A P + - T S one feedback controller per core 13

  28. ThermOS max. 80% of control period 10 ms of control period _ T e I% I T C A P + - T S one feedback controller per core 13

  29. Evaluation platform • 4-core Intel Xeon (Nehalem) • From 1.60 GHz to 2.8 GHz • C0, C1E (3 us latency), C3 (20 us latency plus other overheads), and C6 (200 us latency plus many other overheads) C-states • Ambient temperature about (20 Celsius plus/minus 1) • Idle temperature about (28-32 Celsius plus/minus 1 depending on the core) • Modified Linux kernel 3.4 • PARSEC 2.1 benchmark 14

  30. Thermal profile temperature increase (°C) 50 40 swaptions @ core 0/2 swaptions @ core 1 swaptions @ core 3 30 470 480 490 500 510 520 530 time (s) 15

  31. Thermal profile temperature increase (°C) 50 temperature is not symmetric in CMPs 40 swaptions @ core 0/2 swaptions @ core 1 swaptions @ core 3 30 470 480 490 500 510 520 530 time (s) 15

  32. Research questions • Can ThermOS constraint the temperature and selectively affect applications in a multi- programmed workload? • How much ThermOS is efficient w.r.t. state of the art solutions? 16

  33. Management of multi-programmed workloads 55 temperature increase (°C) 50 45 40 37 swaptions @ core 0/2 35 swaptions @ core 1 swaptions @ core 3 30 510 520 530 540 550 560 570 time (s) 17

  34. Management of multi-programmed workloads 55 temperature increase (°C) 50 core 3: 91% in C0, 5% in C1E, 4% in C3 45 core 2: 91% in C0, 6% in C1E, 3% in C3 core 0: 91% in C0, 7% in C1E, 2% in C3 40 core 1: 92% in C0, 8% in C1E, 0% in C3 37 swaptions @ core 0/2 35 swaptions @ core 1 swaptions @ core 3 30 510 520 530 540 550 560 570 time (s) 17

  35. State of the art solutions • Dimetrodon (Bailis et al. in DAC’11 [4]) • Probabilistic feedforward control inside the FreeBSD 7.2 task scheduler • We swipe the idle quantum/probability configuration space • VFS • We statically select the following frequencies (and the associated voltages): 2.79, 2.66, 2.53, 2.39, 2.26, 2.13 GHz 18

  36. Efficiency with multi-programmed workloads 100 performance (%) 90 80 Dimetrodon ThermOS VFS 70 0 10 20 30 40 temperature decrease (%) 19

  37. Efficiency with multi-programmed workloads 100 dynamic power is proportional to C V**2 f performance (%) 90 as the supply voltage approaches its threshold 80 DVFS will loose most of its efficiency Dimetrodon ThermOS VFS 70 0 10 20 30 40 temperature decrease (%) 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend