Towards Energy-Efficient Reactive Thermal Management in Instrumented - PowerPoint PPT Presentation

Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters Ivan Rodero 1 , Eun Kyung Lee 1 , Dario Pompili 1 , Manish Parashar 1 , Marc Gamell 2 , Renato J. Figueiredo 3 1 NSF Center for Autonomic Computing, Rutgers University, NJ, USA 2 Open University of Catalonia, Barcelona, Spain 3 NSF Center for Autonomic Computing, University of Florida, FL, USA Energy Efficient Grids, Clouds and Clusters, Brussels , October 26, 2010

Agenda  Context and Motivation  Datacenter Thermal Management  Energy Efficiency and Tradeoffs  Evaluation Methodology  Results  Next Steps  Conclusions 2

Energy-Efficient Autonomic Management for High Performance Computing Workloads Cross-infrastructure Power Management Application-aware Cloud Cloud Application/ Controller ¡ Observer ¡ Sensor Actuator Workload (private, (private, public, public, Controller ¡ Observer ¡ Virtualization Sensor Actuator hybrid, etc.) hybrid, etc.) Controller ¡ Observer ¡ Resources Sensor Actuator Physical Controller ¡ Observer ¡ Sensor Actuator Environment Cross-layer Power Management Instrumented infrastructure Virtualized  Goal : Autonomic (self-monitored and self-managed) computing systems:  Optimizing energy efficiency while ensuring Quality of Service delivered ( performance ) 3

Cross-Layer Architecture Observer ¡ Correla3ons ¡ Observer ¡1 ¡ Global Application/ Local Sensor 1 Controller Actuator Workload Controller Applica3on ¡req. ¡profiles ¡ Observer ¡2 ¡ Local Virtualization Sensor 2 Actuator 1 Controller VM ¡efficiency ¡ Observer ¡ Controls ¡ Observer ¡3 ¡ Local Resources Actuator 2 Sensor 3 Controller Resource ¡performance ¡ Observer ¡4 ¡ Physical Local Sensor 4 Actuator 3 Environment Controller Environment ¡predic3on ¡ Managed ¡Environment ¡ Observer’s ¡ ¡Sensing ¡Port ¡ Request ¡Flow ¡ Actuator ¡ Informa3on ¡Flow ¡ Resource ¡Flow ¡ Sensor ¡ Controller’s ¡Command ¡ 4

Cross-Layer Energy-Efficient Autonomic Management  Abnormal operational state detection  Distributed Online Clustering (e.g., workload)  Physical sensing physical layer (e.g., thermal hotspots)  Reactive and proactive approaches  Reacting to anomalies to return to steady state  Predict anomalies in order to avoid them QoS Different paths for reaching steady operational state QoS Abnormal Energy state Efficiency ? Cross-layer actions Thermal Energy efficiency Efficiency QoS Steady Thermal State (AUTONOMIC) efficiency Energy … Efficiency Thermal efficiency 5

Interactions between Autonomic Components Scheduling Global controller Observer (correlations) Provisioning and maping Workload Characterization Pinning Trading with (e.g., DOC) HPC Workload 3 rd parties VM Migration Proactive Reactive configuration Cooperate Environment Component-level monitoring Power (temperature, etc.) Management Reactive Resources configuration Depend monitoring Designs for (load, power, etc.) aggressive power management 6

Datacenter’s Thermal Behavior 8 6 Temperature [C] 4 2 0 � 2 15 60 10 40 5 20 0 0 Node Number Time [min] 8 6 Temperature [C] 4 Temporal correlation of the measured 2 temperature under different workload 0 � 2 distributions 15 60 10 40 5 20 0 0 Node Number Time [min] 8

Reacting to Thermal Hotspots 60 Steady Internal Server Temperature Environment (Hotspot) 55 50 Temperature (C) 45 40 35 Reaction : VM migration 30 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (s) 260 240 220 Power (W) 200 Correlation between server’s 180 temperature and power 160 140 120 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (s) 9

Thermal Management Approaches The lower power dissipated �  Assumption: � � � � The lower heat generated cpu ≈ C × α × V 2 × f P  Reducing the activity factor ( α )  VM Migration : move a VM to another server  May reduce CPU activity, but also memory activity, etc.  Potentially may result in lower CPU frequency if OS support  Overhead (suspend, transfer data, resume, etc.)  Requires availability in another server (impact on the target server) 10

Thermal Management Approaches (2)  Reducing the activity factor ( α )  Pinning (in Xen platform) : affinity in VCPUs – PCPUs mapping  CPUs without VMs running on them  OS power management may result in automatic DVFS  Penalty on the performance (resource sharing)  Reducing the frequency/voltage of CPUs ( V 2 × f )  Processor DVFS  Penalty on the performance (in general higher response time)  Different possibilities  Different frequencies/voltages  Applied to all CPUs/cores or to a subset 11

Goals and Tradeoffs  Goal: selection of appropriate technique to mitigate the effects of thermal hotspots  Energy-Efficient  Lower energy consumption  Lower maximum/average power dissipation  Driven by optimization requirements. Examples:  Reduce temperature 5 o C (based on a threshold)  A penalty of up to 10% on response time is acceptable  There are well known tradeoffs between p erformance and energy efficiency  But also we need to consider other dimensions such as thermal efficiency (temperature) 12

Goals and Tradeoffs (2)  Example: Tradeoff between temperature and performance of pinning 4 VMs into different PCPUs 13

Evaluation Methodology  Server configuration:  Two servers based on Intel quad-core Xeon processors  Operate at four frequencies ranging from 1.6GHz to 2.4GHz (but only 3 available under Xen 3.1)  CentOS Linux operating system with a patched 2.6.18 kernel with Xen version 3.1  Additional hardware:  A “Watts Up? .NET” power meter to empirically measure “instantaneous” power  Accuracy of ±1.5% of the measured power with sampling rate of 1Hz  TelosB motes to measure both internal (not sensors embedded into the CPU) and external temperatures  A Sunbeam SFH111 heater (directed to the servers) in order to emulate a thermal hotspot  Workload: HPL linpack

Energy Consumption “Estimation”  Use case: No running VMs in target server before migration 15

Results 55 46 260 Reference Reference Reference Migrate 1VM Migrate 1VM Migrate 1VM Migrate 2VMs External Temperature o C 44 Migrate 2VMs Migrate 2VMs Internal Temperature o C 240 Migrate 3VMs Migrate 3VMs Migrate 3VMs 50 42 220 External Temperature External Temperature 40 Power (W) 45 Power (W) 200 38 180 40 36 160 34 35 140 32 Correlation between 30 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 55 Time (s) Time (s) Time (s) 4CPUs at 2.40GHz 46 260 Internal Temperature o C 2CPUs at 1.60GHz 4CPUs at 2.40GHz 4CPUs at 2.40GHz internal and external 4CPUs at 2.13GHz External Temperature o C 2CPUs at 1.60GHz 2CPUs at 1.60GHz 4CPUs at 1.60GHz 44 4CPUs at 2.13GHz 4CPUs at 2.13GHz 240 50 4CPUs at 1.60GHz 4CPUs at 1.60GHz Correlation between temperature 42 External Temperature 220 External Temperature 45 40 Power (W) Power (W) 200 temperature and power 38 40 180 36 160 34 35 140 32 30 0 200 400 600 800 1000 1200 1400 1600 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 Time (s) 55 46 260 Time (s) Time (s) Reference Reference Reference Pinning VMs to 3CPUs Pinning VMs to 3CPUs Pinning VMs to 3CPUs Pinning VMs to 2CPUs External Temperature o C 44 Pinning VMs to 2CPUs Pinning VMs to 2CPUs Internal Temperature o C 240 Pinning VMs to 1CPU Pinning VMs to 1CPU Pinning VMs to 1CPU 50 42 220 External Temperature Internal Temperature DVFS: using 2 CPUs at 1.6 GHz 40 Power (W) 45 Power (W) 200 38 presents similar results than using 4 180 40 36 160 CPU at 2.13 GHz 34 35 140 32 30 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 Time (s) Time (s) Time (s) 16

Results (2) 17

Next Steps  Autonomic VM allocation and reactive technique decision  Cross-layer design approach  Examples: component-level power management, workload clustering, etc.  Application-aware (workload characterization into CPU-, I/O-, etc. intensive )  Optimization targets based on self-monitoring  Models are required  VM migration, DVFS (work presented in this presentation)  VM allocation (#VMs, workload characteristics, combinations, etc.)  Preliminary results based on a brute force algorithm  Models at the server and datacenter level 18

Conclusions  Tradeoffs exist between:  Performance  Energy efficiency  Thermal efficiency of reactive thermal management techniques for HPC workloads  Pinning is an effective mechanism to react to thermal anomalies under certain conditions  In addition to VM migration  In contrast to DVFS  Different mechanisms’ behaviors observed depending on the system characteristics and optimization goals.  Autonomic decision making is required  Cross-layer designs should improve datacenter’s management 20

Thank you! Energy E ffi cient High Performance Computing Initiative Center for Autonomic Computing, Rutgers University http://nsfcac.rutgers.edu/GreenHPC/ 21

Towards Energy-Efficient Reactive Thermal Management in Instrumented - PowerPoint PPT Presentation

Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters Ivan Rodero 1 , Eun Kyung Lee 1 , Dario Pompili 1 , Manish Parashar 1 , Marc Gamell 2 , Renato J. Figueiredo 3 1 NSF Center for Autonomic Computing, Rutgers

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Thermal Physics www.njctl.org Slide 3 / 163 Slide 4 / 163 Thermal Physics Temperature, Thermal

Thermal Physics www.njctl.org Slide 3 / 163 Thermal Physics Temperature, Thermal Equilibrium

Thermal Physics www.njctl.org Slide 3 / 163 Slide 4 / 163 Thermal Physics Temperature, Thermal

Reactive design patterns for microservices on multicore Reactive summit - 22/10/18

Reactive elements Building Web Applications in R with Shiny Reactive objects Building Web

Hybrid Energy Efficient Reactive Protocol For Wireless Sensor Networks Prepared by Saad Noor

Thermal Runaway Warning signs and preventive actions Presented by Peter DeMar 1 If thermal

1 8th Grade Thermal Energy Study Guide 20151009 www.njctl.org 2 Thermal Energy Study

Third Generation PV and Other Ways to Utilize Solar Energy Solar Thermal Energy III - Solar

All-new SDN-RX: Reactive Spring Data Neo4j Spring Data Neo4j / Neo4j-OGM Team Michael Simons

F.Maraninchi 2 Aspects and Reactive Systems Switch to full screen F.Maraninchi 0 Aspects and

List of content Energy efficient thermal packaging 1. Introduction to thermal issues 2.

Overview of thermal energy storage technologies and applications by Dr Peter Klein CSIR Agenda

FLEXIBLE THERMAL MANAGEMENT CIRCUITS FLEXIBLE THERMAL MANAGEMENT CIRCUITS BONDED DIRECTLY TO

Towards Interval Techniques The Thermal . . . Thermal Challenge . . . for Model Validation

C=Fermilab Managed by Fermi Research Alliance, LLC for the U.S. Department of Energy Office of

Cosmic rays in early Star-Forming Galaxies and their effects on the Interstellar Medium Ellis

Thermodynamics of Glaciers McCarthy Summer School, June 2010 Andy Aschwanden Arctic Region

ARTS/DExVal Derivation of Meaningful Experiments for Validation Prof. A. Haeberer, PUC-Rio

#electro heatsinks JellyBox HowTo: 24_Heat Sinks In this video, we add heatsinks to the motor

Assessment of Major Systems Cooling System S. Michael Modro Joint IAEA-ICTP Essential Knowledge

Evaluation of risks due to therm al stress before physical failure appearance Michael Hertl

Constraints on global carbon and heat exchanges from measurements of atmospheric O 2 and related