A POWER CAPPING APPROACH FOR HPC SYSTEM DEMAND RESPONSE
Kishwar Ahmed, Research Aide MCS Division Argonne National Laboratory, IL, USA Mentor: Kazutomo Yoshii MCS Division Argonne National Laboratory, IL, USA
A POWER CAPPING APPROACH FOR HPC SYSTEM DEMAND RESPONSE Kishwar - - PowerPoint PPT Presentation
A POWER CAPPING APPROACH FOR HPC SYSTEM DEMAND RESPONSE Kishwar Ahmed , Research Aide MCS Division Argonne National Laboratory, IL, USA Mentor: Kazutomo Yoshii MCS Division Argonne National Laboratory, IL, USA 2 Outline Motivation
Kishwar Ahmed, Research Aide MCS Division Argonne National Laboratory, IL, USA Mentor: Kazutomo Yoshii MCS Division Argonne National Laboratory, IL, USA
2
Energy reduction target at PJM on January 2014 3
4
5
6
7
migration, resource allocation)
communication phases
8
balancing
center workload
intolerant
9
10
Benchmark Type Description Applications Application Description Scalable science benchmarks Expected to run at full scale of the CORAL systems HACC, Nekbone, etc. Compute intensity, small messages, allreduce Throughput benchmarks Represent large ensemble runs UMT2013, AMG2013, SNAP LULESH, etc. Shock hydrodynamics for unstructured meshes. Data Centric Benchmarks Represent emerging data intensive workloads – Integer
indirect addressing Graph500, Hash, etc. Parallel hash benchmark Skeleton Benchmarks Investigate various platform characteristics including network performance, threading
CLOMP, XSBench, etc. Stresses system through memory capacity. 11
Benchmark Type Description Applications Application Description NAS Parallel Benchmarks A small set of programs designed to help evaluate the performance of parallel supercomputers IS, EP, FT, CG CG - Conjugate Gradient method Dense-matrix multiply benchmarks A simple, multi-threaded, dense-matrix multiply
designed to measure the sustained, floating-point computational rate of a single node MT-DGEMM, Intel MKL DGEMM MT-DGEMM: The source code given by NERSC (National Energy Research Scientific Computing Center) Intel MKL DGEMM: The source code given by Intel to multiply matrix Processor Stress Test Utility N/A FIRESTARTER Maximizes the energy consumption of 64-Bit x86 processors by generating heavy load on the execution units as well as transferring data between the cores and multiple levels of the memory hierarchy. 12
13
../tools/pycoolr/clr_rapl.py --limitp=140 etrace2 mpirun -n 32 bin/cg.D.32 ../tools/pycoolr/clr_rapl.py --limitp=120 etrace2 mpirun -n 32 bin/cg.D.32
Output: p0 140.0 p1 140.0 NAS Parallel Benchmarks 3.3 -- CG Benchmark Size: 1500000 Iterations: 100 Number of active processes: 32 Number of nonzeroes per row: 21 Eigenvalue shift: .500E+03 iteration ||r|| zeta 1 0.73652606305295E-12 499.9996989885352 ... # ETRACE2_VERSION=0.1 # ELAPSED=1652.960293 # ENERGY=91937.964940 # ENERGY_SOCKET0=21333.227051 # ENERGY_DRAM0=30015.779454 # ENERGY_SOCKET1=15409.632036 # ENERGY_DRAM1=25180.102634
14
../tools/pycoolr/clr_rapl.py --limitp=140 mpirun -n 32 ./nekbone ex1 ./coolrs.py > nekbone.out
{"sample":"temp","time": 1499822397.016,"node":"protos","p0":{"mean": 34.89 ,"std":1.20 ,"min":33.00 ,"max":36.00 ,"0": 33,"1":33,"2":35,"3":36,"4":35,"5":36,"6":36,"7": 34,"pkg":36}} {"sample":"energy","time": 1499822397.017,"node":"protos","label":"run","energ y":{"p0":57706365709,"p0/core":4262338717,"p0/ dram":62433931283,"p1":15467688771,"p1/core": 18329000806,"p1/dram":55726072673},"power": {"p0":16.3,"p0/core":4.6,"p0/dram":1.4,"p1": 16.7,"p1/core":4.8,"p1/dram":0.9,"total": 35.3},"powercap":{"p0":140.0,"p0/core":0.0,"p0/ dram":0.0,"p1":140.0,"p1/core":0.0,"p1/dram":0.0}}
Computing Center
Dell server’s BIOS
temperature data
15
35 40 45 50 55 60 65 70 75 80 40 60 80 100 120 140 160 Power (W) Power Capping (W) NPB (CG, Size C): Processor 0
16
35 40 45 50 55 60 65 70 75 40 60 80 100 120 140 160 Power (W) Power Capping (W) NPB (CG, Size C): Processor 1 20 25 30 35 40 40 60 80 100 120 140 160 Execution Time (s) Power Capping (W) NPB benchmark (CG, Size: C) 35 40 45 50 55 60 65 70 75 40 60 80 100 Power (W) Power Capping (W) NPB (CG, Size D): Processor 1 35 40 45 50 55 60 65 70 75 40 60 80 100 Power (W) Power Capping (W) NPB (CG, Size D): Processor 0 1650 1700 1750 1800 1850 1900 1950 2000 40 60 80 100 Execution Time (s) Power Capping (W) NPB benchmark (CG, Size: D) 35 40 45 50 55 60 65 70 75 80 40 60 80 100 120 140 160 Power (W) Power Capping (W) Nekbone: Processor 0 35 40 45 50 55 60 65 70 75 80 85 40 60 80 100 120 140 160 Power (W) Power Capping (W) Nekbone: Processor 1 34 36 38 40 42 44 46 48 50 52 40 60 80 100 120 140 160 Execution Time (s) Power Capping (W) Nekbone
36 38 40 42 44 46 48 50 52 54 56 58 40 60 80 100 120 140 Power (W) Power Capping (W) XSBench: Processor 0
17
35 40 45 50 55 60 40 60 80 100 120 140 Power (W) Power Capping (W) XSBench: Processor 1 38 40 42 44 46 48 50 52 54 40 60 80 100 120 140 Execution Time (s) Power Capping (W) XSBench 35 40 45 50 55 60 65 70 40 60 80 100 120 140 Power (W) Power Capping (W) DGEMM: Processor 0 35 40 45 50 55 60 65 70 40 60 80 100 120 140 Power (W) Power Capping (W) DGEMM: Processor 1 170 180 190 200 210 220 230 240 250 260 40 60 80 100 120 140 Execution Time (s) Power Capping (W) DGEMM 40 45 50 55 60 65 70 40 60 80 100 120 140 Power (W) Power Capping (W) AMG: Processor 0 35 40 45 50 55 60 65 70 75 40 60 80 100 120 140 Power (W) Power Capping (W) AMG: Processor 1 110 115 120 125 130 135 140 145 40 60 80 100 120 140 Execution Time (s) Power Capping (W) AMG
38 40 42 44 46 48 50 52 5 10 15 20 25 30 35 40 45 50 Power (W) Time (s) Effect of Running Graph500 Application Processor 0 Processor 1
18
48 50 52 54 56 58 60 5 10 15 20 25 30 35 40 45 50 Temperature (C) Time (s) Effect of Running Graph500 Application Processor 0 Processor 1 10 20 30 40 50 60 70 80 5 10 15 20 25 30 35 40 45 50 Power (W) Time (s) n#6p#0 n#6p#1 n#60p#0 n#60p#1 35 40 45 50 55 60 65 70 75 10 20 30 40 50 Temperature (C) Time (s) n#6p#0 n#6p#1 n#60p#0 n#60p#1
19
20
21
40 50 60 70 80 90 40 60 80 100 120 140 Average Power (Watt) Power Cap Limit (Watt) AMG Model Data Experiment Data
22
40 50 60 70 80 90 40 60 80 100 120 140 Average Power (Watt) Power Cap Limit (Watt) NAS Parallel Benchmark: CG Model Data Experiment Data 40 50 60 70 80 90 40 60 80 100 120 140 Average Power (Watt) Power Cap Limit (Watt) DGEMM Model Data Experiment Data 40 50 60 70 80 90 40 60 80 100 120 140 Average Power (Watt) Power Cap Limit (Watt) XSBench Model Data Experiment Data
105 110 115 120 125 130 135 140 145 150 40 60 80 100 120 140 Execution Time (Sec) Power Cap Limit (Watt) AMG Model Data Experiment Data
23
26 27 28 29 30 31 32 33 34 35 36 37 40 60 80 100 120 140 Execution Time (Sec) Power Cap Limit (Watt) NAS Parallel Benchmark: CG Model Data Experiment Data 170 180 190 200 210 220 230 240 250 260 40 60 80 100 120 140 Execution Time (Sec) Power Cap Limit (Watt) DGEMM Model Data Experiment Data 38 40 42 44 46 48 50 52 54 40 60 80 100 120 140 Execution Time (Sec) Power Cap Limit (Watt) XSBench Model Data Experiment Data
24
such as C or C++
al.)
25
26 Job Dispatcher Job Executioner Execution Policies
Waiting Jobs Running Jobs
Job Arrival Job Departure Job Eviction
Application Models Power Models Performance Models Resource Manager Processor Allocation Power Allocation
Power Demand Change
100 200 300 400 500 600 1e+06 2e+06 3e+06 4e+06 5e+06 Available Processors Time (s) PYSS Our Simulator 10 20 30 40 50 60 1e+06 2e+06 3e+06 4e+06 5e+06 Queue Length Time (s) PYSS Our Simulator
27
runtime
28
50000 55000 60000 65000 70000 25 50 75 100 Average Energy Consumption (J) Demand Response Events Ratio (%) Demand-response Performance-mode 20 40 60 80 100 120 25 50 75 100 Average Execution Time (Sec) Demand Response Events Ratio (%) Demand-response Performance-mode
29
30
3000 3500 4000 4500 5000 5500 6000 6500 7000 69 69.5 70 70.5 71 71.5 72 Round Per Minute (RPM) Temperature (C) Model Data Empirical Data 0.5 1 1.5 2 2.5 3 3000 4000 5000 6000 7000 8000 9000 Fan Power (Watt) Round Per Minute (RPM) 0.26-2e-4XRPM+5e-8XRPM2
31
2 4 6 8 10 12 14 16 18 1 2 3 4 Average Fan Power Per Job (W) Number of Processors (Times) Thermal-aware Thermal-unaware
32
33
allocation
issue
characteristics (e.g., power usage)
34
35