WattWatcher: Fine-Grained Power Estimation for Emerging Workloads - - PowerPoint PPT Presentation
WattWatcher: Fine-Grained Power Estimation for Emerging Workloads - - PowerPoint PPT Presentation
SBAC-PAD 2015 SBAC-PAD 2015 WattWatcher: Fine-Grained Power Estimation for Emerging Workloads Michael LeBeane, Jee Ho Ryoo , Reena Panda, Lizy K. John The University of Texas at Austin jr45842@utexas.edu SBAC-PAD 2015 Motivation
SBAC-PAD 2015
▪ Understanding power at a fine- granularity is still a challenge – Thermal effects, DVFS policies ▪ Not always easy for researchers ▪ Simple and detailed power estimation is extremely useful ▪ Some methods currently available...
Motivation
Michael LeBeane 2
OTHER 3% MC 6% L3 16% L2 31% LS 13% Fetch 9% ALU 9% OoO 13% OTHER 4% MC 8% L3 12% L2 17% LS 17% Fetch 12% ALU 13% OoO 17%
Time Watts
Sample Points Core 0
10/20/2015
SBAC-PAD 2015
▪ Direct Measurements [1]
– Hardware probes
▪ Curve Fitting [2,3,4,5,6]
– Machine learning models
▪ Power PMCs [6,7]
– E.g. Intel RAPL
▪ Simulators[9,10,11,12]
– E.g. McPAT plugins to simulation environment
Currently Available Methods
Michael LeBeane 10/20/2015 3
SBAC-PAD 2015
▪ Diverse design space to explore (subjective taxonomy)
Design Space
Michael LeBeane 10/20/2015
Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++
- ~us-ms
- Fast
Power PMCs +
- ~ms
= Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow
4
SBAC-PAD 2015
▪ Diverse design space to explore (subjective taxonomy)
Design Space
Michael LeBeane 10/20/2015
Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++
- ~us-ms
- Fast
Power PMCs +
- ~ms
= Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow WattWatcher + + ~ms + Fast
5
▪ WattWatcher offers functional-unit power breakdowns in real-time, on real hardware
SBAC-PAD 2015
WattWatcher Overview
6 Michael LeBeane 10/20/2015
…
Workload Performance Counters Access Estimator
ALU
…
Registers
L1$ System Configuration L2$ L3$
OoO Fetch Logic Package
SUT
Power Time
Core 0 Core 2 Core 3 Core 4
OTHER 4% MEM 8% L3 12% L2 17% LS 17% FE 12% ALU 13% OoO 17%
Configurable Power Model
▪ Online / Real-Time ▪ MCPAT-based ▪ Configurable ▪ Low Overhead
SBAC-PAD 2015
WattWatcher Overview
7 Michael LeBeane 10/20/2015
SBAC-PAD 2015
WattWatcher Hardware Events
8 Michael LeBeane 10/20/2015
Category Hardware Event General Context Switches Frequency Voltage Cycles Frontend Branch Mispredictions IC Misses iTLB Misses uops Issued LS/Caches L1 Misses/Hits L2 Misses LLC Misses dTLB Misses Execution FP Scalar FP Packed FP Width Retirement Uops Retired
▪ Hardware performance counters feed McPAT ▪ Some low-level McPAT events unavailable from counters ▪ Unavailable statistics estimated from available counters
SBAC-PAD 2015
WattWatcher Toolkit Overview
9 Michael LeBeane 10/20/2015
Counter Descriptor
WattWatcher Collector
Network Connection
Hardware Events
WattWatcher Analyzer Application Operating System
HW Event Interface
SUT Analyzer
Hardware Events Processor Descriptor
Input Formatter Customized McPAT Output Formatter
Power Analysis
WattWatcher Controller Admin Control
SBAC-PAD 2015
WattWatcher Toolkit Overview
10 Michael LeBeane 10/20/2015
Counter Descriptor
WattWatcher Collector
Network Connection
Hardware Events
WattWatcher Analyzer Application Operating System
HW Event Interface
SUT Analyzer
Hardware Events Processor Descriptor
Input Formatter Customized McPAT Output Formatter
Power Analysis
WattWatcher Controller Admin Control
SBAC-PAD 2015
WattWatcher Toolkit Overview
11 Michael LeBeane 10/20/2015
Counter Descriptor
WattWatcher Collector
Network Connection
Hardware Events
WattWatcher Analyzer Application Operating System
HW Event Interface
SUT Analyzer
Hardware Events Processor Descriptor
Input Formatter Customized McPAT Output Formatter
Power Analysis
WattWatcher Controller Admin Control
SBAC-PAD 2015
WattWatcher Toolkit Overview
12 Michael LeBeane 10/20/2015
Counter Descriptor
WattWatcher Collector
Network Connection
Hardware Events
WattWatcher Analyzer Application Operating System
HW Event Interface
SUT Analyzer
Hardware Events Processor Descriptor
Input Formatter Customized McPAT Output Formatter
Power Analysis
WattWatcher Controller Admin Control
SBAC-PAD 2015
WattWatcher Calibration
13 Michael LeBeane 10/20/2015
0.7855 0.8055 0.8256 0.8506 0.8756 0.8956 0.9207 0.9407 0.9797 0.9907 1.016 1.041 1.066 1.121 5 10 15 20 25 30 35 5 10 15 20 25 30 800 1000 1200 1400 1600 1800 2000 2200 Percentage Error Watts Frequency WattWatcher RAPL Error
▪ McPAT typically underestimates power [10] ▪ Small amount of course-grained calibration required ▪ More sophisticated corrections are available [13]
SBAC-PAD 2015
Verification
14 Michael LeBeane 10/20/2015
▪ Intel Sandy Bridge Laptop
– Intel i7 2720QM – 32nm Process – 45W TDP
▪ Workloads: SPECFP + SPECINT[11], PARSEC[12] ▪ Compared against RAPL counters ▪ Other SUTs: Intel Haswell and AMD Piledriver ▪ All results use previous coarse-grained calibration
SBAC-PAD 2015
Verification
15 Michael LeBeane 10/20/2015
10 20 30 10 12 14 16 18 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 Percentage Error Watts 10 20 30 10 12 14 16 18 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 301 Percentage Error Watts
bwaves : SPECfp xalancbmk : SPECint ▪ Pearson correlation coefficient (0.982, 0.995)
SBAC-PAD 2015
Verification
16 Michael LeBeane 10/20/2015 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MAPE (Percentage) MAE/RMSE (Watts) MAE RMSE MAPE 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 MAPE (Percentage) MAE/RMSE (Watts) MAE RMSE MAPE
▪ MAPE over all workloads is 2.67%
SBAC-PAD 2015
Case Studies 1: Per Core Power Measurements
17 Michael LeBeane 10/20/2015
10 13 16 19 22 1 16 31 46 61 Runtime (s) 3 6 3 6 3 6 3 6 1 11 21 31 41 51 61 Runtime (s) Phase 1 Phase 2 Phase 3
All Cores Aggregate Core 0 Core 1 Core 2 Core 3
Watts Watts Phase 1 Phase 2 Phase 3
▪ canneal workload (PARSEC) ▪ Per core and aggregate breakdown ▪ RAPL cannot provide core level breakdown
SBAC-PAD 2015
Case Studies 2: Big Data Workloads
18 Michael LeBeane 10/20/2015
10 20 30 40 50 60 20 40 60 80 100 120 140 % CPU Utilization Watts Leakage Power Dynamic Power CPU Utilization
▪ Big Data Workloads: Hadoop
SBAC-PAD 2015
Case Studies 3: Functional Unit Breakdowns
19 Michael LeBeane 10/20/2015
20 40 60 80 100 120 140 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 Total Watts Percentage of Total Watts Runtime (s) Fetch IC ALU LS L2 DC OoO Total
▪ Word count power breakdown
SBAC-PAD 2015
▪ WattWatcher fills an important role in power estimation techniques
– Real time results on real hardware – Highly configurable models – Minimal calibration required – Verified over different processors and vendors
- MAPE = 2.67% averaged over all benchmarks
– Illustrated over a number of interesting case studies
Thank you!
Conclusion
Michael LeBeane 10/20/2015 20
SBAC-PAD 2015
▪ [1] R. Ge et al., “Powerpack: Energy profiling and analysis of high performance systems and applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 5, pp. 658–671, May 2010. ▪ [2] W. Bircher and L. John, “Complete system power estimation: A trickledown approach based on performance events,” in ISPASS, April 2007, pp. 158–168. ▪ [3] G. Contreras and M. Martonosi, “Power prediction for intel xscale processors using performance monitoring unit events,” in ISLPED ’05, 2005. ▪ [4] S. Gurumurthi et al., “Using complete machine simulation for software power estimation: The softwatt approach,” in HPCA, 2002. ▪ [5] C. Isci and M. Martonosi, “Runtime power monitoring in high-end processors: methodology and empirical data,” in MICRO, Dec 2003, pp. 93–104. ▪ [6] R. Joseph and M. Martonosi, “Run-time power estimation in high performance microprocessors,” in ISLPED 2001 ▪ [7] “AMD BIOS and Kernel Developer’s Guide for AMD family 15h Models 00h-0Fh Processors,” http://support.amd.com/TechDocs/. ▪ [8] J. Dongarra et al., “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures,” in CGC, Nov 2012, pp. 274–281. ▪ [9] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: a framework for architectural-level power analysis and optimizations,” in ISCA, June 2000, pp. 83–94. ▪ [10] S. Li et al., “Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in MICRO, Dec 2009, pp. 469–480 ▪ [11] J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput. Archit. News. ▪ [12] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation, Princeton University, January 2011. ▪ [13] W. Lee, et. al. “PowerTrain: A Learning-based Calibration of McPAT Power Models ,” in ISLPED 2015
References
Michael LeBeane 10/20/2015 21