WattWatcher: Fine-Grained Power Estimation for Emerging Workloads - - PowerPoint PPT Presentation

wattwatcher fine grained power estimation for emerging
SMART_READER_LITE
LIVE PREVIEW

WattWatcher: Fine-Grained Power Estimation for Emerging Workloads - - PowerPoint PPT Presentation

SBAC-PAD 2015 SBAC-PAD 2015 WattWatcher: Fine-Grained Power Estimation for Emerging Workloads Michael LeBeane, Jee Ho Ryoo , Reena Panda, Lizy K. John The University of Texas at Austin jr45842@utexas.edu SBAC-PAD 2015 Motivation


slide-1
SLIDE 1

SBAC-PAD 2015

WattWatcher: Fine-Grained Power Estimation for Emerging Workloads

Michael LeBeane, Jee Ho Ryoo, Reena Panda, Lizy K. John The University of Texas at Austin jr45842@utexas.edu SBAC-PAD 2015

slide-2
SLIDE 2

SBAC-PAD 2015

▪ Understanding power at a fine- granularity is still a challenge – Thermal effects, DVFS policies ▪ Not always easy for researchers ▪ Simple and detailed power estimation is extremely useful ▪ Some methods currently available...

Motivation

Michael LeBeane 2

OTHER 3% MC 6% L3 16% L2 31% LS 13% Fetch 9% ALU 9% OoO 13% OTHER 4% MC 8% L3 12% L2 17% LS 17% Fetch 12% ALU 13% OoO 17%

Time Watts

Sample Points Core 0

10/20/2015

slide-3
SLIDE 3

SBAC-PAD 2015

▪ Direct Measurements [1]

– Hardware probes

▪ Curve Fitting [2,3,4,5,6]

– Machine learning models

▪ Power PMCs [6,7]

– E.g. Intel RAPL

▪ Simulators[9,10,11,12]

– E.g. McPAT plugins to simulation environment

Currently Available Methods

Michael LeBeane 10/20/2015 3

slide-4
SLIDE 4

SBAC-PAD 2015

▪ Diverse design space to explore (subjective taxonomy)

Design Space

Michael LeBeane 10/20/2015

Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++

  • ~us-ms
  • Fast

Power PMCs +

  • ~ms

= Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow

4

slide-5
SLIDE 5

SBAC-PAD 2015

▪ Diverse design space to explore (subjective taxonomy)

Design Space

Michael LeBeane 10/20/2015

Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++

  • ~us-ms
  • Fast

Power PMCs +

  • ~ms

= Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow WattWatcher + + ~ms + Fast

5

▪ WattWatcher offers functional-unit power breakdowns in real-time, on real hardware

slide-6
SLIDE 6

SBAC-PAD 2015

WattWatcher Overview

6 Michael LeBeane 10/20/2015

Workload Performance Counters Access Estimator

ALU

Registers

L1$ System Configuration L2$ L3$

OoO Fetch Logic Package

SUT

Power Time

Core 0 Core 2 Core 3 Core 4

OTHER 4% MEM 8% L3 12% L2 17% LS 17% FE 12% ALU 13% OoO 17%

Configurable Power Model

▪ Online / Real-Time ▪ MCPAT-based ▪ Configurable ▪ Low Overhead

slide-7
SLIDE 7

SBAC-PAD 2015

WattWatcher Overview

7 Michael LeBeane 10/20/2015

slide-8
SLIDE 8

SBAC-PAD 2015

WattWatcher Hardware Events

8 Michael LeBeane 10/20/2015

Category Hardware Event General Context Switches Frequency Voltage Cycles Frontend Branch Mispredictions IC Misses iTLB Misses uops Issued LS/Caches L1 Misses/Hits L2 Misses LLC Misses dTLB Misses Execution FP Scalar FP Packed FP Width Retirement Uops Retired

▪ Hardware performance counters feed McPAT ▪ Some low-level McPAT events unavailable from counters ▪ Unavailable statistics estimated from available counters

slide-9
SLIDE 9

SBAC-PAD 2015

WattWatcher Toolkit Overview

9 Michael LeBeane 10/20/2015

Counter Descriptor

WattWatcher Collector

Network Connection

Hardware Events

WattWatcher Analyzer Application Operating System

HW Event Interface

SUT Analyzer

Hardware Events Processor Descriptor

Input Formatter Customized McPAT Output Formatter

Power Analysis

WattWatcher Controller Admin Control

slide-10
SLIDE 10

SBAC-PAD 2015

WattWatcher Toolkit Overview

10 Michael LeBeane 10/20/2015

Counter Descriptor

WattWatcher Collector

Network Connection

Hardware Events

WattWatcher Analyzer Application Operating System

HW Event Interface

SUT Analyzer

Hardware Events Processor Descriptor

Input Formatter Customized McPAT Output Formatter

Power Analysis

WattWatcher Controller Admin Control

slide-11
SLIDE 11

SBAC-PAD 2015

WattWatcher Toolkit Overview

11 Michael LeBeane 10/20/2015

Counter Descriptor

WattWatcher Collector

Network Connection

Hardware Events

WattWatcher Analyzer Application Operating System

HW Event Interface

SUT Analyzer

Hardware Events Processor Descriptor

Input Formatter Customized McPAT Output Formatter

Power Analysis

WattWatcher Controller Admin Control

slide-12
SLIDE 12

SBAC-PAD 2015

WattWatcher Toolkit Overview

12 Michael LeBeane 10/20/2015

Counter Descriptor

WattWatcher Collector

Network Connection

Hardware Events

WattWatcher Analyzer Application Operating System

HW Event Interface

SUT Analyzer

Hardware Events Processor Descriptor

Input Formatter Customized McPAT Output Formatter

Power Analysis

WattWatcher Controller Admin Control

slide-13
SLIDE 13

SBAC-PAD 2015

WattWatcher Calibration

13 Michael LeBeane 10/20/2015

0.7855 0.8055 0.8256 0.8506 0.8756 0.8956 0.9207 0.9407 0.9797 0.9907 1.016 1.041 1.066 1.121 5 10 15 20 25 30 35 5 10 15 20 25 30 800 1000 1200 1400 1600 1800 2000 2200 Percentage Error Watts Frequency WattWatcher RAPL Error

▪ McPAT typically underestimates power [10] ▪ Small amount of course-grained calibration required ▪ More sophisticated corrections are available [13]

slide-14
SLIDE 14

SBAC-PAD 2015

Verification

14 Michael LeBeane 10/20/2015

▪ Intel Sandy Bridge Laptop

– Intel i7 2720QM – 32nm Process – 45W TDP

▪ Workloads: SPECFP + SPECINT[11], PARSEC[12] ▪ Compared against RAPL counters ▪ Other SUTs: Intel Haswell and AMD Piledriver ▪ All results use previous coarse-grained calibration

slide-15
SLIDE 15

SBAC-PAD 2015

Verification

15 Michael LeBeane 10/20/2015

10 20 30 10 12 14 16 18 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 Percentage Error Watts 10 20 30 10 12 14 16 18 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 301 Percentage Error Watts

bwaves : SPECfp xalancbmk : SPECint ▪ Pearson correlation coefficient (0.982, 0.995)

slide-16
SLIDE 16

SBAC-PAD 2015

Verification

16 Michael LeBeane 10/20/2015 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MAPE (Percentage) MAE/RMSE (Watts) MAE RMSE MAPE 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 MAPE (Percentage) MAE/RMSE (Watts) MAE RMSE MAPE

▪ MAPE over all workloads is 2.67%

slide-17
SLIDE 17

SBAC-PAD 2015

Case Studies 1: Per Core Power Measurements

17 Michael LeBeane 10/20/2015

10 13 16 19 22 1 16 31 46 61 Runtime (s) 3 6 3 6 3 6 3 6 1 11 21 31 41 51 61 Runtime (s) Phase 1 Phase 2 Phase 3

All Cores Aggregate Core 0 Core 1 Core 2 Core 3

Watts Watts Phase 1 Phase 2 Phase 3

▪ canneal workload (PARSEC) ▪ Per core and aggregate breakdown ▪ RAPL cannot provide core level breakdown

slide-18
SLIDE 18

SBAC-PAD 2015

Case Studies 2: Big Data Workloads

18 Michael LeBeane 10/20/2015

10 20 30 40 50 60 20 40 60 80 100 120 140 % CPU Utilization Watts Leakage Power Dynamic Power CPU Utilization

▪ Big Data Workloads: Hadoop

slide-19
SLIDE 19

SBAC-PAD 2015

Case Studies 3: Functional Unit Breakdowns

19 Michael LeBeane 10/20/2015

20 40 60 80 100 120 140 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 Total Watts Percentage of Total Watts Runtime (s) Fetch IC ALU LS L2 DC OoO Total

▪ Word count power breakdown

slide-20
SLIDE 20

SBAC-PAD 2015

▪ WattWatcher fills an important role in power estimation techniques

– Real time results on real hardware – Highly configurable models – Minimal calibration required – Verified over different processors and vendors

  • MAPE = 2.67% averaged over all benchmarks

– Illustrated over a number of interesting case studies

Thank you!

Conclusion

Michael LeBeane 10/20/2015 20

slide-21
SLIDE 21

SBAC-PAD 2015

▪ [1] R. Ge et al., “Powerpack: Energy profiling and analysis of high performance systems and applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 5, pp. 658–671, May 2010. ▪ [2] W. Bircher and L. John, “Complete system power estimation: A trickledown approach based on performance events,” in ISPASS, April 2007, pp. 158–168. ▪ [3] G. Contreras and M. Martonosi, “Power prediction for intel xscale processors using performance monitoring unit events,” in ISLPED ’05, 2005. ▪ [4] S. Gurumurthi et al., “Using complete machine simulation for software power estimation: The softwatt approach,” in HPCA, 2002. ▪ [5] C. Isci and M. Martonosi, “Runtime power monitoring in high-end processors: methodology and empirical data,” in MICRO, Dec 2003, pp. 93–104. ▪ [6] R. Joseph and M. Martonosi, “Run-time power estimation in high performance microprocessors,” in ISLPED 2001 ▪ [7] “AMD BIOS and Kernel Developer’s Guide for AMD family 15h Models 00h-0Fh Processors,” http://support.amd.com/TechDocs/. ▪ [8] J. Dongarra et al., “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures,” in CGC, Nov 2012, pp. 274–281. ▪ [9] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: a framework for architectural-level power analysis and optimizations,” in ISCA, June 2000, pp. 83–94. ▪ [10] S. Li et al., “Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in MICRO, Dec 2009, pp. 469–480 ▪ [11] J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput. Archit. News. ▪ [12] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation, Princeton University, January 2011. ▪ [13] W. Lee, et. al. “PowerTrain: A Learning-based Calibration of McPAT Power Models ,” in ISLPED 2015

References

Michael LeBeane 10/20/2015 21