Power-performance modeling, analyses and challenges Kirk W. Cameron - - PowerPoint PPT Presentation
Power-performance modeling, analyses and challenges Kirk W. Cameron - - PowerPoint PPT Presentation
11 th Charm++ Workshop: Power-performance modeling, analyses and challenges Kirk W. Cameron Computer Science Virginia Tech This material is based upon work supported by the National Science Foundation under Grant No. 0910784 and 0905187. My
Power-performance modeling, analyses and challenges
Kirk W. Cameron Computer Science Virginia Tech
This material is based upon work supported by the National Science Foundation under Grant No. 0910784 and 0905187. 11th Charm++ Workshop:
My Green HPC Upbringings
- Over $6M related federal funding (since ‘04)
(NSF, DOE, SBIR, IBM, Intel, and others)
- EPA Energy Star for servers (since ‘05)
- SPECPower Founding Member (since ‘05)
- Co-founder Green500 (since ‘06)
- Green IT Columnist (IEEE Computer)
- CEO and Founder, MiserWare Inc. (since ‘07)
The way we were (circa 2003)
Source: CAREER: High-performance, Power-aware Computing
- K. Cameron, NSF CCF-0347683, 3/1/04-2/28/09)
Getting there…
From 2007-2012… [6x ↑ Flops/watt] [~2.5x ↑ power consumption] Projections for 2012-2019… [2100 to ~15,000 MFlops/Watt] [66 kW for 1 Petaflop System] [66 MW for 1 Exaflop System} [Need 50,000 Mflops/Watt for 1 Exaflop @ 20 MW by 2019!!!]
Conclusion: We need help.
What do we need…?
Insight
Where does energy go?
Understanding
How does energy scale?
Action
What can we do?
Power-Performance Efficiency,
Model & Optimize Performance Improve Power-Performance Efficiency Model Effects of Power Profile & Evaluate Power Optimize Effects of Power
[SC04], [SC05], [IPDPS 2005], [IJHPCA 2009], [TPDS 2010]
How can we…help you…help us…
Vi Virgin inia ia Tech ch
“You can only manage what you can measure.”
Peter Drucker, writer
Measuring power is “tough”
12
- Modularized measurement software
- HW sensors (component, room, etc.)
- Fine-grain API (function-level)
- Analytics
What is PowerPack?
[IEEE Computer 38(11) 2005, TPDS 21(5) 2010, http://scape.cs.vt.edu/software/]
SystemG Supercomputer
Power Profiles – Single Node
14
PowerPack Function-level Profiling
[IEEE Computer 38(11) 2005, TPDS 21(5) 2010, http://scape.cs.vt.edu/software/]
Who uses PowerPack? SystemG?
- Texas A&M (Taylor et al)
- UTenn-Knoxville (Moore, Dongarra, et al)
- Oxford University
- Lawrence Livermore National Lab
- Pacific Northwest National Lab
- Oak Ridge National Lab
- University of Florida
- KAUST (Saudi Arabia)
- University of Madrid (Spain)
- UC Berkeley
...and many others
16
February 15, 2012 SIAM PP, Savannah, GA 17 / 19
LAPACK MKL PLASM A Power consumption over time Matrix inverse Sources: Piotr Luszczek Hatem Ltaief
February 15, 2012 SIAM PP, Savannah, GA 18 / 19
Bidiagonal Reduction: CPU Power
PLASMA LAPACK
PowerPack 4.0 (accelerator support)
20 40 60 80 100 120 140 160 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 378 391 404 417 430 443 456 469 482 495 508 521 534 547 560 573 586 599 612 625 Power (watt) Time (0.02 second) CPU GPU MEM MB
CudaMalloc (Data Movement) convolutionRow convolutionColumn
convolutionTexture_15360_32
PowerPack 4.0 (API+accelerator)
21
200 400 600 800 1000 1200 1400 1600 1800
Time Watts
PDU Power Measurements
System +CPU Monitor 200 400 600 800 1000 1200 1400 1600 1800
Time Watts
Granola Enterprise Power Estimates
CPU System Monitor
Granola software gives more detail… …same accuracy as expensive hardware
Commercial grade measurement…
Granola Enterprise (Freeware)
22
“To know is to understand.” Aristotle
Power-Performance Efficiency
24
Model & Optimize Performance Improve Power-Performance Efficiency Model Effects of Power Profile & Evaluate Power Optimize Effects of Power
[SC 2004], [SC 2005], [IPDPS 2011], [IPDPS 2013]
Early Green HPC questions…
- What happens to energy at scale?
- How can we scale energy/perf efficiently?
Amdahl’s Law (for energy?)
1 1
) 1 ( ) ( ) ( ) (
SE FE FE w T w T w S
N N
- Classical speedup
– Amdahl’s law for 1 enhancement (parallelism)
Time Degree of Parallelism
Time ~ energy. Right?
So we only get energy savings by reducing time. Right? Then why does PM (e.g. DVFS) save energy? And sometimes without affecting time?
Amdahl = no overhead
But, overhead is the key to savings energy without loss!
Energy
- Definition
– Speedup – w: workload – N: number of nodes – f: the clock frequency and f0 is the base value – T1(w, f0): sequential execution time at base frequency f0 – TN(w, f): parallel execution time at N processors at frequency f
Power-Aware Speedup
) , ( ) , ( ) , ( ) , (
1
f w O f w T f w T f w S
N N
27
[IPDPS 2007]
Bounding Efficiency at Scale
- Energy/performance optimal system configuration
– # processors: 256 – CPU frequency: 1200MHz
8 16 32 64 128 256 512 1024 600 1000 1400 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 EDP(104Joulesxseconds) Processors Frequency (MHz)
EDP values for LU
30-35 25-30 20-25 15-20 10-15 5-10 0-5
Early Green HPC questions…
- What happens to energy at scale?
- How can we scale efficiently?
30
Iso-energy-efficiency
Grama et al: performance efficiency can be held constant if we increase both number of processors and problem size simultaneously. Algorithm + Scale fixed performance Iso-energy-efficiency Algorithm + Scale + Power Modes (power, performance) – Requires accurate performance model – Requires accurate power model – Must be accurate, useful, usable
General form of our Iso-energy-efficiency model: : system-wide energy efficiency (baseline): total energy consumption of sequential execution on one processor : the total energy consumption of parallel execution for a given application on p parallel processors : the additional energy overhead required for parallel execution and running extra system components
31
Iso-energy-efficiency Derivation
[IPDPS 2011],[IPDPS 2013]
32 Energy efficiency Energy efficiency FT’s system-wide energy efficiency with p and n as variables FT’s system-wide energy efficiency with p and f as variables
Maintaining Efficiency in 3-D FFT
- Problem size scaling effective in maintaining overall system energy
- CPU frequency scaling: only slightly improves EE
- But, the effects of CPU clock frequency on on-chip workload diminish
while scaling up system size.
Commercial grade management…
Granola (http://grano.la)
- Launched Earth Day 2010
- Free home version
- 350K+ Downloads so far…
- 165+ Countries
- Uses: laptops, PCs, servers
- Performance Guarantees
Patents: [USPTO: #13/061,565] [UK: #GB2476606B]
Fatbatt (http://fatbatt.com)
- Launched March 2013
- Free ad-version
33
Where do we go from here?
We need lots of help. Disruptive vs. Incremental. Silver bullet is unlikely. Commodity matters. Markets matter. Tools matter. Wanted: Major catastrophe. Custom system is likely the only answer by 2019. Energy wall? “Victory” is inevitable when you change the game.