Power-performance modeling, analyses and challenges Kirk W. Cameron - - PowerPoint PPT Presentation

power performance modeling
SMART_READER_LITE
LIVE PREVIEW

Power-performance modeling, analyses and challenges Kirk W. Cameron - - PowerPoint PPT Presentation

11 th Charm++ Workshop: Power-performance modeling, analyses and challenges Kirk W. Cameron Computer Science Virginia Tech This material is based upon work supported by the National Science Foundation under Grant No. 0910784 and 0905187. My


slide-1
SLIDE 1
slide-2
SLIDE 2

Power-performance modeling, analyses and challenges

Kirk W. Cameron Computer Science Virginia Tech

This material is based upon work supported by the National Science Foundation under Grant No. 0910784 and 0905187. 11th Charm++ Workshop:

slide-3
SLIDE 3

My Green HPC Upbringings

  • Over $6M related federal funding (since ‘04)

(NSF, DOE, SBIR, IBM, Intel, and others)

  • EPA Energy Star for servers (since ‘05)
  • SPECPower Founding Member (since ‘05)
  • Co-founder Green500 (since ‘06)
  • Green IT Columnist (IEEE Computer)
  • CEO and Founder, MiserWare Inc. (since ‘07)
slide-4
SLIDE 4

The way we were (circa 2003)

Source: CAREER: High-performance, Power-aware Computing

  • K. Cameron, NSF CCF-0347683, 3/1/04-2/28/09)
slide-5
SLIDE 5

Getting there…

From 2007-2012… [6x ↑ Flops/watt] [~2.5x ↑ power consumption] Projections for 2012-2019… [2100 to ~15,000 MFlops/Watt] [66 kW for 1 Petaflop System] [66 MW for 1 Exaflop System} [Need 50,000 Mflops/Watt for 1 Exaflop @ 20 MW by 2019!!!]

slide-6
SLIDE 6

Conclusion: We need help.

slide-7
SLIDE 7

What do we need…?

Insight

Where does energy go?

Understanding

How does energy scale?

Action

What can we do?

slide-8
SLIDE 8

Power-Performance Efficiency,

Model & Optimize Performance Improve Power-Performance Efficiency Model Effects of Power Profile & Evaluate Power Optimize Effects of Power

[SC04], [SC05], [IPDPS 2005], [IJHPCA 2009], [TPDS 2010]

slide-9
SLIDE 9

How can we…help you…help us…

Vi Virgin inia ia Tech ch

slide-10
SLIDE 10

“You can only manage what you can measure.”

Peter Drucker, writer

slide-11
SLIDE 11

Measuring power is “tough”

slide-12
SLIDE 12

12

  • Modularized measurement software
  • HW sensors (component, room, etc.)
  • Fine-grain API (function-level)
  • Analytics

What is PowerPack?

[IEEE Computer 38(11) 2005, TPDS 21(5) 2010, http://scape.cs.vt.edu/software/]

slide-13
SLIDE 13

SystemG Supercomputer

slide-14
SLIDE 14

Power Profiles – Single Node

14

slide-15
SLIDE 15

PowerPack Function-level Profiling

[IEEE Computer 38(11) 2005, TPDS 21(5) 2010, http://scape.cs.vt.edu/software/]

slide-16
SLIDE 16

Who uses PowerPack? SystemG?

  • Texas A&M (Taylor et al)
  • UTenn-Knoxville (Moore, Dongarra, et al)
  • Oxford University
  • Lawrence Livermore National Lab
  • Pacific Northwest National Lab
  • Oak Ridge National Lab
  • University of Florida
  • KAUST (Saudi Arabia)
  • University of Madrid (Spain)
  • UC Berkeley

...and many others

16

slide-17
SLIDE 17

February 15, 2012 SIAM PP, Savannah, GA 17 / 19

LAPACK MKL PLASM A Power consumption over time Matrix inverse Sources: Piotr Luszczek Hatem Ltaief

slide-18
SLIDE 18

February 15, 2012 SIAM PP, Savannah, GA 18 / 19

Bidiagonal Reduction: CPU Power

PLASMA LAPACK

slide-19
SLIDE 19

PowerPack 4.0 (accelerator support)

20 40 60 80 100 120 140 160 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 378 391 404 417 430 443 456 469 482 495 508 521 534 547 560 573 586 599 612 625 Power (watt) Time (0.02 second) CPU GPU MEM MB

CudaMalloc (Data Movement) convolutionRow convolutionColumn

convolutionTexture_15360_32

slide-20
SLIDE 20

PowerPack 4.0 (API+accelerator)

slide-21
SLIDE 21

21

200 400 600 800 1000 1200 1400 1600 1800

Time Watts

PDU Power Measurements

System +CPU Monitor 200 400 600 800 1000 1200 1400 1600 1800

Time Watts

Granola Enterprise Power Estimates

CPU System Monitor

Granola software gives more detail… …same accuracy as expensive hardware

Commercial grade measurement…

slide-22
SLIDE 22

Granola Enterprise (Freeware)

22

slide-23
SLIDE 23

“To know is to understand.” Aristotle

slide-24
SLIDE 24

Power-Performance Efficiency

24

Model & Optimize Performance Improve Power-Performance Efficiency Model Effects of Power Profile & Evaluate Power Optimize Effects of Power

[SC 2004], [SC 2005], [IPDPS 2011], [IPDPS 2013]

slide-25
SLIDE 25

Early Green HPC questions…

  • What happens to energy at scale?
  • How can we scale energy/perf efficiently?
slide-26
SLIDE 26

Amdahl’s Law (for energy?)

1 1

) 1 ( ) ( ) ( ) (

          SE FE FE w T w T w S

N N

  • Classical speedup

– Amdahl’s law for 1 enhancement (parallelism)

Time Degree of Parallelism

Time ~ energy. Right?

So we only get energy savings by reducing time. Right? Then why does PM (e.g. DVFS) save energy? And sometimes without affecting time?

Amdahl = no overhead

But, overhead is the key to savings energy without loss!

Energy

slide-27
SLIDE 27
  • Definition

– Speedup – w: workload – N: number of nodes – f: the clock frequency and f0 is the base value – T1(w, f0): sequential execution time at base frequency f0 – TN(w, f): parallel execution time at N processors at frequency f

Power-Aware Speedup

) , ( ) , ( ) , ( ) , (

1

f w O f w T f w T f w S

N N

 

27

[IPDPS 2007]

slide-28
SLIDE 28

Bounding Efficiency at Scale

  • Energy/performance optimal system configuration

– # processors: 256 – CPU frequency: 1200MHz

8 16 32 64 128 256 512 1024 600 1000 1400 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 EDP(104Joulesxseconds) Processors Frequency (MHz)

EDP values for LU

30-35 25-30 20-25 15-20 10-15 5-10 0-5

slide-29
SLIDE 29

Early Green HPC questions…

  • What happens to energy at scale?
  • How can we scale efficiently?
slide-30
SLIDE 30

30

Iso-energy-efficiency

Grama et al: performance efficiency can be held constant if we increase both number of processors and problem size simultaneously. Algorithm + Scale  fixed performance Iso-energy-efficiency Algorithm + Scale + Power Modes  (power, performance) – Requires accurate performance model – Requires accurate power model – Must be accurate, useful, usable

slide-31
SLIDE 31

General form of our Iso-energy-efficiency model: : system-wide energy efficiency (baseline): total energy consumption of sequential execution on one processor : the total energy consumption of parallel execution for a given application on p parallel processors : the additional energy overhead required for parallel execution and running extra system components

31

Iso-energy-efficiency Derivation

[IPDPS 2011],[IPDPS 2013]

slide-32
SLIDE 32

32 Energy efficiency Energy efficiency FT’s system-wide energy efficiency with p and n as variables FT’s system-wide energy efficiency with p and f as variables

Maintaining Efficiency in 3-D FFT

  • Problem size scaling effective in maintaining overall system energy
  • CPU frequency scaling: only slightly improves EE
  • But, the effects of CPU clock frequency on on-chip workload diminish

while scaling up system size.

slide-33
SLIDE 33

Commercial grade management…

Granola (http://grano.la)

  • Launched Earth Day 2010
  • Free home version
  • 350K+ Downloads so far…
  • 165+ Countries
  • Uses: laptops, PCs, servers
  • Performance Guarantees

Patents: [USPTO: #13/061,565] [UK: #GB2476606B]

Fatbatt (http://fatbatt.com)

  • Launched March 2013
  • Free ad-version

33

slide-34
SLIDE 34

Where do we go from here?

We need lots of help. Disruptive vs. Incremental. Silver bullet is unlikely. Commodity matters. Markets matter. Tools matter. Wanted: Major catastrophe. Custom system is likely the only answer by 2019. Energy wall? “Victory” is inevitable when you change the game.

slide-35
SLIDE 35

Thank you.