Energy-Efficient HPC A Tools Perspective February 2nd, 2016 - - PowerPoint PPT Presentation

energy efficient hpc
SMART_READER_LITE
LIVE PREVIEW

Energy-Efficient HPC A Tools Perspective February 2nd, 2016 - - PowerPoint PPT Presentation

Mitglied der Helmholtz-Gemeinschaft Energy-Efficient HPC A Tools Perspective February 2nd, 2016 Michael Knobloch Short Intro to Performance Tools Trigger performance data Sampling Code instrumentation Recording performance data Profiling


slide-1
SLIDE 1

Mitglied der Helmholtz-Gemeinschaft

Energy-Efficient HPC

A Tools Perspective

February 2nd, 2016 Michael Knobloch

slide-2
SLIDE 2

Short Intro to Performance Tools

Trigger performance data

Sampling Code instrumentation

Recording performance data

Profiling Tracing

February 2nd, 2016 Michael Knobloch Slide 2

slide-3
SLIDE 3

Sampling

February 2nd, 2016 Michael Knobloch Slide 3

slide-4
SLIDE 4

Instrumentation

February 2nd, 2016 Michael Knobloch Slide 4

slide-5
SLIDE 5

Instrumentation Techniques

February 2nd, 2016 Michael Knobloch Slide 5

slide-6
SLIDE 6

Instrumentation Critical Issues

February 2nd, 2016 Michael Knobloch Slide 6

slide-7
SLIDE 7

Profiling

February 2nd, 2016 Michael Knobloch Slide 7

slide-8
SLIDE 8

Tracing

February 2nd, 2016 Michael Knobloch Slide 8

slide-9
SLIDE 9

Tracing details

February 2nd, 2016 Michael Knobloch Slide 9

slide-10
SLIDE 10

Tracing vs. Profiling

February 2nd, 2016 Michael Knobloch Slide 10

slide-11
SLIDE 11

Energy-Efficiency Tools Projects at JSC

Score-E

February 2nd, 2016 Michael Knobloch Slide 11

slide-12
SLIDE 12

eeClust – 2009 - 2012

eeClust - Energy-Efficient Cluster Computing

Project partners: Uni Hamburg, TU Dresden, JSC, ParTec www.eeclust.de Goals: Identify phases of low resource utilization and turn hardware to lower power-states in such phases Integral point: Extension of performance analysis tools to analyse application power consumption and hardware utilization

February 2nd, 2016 Michael Knobloch Slide 12

slide-13
SLIDE 13

MPI Busy-Waiting

MPI Busy-Waiting

Power consumption in phases of busy-waiting is very high due to constant CPU activity.

February 2nd, 2016 Michael Knobloch Slide 13

slide-14
SLIDE 14

Scalasca Workflow

Instr. target application Measurement library Distributed trace Parallel wait-state search Wait-state report Report browser February 2nd, 2016 Michael Knobloch Slide 14

slide-15
SLIDE 15

Scalasca Wait-State Detection

time processes A B C D Send Recv Send Recv

Wait State

February 2nd, 2016 Michael Knobloch Slide 15

slide-16
SLIDE 16

Calculating Energy-Saving Potential

Idle-Waiting

ESP = max

p∈PS((tw ∗ Ap1) − (tw − tTp,p1) ∗ Ip + ETp,p1)

Busy-Waiting

ESP BW = max

p∈PS((tw ∗ Ap1) − (tw − tTp,p1) ∗ Ap + ETp,p1)

PS – Set of power states tw – Waiting time Ap – Active energy in P-State p tTp1,p2 – Transition time Ip – Idle energy in P-State p ETp,p1 – Transition energy

February 2nd, 2016 Michael Knobloch Slide 16

slide-17
SLIDE 17

Example: Energy Saving Potential

February 2nd, 2016 Michael Knobloch Slide 17

slide-18
SLIDE 18

Example: Optimal P-State Detection

February 2nd, 2016 Michael Knobloch Slide 18

slide-19
SLIDE 19

EIC – 2010 - . . .

EIC - Exascale Innovation Center

Project partners: IBM Germany R&D and JSC Goal: Co-Design for next-gen of Supercomputers One work-package on energy-efficiency Investigation of power consumption on Blue Gene Fine-grained power measurements on POWER7 Now in transition to PADC - POWER Acceleration and Design Center

February 2nd, 2016 Michael Knobloch Slide 19

slide-20
SLIDE 20

Power Consumption Analysis on POWER7

Amester

IBM Automated Measurement of Systems for Temperature and Energy Reporting software. Results were published at EnA-HPC 2013.

Sensor name Units Time scale Description PWR1MS W Instantaneous Node power consumption PWR1MSP0 W Instantaneous Processor power consumption PWR1MSMEM0 W Instantaneous Memory power consumption PWR32MS W

  • avg. over last 32 ms

Node power consumption PWR32MSP0 W

  • avg. over last 32 ms

Pocessor power consumption PWR32MSMEM0 W

  • avg. over last 32 ms

Memory power consumption IPS32MS Mips Every 32 ms Instructions per second rate

February 2nd, 2016 Michael Knobloch Slide 20

slide-21
SLIDE 21

Example: Component Level Power Measurement

February 2nd, 2016 Michael Knobloch Slide 21

slide-22
SLIDE 22

Example: Counter Resolution Comparison

February 2nd, 2016 Michael Knobloch Slide 22

slide-23
SLIDE 23

Amester on POWER8

20 40 60 80 100

Time (s)

50 100 150 200

Power (W)

Disk IO Memory GPU Fan CPU

February 2nd, 2016 Michael Knobloch Slide 23

slide-24
SLIDE 24

Score-E – 2013 - 2016

Score-E

Main Tools Partners: JSC, TU Dresden, TU Munich Successor of SILC and LMAC Extension of Score-P measurement system (www.score-p.org)

Common measurement system for Scalasca, Vampir, and Periscope

Power and Energy measurements from different sources, e.g. RAPL, Xeon Phi/GPU power sensors, etc. Energy modelling from power consumption data Enable auto-tuning for energy efficiency New visualization based on application geometries

February 2nd, 2016 Michael Knobloch Slide 24

slide-25
SLIDE 25

Lessons learned

Metric discussion

Still on the quest for the right metric Power vs. Energy

Might require different analyses

User motivation (still) hard

Tools need

Sensors that provide relevant information

Power, energy, temperature, etc.

At all relevant system levels Scalable APIs

February 2nd, 2016 Michael Knobloch Slide 25

slide-26
SLIDE 26

Discussion: Road to Dark Silicon

Challenges for tools

How reliable is performance data from dynamic applications? How reliable is performance data on dynamic hardware? How to analyse dynamic applications?

Requirements on tools

?

February 2nd, 2016 Michael Knobloch Slide 26