From Autotune to READEX Michael Gerndt Technische Universitt Mnchen - - PowerPoint PPT Presentation

from autotune to readex
SMART_READER_LITE
LIVE PREVIEW

From Autotune to READEX Michael Gerndt Technische Universitt Mnchen - - PowerPoint PPT Presentation

Energy Efficiency Tuning: From Autotune to READEX Michael Gerndt Technische Universitt Mnchen Project overview READEX R untime E xploitation of A pplication D ynamism for E nergy-efficient e X ascale Computing Starting date: 1.


slide-1
SLIDE 1

Energy Efficiency Tuning: From Autotune to READEX

Michael Gerndt Technische Universität München

slide-2
SLIDE 2

Project overview

  • READEX

Runtime Exploitation of Application Dynamism for Energy-efficient eXascale Computing

  • Starting date:
  • 1. September 2015
  • Duration:

3 years

  • Funding:

European Commission Horizon 2020 grant agreement 671657

2

slide-3
SLIDE 3

Project partners

3

slide-4
SLIDE 4

Motivation

Challenges

  • Energy consumption
  • Extreme scale
  • Dynamism

Problems

  • Awareness
  • Ability
  • Effort

Solution

  • Dynamism
  • Automatic tuning
  • Design-/Run-time

4

slide-5
SLIDE 5

General idea HPC

  • Automatic Tuning

Embedded

  • System Scenarios
slide-6
SLIDE 6

Systems scenarios

  • System Scenario based Methodology
  • Formalism for dynamic auto-tuning in the embedded systems world
  • Detect and analyze dynamism in applications at design-time
  • Switch parameters at run-time based on detected scenarios

6

slide-7
SLIDE 7

Periscope Tuning Framework

  • Automatic application analysis & tuning
  • Tune performance and energy (statically)
  • Plug-in-based architecture
  • Evaluate alternatives online
  • Scalable and distributed framework
  • Support variety of parallel paradigms
  • MPI, OpenMP, OpenCL, Parallel pattern
  • Developed in the Autotune EU-FP7 project

7

slide-8
SLIDE 8

Score-P

  • Scalable Performance Measurement Infrastructure for Parallel Codes
  • Common instrumentation and measurement infrastructure

8

slide-9
SLIDE 9

ENOPT library implemented by LRZ

slide-10
SLIDE 10

Tuning Plugin Interface

Plugin Periscope Frontend Application with Monitor

Scenario execution Tuning actions Analysis strategies

Search Space Exploration Tuning Step

slide-11
SLIDE 11

Tuning Plugins

  • MPI parameters
  • Eager Limit, Buffer space, collective algorithms
  • Application restart or MPIT Tools Interface
  • DVFS
  • Frequency tuning for energy delay product
  • Model-based prediction of frequency
  • Region level tuning
  • Parallelism capping
  • Thread number tuning for energy delay product
  • Exhaustive and curve fitting based prediction
slide-12
SLIDE 12

Tuning Plugins

  • Master/worker
  • Partition factor and number of workers
  • Prediction through performance model based on data measured in pre-

analysis

  • Parallel Pattern
  • Tuning replication and buffers between pipeline stages
  • Based on component distribution via StarPU
  • OpenCL tuning
  • Compiler flags for offline compilation
  • NDRange tuning
slide-13
SLIDE 13

Tuning Plugins

  • MPI IO
  • Tuning data sieving and number of aggregators
  • Exhaustive and model based
  • Compiler Flag Selection
  • Automatic recompilation and execution
  • Selective recompialtion based on pre-analysis
  • Exhaustive and individual search
  • Scenario analysis for significant routines
  • Combination with Pathway
slide-14
SLIDE 14

Plugin Evaluation Status

slide-15
SLIDE 15

Variation of Measurements

15

slide-16
SLIDE 16

Predicted vs Measured Time for Seissol

16

slide-17
SLIDE 17

Tuning with Persicope Tuning Framework

17

slide-18
SLIDE 18

Dynamism

  • Intra-phase
  • Inter-phase

18

slide-19
SLIDE 19

PEPC Benchmark of the DEISA Benchmark Suite

19

All-to-all Performance 2048 phases

slide-20
SLIDE 20

Inter-phase Dynamism

  • Indeed application of GNS
  • Identifiers for
  • adaptation strategy
  • Valleys vs hills

20

slide-21
SLIDE 21

Scenario-Based Tuning

21

Design Time Analysis Runtime Scenarios with Tuning Model RunTime Tuning

Periscope Tuning Framework (PTF) READEX Runtime Library (RRL)

slide-22
SLIDE 22

Terminology

  • Significant Regions: Coarse-granular code regions
  • Runtime Situations: Instances of significant regions
  • Identifiers: Distinguish rts's with different characteristics
  • Region ID, Call path, region parameters, phase identifiers, input identifiers
  • Scenarios: rts's with same characteristics
  • Tuning Model:
  • Set of scenarios
  • Classifier based on the identifiers
  • Selector for each scenario

Yury Oleynik | oleynik@in.tum.de 22

slide-23
SLIDE 23

Design Time Analysis with Periscope

23

slide-24
SLIDE 24

Runtime Tuning with the READEX Runtime Library

slide-25
SLIDE 25

Validation and project goals

  • Goal: Validate the effect of READEX using real-world applications
  • Co-design process:
  • Hand-tune selected applications
  • Compare results with automatic static and dynamic tuning
  • Energy measurements using HDEEM infrastructure

25

slide-26
SLIDE 26

Conclusion

  • Energy-efficiency at exascale
  • Application developers and users will have to care
  • Lack of capabilities
  • Awareness
  • Expertise
  • Resources
  • Proposed solution – READEX:
  • Exploit dynamism
  • Detect at design-, exploit at run-time
  • Tools-aided auto-tuning methodology

Yury Oleynik | oleynik@in.tum.de 26

slide-27
SLIDE 27

Thank you! Questions?

Yury Oleynik | oleynik@in.tum.de 27