Towards system-scale optimisation of HPC applications TADaaM : - - PowerPoint PPT Presentation

towards system scale optimisation of hpc applications
SMART_READER_LITE
LIVE PREVIEW

Towards system-scale optimisation of HPC applications TADaaM : - - PowerPoint PPT Presentation

Towards system-scale optimisation of HPC applications TADaaM : Topology-Aware System-Scale Data Management for High-Performance Computing Applications Emmanuel Jeannot October 2016 INTRODUCTION Optimize application execution at system-scale


slide-1
SLIDE 1

Towards system-scale

  • ptimisation of HPC applications

TADaaM : Topology-Aware System-Scale Data Management for High-Performance Computing Applications

Emmanuel Jeannot October 2016

slide-2
SLIDE 2

INTRODUCTION

Optimize application execution at system-scale Topology

t he nt a

  • s

e ., d h e a t t t (a)

Applications Data

?

Tadaam, october 2016 Emmanuel Jeannot - 2

slide-3
SLIDE 3

Outline

  • 1. Context and problematic
  • 2. Scientific challenges
  • 3. Software and use-cases
  • 4. Conclusion

Tadaam, october 2016 Emmanuel Jeannot - 3

slide-4
SLIDE 4

Context and Problematic

1

Tadaam, october 2016 Emmanuel Jeannot - 4

slide-5
SLIDE 5

Computing is easy, accessing data is difficult

Lot of computing power. Bringing data at the right place at the right time is the challenge. Flops are free but bytes are expensive!

Tadaam, october 2016 Emmanuel Jeannot - 5

slide-6
SLIDE 6

Stacking Optimized Library and Runtime Systems

Multithreaded application Multithreaded

  • Com. Library

Multithreaded Runtime System Multithreaded Comp. Library Scientific app MPI (progress threads) OpenMP Parallel Blas Hardware Multicore+parallel Pb: Each thread ignore the existence of the other threads! Mapping? Priority? Scheduling?

Tadaam, october 2016 Emmanuel Jeannot - 6

slide-7
SLIDE 7

2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 1 10 100 1000 Node hours Job size First Year Accumlated Curie Utilization Cumulated node hours

Platform partitioning

t he nt a

  • s

e ., h e a t t t (a)

Pb: message transfer not aware of other applications! Contention, routing, message scheduling Cf.: Demonstrating Improved Application Performance Using Dynamic Monitoring and Task Mapping, A. Gentile, J.Brandt, K. Devine, K. Pedretti BW median case: 2048 nodes Curie median case (install time): 256 nodes

Tadaam, october 2016 Emmanuel Jeannot - 7

slide-8
SLIDE 8

What is missing?

A “thing” that allows for managing data by doing:

  • Cross-layer optimizations
  • System-wide optimizations

Tadaam, october 2016 Emmanuel Jeannot - 8

slide-9
SLIDE 9

How application can make the best possible use of the available resources

Topology

t he nt a
  • s
e ., d h e a t t t (a)

Applications Data ?

Problematic:

  • Allocate data
  • Partition data
  • Reserve resources
  • Control affinity
  • Map computation
  • Manage contention
  • Optimize communication
  • Access storage
  • Perform visualization

Tadaam, october 2016 Emmanuel Jeannot - 9

slide-10
SLIDE 10

Our approach: An intermediate service layer for optimizing execution

Hardware Application Memory hierachy Cache size Network topology Allocated resources Other applications Storage Stateful System-wide Service Layer Application needs Application a Application b Programming Model

Tadaam, october 2016 Emmanuel Jeannot - 10

slide-11
SLIDE 11

Applications needs

Application can express its varying needs for:

  • Memory usage
  • Computation
  • Network access
  • Storage
  • Affinity
  • Model/data refinement
  • etc.

Tadaam, october 2016 Emmanuel Jeannot - 11

slide-12
SLIDE 12

Scientific challenges

2

Tadaam, october 2016 Emmanuel Jeannot - 12

slide-13
SLIDE 13

The application within its ecosystem

Applications Programming models Hardware Libraries Operating systems Runtime systems

Compilers

SW stack Storage Batch scheduler

Environment model Application need and model Optimization algorithm Optimized execution

Network

Tadaam, october 2016 Emmanuel Jeannot - 13

slide-14
SLIDE 14

Challenges

We need:

  • A layer based on models and

abstractions (application and environment)

  • System-wide services that

take into account the whole ecosystem at scale

  • A stateful optimization engines

Hardware Memory hierachy Cache size Network topology Allocated resources Other applications Storage Stateful System-Wide Service Layer

  • App. needs

Application a Application b

Tadaam, october 2016 Emmanuel Jeannot - 14

slide-15
SLIDE 15

Software and use-case

3

Tadaam, october 2016 Emmanuel Jeannot - 15

slide-16
SLIDE 16

Mesh-based High-performance computing applications

Most of the large-scale applications (at least 2/3 in last PRACE call) use meshes:

  • domain decomposition
  • stencil
  • unstructured
  • hierarchical
  • etc.

Ex: aerodynamic, climate, electromagnetism, seismology, plasma, etc.

Tadaam, october 2016 Emmanuel Jeannot - 16

slide-17
SLIDE 17

Software suite: use-case example

Mesh/graph partitioning (Scotch) Platform model (Hwloc) Topology-aware locality mechanisms (TreeMatch) Parallel mesh adaptation (Pampa) Communication optimization (New Madeline)

Tadaam, october 2016 Emmanuel Jeannot - 17

slide-18
SLIDE 18

Conclusion

4

Tadaam, october 2016 Emmanuel Jeannot - 18

slide-19
SLIDE 19

System-wide topology-aware data management

Machines are more complex and applications require to be executed at large-scale. Need for cross-layer and system-wide optimizations Target mesh-based applications. Design, implement, deploy a stateful, system-wide service layer to:

  • Optimize application execution
  • According to its needs

Tadaam, october 2016 Emmanuel Jeannot - 19

slide-20
SLIDE 20

The TADaaM Team

Emmanuel Jeannot, senior research scientist (DR2), Inria, Team leader; Guillaume Aupy, Research scientist (CR2), Inria Alexandre Denis, experienced research scientist (CR1), Inria; Brice Goglin, experienced research scientist (CR1), Inria; Guillaume Mercier, assistant professor, Bordeaux Institute of Technology; François Pellegrini, professor, University of Bordeaux; Raphaël Blanchard, PhD student, CIFRE Onera; Cyril Bordage, Postdoc, COLOC, Inria; Remi Barat, PhD student, CIFRE, CEA; Nicolas Denoyelle, research engineer, COLOC, Inria; Clément Foyer, Engineer, ELCI, Inria; Cédric Lachat, post-doc, ELCI, Inria; Benjamin Lorendeau, PhD student, CIFRE, EDF; Farouk Mansouri, Post-doc, Inria, Adèle Villiermet, PhD student, COLOC, Inria. ; Hugo Taboada, PhD syudent, CEA; Cécile Boutors, Team assistant.

Tadaam, october 2016 Emmanuel Jeannot - 20

slide-21
SLIDE 21

Thanks!

Inria Bordeaux Sud-Ouest

www.inria.fr