High Performance Systems EuroMPI 2015 Objectives Yet another - - PowerPoint PPT Presentation

high performance systems
SMART_READER_LITE
LIVE PREVIEW

High Performance Systems EuroMPI 2015 Objectives Yet another - - PowerPoint PPT Presentation

Tutorial 1: Performance analysis for High Performance Systems EuroMPI 2015 Objectives Yet another performance analysis tool Developping performance analysis features for your application/library 2 EuroMPI 2015 Performance analysis for


slide-1
SLIDE 1

EuroMPI 2015

Tutorial 1: Performance analysis for High Performance Systems

slide-2
SLIDE 2

EuroMPI 2015

฀ Yet another performance analysis tool ฀ Developping performance analysis features for your application/library

2 Performance analysis for High Performance Systems

Objectives

slide-3
SLIDE 3

EuroMPI 2015

฀ Introduction ฀ Overview of EZTrace workflow ฀ Analyzing an MPI application ฀ Analyzing an MPI + OpenMP application ฀ Developping a plugin

3 Performance analysis for High Performance Systems

Contents

slide-4
SLIDE 4

EuroMPI 2015 4 Performance analysis for High Performance Systems

Who are we ?

François Trahay

EZTrace project leader

Associate professor Télécom SudParis

François Rue

Research Engineer INRIA Bordeaux

Mathias Hastaran

Research Engineer INRIA Bordeaux

slide-5
SLIDE 5

EuroMPI 2015

The materials for this tutorial are available here: http://eztrace.gforge.inria.fr/eurompi2015 You should have received an email with information on your temporary account on the Plafrim cluster

5 Performance analysis for High Performance Systems

Before we start

slide-6
SLIDE 6

EuroMPI 2015

฀ Modern HPC applications are complex

  • Complex hardware

– NUMA architecture, hierarchical caches, accelerators

  • Hybrid programming models

– MPI + [OpenMP | Pthread | CUDA]

Understanding the performance of such applications is difficult  Need for performance analysis tools

6 Performance analysis for High Performance Systems

Introduction

slide-7
SLIDE 7

EuroMPI 2015

฀ Gather statistical information on the application

– Allinea MAP, gprof, mpiP, …

7 Performance analysis for High Performance Systems

Performance analysis tools

Profiling tools

$ gprof ./sgefa_openmp % cumulative self self total time seconds seconds calls s/call s/call name 49.68 4.21 4.21 3283 0.00 0.00 sswap 31.51 6.89 2.67 1107 0.00 0.00 msaxpy2 17.47 8.37 1.48 511146 0.00 0.00 saxpy 0.94 8.45 0.08 9 0.01 0.01 matgen 0.47 8.49 0.04 3 0.01 0.50 sgefa 0.00 8.49 0.00 3321 0.00 0.00 isamax [...]

slide-8
SLIDE 8

EuroMPI 2015

฀ Gather statistical information on the application

– Allinea MAP, gprof, mpiP, …

8 Performance analysis for High Performance Systems

Performance analysis tools

Profiling tools

slide-9
SLIDE 9

EuroMPI 2015

฀ Collect a list of timestamped events

  • Tau, VampirTrace, Scalatrace, Intel Trace Analyzer and Collector, EZTrace, …

9 Performance analysis for High Performance Systems

Performance analysis tools

Tracing applications

#timestamp #ThreadId #Event 0.00175s 1 Enter function Foo(arg1=17) 0.20573s 1 Enter function Bar(n=42.23) 0.21248s 2 Enter function Baz(a=21, b=40) 0.31054s 2 Leave function Baz(a=21, b=40) return value=91 0.61057s 1 Leave function Bar(n=42.23) return value=124.89 [...]

slide-10
SLIDE 10

EuroMPI 2015

฀ Collect a list of timestamped events

  • Tau, VampirTrace, Scalatrace, Intel Trace Analyzer and Collector, EZTrace, …

10 Performance analysis for High Performance Systems

Performance analysis tools

Tracing applications

slide-11
SLIDE 11

EuroMPI 2015

฀ Framework for performance analysis

  • Provides tracing facilities
  • Provides pre-defined modules (MPI, OpenMP, CUDA, etc.)
  • Allows external modules

– Develop your own module – Use a module shipped with a library (eg. PLASMA)

  • Uses standard file formats (OTF, Pajé)
  • Open source (~BSD license)

http://eztrace.gforge.inria.fr/

11 Performance analysis for High Performance Systems

EZTrace

slide-12
SLIDE 12

EuroMPI 2015

฀ Introduction ฀ Overview of EZTrace workflow ฀ Analyzing an MPI application ฀ Analyzing an MPI + OpenMP application ฀ Developping a plugin

12 Performance analysis for High Performance Systems

Contents

slide-13
SLIDE 13

EuroMPI 2015 13 Performance analysis for High Performance Systems

Overview of EZTrace workflow

slide-14
SLIDE 14

EuroMPI 2015 14 Performance analysis for High Performance Systems

Overview of EZTrace workflow

slide-15
SLIDE 15

EuroMPI 2015

฀ Select the modules to load

15 Performance analysis for High Performance Systems

Running an application with EZTrace

$ eztrace_avail 3 stdio Module for stdio functions (read, write, select, poll, etc.) 2 pthread Module for PThread synchronization functions (mutex, semaphore, spinlock, etc.) 1 omp Module for OpenMP parallel regions 4 mpi Module for MPI functions 5 memory Module for memory functions (malloc, free, etc.) 6 papi Module for PAPI Performance counters 7 cuda Module for cuda functions (cuMemAlloc, cuMemcopy, etc.) 10 starpu Module for the StarPU framework $ export EZTRACE_TRACE="pthread" $ eztrace_loaded 2 pthread Module for PThread synchronization functions (mutex, semaphore, spinlock, etc.)

slide-16
SLIDE 16

EuroMPI 2015

฀ Run the application

  • Intercept the calls to a set of functions

– Intercept calls to shared libraries (using LD_PRELOAD) – Modify the binary to insert hooks (only with eztrace)

  • Record timestamped events in trace files
  • Create one file per process

16 Performance analysis for High Performance Systems

Running an application with EZTrace

$ eztrace ./heat_pthread 100 100 50 1 Starting EZTrace... Done [...] Stopping EZTrace... saving trace /tmp/trahay_eztrace_log_rank_1 $ eztrace.preload ./heat_pthread 100 100 50 1 Starting EZTrace... Done [...] Stopping EZTrace... saving trace /tmp/trahay_eztrace_log_rank_1

slide-17
SLIDE 17

EuroMPI 2015

฀ Visualizing the trace

17 Performance analysis for High Performance Systems

Post-mortem analysis

$ eztrace_convert /tmp/trahay_eztrace_log_rank_1 module pthread loaded 1 modules loaded no more block for trace #0 833 events handled $ vite eztrace_output.trace

  • Read the traces and interpret events
  • Creates the output file:

eztrace_output.[trace|otf]

  • Visualize the trace with standard tools

(Vampir, ViTE, etc.)

slide-18
SLIDE 18

EuroMPI 2015

฀ Getting statistics

18 Performance analysis for High Performance Systems

Post-mortem analysis

$ eztrace_stats /tmp/trahay_eztrace_log_rank_1 PThread:

  • CT_Process #0:

semaphore 0x0x601f40 was acquired 4 times. total time spent waiting: 0.089913 ms. barrier 0x0x601f00 was acquired 400 times. total time spent waiting: 4.499698 ms. Total: 2 locks acquired 404 times Thread P#0_T#3711915776 time spent waiting on a semaphore: 0.089913 ms Thread P#0_T#3665626880 time spent waiting on a barrier: 1.159355 ms Thread P#0_T#3514812160 time spent waiting on a barrier: 1.159498 ms Total for CT_Process #0 time spent waiting on a semaphore: 0.089913 ms time spent waiting on a barrier: 4.499698 ms PTHREAD_CORE

  • Thread P#0_T#3711915776:

time spent in pthread_join : 9.158800 ms time spent in pthread_create: 0.044299 ms Total for CT_Process #0 time spent in pthread_join : 9.158800 ms time spent in pthread_create: 0.044299 ms 812 events handled

slide-19
SLIDE 19

EuroMPI 2015

฀ Connection to plafrim ฀ Accessing a node of the cluster ฀ http://eztrace.gforge.inria.fr/eurompi2015

  • Exercice 1: Introduction to EZTrace

19 Performance analysis for High Performance Systems

Hands-on

$ emacs ~/.ssh/config Host formation ForwardAgent yes ForwardX11 yes User eurompi2015-trahay ProxyCommand ssh -A -l login@formation.plafrim.fr -W plafrim:22 $ ssh formation

(plafrim) $ module load slurm (plafrim) $ salloc –-share –N 4 (plafrim) $ echo $SLURM_JOB_NODELIST miriel[078-081] (plafrim) $ ssh miriel078

slide-20
SLIDE 20

EuroMPI 2015

฀ Run the application with eztrace

  • Generates one trace per process
  • Each MPI process write in its /tmp directory

 export EZTRACE_TRACE_DIR=$PWD

20 Performance analysis for High Performance Systems

Analyzing an MPI application with EZTrace

$ export EZTRACE_TRACE=mpi $ mpirun –np 4 eztrace ./application arg1 arg2

  • r

$ mpirun –np 4 eztrace –t mpi ./application arg1 arg2

  • r

$ mpirun –np 4 $(eztrace.preload –t mpi ./application arg1 arg2)

slide-21
SLIDE 21

EuroMPI 2015 21 Performance analysis for High Performance Systems

MPI statistics

฀ eztrace_stats dumps information on MPI messages

฀ Communication matrix ฀ Distribution of message sizes ฀ List of *all* the messages  export EZTRACE_MPI_DUMP_MESSAGES=1

slide-22
SLIDE 22

EuroMPI 2015

฀ OpenMP relies on compiler directives

  • Need to recompile the application with eztrace_cc

22 Performance analysis for High Performance Systems

Analyzing an OpenMP application with EZTrace

$ make CC=’’eztrace_cc gcc’’ [...] $ eztrace –t omp ./application

slide-23
SLIDE 23

EuroMPI 2015

฀ Simply select the mpi and omp modules

23 Performance analysis for High Performance Systems

Analyzing an MPI+OpenMP application

$ make MPICC=’’eztrace_cc mpicc’’ [...] $ mpirun –np 4 eztrace –t ’’mpi omp’’ ./application

slide-24
SLIDE 24

EuroMPI 2015

฀ Connection to plafrim ฀ Accessing a node of the cluster ฀ http://eztrace.gforge.inria.fr/eurompi2015

  • Exercice 2: Using EZTrace for MPI applications

24 Performance analysis for High Performance Systems

Hands-on part 2: MPI

(plafrim) $ module load slurm (plafrim) $ salloc –-share –N 4 (plafrim) $ echo $SLURM_JOB_NODELIST miriel[078-081] (plafrim) $ ssh miriel078

$ emacs ~/.ssh/config Host formation ForwardAgent yes ForwardX11 yes User eurompi2015-trahay ProxyCommand ssh -A -l login@formation.plafrim.fr -W plafrim:22 $ ssh formation

slide-25
SLIDE 25

EuroMPI 2015

฀ EZTrace is a framework for performance analysis

  • Allow third-party modules

– Analyze your application/library – Ship an EZTrace module with your library (eg. PLASMA)

฀ An EZTrace module consists of

  • A library that intercepts a set of functions and record events
  • A library that interprets events

25 Performance analysis for High Performance Systems

EZTrace thrid-party modules

slide-26
SLIDE 26

EuroMPI 2015

฀ eztrace_plugin_generator

  • Search for symbols in a binary application (C / Fortran)
  • Search for the prototypes of the functions
  • Generates a .tpl file for these functions

26 Performance analysis for High Performance Systems

Module generator

$ eztrace_plugin_generator heat_mpi Creating the plugin script heat_mpi.tpl Found 'void ghosts_swap (MPI_Comm comm, MPI_Datatype col, const int *neighbours, int size_x, int size_y, double *u)' Found 'void print_mat (int size_x, int size_y, const double *u)' Found 'void save_mat (const char *filename, int size_x, int size_y, const double *u)' Found 'void set_bounds (const int *coo, int nc_x, int nc_y, int size_x, int size_y, double *u)' Found 'void usage (char *argv[])' 5 symbols found Generating the plugin... $ eztrace_create_plugin -o plugin_heat_mpi heat_mpi.tpl Compiling the plugin... $ make -C plugin_heat_mpi

slide-27
SLIDE 27

EuroMPI 2015

฀ Describe the module

  • Name/description
  • List of functions to intercept
  • Actions to perform for each

function

– EVENT("Do function foo“) – PUSH_STATE("doing function foo") – POP_STATE() – SET_VAR("var_name", value) – ADD_VAR("var_name", value) – SUB_VAR("var_name", value)

27 Performance analysis for High Performance Systems

Creating a module from a .tpl file

BEGIN_MODULE NAME heat_mpi DESC "Module for the heat_mpi program" void print_mat (const double *u) void save_mat (const char *f, const double *u) BEGIN ADD_VAR("variable name", 1) END int foo(int a, int b) BEGIN PUSH_STATE("Doing function foo") CALL_FUNC POP_STATE() END END_MODULE

$ eztrace_create_plugin -o plugin_heat_mpi heat_mpi.tpl $ make –C plugin_heat_mpi $ export EZTRACE_LIBRARY_PATH=$PWD/plugin_heat_mpi

slide-28
SLIDE 28

EuroMPI 2015

฀ Objective: collect the exact information you’re looking for

  • eg. average duration of function void foo(int a, int b)

when b>a

฀ Edit the eztrace_convert_*.c file generated by

eztrace_create_plugin

฀ Per-thread/per-process statistics

28 Performance analysis for High Performance Systems

Tuning a module

slide-29
SLIDE 29

EuroMPI 2015

฀ Connection to plafrim ฀ Accessing a node of the cluster ฀ http://eztrace.gforge.inria.fr/eurompi2015

  • Exercice 3: Creating an EZTrace module

29 Performance analysis for High Performance Systems

Hands-on part 3: creating EZTrace modules

(plafrim) $ module load slurm (plafrim) $ salloc –-share –N 4 (plafrim) $ echo $SLURM_JOB_NODELIST miriel[078-081] (plafrim) $ ssh miriel078

$ emacs ~/.ssh/config Host formation ForwardAgent yes ForwardX11 yes User eurompi2015-trahay ProxyCommand ssh -A -l login@formation.plafrim.fr -W plafrim:22 $ ssh formation

slide-30
SLIDE 30

EuroMPI 2015

฀ EZTrace is open-source

  • CeCILL-B (~BSD) license
  • Contribution / collaboration are welcome !

http://eztrace.gforge.inria.fr/ eztrace-devel@lists.gforge.inria.fr

30 Performance analysis for High Performance Systems

Thank you !