MPI and Scalable Parallel Performance Analysis 25 Years of MPI - PowerPoint PPT Presentation

MPI and Scalable Parallel Performance Analysis 25 Years of MPI Workshop, ANL, September 25, 2017 DCG Data Center Group

Fair Warning/Disclaimer The views expressed by this presentation are mine and not necessarily those of Intel Lots of the graphical material shown comes from parallel tools projects (ANL/Jumpshot, BSC Extrae/Paraver/Dimemas,, JSC/Scalasca, Score-P, TU-D/Vampir, …) I’m ignoring most of the more lightweight, pure profiling tools Getting involved with MPI can lead to this: DCG Data Center Group

The Performance Analysis/Optimization Cycle Run applicaton on parallel system Potentially huge data volumes Message matching, statistics Profiling or full event data Text or 2D graphics DCG Data Center Group 4

Scalable Parallel Performance Analysis Today Several well-established portable tools families Europe: BSC Extrae/Paraver, JSC Scalasca, TU-Dresden Vampir  US: TAU  Very few commercial tools Know only of Intel Cluster Tools and Allinea/ARM MAP  Scalability to the 100000s of processes Both Scalasca and Vampir have demonstrated this  Key is to use massive parallelism for analysis and (graphical) presentation  On- and off progress in tools integration/interoperability Score-P as latest effort covers TAU and two European tool families  Generally, data file formats can be converted  Limited progress in correctness checking & modeling Tools tend to report the status quo, can’t extrapolate or answer “what if” questions  MPI semantics are a harsh mistress – most mistakes keep your code from working (debugger time)  DCG Data Center Group 5

A few Colourful Graphics Zoom DCG Data Center Group 6

Integration and Interoperability Data interoperability is key here - want to be able to look at the same trace data through different lenses/tools  Not rocket science, yet trace formats have become quite complex US/European Score-P initiative DCG Data Center Group 7

MPI Advancing the Tools Field – the Good Profiling interface Provides a transparent & reliable way to intercept calls & record data  Gives access to all application-visible MPI-related data  Communicator concept Enables clear separation of application and tools communication  Critical to achieve reliable tool operation  Tools did immediately use MPI internally  MPI debugging I/F Could build tools that attach to running applications  Market effects Portability  tools (reasonable easily) portable to all systems supporting MPI  Users  reach a vastly larger user community due to only one “message passing model” for all  systems & applications Clear & orthogonal semantic  reduce effort required for analysis code  DCG Data Center Group 8

MPI Advancing the Tools Field – the (Slightly) Bad Limited MPI introspection  Can’t see “inside” the MPI calls or the progress engine  Some analysis questions are hard to answer – Why is MPI call XYZZY taking so long? – How much time is taken up by MPI SW stack vs. network stack & transmission?  This is a bigger problem for MPI- 2 “one - sided” operations, f.i. No way to record message matching  To match sends and receives, all tools replay the MPI message matching rules  This can break down when watching only parts of application runs DCG Data Center Group 9

Are we Done Yet? Couple of hard problems do remain  Trace data deluge  End-user information overload & required expertise  Answering the *real* end-user questions – How good is my code, and what could optimizations achieve? – How well does it scale? – How will it run on a different system? Couple of ideas/references in the following slides DCG Data Center Group 10

Data Deluge – On-Demand Trace Collection Tracefiles are always too big (recent example: 3 TB for a NLP ML application on 64 processes)  Want to be able to safely en/disable tracing without screwing up message matching  Want to be able to safely cut recorded tracefiles  Prefer automatic triggers to assist Before you ask  End-users often unable/unwilling to cut down workloads Ideal world  Traces are collected when a performance metric indicates a problem  Lightweight monitoring produces the underlying data, ML techniques as trigger DCG Data Center Group 11

Identifying Performance Problems Current tools largely assume a “black belt” expert in the drivers seat This seriously limits their take-up  Some tools try to identify MPI bottlenecks and link to root causes in the program (Scalasca as example) This pretty much requires replay of an application We need more of this Hard-coding rules does not scale at all  ML techniques could have a role here  DCG Data Center Group 12

Modeling & What-If Scenarios (1) BSC Dimemas replay tool  Replay application run with CPU scaling and communication model  Assess impact of MPI implementation and interconnect: replay with BW= ∞, Latency=0 & no contention  this has proven to be very useful BSC multiplicative performance model  Partition parallel efficiency into three factors 𝜃 || = 𝑈 × 𝑀𝐶 × 𝜈 𝑀𝐶 – Transfer (T): effect of the interconnect network – Load balance (LB): difference in work between processes – Serialisation (µ LB ): process dependencies and transient load imbalances Division of responsibilities  µ LB and LB are the application developer’s problem  T can be addressed by MPI and system developers DCG Data Center Group 13

Modeling & What-If Scenarios (2) Fit &extrapolate efficiency factors, usually resulting in depressing predictions Scalability analysis with Extra-P  Measure and fit the scaling behaviour of code components (block, MPI calls)  Scaling model is the sum of all component models  Integrated wit Scalasca infrastructure  Give it a try! DCG Data Center Group 14

Where MPI Could Help in the Future Wish #1 – Message matching  Avoid need to replay communication and re-match messages  MPI-internal mechanism or ways to extend the message header  Fundamental to address the data deluge Wish #2 – Addtl. Introspection (MPI_T?)  Collect data on separate (logical) phases in MPI operations  Examples – Data type processing vs. transmission of serial byte stream – Completion of one-sided operations – Data volumes in and out for collectives  Callback method preferred  Prescribed, strict semantics?? DCG Data Center Group 15

The POP Project POP – Performance Optimisation and Productivity  European govt.-funded project (term 2015-2018)  Partners include BSC and JSC as tools providers Objectives  Promote best practices in parallel programming  Offer services to – Gain detailed understanding of application and system behavior – Propose how to refactor applications in the most productive way  Cover academic as well as industrial users  Support MPI and/or OpenMP Success so far  72 performance audits, 5 completed PoCs (36 and 8 are WIP)  Very favourable feedback from customers … DCG Data Center Group 16

Semi- Useful Links … Argonne MPI performance tools https://www.mcs.anl.gov/research/projects/perfvis – BSC performance tool suite https://tools.bsc.es/ – Vampir tool https://www.vampir.eu/ – Scalasca tool https://www.scalasca.org/ – Extra-P tool https://www.scalasca.org/software/extra-p/ – Score-P effort https://www.vi-hps.org/projects/score-p/ – POP project https://pop-coe.eu/ – DCG Data Center Group 17

MPI and Scalable Parallel Performance Analysis 25 Years of MPI - PowerPoint PPT Presentation

MPI and Scalable Parallel Performance Analysis 25 Years of MPI Workshop, ANL, September 25, 2017 DCG Data Center Group Fair Warning/Disclaimer The views expressed by this presentation are mine and not necessarily those of Intel Lots of the

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Enhanced Memory debugging of MPI-parallel Applications in Open MPI 4th Parallel tools workshop

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Design challenges of High- performance and Scalable MPI over InfiniBand Presented by Karthik

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

CS-184: Computer Graphics Lecture #15: Radiometry Prof. James OBrien University of

Projective Geometry and Light Various slides from previous courses by: D.A. Forsyth (Berkeley /

The Light Weight JIT Compiler Project Vladimir Makarov RedHat Linux Plumbers Conference, Aug 24,

Quark mass dependence of light resonances and phase shifts in elastic and K scattering

INSTANT RADIOSITY Keller (SIGGRAPH 1997) Presented by Ivo Boyadzhiev and Kevin Matzen BRIEF

ADAPTIVE MATRIX COMPLETION FOR FAST VISIBILITY COMPUTATIONS WITH MANY LIGHTS RENDERING Sunrise

5.1 Lighting and Shading Hao Li http://cs420.hao-li.com 1 Debunking Lunar Landing Conspiracies

SIGGRAPH 2003 Course #19 HDRI and Image-Based Lighting HDRI and Image-Based Lighting HDRI and

MPI and Scalable Parallel Performance Analysis 25 Years of MPI - PowerPoint PPT Presentation

MPI and Scalable Parallel Performance Analysis 25 Years of MPI Workshop, ANL, September 25, 2017 DCG Data Center Group Fair Warning/Disclaimer The views expressed by this presentation are mine and not necessarily those of Intel Lots of the

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

c p e c Writing Message-Passing Parallel Programs with MPI Edinburgh Parallel Computing Centre

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Enhanced Memory debugging of MPI-parallel Applications in Open MPI 4th Parallel tools workshop

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Design challenges of High- performance and Scalable MPI over InfiniBand Presented by Karthik

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

CS-184: Computer Graphics Lecture #15: Radiometry Prof. James OBrien University of

Projective Geometry and Light Various slides from previous courses by: D.A. Forsyth (Berkeley /

The Light Weight JIT Compiler Project Vladimir Makarov RedHat Linux Plumbers Conference, Aug 24,

Quark mass dependence of light resonances and phase shifts in elastic and K scattering

INSTANT RADIOSITY Keller (SIGGRAPH 1997) Presented by Ivo Boyadzhiev and Kevin Matzen BRIEF

ADAPTIVE MATRIX COMPLETION FOR FAST VISIBILITY COMPUTATIONS WITH MANY LIGHTS RENDERING Sunrise

5.1 Lighting and Shading Hao Li http://cs420.hao-li.com 1 Debunking Lunar Landing Conspiracies

SIGGRAPH 2003 Course #19 HDRI and Image-Based Lighting HDRI and Image-Based Lighting HDRI and

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards