http://www.open-mpi.org/ Graham Fagg, University of Tennessee - PDF document

Open MPI Mini-Talks • Introduction and Overview  Jeff Squyres, Indiana University Open MPI • Advanced Point-to-Point Architecture Join the Revolution  Tim Woodall, Los Alamos National Lab • Datatypes, Fault Tolerance and Other Supercomputing Cool Stuff November, 2005  George Bosilca, University of Tennessee • Tuning Collective Communications http://www.open-mpi.org/  Graham Fagg, University of Tennessee Technical Contributors • Indiana University • The University of Tennessee Open MPI: Introduction and Overview • Los Alamos National Laboratory • High Performance Computing Center, Jeff Squyres Stuttgart Indiana University • Sandia National Laboratory - Livermore http://www.open-mpi.org/ MPI From Scratch! MPI From Scratch: Why? • Developers of FT-MPI, LA-MPI, LAM/MPI • Each prior project had different strong points  Kept meeting at conferences in 2003  Could not easily combine into one code base • New concepts could not easily be  Culminated at SC 2003: Let’s start over accommodated in old code bases  Open MPI was born • Easier to start over Jan 2004 SC 2004 Today Tomorrow  Start with a blank sheet of paper  Decades of combined MPI implementation Started Demonstrated Released World experience work v1.0 peace 1

MPI From Scratch: Why? Open MPI Project Goals • All of MPI-2 • Merger of ideas from PACX-MPI • Open source  FT-MPI (U. of Tennessee) LAM/MPI LA-MPI  LA-MPI (Los Alamos)  Vendor-friendly license (modified BSD) FT-MPI  LAM/MPI (Indiana U.) • Prevent “forking” problem  PACX-MPI (HLRS, U. Stuttgart)  Community / 3rd party involvement  Production-quality research platform (targeted) Open MPI Open MPI  Rapid deployment for new platforms • Shared development effort Open MPI Project Goals Design Goals • Actively engage the • Extend / enhance previous ideas HPC community  Component architecture Researchers Researchers  Users  Message fragmentation / reassembly Sys. Sys.  Researchers Users Users Admins Admins  System administrators  Design for heterogeneous environments  Vendors • Multiple networks (run-time selection and striping) • Solicit feedback and Developers Developers Vendors Vendors • Node architecture (data type representation) contributions  Automatic error detection / retransmission  Process fault tolerance Open MPI Open MPI  True open source  Thread safety / concurrency model Design Goals Plugins for HPC (!) • Design for a changing environment • Run-time plugins for combinatorial functionality  Hardware failure  Underlying point-to-point network support  Resource changes  Different MPI collective algorithms  Application demand (dynamic processes) • Portable efficiency on any parallel resource  Back-end run-time environment / scheduler support  Small cluster • Extensive run-time tuning capabilities  “Big iron” hardware  Allow power user or system administrator to  “Grid” (everyone a different definition) tweak performance for a given platform  … 2

Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Shmem Your MPI application Shmem Your MPI application rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem OpenIB OpenIB rsh/ssh PBS PBS TCP mVAPI mVAPI BProc BProc GM GM Xgrid Xgrid MX MX Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Your MPI application Your MPI application Shmem Shmem rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem Shmem OpenIB rsh/ssh OpenIB rsh/ssh PBS PBS TCP TCP mVAPI mVAPI BProc BProc GM GM GM GM Xgrid Xgrid MX MX Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Your MPI application Your MPI application Shmem Shmem rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem Shmem OpenIB SLURM OpenIB SLURM PBS PBS TCP TCP mVAPI mVAPI BProc BProc GM GM GM GM Xgrid Xgrid MX MX 3

Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Shmem Your MPI application Shmem Your MPI application rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem Shmem OpenIB PBS OpenIB PBS PBS PBS TCP TCP mVAPI mVAPI BProc BProc GM GM GM GM Xgrid Xgrid MX MX Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Your MPI application Your MPI application Shmem Shmem rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem Shmem OpenIB PBS OpenIB PBS PBS PBS TCP TCP mVAPI mVAPI BProc BProc TCP TCP GM GM Xgrid Xgrid GM GM MX MX Plugins for HPC (!) Plugins for HPC (!) Run-time Run-time Networks Networks environments environments Your MPI application Your MPI application Your MPI application Your MPI application Shmem Shmem rsh/ssh rsh/ssh TCP TCP SLURM SLURM Shmem Shmem OpenIB BProc OpenIB BProc PBS PBS TCP TCP mVAPI mVAPI BProc BProc TCP TCP GM GM Xgrid Xgrid GM GM MX MX 4

Current Status • v1.0 released (see web site) • Much work still to be done Open MPI: Advanced Point-to- Point Architecture  More point-to-point optimizations  Data and process fault tolerance Tim Woodall  New collective framework / algorithms Los Alamos National Laboratory  Support more run-time environments  New Fortran MPI bindings  … http://www.open-mpi.org/ • Come join the revolution! Component Based Architecture Advanced Point-to-Point Architecture • Component-based • Uses Modular Component Architecture (MCA) • High performance  Plugins for capabilities (e.g., different • Scalable networks) • Multi-NIC capable  Tunable run-time parameters • Optional capabilities  Asynchronous progress  Data validation / reliability Point-to-Point Point-to-Point Component Component Frameworks Frameworks • BTL Management • Byte Transfer Layer Layer (BML) (BTL)  Multiplexes access to  Abstracts lowest native BTL's network interfaces • Memory Pool • Point-to-Point  Provides for memory Messaging Layer management / (PML) registration • Registration Cache  Implements MPI semantics, message  Maintains cache of fragmentation, and most recently used striping across BTLs memory registrations 5

Network Support High Performance • Native support for: • Planned support for: • Component-based architecture does not impact performance  Infiniband: Mellanox  IBM LAPI Verbs  DAPL • Abstractions leverage network capabilities  Infiniband: OpenIB  Quadrics Elan4  RDMA read / write Gen2  Myrinet: GM  Scatter / gather operations Third party contributions  Myrinet: MX  Zero copy data transfers welcome!  Portals • Performance on par with ( and exceeding )  Shared memory vendor implementations  TCP Performance Results: Infiniband Performance Results: Myrinet Scalability Memory Usage Scalability • On-demand connection establishment  TCP  Infiniband (RC based) • Resource management  Infiniband Shared Receive Queue (SRQ) support  RDMA pipelined protocol (dynamic memory registration / deregistration)  Extensive run-time tuneable parameters: • Maximum fragment size • Number of pre-posted buffers • .... 6

Latency Scalability Multi-NIC Support • Low-latency interconnects used for short messages / rendezvous protocol • Message stripping across high bandwidth interconnects • Supports concurrent use of heterogeneous network architectures • Fail-over to alternate NIC in the event of network failure (work in progress) Optional Capabilities Multi-NIC Performance (Work in Progress) • Asynchronous Progress  Event based (non-polling)  Allows for overlap of computation with communication  Potentially decreases power consumption  Leverages thread safe implementation • Data Reliability  Memory to memory validity check (CRC/checksum)  Lightweight ACK / retransmission protocol  Addresses noisy environments / transient faults  Supports running over connectionless services (Infiniband UD) to improve scalability User Defined Data-type • MPI provides many functions allowing users to describe non-contiguous memory layouts Open MPI: Datatypes, Fault  MPI_Type_contiguous, MPI_Type_vector, Tolerance, and Other Cool Stuff MPI_Type_indexed, MPI_Type_struct • The send and receive type must have the same George Bosilca signature, but not necessary have the same memory layout University of Tennessee • The simplest way to handle such data is to … http://www.open-mpi.org/ Timeline Pack Network transfer Unpack 7

http://www.open-mpi.org/ Graham Fagg, University of Tennessee - PDF document

Open MPI Mini-Talks Introduction and Overview Jeff Squyres, Indiana University Open MPI Advanced Point-to-Point Architecture Join the Revolution Tim Woodall, Los Alamos National Lab Datatypes, Fault Tolerance and Other

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

http://www.open-mpi.org/ Open MPI Mini-Talks Introduction and Overview Jeff Squyres,

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

The MESUR project. Johan Bollen (IU): Principal investigator Herbert Van de Sompel (LANL):

Multiple Sequence Alignment Sequences > Yeast YOR020c > Crypthecodinium cohnii

Intrusion Detection Systems (IDS) John Kristoff jtk@depaul.edu +1 312 3625878 DePaul

Mixed Similarity Learning for Recommendation with Implicit Feedback Mengsi Liu, Weike Pan # , Miao

Secure producer mobility in information-centric network Alberto Compagno, Xuan Zeng, Luca

An Algebraic Approach to XQuery View Maintenance J. Nathan Foster (Penn) Ravi Konuru (IBM) J

Implicitization of tensor product surfaces in the presence of a generic set of basepoints Eliana

Chapter 2: Analysis of univariate data Objective: Show how graphics and numerical measures can be

http://www.open-mpi.org/ Graham Fagg, University of Tennessee - PDF document

Open MPI Mini-Talks Introduction and Overview Jeff Squyres, Indiana University Open MPI Advanced Point-to-Point Architecture Join the Revolution Tim Woodall, Los Alamos National Lab Datatypes, Fault Tolerance and Other

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

http://www.open-mpi.org/ Open MPI Mini-Talks Introduction and Overview Jeff Squyres,

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

The MESUR project. Johan Bollen (IU): Principal investigator Herbert Van de Sompel (LANL):

Multiple Sequence Alignment Sequences &gt; Yeast YOR020c &gt; Crypthecodinium cohnii

Intrusion Detection Systems (IDS) John Kristoff jtk@depaul.edu +1 312 3625878 DePaul

Mixed Similarity Learning for Recommendation with Implicit Feedback Mengsi Liu, Weike Pan # , Miao

Secure producer mobility in information-centric network Alberto Compagno, Xuan Zeng, Luca

An Algebraic Approach to XQuery View Maintenance J. Nathan Foster (Penn) Ravi Konuru (IBM) J

Implicitization of tensor product surfaces in the presence of a generic set of basepoints Eliana

Chapter 2: Analysis of univariate data Objective: Show how graphics and numerical measures can be

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Multiple Sequence Alignment Sequences > Yeast YOR020c > Crypthecodinium cohnii