MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 - - PowerPoint PPT Presentation

mpi mpich
SMART_READER_LITE
LIVE PREVIEW

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 - - PowerPoint PPT Presentation

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards MPICH-1 MPI-2 MPICH-2 MPI-3 Overview MPI (Message Passing Interface) Specification for a standard library for message passing


slide-1
SLIDE 1

MPI & MPICH

Presenter: Naznin Fauzia CSE 788.08 Winter 2012

slide-2
SLIDE 2

Outline

  • MPI-1 standards
  • MPICH-1
  • MPI-2
  • MPICH-2
  • MPI-3
slide-3
SLIDE 3

Overview

  • MPI (Message Passing Interface)
  • Specification for a standard library for message

passing

  • Defined by MPI forum
  • Designed for high performance
  • on both massively parallel machines and on

workstation clusters.

  • Widely available
  • both free available and vendor-supplied

implementations

slide-4
SLIDE 4

Goals

  • To develop a widely used standard for writing message-passing programs.
  • Establish a practical, portable, efficient, and flexible standard for message passing.
  • Design an application programming interface (not necessarily for compilers or a system

implementation library).

  • Allow efficient communication: Avoid memory-to-memory copying and allow overlap of

computation and communication and offload to communication co-processor, where available.

  • Allow for implementations that can be used in a heterogeneous environment.
  • Allow convenient C and Fortran 77 bindings for the interface.
  • Assume a reliable communication interface: the user need not cope with

communication failures. Such failures are dealt with by the underlying communication subsystem.

slide-5
SLIDE 5

Example

#include <mpi.h> int main(int argc, char **argv){ /* Initialize MPI */ MPI_Init(&argc, &argv); /* Find out my identity in the default communicator */ int my_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); int number ; if (my_rank == 0) { number = -1; MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); } else if (my_rank == 1) { MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("Process 1 received number %d from process 0\n", number); } /* Shut down MPI */ MPI_Finalize(); return 0; }

slide-6
SLIDE 6

MPI-1

  • Point-to-point communication
  • basic, pairwise communication (i.e., send and receive)
  • Collective operations
  • process-group collective communication operations (i.e., barrier, broadcast,

scatter, gather, reduce )

  • Process groups & communication contexts
  • how groups of processes are formed and manipulated, how unique

communication contexts are obtained, and how the two are bound together into a communicator (i.e., MPI_COMM_WORLD)

  • Process topologies
  • explains a set of utility functions meant to assist in the mapping of process

groups (a linearly ordered set) to richer topological structures such as multi- dimensional grids.

slide-7
SLIDE 7

MPI-1 contd.

  • Bindings for Fortran 77 and C
  • gives specific syntax in Fortran 77 and C, for all MPI

functions, constants, and types.

  • Environmental Management and inquiry
  • explains how the programmer can manage and make

inquiries of the current MPI environment

  • Profiling interface
  • ability to put performance profiling calls into MPI

without the need for access to the MPI source code

slide-8
SLIDE 8

MPICH

  • Freely available implementation of MPI specification
  • Argonne National Laboratory, Mississippi State

University

  • Portability and High-performance
  • “CH” => “Chameleon”
  • Symbol of adaptability
  • Other – LAM, CHIMP-MPI, Unify etc.
  • Focus on the work station environment
slide-9
SLIDE 9

Portability of MPICH

  • Distributed-memory Parallel Supercomputer
  • Intel Paragon, IBM SP2, Meiko CS-2, Thinking Machines

CM-5, Ncube-2, Cray T3D

  • Shared-memory architectures
  • SGI Onyx, Challenge, Power Challenge, IBM SMP's the

Convex Exempler, the Sequent Symmetry

  • Networks of Workstations
  • Ethernet-connected Unix workstations (may be of

multiple vendors)

  • Sun, DEC, HP, SGI, IBM, Intel
slide-10
SLIDE 10

MPICH Architecture

  • ADI (Abstract Device Interface)
  • Central mechanism for portability
  • Many implementations of ADI
  • MPI functions are implemented in terms of ADI

macros and function

  • Not MPI library specific – can be used for any high-

level message passing library

slide-11
SLIDE 11

ADI

  • A set of function definitions
  • Four set of functions
  • Specifying a message to be sent or received
  • Moving data between the API and the message

passing h/w

  • Managing list of pending messages (both sent or

received)

  • Providing basic information about the execution

environment (i.e., how many tasks are there)

slide-12
SLIDE 12

Upper Layer

slide-13
SLIDE 13

Lower Layer

slide-14
SLIDE 14

Features of MPICH

  • Groups
  • An ordered list of process identifiers
  • Stored as an integer array
  • Process's rank in a group is its index in the list
  • Communicators
  • MPICH intracommunicators and intercommunicators uses

same structure

  • Both has a local group and a remote group – identical

(intra) or disjoint (inter)

  • Send and receive context – equal (intra) or different (inter)
  • Contexts are integers
slide-15
SLIDE 15

Features of MPICH

  • Collective operations
  • Implemented on top of point-to-point operations
  • Some vendor-specific collective operations (Meiko,

Intel and Convex)

  • Job Startup
  • MPI forum did not standardize the mechanism for

starting jobs

  • mpirun

mpirun -np 12 myprog

slide-16
SLIDE 16
  • Command-Line Arguments and Standard I/O

mpirun –np 64 myprog –myarg 13 < data.in > results.out mpirun –np 64 –stdin data.in myprog –myarg 13 > results.out

  • Useful commands

mpicc –c myprog.c

Features of MPICH

slide-17
SLIDE 17

MPE (Multi-Processing Environment) Extension Library

  • Parallel X graphics – routines to provide all

processes with access to a shared X display

  • Logging – time stamped event trace file
  • Sequential sections – one process at a time, in

rank order

  • Error handling – MPI_Errhandler_set
slide-18
SLIDE 18
  • MPICH has succeeded in popularizing the MPI

standard

  • Encouraging vendors to provide MPI to their

customers

  • By helping to create demand
  • By offering them a convenient starting point

Contributions of MPICH

slide-19
SLIDE 19

MPI-2

  • Parallel I/O
  • Dynamic process management
  • One-sided communication
  • New language bindings – C++ & F90
slide-20
SLIDE 20

Sequential I/O

  • Good for small process numbers (~100) and small

datasets (~MB)

  • Not good for big process numbers (~ 100K) and big

datasets (~TB)

1 2 3

slide-21
SLIDE 21

Parallel I/O

  • Multiple processes of a parallel program accessing data from a

common file

  • Each process access a chunk of data using individual file pointers
  • MPI_File_open, MPI_File_seek, MPI_File_read, MPI_File_close

FILE P0 P1 P2 P(n-1)

slide-22
SLIDE 22

One-Sided Communication

  • Remote Memory Access (RMA)
  • Window – specific region of process memory made

available for RMA by other processes

  • MPI_Win_create – called by all processes within a

communicator

  • Origin: the process that performs the call
  • Target: the process in which memory is accessed
  • Communication calls
  • MPI_Get: Remote read
  • MPI_Put: Remote write
  • MPI_accumulate
slide-23
SLIDE 23

One-sided communication

MPI_Send MPI_Recv MPI_Get MPI_Put

slide-24
SLIDE 24

Dynamic process mgt.

  • MPI-1
  • Does not specify how processes will be created
  • Does not allow processes to enter or leave a

running parallel application

  • MPI-2
  • Start new process, send them signals, find out when

they die, establish communication between two processes

slide-25
SLIDE 25

MPICH-2

  • ADI 3 – provides routines to support MPI-1 & 2
  • Two types of RMA operations
  • Active target – target process must call an Mpi

routine

  • Origin calls MPI_Win_start/MPI_Win_complete
  • Target calls MPI_Win_post/MPI_Win_wait
  • Passive target - target process not required to call

any MPI routine

  • Origin calls MPI_Win_lock/MPI_Win_unlock
slide-26
SLIDE 26

MPICH-2

  • Dynamic process
  • There are no absolute and global process ids
  • No data structure that map a process rank into a

“global rank” (i.e., rank in MPI_COMM_WORLD)

  • All communications are considered locally in terms of

possible virtual connections to processes

  • Arrays of virtual connections indexed by rank
slide-27
SLIDE 27

MPI-3

  • Improved scalability
  • Better support for multi-core, cluster & application
  • Proposed => MPI_Count (larger than integer)
  • Extension of collective operations
  • Include non-blocking
  • Sparse collective operations
  • MPI_Sparse_gather
slide-28
SLIDE 28

MPI-3

  • Extension of one-sided communication
  • To support RMA to arbitrary locations, no constraints (symmetric

allocation or collective window creation) on memory

  • RMA operations that are imprecise (such access to overlapping

storage) must be permitted, even if the behavior is undefined

  • The required level of consistency, atomicity, and completeness

should be flexible

  • Read-modify-write operations and compare and swap are

needed for efficient algorithms

  • MPI_Get_accumulate, MPI_Compare_and_swap
  • Backward compatibility
slide-29
SLIDE 29

References

  • http://www.mcs.anl.gov/research/projects/mpi/
  • http://www.mpi-forum.org
  • A High-Performance, Portable Implementation of the MPI Message Passing Interface

Standard - W. Gropp et al

  • MPI-2: Extending the Message Passing Interface - Al Geist et al
  • MPICH Abstract Device Interface, version 3.3 Reference Manual
  • http://meetings.mpi-forum.org/presentations/MPI_Forum_SC10.ppt.pdf
  • http://wissrech.ins.uni-

bonn.de/teaching/seminare/technum/pdfs/iseringhausen_mpi2.pdf

  • www.sdsc.edu/us/training/workshops/docs