Lightweight MPI Communicators with Applications to Perfectly - PowerPoint PPT Presentation

Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort Michael Axtmann , Peter Sanders, Armin Wiebigke IPDPS · May 22, 2018 Institute of Theoretical Informatics Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics KIT – The Research University in the Helmholtz Association www.kit.edu Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Overview Communicators and communication Disadvantages of communicator construction Solutions for MPI RBC communicators Case study on sorting Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 1 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Communicators in MPI 2 0 1 2 Subcommunicator MPI COMM WORLD 0 → 3; 1 → 4; 2 → 1 0 1 5 3 4 Send ISend Compute and Test 0 0 Blocking Nonblocking 1 1 point-to-point point-to-point 2 2 Receive IReceive and Test Scan IScan Compute and Test 0 0 Blocking Nonblocking 1 1 collective collective 2 2 Scan IScan Test Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 2 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

MPI Examples Communication over Divide and conquer rows and columns 0 1 2 0 1 2 3 4 5 6 7 5 3 4 0 1 2 3 4 5 6 7 6 7 8 0 1 2 3 4 5 6 7 Usage of communicators Divide tasks into fine-grained subproblems Elegant algorithms and comfortable programming Communicators make life easier at no cost!? Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 3 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Communicator creation takes time linear to the communicator size Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Communicator creation takes time linear to the communicator size Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Communicator creation is a blocking collective operation Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Blocking Communicator Creation “. . . nonblocking collective operations can mitigate possible synchronizing effects. . . ” “. . . enabling communication-computation overlap. . . ” “. . . perform collective operations on overlapping communicators, which would lead to deadlocks with blocking operations.” – MPI Standard A collective operation is invoked by all PEs of a communicator BUT: Communicator creation breaks nonblocking idea Nonblocking Nonblocking collective collective with communicator creation IScan Compute and Test Communicator creation IScanTest 0 0 1 1 2 2 IScan Test Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 5 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Communicator Construction Splitting a communicator into two communicators of half the size Splitting PE 0 1 2 3 4 5 6 7 8 9 0.6 Running Time / Comm Size [µs] IBM – MPI Comm create group IBM – MPI Comm split 0.5 Intel – MPI Comm create group Intel – MPI Comm split 0.4 0.3 0.2 0.1 0 2 10 2 11 2 12 2 13 2 14 2 15 Comm Size (Cores) Communicator construction time is linear to PE group size Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC 6 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Communicator Construction Splitting 2 15 PEs into two Collective operation on 2 14 cores communicators of size 2 14 Splitting Collective operation PE 0 1 2 3 4 5 6 7 8 9 MPI_Reduce MPI_Exscan 10 2 MPI_Comm_split Running Time [ms] 10 1 10 0 10 − 1 2 3 2 5 2 7 2 9 2 11 2 13 2 15 2 17 2 19 2 21 Message Length [B] Communicator construction is expensive compared to collectives Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC – 32 768 cores – IBM MPI 7 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Communicator Construction Splitting a communicator into overlapping communicators of size four Alternating Cascading PE group invokest MPI Comm create group Time PE 0 1 2 3 4 5 6 7 8 9 10 PE 0 1 2 3 4 5 6 7 8 9 10 Alternating 10 3 Cascade Running time [ms] 10 2 10 1 10 0 10 − 1 2 9 2 10 2 11 2 12 2 13 Comm Size (Cores) Blocking communicator creation causes delays Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC – Intel MPI 8 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Proposals for MPI PE group 0 Subcommunicator Sparse representations f i r s t =1 , l a s t =4 , s t r i d e =2 1 E.g. MPI_Group_range_incl Context ID Subcommunicator 3/0 2/1 2 Context ID 0 User-defined tag Subcommunicator Calculate by MPI: Concatenation of counters 0 1 3 Context ID 1 MPI COMM WORLD { 0 } { 1 } { 2 } { 3 } { 0,0 } { 0,1 } { 2,0 } { 2,1 } { 2,2 } Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 9 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Our RBC library Our RBC library Range-based communicator in O ( 1 ) time rbc::Comm rbc::Comm Local construction + parent comm : MPI Comm + parent comm : MPI Comm Select MPI or RBC operations + first : int + first : int + last : int + last : int Local splitting: + stride : int + stride : int Split_RBC_Comm(Comm&, Comm&, ttt int first, int last, int stride) Only adjust range Initial MPI communicator 0 1 2 Blocking Ops Nonblocking Ops Classes Local Ops rbc::Bcast rbc::Ibcast rbc::Request rbc::Create RBC Comm 5 3 4 rbc::Reduce rbc::Ireduce rbc::Comm rbc::Split RBC Comm rbc::Allreduce rbc::Iallreduce RBC::Comm rank rbc::Scan rbc::Iscan rbc::Comm size rbc::Gather rbc::Igather 6 7 8 rbc::Gatherv rbc::Igatherv rbc::Barrier rbc::Ibarrier rbc::Send rbc::Isend rbc::Recv rbc::Irecv f i r s t =1 , l a s t =7 , s t r i d e =3 rbc::Probe rbc::Iprobe rbc::Wait rbc::Test rbc::Waitall Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 10 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Our RBC library Implementation Details (Non)blocking point-to-point communication Maps rank to rank of MPI communicator Call MPI counterpart (Non)blocking collective operations Broadcast Calls point-to-point operations of RBC One globally reserved tag Time Nonblocking details Optional user-defined tag 0 1 2 3 PE Round-based schedule rbc::Ibcast( void *buff, int cnt, MPI_Datatype datatype, int root, ttt rbc::Comm comm, rbc::Request *request, int tag = RBC_IBCAST_TAG) Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 11 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

RBC vs. MPI Splitting a communicator into two communicators of half the size Splitting PE 0 1 2 3 4 5 6 7 8 9 0.6 Running Time / Comm Size [µs] IBM – MPI Comm create group IBM – MPI Comm split 0.5 Intel – MPI Comm create group Intel – MPI Comm split 0.4 RBC – rbc::Split_RBC_Comm 0.3 0.2 0.1 0 2 10 2 11 2 12 2 13 2 14 2 15 Comm Size (Cores) RBC splitting comes with almost no cost Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC 12 Perfectly Balanced Quicksort Karlsruhe Institute of Technology

Lightweight MPI Communicators with Applications to Perfectly - PowerPoint PPT Presentation

Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort Michael Axtmann , Peter Sanders, Armin Wiebigke IPDPS May 22, 2018 Institute of Theoretical Informatics Michael Axtmann: Lightweight MPI Communicators with

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Programming Modes, Tags and Communicators Overview Lecture will cover - explanation of MPI

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Model MPI processes behaving as threads 1 Overview Motivation Node-local communicators

Message Passing Programming Modes, Tags and Communicators Overview Lecture will cover

Message Passing Programming Modes, Tags and Communicators Overview Lecture will cover

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

DIVIDE AND CONQUER: RESOURCE SEGREGATION IN THE OPENSTACK CLOUD Steve Gordon (@xsgordon)

Makers Square 1 PDAs Operations First five years: focus on hospitality services to

Where are we? The Status of Latinas/os at UConn 4 th Annual Faculty & Staff Luncheon October

InterNASC InterDHB Collaboration Background Capital & Coast and Hutt Valley DHBs had a

IMDRF RPS ToC WG Update Mike Ward Update Work done via teleconference since Sept, but the

Writable Wall Solutions & Mobile Screens Page: 1 Standard Glass Whiteboards Page: 2

ANNUAL GENERAL MEETING 12 INSERT DIVIDER TITLE 14 A INSERT DIVIDER TITLE 15 BINGO INDUSTRIES

Sambuz

Useful Links

Newsletter

Mail Us

Lightweight MPI Communicators with Applications to Perfectly - PowerPoint PPT Presentation

Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort Michael Axtmann , Peter Sanders, Armin Wiebigke IPDPS May 22, 2018 Institute of Theoretical Informatics Michael Axtmann: Lightweight MPI Communicators with

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

Programming Modes, Tags and Communicators Overview Lecture will cover - explanation of MPI

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI Internals Advanced Parallel Programming Overview MPI Library Structure Point-to-point

Model MPI processes behaving as threads 1 Overview Motivation Node-local communicators

Message Passing Programming Modes, Tags and Communicators Overview Lecture will cover

Message Passing Programming Modes, Tags and Communicators Overview Lecture will cover

The lightweight beam for Heavyweight applications The impact of this lightweight beam concept

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Stick a fork in it An attempt to summarise the Fork-Join framework through the same titled series

DIVIDE AND CONQUER: RESOURCE SEGREGATION IN THE OPENSTACK CLOUD Steve Gordon (@xsgordon)

Makers Square 1 PDAs Operations First five years: focus on hospitality services to

Where are we? The Status of Latinas/os at UConn 4 th Annual Faculty &amp; Staff Luncheon October

InterNASC InterDHB Collaboration Background Capital &amp; Coast and Hutt Valley DHBs had a

IMDRF RPS ToC WG Update Mike Ward Update Work done via teleconference since Sept, but the

Writable Wall Solutions &amp; Mobile Screens Page: 1 Standard Glass Whiteboards Page: 2

ANNUAL GENERAL MEETING 12 INSERT DIVIDER TITLE 14 A INSERT DIVIDER TITLE 15 BINGO INDUSTRIES

Sambuz

Useful Links

Newsletter

Mail Us

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Where are we? The Status of Latinas/os at UConn 4 th Annual Faculty & Staff Luncheon October

InterNASC InterDHB Collaboration Background Capital & Coast and Hutt Valley DHBs had a

Writable Wall Solutions & Mobile Screens Page: 1 Standard Glass Whiteboards Page: 2