Implementation and Analysis of Nonblocking Collective Operations on - PowerPoint PPT Presentation

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser Torsten Hoefler Boris Bierbaum, Thomas Bemmerl

Scalable Coherent Interface (SCI) Ringlet: • IEEE Std 1596-1992 • Memory Coupled Clusters • Data Transfer: PIO and DMA 2D Torus: • SISCI User-Level Interface • 16 x Intel Pentium D, 2.8 GHz • SCI: D352 (IB: Mellanox DDR x4) 2 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Collective Operations: GATHER source destination Process 0 A Process 1 (root) B A B C D Process 2 C Process 3 D 3 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Collective Operations: GATHERV source destination Process 0 A Process 1 (root) B A B C D Process 2 C Process 3 D 4 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Collective Operations: ALLTOALL source destination Process 0 A0 A1 A2 A3 A0 B0 C0 D0 Process 1 (root) B0 B1 B2 B3 A1 B1 C1 D1 Process 2 C0 C1 C2 C3 A2 B2 C2 D2 Process 3 D0 D1 D2 D3 D3 A3 B3 C3 D3 5 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Collective Operations: ALLTOALLV source destination Process 0 A0 A1 A2 A3 A0 B0 C0 D0 Process 1 (root) B0 B1 B2 B3 A1 B1 C1 D1 Process 2 C0 C1 C2 C3 A2 B2 C2 D2 Process 3 D0 D1 D2 D3 A3 B3 C3 D3 6 Chair for Operating Systems Nonblocking Collectives for SCI Networks

The SCI Collectives Library Purpose: • Study collective communication algorithms for SCI clusters • Support multiple MPI libraries: Open MPI, NMPI • Support arbitrary communication libraries: LibNBC 7 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Nonblocking Collectives (NBC) Purpose: Overlap of Computation and Communication 8 Chair for Operating Systems Nonblocking Collectives for SCI Networks

NBC in MPI MPI-2.0 JoD: Split Collectives MPI_BCAST_BEGIN(buffer, count, datatype, root, comm) MPI_BCAST_END(buffer, comm) MPI-2.1: • Implement with nonblocking Point-to-Point operations • Blocking collectives in separate thread MPI-3 Draft: MPI_IBCAST(buffer, count, datatype, root, comm, request) MPI_WAIT(request, status) 9 Chair for Operating Systems Nonblocking Collectives for SCI Networks

LibNBC FFT CG PC LibNBC IB scicoll MPI support adapter support scicoll SISCI pthreads MPI 10 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Rationale: NBC for SCI So far: • Promising results with NBC via LibNBC • Research done on InfiniBand clusters Therefore: What about a very different network architecture? Implementation considerations: • Use algorithms different from blocking version? • PIO vs DMA • Use background thread? 11 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Available Benchmarks for LibNBC API Synthetic: NBCBench: measures the communication overhead / overlap potential Application Kernels: • CG (Alltoallv): 3D Grid, overlaps computation with halo zone exchange • PC (Gatherv): overlaps compression with gathering of previous results • FFT (Alltoall): parallel matrix transpose, overlaps data exchange for z transpose with computation for x and y 12 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Gather(v) • Underlying concept: Hamiltonian Path in a 2D torus • Algorithms: Binary Tree, Binomial Tree, Flat Tree, Sequential Transmission 13 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Gather(v)/Alltoall(v) Gather(v): • Additional progress thread: Binary Tree (PIO), Binomial Tree (PIO), Flat Tree (PIO), Sequential Transmission (PIO, DMA) • Single Thread with manual progress: Sequential Transmission • Vector Variant: Flat Tree and Sequential Transmission Alltoall(v): • Additional progress thread: Bruck (PIO), Pairwise Exchange (PIO), Ring (PIO), Flat Tree (PIO) • Single Thread with manual progress: Pairwise Exchange (DMA) • Vector Variant:Pairwise Exchange, Flat Tree 14 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Application Kernels: Algorithms CG (Alltoallv) PC (Gatherv) FFT (Alltoall) 15 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Communication Overhead (NBCBench) Gather 16 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Communication Overhead (NBCBench) Alltoall 17 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Application Kernels: Performance CG (Alltoallv) PC (Gatherv) FFT (Alltoall) 18 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Conclusion What we‘ve done: Implement nonblocking Gather(v) and Alltoall(v) collective opera-tions on SCI clusters with different algorithms and implementation alternatives What we found out: • Applications can benefit from nonblocking collectives on SCI clusters in spite of inferior DMA performance • Best implementation method: DMA in a single thread, PIO is usually used for blocking collectives • Issues with multiple threads 19 Chair for Operating Systems Nonblocking Collectives for SCI Networks

The End Thank you for your attention! 20 Chair for Operating Systems Nonblocking Collectives for SCI Networks

Implementation and Analysis of Nonblocking Collective Operations on - PowerPoint PPT Presentation

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser Torsten Hoefler Boris Bierbaum, Thomas Bemmerl Scalable Coherent Interface (SCI) Ringlet: IEEE Std 1596-1992 Memory Coupled Clusters

Nonblocking commit protocols Dale Skeen, SIGMOD81 Jingchao Fang, Zhuoer Tong Abstract

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory

Lowering the Overhead of Nonblocking Software Transactional Memory Virendra J. Marathe Michael

Wren: Nonblocking Reads in a Partitioned Transactional Causally Consistent Data Store Kristina

Collective states and transitional behavior in schooling fish KOLBJRN TUNSTRM Collective

COLLECTIVE LEADERSHIP AND SAFETY CULTURES COLLECTIVE LEADERSHIP FOR SAFETY SKILLS Co-Lead Coll

Collective Investment Schemes in Cyprus What are the Collective Investment Schemes A

Improving joy and meaning in work COLLECTIVE LEADERSHIP AND (COLLECTIVE LEADERSHIP FOR TEAM

Healthcare Employee Wage and Hour Collective Claims: Growing Litigation Threat Collective Claims:

Collective Decision Making in the Large and in the Small David K. Levine 1 What are Collective

Getting Started with Collective Impact Webinar Series Presented by: An Initiative of FSG and

Getting Started with Collective Impact Webinar Series Presented by: An Initiative of FSG and

Business Models and Legal Aspects of Collective Licensing Latrepro Collective Rights Management

Sustaining Collective Impact November 13, 2017 An Initiative of FSG and Aspen Institute Forum for

Getting Started with Collective Impact Webinar Series Presented by: An Initiative of FSG and

Using Data for a Collective Impact Refresh December 6, 2017 An Initiative of FSG and Aspen

Peer-to-Peer Networks The Internet 6th Week Albert-Ludwigs-Universitt Freiburg Department of

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Tweaking TCPs Timers Kieran Mansley Laboratory for Communication Engineering Context

SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey Nikolenko Steklov Mathematical

Runtime Tracing of the Community Earth System Model: Feasibility Study and Benefits ICCS12

NMO in presence of NSI N. R. Khan Chowdhury 1 5th Nov 2019 | N. R. Khan Chowdhury | Group

eaking Br 56 nd Ba A Breakdown of High- performance Communication Rohit Zambre,* Megan

Laboratory Astrophysics and Stardust Natalia Ruiz Zelmanovitch - @bynzelman Public Information