Single-sided PGAS Communications Libraries Overview of PGAS - PowerPoint PPT Presentation

Single-sided PGAS Communications Libraries Overview of PGAS approaches David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long (Cray)

Shared-memory directives and OpenMP memory threads 2

OpenMP: work distribution memory !$OMP PARALLEL DO do i=1,32 a(i)=a(i)*2 end do 1-8 9-16 17-24 25-32 threads 3

OpenMP implementation memory process threads cpus 4

Shared Memory Directives • Multiple threads share global memory • Most common variant: OpenMP • Program loop iterations distributed to threads, more recent task features  Each thread has a means to refer to private objects within a parallel context • Terminology  Thread, thread team • Implementation  Threads map to user threads running on one SMP node  Extensions to distributed memory not so successful • OpenMP is a good model to use within a node 5

Cooperating Processes Models PROBLEM processes 6

Message Passing, MPI process memory memory memory cpu cpu cpu 7

MPI process 0 process 1 memory memory cpu cpu MPI_Send(a,...,1,…) MPI_Recv(b,..., 0,…) 8

Message Passing • Participating processes communicate using a message-passing API • Remote data can only be communicated (sent or received) via the API • MPI (the Message Passing Interface) is the standard • Implementation: MPI processes map to processes within one SMP node or across multiple networked nodes • API provides process numbering, point-to-point and collective messaging operations • Mostly used in two-sided way, each endpoint coordinates in sending and receiving 9

SHMEM process 0 process 1 memory memory cpu cpu shmem_put (a, b, 1, …) 10

SHMEM • Participating processes communicate using an API • Fundamental operations are based on one-sided PUT and GET • Need to use symmetric memory locations • Remote side of communication does not participate • Can test for completion • Barriers and collectives • Popular on Cray and SGI hardware, also Blue Gene version • To make sense needs hardware support for low-latency RDMA- type operations 11

Fortran 2008 coarray model • Example of a Partitioned Global Address Space (PGAS) model • Set of participating processes like MPI • Participating processes have access to local memory via standard program mechanisms • Access to remote memory is directly supported by the language 12

Fortran coarray model process process process memory memory memory cpu cpu cpu 13

Fortran coarray model process process process memory memory memory cpu cpu cpu a = b[3] 14

Fortran coarrays • Remote access is a full feature of the language:  Type checking  Opportunity to optimize communication • No penalty for local memory access • Single-sided programming model more natural for some algorithms  and a good match for modern networks with RDMA 15

High Performance Fortran (HPF) • Data Parallel programming model • Single thread of control • Arrays can be distributed and operated on in parallel • Loosely synchronous • Parallelism mainly from Fortran 90 array syntax, FORALL and intrinsics. • This model popular on SIMD hardware (AMT DAP, Connection Machines) but extended to clusters where control thread is replicated 16

HPF memory memory memory memory pe pe pe pe memory cpu 17

HPF memory memory memory memory pe pe pe pe memory A(N) - distributed A(1:N)=SQRT(A(1:N)) cpu 18

UPC thread thread thread memory memory memory cpu cpu cpu 19

UPC thread thread thread memory memory memory cpu cpu cpu upc_forall(i=0;i<32;i++; affinity ){ a[i]=a[i]*2; } 20

UPC • Extension to ISO C99 • Participating “ threads ” • New shared data structures  shared pointers to distributed data (block or cyclic)  pointers to shared data local to a thread  Synchronization • Language constructs to divide up work on shared data  upc_forall() to distribute iterations of for() loop • Extensions for collectives • Both commercial and open source compilers available  Cray, HP, IBM  Berkeley UPC (from LBL), GCC UPC 21

Single-sided PGAS Communications Libraries Overview of PGAS - PowerPoint PPT Presentation

Single-sided PGAS Communications Libraries Overview of PGAS approaches David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long (Cray) Shared-memory directives and OpenMP memory threads 2 OpenMP: work distribution memory !$OMP

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Advanced use of OpenSHMEM 2 Outline

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and

Programming paradigms using PGAS-based languages Marc Tajchman CEA - DEN/DM2S/SFME/LGLS Monday,

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Outline Introduction PGAS Chapel Motivation Related Studies Benchmarks

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

Presentation Folders KNIFE CATALOGUE A2 Sheet Size Code PF-001 Description Single sided

Grab & Go Merchandiser Single or Double-Sided Grab & Go Food offerings a self service

One-Sided Access in Two-Sided Markets Marianne Verdier Universit de Lille 1, Laboratoire

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

One-sided versus two-sided measures: the lack of continuity Eric O. Endo 1 Department of

Competition Policy in Two-Sided Markets Tobias J. Klein, TILEC und Tilburg University December 9,

Fast Distributed Process Creation with the XMOS XS1 Architecture James Hanlon Department of

Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in

Scenario Workshop SOUTHEAST GUIDING COALITION ENROLLMENT AND PROGRAM BALANCING November 12, 2020

Lecture 22: NoSQL Finale Wednesday, April 22, 2015 Announcements Course evaluations will be

Lecture 5: Parallel machines and models; shared memory programming David Bindel 8 Feb 2010

Distributed Machine Learning on Spark Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Outline

Mlbase: distributed machine learning system Adapted slides from mlbase.org S

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co- design Pengtao Xie

Sambuz

Useful Links

Newsletter

Mail Us

Single-sided PGAS Communications Libraries Overview of PGAS - PowerPoint PPT Presentation

Single-sided PGAS Communications Libraries Overview of PGAS approaches David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long (Cray) Shared-memory directives and OpenMP memory threads 2 OpenMP: work distribution memory !$OMP

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Advanced use of OpenSHMEM 2 Outline

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and

Programming paradigms using PGAS-based languages Marc Tajchman CEA - DEN/DM2S/SFME/LGLS Monday,

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

Porting GASNet to Portals: Porting GASNet to Portals: Partitioned Global Address Space (PGAS)

Outline Introduction PGAS Chapel Motivation Related Studies Benchmarks

Libraries Jonathan Platt Head of Libraries and Heritage 22 nd July 2014 Libraries 1.

Libraries In C++ its possible to create static libraries and shared libraries Static

Presentation Folders KNIFE CATALOGUE A2 Sheet Size Code PF-001 Description Single sided

Grab &amp; Go Merchandiser Single or Double-Sided Grab &amp; Go Food offerings a self service

One-Sided Access in Two-Sided Markets Marianne Verdier Universit de Lille 1, Laboratoire

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

One-sided versus two-sided measures: the lack of continuity Eric O. Endo 1 Department of

Competition Policy in Two-Sided Markets Tobias J. Klein, TILEC und Tilburg University December 9,

Fast Distributed Process Creation with the XMOS XS1 Architecture James Hanlon Department of

Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in

Scenario Workshop SOUTHEAST GUIDING COALITION ENROLLMENT AND PROGRAM BALANCING November 12, 2020

Lecture 22: NoSQL Finale Wednesday, April 22, 2015 Announcements Course evaluations will be

Lecture 5: Parallel machines and models; shared memory programming David Bindel 8 Feb 2010

Distributed Machine Learning on Spark Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Outline

Mlbase: distributed machine learning system Adapted slides from mlbase.org S

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co- design Pengtao Xie

Sambuz

Useful Links

Newsletter

Mail Us

Grab & Go Merchandiser Single or Double-Sided Grab & Go Food offerings a self service