Mig Migrating A ing A Scientific A Scientific Applica plication - PowerPoint PPT Presentation

Mig Migrating A ing A Scientific A Scientific Applica plication ion fr from MPI to Coar om MPI to Coarrays ys John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK CUG 2008 Crossing the Boundaries

Why and Why Not? +MPI programming is arcane +New emerging paradigms for parallelism +Coarrays part of the next Fortran Standard +Gain experience, make informed recommendations - Established MPI expertise - MPI widely available – coarrays only available on (some) Crays. CUG 2008 Crossing the Boundaries

Coarray Fortran in a nutshell • SPMD paradigm, instances of the program are called images, have their own local data and run asynchronously. • Data can be directly addressed across images: A(j,k)[i]. i is image index. • Subroutine calls to synchronize execution. CUG 2008 Crossing the Boundaries

Coarray Fortran in a nutshell (2) • Intrinsics for information: num_images() , this_image() and image_index() . • Coarrays have the same cobounds on all images, but can have allocatable components: type co_double_2 double precision, allocatable:: array(:,:) end type co_double_2 integer nx,ny type(co_double_2) vel[*] allocate(vel%array(nx,ny)) CUG 2008 Crossing the Boundaries

The Application • SBLI: a three-dimensional time- dependent finite difference Navier- Stokes solver • Grid transformation for complex geometries • Parallelisation by domain decomposition and halo exchange. CUG 2008 Crossing the Boundaries

Parallel sections in SBLI • Initial data read in by “master” process, broadcast to all others. • Grid read in by “master” process, distributed to others. • Exchange of halo data. • Solution gathered onto master process for output or written in parallel (MPI-IO). CUG 2008 Crossing the Boundaries

Parameter Broadcast • SBLI reads in data such as number of grid points, Reynolds number, which turbulence model to use. Only one process reads the data. • MPI: these are packed into real, integer and logical arrays, sent to the other processes using MPI_BCAST. • The receiving processes unpack the arrays: CUG 2008 Crossing the Boundaries

Parameter Broadcast (2) if (ioproc) then r(1) = reynolds ... r(18) = viscosity call mpi_bcast(r, 18, real_mp_type, ioid, & MPI_comm_world, ierr) else call mpi_bcast(r, 18, real_mp_type, ioid, & MPI_comm_world, ierr) reynolds = r(1) ... viscosity = r(18) endif CUG 2008 Crossing the Boundaries

Parameter broadcast (3) • Each CAF version fetches the data from the I/O image. call sync_all() if (.not. ioproc) then reynolds = reynolds[ioid] ... viscosity = viscosity[ioid] end if CUG 2008 Crossing the Boundaries

Mesh Distribution • Here the i/o processor has the global data and needs to send different portions of it to each image. • Added complication that the local bounds of the data may be different on different images. • In current version mesh is 2-d, projected CUG 2008 Crossing the Boundaries

Mesh distribution (2) • MPI version: if (ioproc) Find start and end indices of mesh for process j Pack global mesh(start:end) into buffer Send buffer to process j else Receive buffer Unpack to local mesh endif CUG 2008 Crossing the Boundaries

Mesh Distribution (3) • CAF version: do j=1, num_images() find start and end for image j local(:)[j] = global(start:end) end do CUG 2008 Crossing the Boundaries

Mesh distribution(4) • Better: find start and end for this_image() local(:) = global[ioid](start:end) • Advantage – Possible parallelism if multiple access to global is supported CUG 2008 Crossing the Boundaries

Halo Exchange • The MPI version again packs the data to exchange into a buffer, sends it to the appropriate neighbour which unpacks it. • The coarray version uses simple co- addressing: • Example: sending data to the image one x-step lower (procmx): CUG 2008 Crossing the Boundaries

Halo Exchange (2) type(co_double_3) a[nxim, nyim,*] integer nx[nxim, nyim,*], d(3), nxp d = this_image(a) if (d(1) .gt. 1) then nxp = array(nx[d(1)-1,d(2),d(3)] a[d(1)-1,d(2),d(3)]%array(nxp+1:nxp+xhalo,:,:)& = a%array(1:xhalo,:,:) end if • Separate routines cover x, y and z exchanges, each does both directions. CUG 2008 Crossing the Boundaries

Caveat • Synchronization is important • MPI often implies synchronization • Coarrays need it to be made explicit (though for some algorithms it can be left out or reduced) CUG 2008 Crossing the Boundaries

Code Comparison Summary + Simple assignment statements replace MPI calls + No need to pack and unpack data (scope for programming errors) + Simpler, shorter, more maintainable code - Added indirection through allocatable components CUG 2008 Crossing the Boundaries

BUT… • How does the code perform? • Have we gained clarity and lost speed? • SBLI is a mature code and a lot of work has gone into making its MPI work as efficiently as possible. CUG 2008 Crossing the Boundaries

Experiments • Small mesh (120 cubed) • Small Cray X1E • Run for 100 timesteps so overall time is dominated by the exchange time (realistic for how this code would work in production). CUG 2008 Crossing the Boundaries

Speedup Speedup relative to one processor 100 10 linear MPI Co-Array 1 1 10 100 Number of Processors CUG 2008 Crossing the Boundaries

Performance • Comparable with MPI (a few percent lower at most, but within the range of variability of individual runs) • Scaling behaviour unaffected, but note this is a problem that scales strangely from 4 to 8 images, probably for memory reasons. CUG 2008 Crossing the Boundaries

Optimization • MPI is powerful and contains many ways of communicating which can be used by the programmer to optimize a code. • Coarrays are simple and give plenty of scope for compiler optimization. • But… CUG 2008 Crossing the Boundaries

Optimization(2) • There are a few things one can do: – Order of memory accesses, just as in serial Fortran, can have an impact – “push” vs “pull” • Which side of the assignment statement should one have the co-array reference? • Push: a[k] = a • Pull: a = a[k] CUG 2008 Crossing the Boundaries

“push” vs “pull” • Experiments: distribute 240 cubed mesh Number of push pull Processors 8 2.289 1.492 16 2.154 1.406 32 1.427 0.593 64 1.018 0.644 • Pulling data is more efficient, especially at high processor counts CUG 2008 Crossing the Boundaries

“push” vs “pull” (2) • These experiments are indicative only • Low impact on current code • If your code does a lot of scatter/gather this is an area to optimize CUG 2008 Crossing the Boundaries

Conclusions • Coarray Fortran provides a language which: – Expresses parallelism in a “natural”, Fortran-like manner – Produces transparent, maintainable code – Is easy to learn by extending existing language skills – Provides comparable performance with mature MPI code in this case CUG 2008 Crossing the Boundaries

Acknowledgements • Thanks to: – Bill Long of Cray for access to their X1 and X1E machines – Mike Ashworth and Roderick Johnstone of STFC Daresbury Laboratory for access to the SBLI code. CUG 2008 Crossing the Boundaries

Mig Migrating A ing A Scientific A Scientific Applica plication - PowerPoint PPT Presentation

Mig Migrating A ing A Scientific A Scientific Applica plication ion fr from MPI to Coar om MPI to Coarrays ys John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK CUG 2008 Crossing the Boundaries Why and

MIG Encoders BEGE MIG NOVA+ BEGE Power Transmission Anton Philipsweg 30 2171 KX Sassenheim The

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

Migrating Legacy.com Migrating a top 50 most visited site in the U.S. onto Drupal - Legacy.com

NWC-CAC | June 29, 2017 MIG Placemaking Team The MIG Team PlacemakingWhy it Matters What

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Migrating GNOME to Git Migrating GNOME to Git (a human & technical perspective) Frdric

Migrating to PostgreSQL Boriss Mejas Consultant - 2ndQuadrant Air Guitar Player https://www.

CDBG/SSG CDBG/SSG Pr Pre-Applica pplication tion Meeting Meeting October 8, 2020 Applica

Dialog in NLP applica.ons VELJKO MILJANIC Overview Applica(ons in S2S

SUPPORT FOR FIELD APPLICATIONS GMAW-FCAW Orbi-MIG II-K Head Field Applications West Closure, New

Migrating The Language Archive to a new repository solution PAUL TRILSBEEK MAX PLANCK INSTITUTE

Migrating Terrible Content to Drupal 8

Migrating to Vitess at (Slack) Scale Michael Demmer Percona Live Europe 2017 This is a (brief)

Migrating from Adobe Connect The Victory of FOSS Over Proprietary Software Jess Portnoy

Migrating to Vitess at (Slack) Scale Michael Demmer Percona Live - April 2018 This is a

Rough viscosity solutions and applications to SPDEs P.K. Friz (joint work with M. Caruana, H.

Outline Outline Viscous Flow Viscous Flow Turbulence Turbulence Mixing

Energy dissipating structures in the vanishing viscosity limit of 2D incompressible flows with

Parallelization of stencil-based methods: step-69 (and beyond) Martin Kronbichler 1 ,

Viscosity the stickiness of fluids v ( y ) ratio between shear and

Values and Vision Workshop Tuesday, November 19, 2019 at 6:30 PM Convention Center 159 Main

Arts & Sciences Collective Vision 2020 - 2025 Brainstorming Session Sept. 11 and 17, 2019

Visual Debugger for Jupyter Notebooks: Myth or Reality? Elizaveta Shashkova EuroPython 2019

Mig Migrating A ing A Scientific A Scientific Applica plication - PowerPoint PPT Presentation

Mig Migrating A ing A Scientific A Scientific Applica plication ion fr from MPI to Coar om MPI to Coarrays ys John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK CUG 2008 Crossing the Boundaries Why and

MIG Encoders BEGE MIG NOVA+ BEGE Power Transmission Anton Philipsweg 30 2171 KX Sassenheim The

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

Migrating Legacy.com Migrating a top 50 most visited site in the U.S. onto Drupal - Legacy.com

NWC-CAC | June 29, 2017 MIG Placemaking Team The MIG Team PlacemakingWhy it Matters What

Spelling, Punctuation and Grammar Suffixes -ing Year One SPaG | Suffixes -ing Suffixes Suffixes

Migrating GNOME to Git Migrating GNOME to Git (a human &amp; technical perspective) Frdric

Migrating to PostgreSQL Boriss Mejas Consultant - 2ndQuadrant Air Guitar Player https://www.

CDBG/SSG CDBG/SSG Pr Pre-Applica pplication tion Meeting Meeting October 8, 2020 Applica

Dialog in NLP applica.ons VELJKO MILJANIC Overview Applica(ons in S2S

SUPPORT FOR FIELD APPLICATIONS GMAW-FCAW Orbi-MIG II-K Head Field Applications West Closure, New

Migrating The Language Archive to a new repository solution PAUL TRILSBEEK MAX PLANCK INSTITUTE

Migrating Terrible Content to Drupal 8

Migrating to Vitess at (Slack) Scale Michael Demmer Percona Live Europe 2017 This is a (brief)

Migrating from Adobe Connect The Victory of FOSS Over Proprietary Software Jess Portnoy

Migrating to Vitess at (Slack) Scale Michael Demmer Percona Live - April 2018 This is a

Rough viscosity solutions and applications to SPDEs P.K. Friz (joint work with M. Caruana, H.

Outline Outline Viscous Flow Viscous Flow Turbulence Turbulence Mixing

Energy dissipating structures in the vanishing viscosity limit of 2D incompressible flows with

Parallelization of stencil-based methods: step-69 (and beyond) Martin Kronbichler 1 ,

Viscosity the stickiness of fluids v ( y ) ratio between shear and

Values and Vision Workshop Tuesday, November 19, 2019 at 6:30 PM Convention Center 159 Main

Arts &amp; Sciences Collective Vision 2020 - 2025 Brainstorming Session Sept. 11 and 17, 2019

Visual Debugger for Jupyter Notebooks: Myth or Reality? Elizaveta Shashkova EuroPython 2019

Migrating GNOME to Git Migrating GNOME to Git (a human & technical perspective) Frdric

Arts & Sciences Collective Vision 2020 - 2025 Brainstorming Session Sept. 11 and 17, 2019