Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient using MPI Datatypes
Torsten Hoefler, Steven Gottlieb
Transform and Conjugate Gradient using MPI Datatypes Torsten Hoefler - - PowerPoint PPT Presentation
Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient using MPI Datatypes Torsten Hoefler , Steven Gottlieb EuroMPI 2010, Stuttgart, Germany, Sep. 13 th 2010 Quick MPI Datatype Introduction (de)serialize arbitrary
Torsten Hoefler, Steven Gottlieb
implementation (many unexplored possibilities!)
– Size of DDT signature (total occupied bytes) – Important for matching (signatures must match)
– Where does the DDT start
– Size of the DDT
1. Type_struct for complex numbers 2. Type_contiguous for blocks 3. Type_vector for stride
– Type_struct (complex) – Type_vector (no contiguous, local transpose)
Reproducible peak at P=192 Scaling stops w/o datatypes
Scaling stops w/o datatypes DDT increase scalability
fundamental laws of physics
lattice field theories (QCD & Beyond Standard Model)
14 Performance Modeling and Simulation on Blue Waters
– su3_vector, half_wilson_vector, and su3_matrix – Even and odd (checkerboard layout) – Eight directions – 48 contig/hvector DDTs total (stored in 3d array)
– Up to a factor of 3.8 or 18% speedup! – Requires some implementation effort
– Declaration and extent tricks make it hard to debug