An Evaluation of UPC in the Ludwig Application Alan Gray EPCC, The - PowerPoint PPT Presentation

An Evaluation of UPC in the Ludwig Application Alan Gray EPCC, The University of Edinburgh CUG 2009, Atlanta

Introduction • Modern HPC architectures comprise multiple nodes – connected via interconnect • Applications must utilise these multiple nodes to solve single problem – Mechanism needed for each process to acquire remote data • Message passing (MPI) has become de-facto standard – need for complex coding to manage the message passing – performance overheads due to underlying 2-way communication • Novel PGAS languages offer intuitive access of remote data – Potentially increase productivity and performance in HPC • UPC (arguably) most mature and portable PGAS language today 4 th May 2009 CUG 2009, Atlanta 2

Introduction (cont.) • AIM: evaluate UPC as a replacement of MPI within real application (LUDWIG) – measure performance • Full conversion beyond scope of work – But UPC and MPI can co-exist: can target area of interest • UPC fully supported at hardware level on Cray X2 – This study uses X2 component of HECToR (112 processors) – UPC will be fully supported on XT after upgrade to GEMINI interconnect 4 th May 2009 CUG 2009, Atlanta 3

UPC • Consider simplistic case: 8 elements distributed between 2 processes – Where updates require neighbouring values • Regular C array (local): int p[6]; • UPC shared array (global): shared [8/THREADS] int s[8]; 4 th May 2009 CUG 2009, Atlanta 4 4

LUDWIG • LUDWIG uses Lattice-Boltzmann models to enable simulation of hydrodynamics of complex fluids (mixtures of fluids, solids/fluids) in 3D – Jean Christophe Desplat, Dublin Institute for Advanced Studies – Kevin Stratford, Mike Cates, The University of Edinburgh – Applications include personal care products, e.g. shampoo 4 th May 2009 CUG 2009, Atlanta 5

LUDWIG • Original Code: – Halo cells only accessed in Propagation 4 th May 2009 CUG 2009, Atlanta 6

LUDWIG Conversion • Main data structure is array site[] , where – each element corresponds to a lattice site – consists of a struct containing physical variables • Original Code Propagation section: updates require values from neighbouring sites Loop over index … site[index].f[0]=site[index-1].f[0]+…; … • Halo cells + message passing halo swap routines required 4 th May 2009 CUG 2009, Atlanta 7

LUDWIG Conversion • Strategy: mirror site with UPC Shared structure s_site . – New functionality: sindex[index] Mapping of local ( site ) - global ( s_site ) index put_site_in_shared() Copy data local -> shared get_site_from_shared() Copy data shared -> local • Allows for specific area of application to be targeted – Propagation section adapted to work with shared arrays Loop over index … s_site[sindex[index]].f[0] =s_site[sindex[index-1]].f[0]+…; … • No halo cells/swaps needed, remote accesses done directly 4 th May 2009 CUG 2009, Atlanta 8

LUDWIG Conversion • Modified LUDWIG code: 4 th May 2009 CUG 2009, Atlanta 9

Performance results 4 th May 2009 CUG 2009, Atlanta 10

Performance results • Naïve adaptation has substantial negative impact • Underlying communication is not cause of this • Shared pointer dereferencing more costly than for regular pointers • Optimised version: access memory through regular C pointers where possible – Obtained by casting from shared pointers – Boundary updates must still use shared array accesses to get remote data. 4 th May 2009 CUG 2009, Atlanta 12

Conclusions • UPC allows for intuitive access to remote data – Potentially increasing performance and productivity in HPC • LUDWIG adapted to utilise UPC functionality – Focusing on key section – Shared structures remove need for complicated halo swaps • Significant performance degradation with naïve adaptation – Due to sensitivity to costly shared pointer operations • Optimised version uses regular C pointers to access data where possible – Performs similarly to (but slightly worse than) MPI version – remaining degradation likely due to remaining shared pointer operations • Would be interesting to test on larger system (inc. future Cray XT) 4 th May 2009 CUG 2009, Atlanta 14

An Evaluation of UPC in the Ludwig Application Alan Gray EPCC, The - PowerPoint PPT Presentation

An Evaluation of UPC in the Ludwig Application Alan Gray EPCC, The University of Edinburgh CUG 2009, Atlanta Introduction Modern HPC architectures comprise multiple nodes connected via interconnect Applications must utilise these

CoMo-UPC TMA evaluation service @ UPC Pere Barlet-Ros Josep Sanjus-Cuxart Advanced Broadband

KnowledgeWeb UPC Introduction Semantic Web Education Activities and Potential Contributions

Mobile Agents for Database Applications Ludwig Klug Database Agents 1 Ludwig Klug Ludwig

EGNOS TUTORIAL Research g roup of A stronomy and GE omatics (gAGE/UPC) Universitat Politcnica

4. Multiagent Systems Design Part 6: Coordination (I). Explicit Coordination ems (SMA-UPC)

3. Reasoning in Agents Part 2: BDI Agents ems (SMA-UPC) Javier Vzquez-Salceda q Multiagent

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

RFID UPC Wallace Flint first suggested an automated checkout in 1932 UPC bar code formats

4. Multiagent Systems Design Part 4: Coordination models (I): ( ) Social Models ems (SMA-UPC)

How UPC is good for Primary Care Clinicians I. How UPC is good for Vermonters II. Primary Care

Pr Prog ogram am UPC Collec UPC Collection tion Na National tional WIC Associa WIC

stereovision Miguel Ares and Santiago Royo (miguel.ares@oo.upc.edu , santiago.royo@upc.edu) COST

Requirements Reuse and Patterns: A Survey GESSI Cristina Palomares (GESSI - UPC) Carme Quer

I need to draw circuits and diagrams! Orestes Mas (orestes@tsc.upc.edu) - UPC Quality diagrams

2. Knowledge Representation and Communication Part 1 Part 1: ems (SMA-UPC) Knowledge

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto

Mirroring Technology through the World Data Centers David Clark WDC Panel 18 th International

Device-Mapper Remote Replication Target Linux-Kongress Hamburg 2008 Heinz Mauelshagen Consulting

KDE performance (again) Lubo Luk 4.7.2010 | Tampere, Finland | Akademy 2010 Do we know our

Why Open Data May Threaten Your Privacy Sebastian Pape, Jetzabel Serna-Olvera, Welderufael B.

Physics 2D Lecture Slides Sep 26 Vivek Sharma UCSD Physics Modern Physics (PHYS 2D)

Preprint Archives in India PREPRINTS Preprint archives provide a platform for permanently

Deterministic walks on a square lattice Ra ul Rechtman Instituto de Energ as Renovables,

Overv rview of f KAGRA Masatake OHASHI (ICRR) on behalf of KAGRA collaboration Overview of

Sambuz

Useful Links

Newsletter

Mail Us