Multi-scale Application Software Development Ecosystem on ARM Dr. - - PowerPoint PPT Presentation

multi scale application software development ecosystem on
SMART_READER_LITE
LIVE PREVIEW

Multi-scale Application Software Development Ecosystem on ARM Dr. - - PowerPoint PPT Presentation

Multi-scale Application Software Development Ecosystem on ARM Dr. Xiaohu Guo STFC Hartree Centre, UK Daresbury Laboratory UK Astronomy Technology Daresbury Science and Innovation Campus Centre, Edinburgh, Scotland Warrington, Cheshire Polaris


slide-1
SLIDE 1

Multi-scale Application Software Development Ecosystem on ARM

  • Dr. Xiaohu Guo

STFC Hartree Centre, UK

slide-2
SLIDE 2

Joint Astronomy Centre Hawaii Isaac Newton Group of Telescopes La Palma UK Astronomy Technology Centre, Edinburgh, Scotland Polaris House Swindon, Wiltshire Chilbolton Observatory Stockbridge, Hampshire Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire Rutherford Appleton Laboratory Harwell Science and Innovation Campus Didcot, Oxfordshire

STFC’s Sites

slide-3
SLIDE 3

Overview

  • Multiscale simulation framework
  • Our early porting experience on Isambard

ARM thunderX2 system

  • Discussion and the future work
slide-4
SLIDE 4

Multiple Scales of Materials Modelling

MS&MD via DL_POLY DPD & LB via DL_MESO KMC via DL_AKMC

FF mapping via DL_FIELD

MC via DL_MONTE Coarse graining via DL_CGMAP

QM/MM bridging via #ChemShell

slide-5
SLIDE 5

Multi-scale Simulation Software Eco-system

slide-6
SLIDE 6

web-registratio n

web-registration 2016 Downloads

  • UK

– 19.2%

  • EU-UK

– 18.7%

  • USA

– 11.4%

  • India

– 10.3%

  • China – 9.4%
  • France

– 5.9%

  • London- 5.5%
  • Sofia
  • 2.0%
  • Beijing - 1.8%

DL_POLY_ 3 DL_POLY_ 4 DL_POLY_ 2 DL_POLY_ C

2010 :: DL_POLY (2+3+MULTI)

  • 1,000 (list end)

2017 :: DL_POLY_4

  • 4,200 (list start 2011)

Annual Downloads & Valid eMail List Size

User Community

slide-7
SLIDE 7

Proteins solvation & binding DNA strands dynamics Membranes’ processes Drug polymorphs & discovery

DL_POLY: MD code

Crystalline & Amorphous Solids – damage and recovery

Dynamic processes in Metal-Organic & Organic Frameworks Dynamics at Interfaces & of Phase Transformations

Thanks to Dr. Ilian Todorov

slide-8
SLIDE 8

DL_MESO: Meso scale simulation Toolkit

  • General-purpose, highly-scalable mesoscopic simulation software

(developed for CCP5/UKCOMES)

– Lattice Boltzmann Equation (LBE) – Dissipative Particle Dynamics (DPD)

  • >800 academic registrations (science and engineering)
  • Extensively used for Computer Aided Formulation (CAF) project with

TSB-funded industrial consortium

Thanks to Dr. Michael Seaton

slide-9
SLIDE 9

IMPORTANCE: Hartree Centre key technologies, align with SCD missions and STFC global challenge schemes.

FEM SPH/ISPH Nuclear Schlumberger oil reservoir NERC ocean roadmap EPSRC MAGIC Wave impact on BP oil rig Manchester Bob Tsunami CCP-WSI

CFD software in macro-scale region

slide-10
SLIDE 10

Concurrent Coupling Toolkit : MUI

Data Points Data Exchange Interface DPD and SPH Coupling

Yu-Hang Tang, etc. Multiscale Universal Interface: A concurrent framework for coupling heterogeneous solvers, Journal of Computational Physics, Volume 297, 2015, Pages 13-31

slide-11
SLIDE 11

Sparse/Dense Linear Solver FEM, FDM, FVM MD, DPD, SPH/ISPH Unstructured Mesh Pre/Post Processing Particle Pre/Post Processing Mesh topology Management Mesh Adaptivity Basic Math

  • perators

FEM Matrix Assembly Basic particle Math operators Nearest Neighbour List Search Mesh/Particles Reordering Particle Refinement MPI OpenMP CUDA OpenCL OpenACC C/C++ Fortran Python DDM/DLB

Algorithms Abstraction and Programming Implementation

slide-12
SLIDE 12

Porting the software framework On ARM Platform

slide-13
SLIDE 13

Isambard system specification

  • 10,752 Armv8 cores (168 x 2 x 32)
  • Cavium ThunderX2 32 core 2.1GHz
  • Cray XC50 ‘Scout’ form factor
  • High-speed Aries interconnect
  • Cray HPC optimised software stack
  • CCE, Cray MPI, math libraries, CrayPAT,

  • Phase 2 (the Arm part):
  • Delivered Oct 22nd
  • Handed over Oct 29th
  • Accepted Nov 9th!

Isambard PI: Prof Simon McIntosh-Smith University of Bristol / GW4 Alliance

slide-14
SLIDE 14

Performance on mini-apps (node level comparisons)

Thanks to Prof. Simon McIntosh-Smith

slide-15
SLIDE 15

Single node performance results

https://github.com/UoB-HPC/benchmarks

Thanks to Prof. Simon McIntosh-Smith

slide-16
SLIDE 16

Earlier DLPOLY Performance Results

slide-17
SLIDE 17

Earlier DLMESO Performance Results

slide-18
SLIDE 18

Earlier ISPH Performance Results

slide-19
SLIDE 19

Performance comparing with our Scafellpike

slide-20
SLIDE 20

Current Arm software ecosystem

Three mature compiler suites:

GNU (gcc, g++, gfortran) Arm HPC Compilers based on LLVM (armclang, armclang++, armflang) Cray Compiling Environment (CCE)

Three mature sets of math libraries:

OpenBLAS + FFTW Arm Performance Libraries (BLAS, LAPACK, FFT) Cray LibSci + Cray FFTW

Multiple performance analysis and debugging tools:

Arm Forge (MAP + DDT, formerly Allinea) CrayPAT / perftools, CCDB, gdb4hpc, etc TAU, Scalasca, Score-P, PAPI, MPE

slide-21
SLIDE 21

More ARM productivity features needed !

  • ARM processor does not trap integer divide by Zero
  • Architectural decision – no signal thrown
  • Will return zero (1/0 == 0)
  • Do trap float divide by zero SIG-FPE
  • Need latest autoconf and automake, update your config.guess

and config.sub

  • Weak memory model:
  • you threading lock-free implementation may not work here !
  • How can we use Nvidia GPUs ?
  • More math libraries ?
  • DD/DLB libraries ?
  • Sparse linear solvers ? Particular theaded libraries ?
slide-22
SLIDE 22

Software Ecosystem on Isambard.

slide-23
SLIDE 23

Motivation: Performance Optimization Space

slide-24
SLIDE 24

These are early results, generated quickly in the first few days with no time to tune scaling etc. We expect the results to improve even further as we continue to work on them The software stack has been robust, reliable and high-quality (both the commercial and open source parts)

Summary and conclusion

slide-25
SLIDE 25
slide-26
SLIDE 26

Thanks, Questions ?

slide-27
SLIDE 27

GROMACS scalability, up to 8,192 cores

http://gw4.ac.uk/is ambard/ Thanks to Prof. Simon McIntosh-Smith