System Software for Armv8-A with SVE
Yutaka Ishikawa, Leader of FLAGSHIP2020 Project RIKEN Center for Computational Science
9:00– 9:25 14th of January, 2019 Open Source HPC Collaboration on Arm Architecture Linaro workshop, Guangzhou , China
System Software for Armv8-A with SVE Yutaka Ishikawa, Leader of - - PowerPoint PPT Presentation
System Software for Armv8-A with SVE Yutaka Ishikawa, Leader of FLAGSHIP2020 Project RIKEN Center for Computational Science 9:00 9:25 14 th of January, 2019 Open Source HPC Collaboration on Arm Architecture Linaro workshop, Guangzhou , China
9:00– 9:25 14th of January, 2019 Open Source HPC Collaboration on Arm Architecture Linaro workshop, Guangzhou , China
20019/1/14
K, and
in order to solve social and science issues in Japan
BSC, INRIA, RIKEN)
and their R&D organizations.
2
RIKEN Center for Computational Science
20019/1/14
K, and
in order to solve social and science issues in Japan
BSC, INRIA, RIKEN)
and their R&D organizations.
3
Target Applications Program Brief description
① GENESIS MD for proteins ② Genomon Genome processing (Genome alignment) ③ GAMERA Earthquake simulator (FEM in unstructured & structured grid) ④ NICAM+LETK Weather prediction system using Big data (structured grid stencil & ensemble Kalman filter) ⑤ NTChem molecular electronic (structure calculation) ⑥ FFB Large Eddy Simulation (unstructured grid) ⑦ RSDFT an ab-initio program (density functional theory) ⑧ Adventure Computational Mechanics System for Large Scale Analysis and Design (unstructured grid) ⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo)
RIKEN Center for Computational Science
Courtesy of FUJITSU LIMITED
20019/1/14
4
Architecture Armv8.2-A SVE (512 bit SIMD) Core 48 cores for compute and 2/4 for OS activities DP: 2.7+ TF, SP: 5.4+ TF, HP: 10.8 TF Cache L1D: 64 KiB, 4 way, 230 GB/s(load), 115 GB/s (store) L2: 8 MiB, 16way, 115 GB/s (load), 57 GB/s (store) Memory HBM2 32 GiB, 1024 GB/s Interconnect TofuD (28 Gbps x 2 lane x 10 port) I/O PCIe Gen3 x 16 lane Technology 7nm FinFET
Symposium on High Performance Chips, San Jose, August 21, 2018.
CMG: CPU Memory Group NOC: Network On Chip
RIKEN Center for Computational Science
by 6D mesh/torus Interconnect
5
20019/1/14
RIKEN Center for Computational Science
20019/1/14
Providing wide range of applications/tools/libraries/compilers
Parallel Programming Environments XMP, FDPS, … Armv8 + SVE
Multi-Kernel System: Linux and light-weight kernel (McKernel)
Batch Job System Application-oriente d File I/O Communicati
MPI Parallel File System Tuning and Debugging Tools Hierarchical File System Low Level Communication
File I/O for Hierarchical Storage LLIO
Fortran, C/C++, OpenMP, Java, … Math libraries Process/Thre ad PIP
6
RIKEN Center for Computational Science
provided by Fujitsu
Clang extensions
GCC, LLVM, and Arm compiler will be also available
Specific Library provided by RIKEN
Simulator)
distributor
Scalableは筑波大・東大が運用する Oakforest-PACS上でも稼働している。 20019/1/14
7 RIKEN Center for Computational Science
8
20019/1/14
RIKEN Center for Computational Science
memory)
HPC applications
CPU cores, physical memory, …
load, boot, destroy, etc..)
and notification
calls, e.g., process and memory management, and the rest are offloaded to Linux
Very simple memory management
Thin LWK
Process/Thread management
General scheduler Complex
Linu x
TCP stack
VFS
File Sys Driers
Memory
Interrupt
System daemons
HPC Applications
Parti tion Parti tion
In-situ non HPC application
Linux API (glibc, /sys/, /proc/)
Core Core Core Core Core Core 20019/1/14 9
(Experiments)
Interface for Heterogeneous Kernels
Linux without any recompilation
RIKEN Center for Computational Science
– daemons for job scheduler and etc. run on Linux
Finish
App A, requiring LWK-without-schedu ler, Is invoked App B, requiring LWK-with-scheduler, Is invoked
Finish
App C, using full Linux capability, Is invoked
Finish 20019/1/14
10
RIKEN Center for Computational Science
11
Oakforest-PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo
20019/1/14
Balazs Gerofi, Rolf Riesen, Robert W. Wisniewski and Yutaka Ishikawa: “Toward Full Specialization of the HPC System Software Stack: Reconciling Application Containers and Lightweight Multi-kernels”, International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), 2017
RIKEN Center for Computational Science
20019/1/14
RIKEN Center for Computational Science
12
CY2017 CY2018 CY2019 CY2020 CY2021 Specification Optimization Guidebook RIKEN Performance Evaluation Environment Early Access Program Publishing Incrementally Performance estimation tool using FX100 RIKEN Simulator Installation, and Tuning
Manufacturing Design and Implementation
Operation Armv8-A + SVE Overview Detailed hardware info.
NOW
20019/1/14
RIKEN Center for Computational Science
13
14
a command queue, the Tofu network interface processes posted commands.
Normal Mode and Session Mode. In the Session Mode, a special register called Scheduling Pointer plays important role.
the command queue are processed until reaching an entry pointed by the Scheduling
packet sent by remote node
20019/1/14
15
RIKEN Center for Computational Science
16 MPI_Neighbor_alltoall_init(sbuf, count, MPI_DOUBLE, rbuf, MPI_DOUBLE, comm, &req[1]); for (I = 0; …….) { /` Computation `/ MPI_Start(req); /* Computation */ MPI_Wait( req, stat); }
Tofu2 Offload
Direct Transfers between User Buffers Completely Asynchronous Progression
Persistent pt2pt. (≒Non-blocking pt2pt.)
Latency [us] Message Size [Bytes] Latency [us]
for communication progress. Thus computation and communication
version.
20019/1/14
interface,” Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI2017), ACM, 2017.
Interconnect,” SC17, 2017 (poster)
Communication using NIC Offloading," SWOPP'18, HPC165, 2018. (In Japanese)
RIKEN Center for Computational Science
17
NetCDF, NetCDF-fortran, PnetCDF, scalasca, SCOTCH, Zoltan, openmpi1.8,
Blat, TopHat, TopHat2, MapSplice2, MPDyn2, ELPA, Trillinos, Eigen3, mesa, MesaGLUT, libxml2, C-LIME, EigenExa
Biobambam, Picard, GMT, GrADS, HDF-EOS, wgrib, GRIB API, Climate data Operators
20019/1/14
RIKEN Center for Computational Science
18
FrontFlow/Red, FrontISTR, GAMES, GENESIS, gromacs, GROMACS, HIVE, LAMMPS, MapSplice2, MODYLAS, NEURON, octa, OpenFOAM, PBVR, Picard, PIMD, quantum ESPRESSO, rDock, Samtools, SCALE, Star, TopHat, TopHat 2, WHEEL, xTAPP,
NuSDAS1.3, octa, fdps, Zoltan, cgns, Polylib, libsim
VPS solver ( PAM-CRASH ), Helyx, HEETAH, iconCFD, LaBS, JMAG, MIZUHO, NuFD, VASP, VSOP
20019/1/14
RIKEN Center for Computational Science