MPAS on GPUs Using OpenACC
Supreeth Suresh Software Engineer II Special Technical Projects (STP) Group National Center for Atmospheric Research
26th September, 2019
MPAS on GPUs Using OpenACC Supreeth Suresh Software Engineer II - - PowerPoint PPT Presentation
MPAS on GPUs Using OpenACC Supreeth Suresh Software Engineer II Special Technical Projects (STP) Group National Center for Atmospheric Research 26 th September, 2019 Outline Team Introduction System and Software Specs
Supreeth Suresh Software Engineer II Special Technical Projects (STP) Group National Center for Atmospheric Research
26th September, 2019
2
3
4
Horizontal Vertical
5
There are 100s of halo exchanges /timestep!
6
Execution time- Physics: 45-50% DyCore: 50-55% Lines of Code- Physics: 110,000 DyCore: 10,000
MicroPhysics WSM6(9.62%) Physics scheme Dynamic Core MPAS Boundary Layer YSU(1.55%) Gravity Wave Drag GWDO(0.71%) Radiation Short Wave RRTMG_SW(18.83%) Radiation Long Wave RRTMG_LW(16.43%) Convection New Tiedtke(4.19%)
Flow Diagram by KISTI
7
resolution (655k grid points, dt=150s) , 15 km resolution (2.6M grid points, dt=90s), 10 km resolution (5.8M grid points, dt=60s) , 5 km resolution (23M grid points, dt=30s)
8
Boundary Layer, Monin-Obhukov Surface layer, RRTMG radiation, Xu Randall Cloud Fraction
9
10
MPI & NOAH control path CPU – SW/LW Rad & NOAH GPU – everything else Proc 0 Proc 1 Node Asynch I/O process Idle processor
11
– Bottleneck was send/recv buffer allocations on CPU
12
Coalescing these 9 kernels dropped MPI overhead by 50%
13
14
1 10 100 1000 1 10 100 1000
Init time (sec) AC922 nodes
MPAS Initialization Scaling on Summit for 15 & 10 km
MPAS 15 km MPAS 10 km
15
16
0.2 0.4 0.6 0.8 1 1.2 50 100 150 200 250 300 350 400 TIME PER TIMESTEP IN SECS NUMBER OF GPUS OR DUAL SOCKET CPU NODES
Moist Dynamics Strong Scaling on Summit and Cheyenne at 10 km
Strong scaling with 5.8M points on GPU Strong scaling with 5.8M points on CPU
17
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 200 400 600 800 1000 1200 1400 1600 1800 TIME PER TIMESTEP IN SECS NUMBER OF GPUS
18
19
0.1 0.2 0.3 0.4 0.5 0.6 100 200 300 400 500 600 Time per timestep in seconds Number of GPUs/MPI ranks
40k Points per GPU 80k Points per GPU
20
20
Grep search help string Preprocessor Directive to
Flip GPU/CPU based on requirement
21
22
22
MPAS-A estimated timestep budget for 40k pts per GPU
dynamics (dry) dynamics (moist) physics radiation comms halo comms H<->D data transfer
0.139 sec 0.03 sec 0.085 sec 0.003 sec 0.06 sec 0.018 sec
24
25
26
1 10 100 8 16 32 64 128 256 512
Days/hour Number of GPUs
15 km 10 km
AVEC forecast threshold
27
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 100 150 200 250 300 350 Time in sec per timestep Number of GPUs
Moist dynamics with 6 tracers Dry dynamics