Stan Posey, Program Manager, ESS Domain NVIDIA (HQ), Santa Clara, CA Raghu Kumar, PhD, Software Engineer, Developer Technology NVIDIA, Boulder, CO
GPU Benefits for Earth System Science Stan Posey, Program Manager, - - PowerPoint PPT Presentation
GPU Benefits for Earth System Science Stan Posey, Program Manager, - - PowerPoint PPT Presentation
GPU Benefits for Earth System Science Stan Posey, Program Manager, ESS Domain NVIDIA (HQ), Santa Clara, CA Raghu Kumar, PhD, Software Engineer, Developer Technology NVIDIA, Boulder, CO ACHIEVEMENTS IN HPC AND AI TOPICS NUMERICAL MODEL
2
TOPICS
- ACHIEVEMENTS IN HPC AND AI
- NUMERICAL MODEL DEVELOPMENTS
- GPU UPDATE ON MPAS-A
3
Piz Daint Europe’s Fastest 5,704 GPUs| 21 PF ORNL Summit #1 Top 500 27,648 GPUs| 144 PF ABCI Japan’s Fastest 4,352 GPUs| 20 PF ENI HPC4 Fastest Industrial 3,200 GPUs| 12 PF LLNL Sierra #2 Top 500 17,280 GPUs| 95 PF
World-Leading HPC Systems Deploy NVIDIA GPUs
NERSC-9 HPC System Based on “Volta-Next” GPU During 2020:
4
Segmentation of Tropical Storms and Atmospheric Rivers on Summit using convolutional neural networks.
SC18 Gordon Bell Award: NERSC and NVIDIA Team
5
Nearly perfect weak scaling up to 25k GPUS. 1 Exa-flop of performance. 100 years of climate model data in hours Demonstrates the power of this approach for large-scale data analysis
SC18 Gordon Bell Award: NERSC and NVIDIA Team
6
SC18 NVIDIA Announcements on NWP Models
*Speedups comparing 2 x Skylake CPU vs. 4 x V100 GPU
*
7
New NVIDIA AI Tech Centre at Reading University
https://blogs.nvidia.com/blog/2019/06/19/ai-technology-center-uk/
The Advanced Computing for Environmental Science (ACES) research group conducts cutting-edge research in computer science to accelerate environmental science. Environmental science depends on the analysis of large volumes of observational data and on sophisticated simulation schemes, coupling different physics on multiple time and special scales, demanding both supercomputing and specialised data analysis systems. ACES research themes address the future of the relevant computing and data systems. ACES is based in the Computer Science Department at the University of Reading.
8
TOPICS
- ACHIEVEMENTS IN HPC AND AI
- NUMERICAL MODEL DEVELOPMENTS
- GPU UPDATE ON MPAS-A
9
Model Organizations Funding Source
E3SM-EAM, SAM US DOE: ORNL, SNL E3SM, ECP MPAS-A NCAR, UWyo, KISTI, IBM WACA II FV3/UFS NOAA SENA NUMA/NEPTUNE US Naval Res Lab, NPS ONR IFS ECMWF ESCAPE GungHo/LFRic MetOffice, STFC PSyclone ICON DWD, MPI-M, CSCS, MCH PASC ENIAC KIM KIAPS KMA CLIMA CLiMA (NASA JPL, MIT, NPS) Private, US NSF FV3 Vulcan, UW/Bretherton Private COSMO MCH, CSCS, DWD PASC GridTools AceCAST-WRF TempoQuest Venture backed
NVIDIA Collaborations With Atmospheric Models
Global: Regional:
10
WRFg Based on ARW release 3.8.1 Several science-ready features:
Full WRF on GPU; 21 physics options Complete nesting functionality
Request download: https://wrfg.net
WRFg Collaboration with TempoQuest
WRFg Physics Options (21)
Microphysics Option
Kessler 1 WSM6 6 Thompson 8 Morrison 10 Aerosol-aware Thompson 28
Radiation
Dudhia (sw) 1 RRTMG (lw + sw) 4
Planetary boundary layer
YSU 1 MYNN 5
Surface layer
Revised MM5 1 MYNN 5
Land surface
5-layer TDS 1 Unified Noah 2 RUC 3
Cumulus
Kain-Fritsch 1; 11; 99 BMJ 2 Grell-Deveni 93 GRIMS Shallow Cumulus SHCU=3
11
GPU Performance Study for the WRF Model on Summit
- Jeff Adie, NVIDIA, Gökhan Sever, Rajeev Jain, DOE Argonne NL, and Stan Posey, NVIDIA
NV-WRFg Summit Scaling on 512 Nodes / 3,072 GPUs
Joint WRF and MPAS Users' Workshop 2019 NCAR, Boulder, USA
Lower is Better ORNL Summit node: 2 x P9 + 6 x V100 OpenACC, PGI 19.1 Based on NCAR WRF 3.7.1 WRF model configuration: Total 3.7B cells Thompson MP RRTM / Dudhia YSU PBL Revised MM5+TDS4
Full model + radiation Full model Full model + radiation Full model
Multiscale Coupled Urban Systems – PI, C. Catlett
http://www2.mmm.ucar.edu/wrf/users/workshops/WS2019/oral_presentations/3.3.pdf
12
COSMO-2E (2 KM)
4 per day, 5 day forecast
COSMO-1E (1 KM)
8 per day, 33 hr forecast
IFS from ECMWF
4 per day, 18km / 9km (?)
MeteoSwiss
COSMO NWP Configurations During 2020
With V100 GPUs
MeteoSwiss Operational COSMO NWP on GPUs
Ensemble 21 members Ensemble 11 members
MeteoSwiss Roadmap
New V100 system in 2019 New EPS configurations
- perational in 2020
New ICON-LAM in ~2022 (Pre-operational in 2020)
18 Nodes x 8 x V100 = 144 Total GPUs
13
Source: https://www.geosci-model-dev-discuss.net/gmd-2017-230/
Large Scale COSMO HPC Demonstration Using ~5000 GPUs
COSMO 1km Near-Global Atmosphere on GPUs
14
Source:
Strong Scaling to 4,888 x P100 GPUs Piz Daint
#6 Top500 25.3 PetaFLOPS 5320 x P100 GPUs
- Oliver Fuhrer, et al, MeteoSwiss
3.7km GPU 3.7km CPU
Higher Is Better
19km GPU 19km CPU 1.9km GPU .93km GPU
https://www.geosci-model-dev-discuss.net/gmd-2017-230/
1.9km GPU .93km GPU > 10x ~5x
COSMO 1km Near-Global Atmosphere on GPUs
15
NOAA FV3 GPU Strategy Includes OpenACC and GridTools
- From Presentation by Dr. Xi Chen, NOAA GFDL, PASC 19, June 2019, Zurich, CH
NOAA FV3 and GPU Developments (Chen – 2019)
[X. Chen, others] [C. Bretherton, O. Fuhrer] [W. Putman, others]
2012: Early GPU development by NASA GSFC GMAO
16
Loc Location - Da Date Organizations Mod
- del
el(s) Ha Hackathon Proje ject KIS KISTI (KR (KR) - Feb KIS KISTI MPAS Physics (W (WSM6) CA CAS (C (CN) - May CM CMA GRAPES PRM advec ection ETH Zu Zurich- Ju Jun MCH, MPI-M, CS CSCS IC ICON Physics, radia iation MIT IT - Ju Jun MIT IT, CliM CliMA CliM CliMA Ocea cean Subgrid id scale le LE LES Prin rinceton - Ju Jun NOAA GFDL FV3/UFS SWE min ini-app kern rnels ls, , UFS radiation package NERSC - Ju Jul DO DOE LB LBNL E3SM E3SM MMF (E (ECP CP) Met t Offic ice - Sep Met t Offic ice, STFC NEMOVAR, WW III III Min inia iapp (?) (?) ORNL - Oct ct DO DOE ORNL, L, SNL E3SM E3SM SCR CREAM (Kokkos)
2019 ORNL Hackathons and GPU Model Progress
https://www.olcf.ornl.gov/for-users/training/gpu-hackathons/
17
Loc Location - Da Date Organizations Mod
- del
el(s) Ha Hackathon Proje ject KIS KISTI (KR (KR) - Feb KIS KISTI MPAS Physics (W (WSM6) CA CAS (C (CN) - May CM CMA GRAPES PRM advec ection ETH Zu Zurich- Ju Jun MCH, MPI-M, CS CSCS IC ICON Physics, radia iation MIT IT - Ju Jun MIT IT, CliM CliMA CliM CliMA Ocea cean Subgrid id scale le LE LES Prin rinceton - Ju Jun NOAA GFDL FV3/UFS SWE min ini-app kern rnels ls, , UFS radiation package NERSC - Ju Jul DO DOE LB LBNL E3SM E3SM MMF (E (ECP CP) Met t Offic ice - Sep Met t Offic ice, STFC NEMOVAR, WW III III Min inia iapp (?) (?) ORNL - Oct ct DO DOE ORNL, L, SNL E3SM E3SM SCR CREAM (Kokkos)
2019 ORNL Hackathons and GPU Model Progress
https://www.olcf.ornl.gov/for-users/training/gpu-hackathons/
18
CliMA: New Climate Model Development
https://blogs.nvidia.com/blog/2019/07/17/clima-climate-model/
Global model Observations Super Parameterization
https://clima.caltech.edu/ Atmosphere Ocean
19
Pushing the Envelope in Ocean Modeling
- From Keynote Presentation by Dr. John Marshall, MIT, at Oxford AI Workshop, Sep 2019, Oxford, UK
Ocean LES Super Parameterization
CliMA: New Climate Model Development
https://clima.caltech.edu/
20
“Optimization of an OpenACC Weather Simulation Kernel”
- A. Gray, NVIDIA
30x Improvement from NVIDIA optimizations
- ver original MetO code
OpenACC GPU Development for LFRic Model
OpenACC collaboration with MetOffice and SFTC: LFRic model
GungHo-MV (matrix-vector operations) OpenACC kernel developed by MetOffice NVIDIA optimizations applied to the OpenACC kernel achieved 30x improvement Improved OpenACC code provided to STFC as ‘target’ for Psyclone auto-generation
https://www.openacc.org/blog/optimizing-openacc-weather-simulation-kernel
21
OpenACC collaboration with MetOffice and SFTC: LFRic model
GungHo-MV (matrix-vector operations) OpenACC kernel developed by MetOffice NVIDIA optimizations applied to the OpenACC kernel achieved 30x improvement Improved OpenACC code provided to STFC as ‘target’ for Psyclone auto-generation
OpenACC GPU Development for LFRic Model
https://www.openacc.org/blog/optimizing-openacc-weather-simulation-kernel
“Optimization of an OpenACC Weather Simulation Kernel”
- A. Gray, NVIDIA
30x Improvement from NVIDIA optimizations
- ver original MetO code
22
ESCAPE Development of Weather & Climate Dwarfs
Batched Legendre Transform (GEMM) variable length Batched 1D FFT variable length
NVIDIA-Developed Dwarf: Spectral Transform - Spherical Harmonics
ESCAPE = Energy efficient SCalable Algorithms for weather Prediction at Exascale
23
From “ECMWF Scalability Programme”
- Dr. Peter Bauer,
UM User Workshop, MetOffice, Exeter, UK 15 June 2018
- Single V100 GPU
improved SH dwarf by 14x vs. original
- Single V100 GPU
improved MPDATA dwarf 57x vs. orig
ECMWF IFS Dwarf Optimizations – Single-GPU
SH Dwarf = 14x Advection = 57x Dwarf
Unoptimized
24
ECMWF IFS SH Dwarf Optimization – Multi-GPU
From “ECMWF Scalability Programme”
- Dr. Peter Bauer,
UM User Workshop, MetOffice, Exeter, UK 15 June 2018
- Results of Spherical
Harmonics Dwarf on NVIDIA DGX System
- Additional 2.4x gain
from DGX-2 NVSwitch for 16 GPU systems 16 GPUs per single node
25
TOPICS
- ACHIEVEMENTS IN HPC AND AI
- NUMERICAL MODEL DEVELOPMENTS
- GPU UPDATE ON MPAS-A