SLIDE 1 MBE --- A GPU-based, fast, robust and precise solver for chemical ODEs
GTC 2016
San Jose, USA, 04-07 Apr. 2016
Fan Feng, Zifa Wang
NVIDIA Technology Center
SLIDE 2 Contents
- Motivation of MBE solver
- Introduction of MBE solver
- Parallelization and GPU implementation
- CPU vs. GPU numerical results
- Performance of the CPU vs. GPU
- Conclusions
vs.
SLIDE 3 Air Quality Simulations
Motivation of MBE solver
Statistic model Kinetic model
- perator splitting techniques
Reaction-Diffusion-Advection PDEs chemical reactions (ODEs), diffusion , advection , etc.
SLIDE 4 Motivation of MBE solver
Institute of Atmospheric Physics, Chinese Academy of Sciences (IAP/CAS) Nested Air Quality Prediction Modeling System (NAQPMS)
Old Chemical Solver--- LSODE
- Slow (>70% NAQPMS time for chemical
ODEs)
- Simulation errors (e.g. computation may fail
because of unsuccessful iteration procedure in LSODE)
Need a faster, robust and more precise solver
SLIDE 5
Chemical equations:
where
( m species )
:
Loss rate Production rate
Introduction of MBE solver
SLIDE 6
Conquer stiffness Maintain nonnegativity &
algorithm
Numerical difficulty:
Introduction of MBE solver
SLIDE 7
Modified-Backward-Euler Method (MBE):
Introduction of MBE solver
SLIDE 8
where
( 67 species )
:
CBM-Z : a set of given chemical ODEs including 67 species
67 (Given functions)
Introduction of MBE solver
SLIDE 9
Introduction of MBE solver
Species of CBM-Z:
SLIDE 10
Introduction of MBE solver
MBE --- A fast, robust and precise solver for chemical ODEs
SLIDE 11
Parallelization and GPU Implementation
Spatial discretization each thread each spatial point nx ny nz Total spatial point = nx ● ny ● nz
SLIDE 12
MBE
Almost the same amount of calculation for each spatial point
No iteration in MBE Load balance MBE --- A GPU-based solver
Parallelization and GPU Implementation
SLIDE 13
Parallelization and GPU Implementation
256 threads per block
GPU Implemetation
Total number of blocks = 𝑜𝑦 ∗ 𝑜𝑧 ∗ 𝑜𝑨 + 256 − 1
256
SLIDE 14
Validation Check of GPU Implementation
X: Time (Hours) Y: Concentration of the Species (PPB)
O3
CPU GPU
Two lines almost coincide with each other
CPU vs. GPU numerical results
SLIDE 15
NO NO2 SO2 H2O2
CPU vs. GPU numerical results
SLIDE 16 CO O( D)
1
O( P)
3
H2SO4
CPU vs. GPU numerical results
SLIDE 17 Nodes Num. Run Time (Sec) Speedup(X)
CPU1
Intel(R) Xeon E5- 2690 @ 3.0 GHz
473600 24295.7
473600 375.9 64.6
K802
473600 376.5 64.5
1: In the test, only one core is used. We did not parallelize the CPU code. 2: K80 has two GPU chips, and only one chip is used in this test.
Performance of the CPU vs. GPU
vs.
SLIDE 18
- MBE is a GPU-based, fast, robust and precise solver for
chemical ODEs
- The GPU implementation of MBE is of high accuracy and
computational efficiency
– The numerical results of GPU code are nearly the same as CPU code – On K40, 64x speedup against CPU code – The same speedup is also achieved with one single K80 chip – We expect to double the performance on K80 if the two chips are used.
- Better performance is expected with further optimization
Conclusions
SLIDE 19