Creative Commons: Nasa Goddard Space Flight Centre, 2010
Michel Müller
Research Assistant, Aoki Laboratory michel@sim.gsic.titech.ac.jp
Supervised by
- Prof. Dr. Takayuki Aoki
Tokyo Institute of Technology
Unified CPU+GPU Programming for the Production - - PowerPoint PPT Presentation
Unified CPU+GPU Programming for the Production Weather Model ASUCA Michel Mller Research Assistant, Aoki Laboratory michel@sim.gsic.titech.ac.jp Supervised by Prof. Dr. Takayuki Aoki Tokyo Institute of Technology Creative
Creative Commons: Nasa Goddard Space Flight Centre, 2010
Michel Müller
Research Assistant, Aoki Laboratory michel@sim.gsic.titech.ac.jp
Supervised by
Tokyo Institute of Technology
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
GPU unfriendly storage order
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
python'3 python1
F90'Fortran
h90'Fortran'source' +'direc6ves
xml'Callgraph'+' parsed'direc6ves xml'Callgraph'+'parsed' direc6ves'+'loop'analysis'
executable
make python'2
F90'Fortran F90'Fortran
hybrid'file python'program GNU'Make
legend
file'with'CPU+'GPU' buildtools/Makefile MakeSeIngs user'defined storage_order.F90
input [projectNdir]/Makefile file$with$CPU+$GPU$ version
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
python'3 python1
F90'Fortran
h90'Fortran'source' +'direc6ves
xml'Callgraph'+' parsed'direc6ves xml'Callgraph'+'parsed' direc6ves'+'loop'analysis'
executable
make python'2
F90'Fortran F90'Fortran
hybrid'file python'program GNU'Make
legend
file'with'CPU+'GPU' buildtools/Makefile MakeSeIngs user'defined storage_order.F90
input [projectNdir]/Makefile file$with$CPU+$GPU$ version
calculate_all_columns
sum_column
calculate_all_columns
sum_column
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
python'3 python1
F90'Fortran
h90'Fortran'source' +'direc6ves
xml'Callgraph'+' parsed'direc6ves xml'Callgraph'+'parsed' direc6ves'+'loop'analysis'
executable
make python'2
F90'Fortran F90'Fortran
hybrid'file python'program GNU'Make
legend
file'with'CPU+'GPU' buildtools/Makefile MakeSeIngs user'defined storage_order.F90
input [projectNdir]/Makefile file$with$CPU+$GPU$ version
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Mid 2014
diag$long
physics$ long
physics$ diag$long
Max/Min/ Ave
rungekutta$ long
diag$adjust$ long
physics$ adjust$long$
physics$rk$ long
dynamics$rk$ long
sediment
diagnose$rk$ short
dynamics$rk$ short
monitflux
radiation
convection pbl/surface
microphys.
Ported
Not$ ported
Tests%passed: Rad$on$CPU,$KIJ$Order Rad$on$CPU,$IJK$Order$ Gabls3$on$CPU,$KIJ$Order$ Gabls3$on$CPU,$IJK$Order Warmbubble$on$CPU,$KIJ$Order$ Rad$on$GPU,$KIJ$Order Rad$on$GPU,$IJK$Order$ Gabls3$on$GPU,$KIJ$Order$ Gabls3$on$GPU,$IJK$Order Warmbubble$on$GPU,$KIJ$Order
RKshort dtshort
makegrid_ideal
ideal makegrid prep
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Now
diag$long
physics$ long
physics$ diag$long
Max/Min/ Ave
rungekutta$ long
diag$adjust$ long
physics$ adjust$long$
physics$rk$ long
dynamics$rk$ long
sediment
diagnose$rk$ short
dynamics$rk$ short
monitflux
radiation
convection pbl/surface
microphys.
RKshort dtshort
makegrid_ideal
ideal makegrid prep
Ported
Not$ ported
Tests%passed: Rad$on$CPU,$KIJ$Order Rad$on$CPU,$IJK$Order$ Gabls3$on$CPU,$KIJ$Order$ Gabls3$on$CPU,$IJK$Order Warmbubble$on$CPU,$KIJ$Order$ Warmbubble$on$CPU,$IJK$Order Rad$on$GPU,$KIJ$Order Rad$on$GPU,$IJK$Order$ Gabls3$on$GPU,$KIJ$Order$ Gabls3$on$GPU,$IJK$Order Warmbubble$on$GPU,$KIJ$Order$ Warmbubble$on$GPU,$IJK$Order
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Hybrid Asuca Dynamics
OpenMP ASUCA Dynamics OpenACC ASUCA Dynamics
Hybrid Asuca Physics
OpenMP ASUCA Physics CUDA Fortran ASUCA Physics
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Hybrid Asuca Dynamics
OpenMP ASUCA Dynamics OpenACC ASUCA Dynamics
nRMS < 1E-9
112 Kernels ~10k LOC OpenACC
advection HEVI diagnose rayleigh damping
Hybrid Asuca Physics
OpenMP ASUCA Physics CUDA Fortran ASUCA Physics
nRMS < 1E-9
121 Kernels ~21k LOC
Performance compared to Reference Code
CUDA Fortran
Shortwave Radiation Longwave Radiation Planetary Boundary Layer surface
kernel(s) kernel inside of kernel not affected by kernel
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
# kernels analytical validation
validation directive priv.- isation hybrid ||-isation reduction stencil access halo direct kernel call array declared in kernel scalar param. from device arr. multi kernel routine strides array access. func. || region in branch early return impl. scheme local module data foreign module data pointer swap
getting started 3
✓ ✓ ✓ ✓
5D vector 1
✓ ✓ ✓
simple stencil 1
✓ ✓ ✓ ✓
stencil w/ local array 1
✓ ✓ ✓ ✓ ✓
scalar passed in 1
✓ ✓ ✓ ✓ ✓
multi kernel routines 4
✓ ✓ ✓ ✓
strides 2
✓ ✓ ✓ ✓
accessor functions 1
✓ ✓ ✓ ✓
II branches 2
✓ ✓ ✓ ✓
early returns 3
✓ ✓ ✓ ✓ ✓
schemes 4
✓ ✓ ✓ ✓ ✓
module data 10
✓ ✓ ✓ ✓
3D diffusion 4
✓ ✓ ✓
particle push 1
✓ ✓
midaco solver 1
✓ ✓ ✓
poisson FEM solver 2
✓ ✓ ✓ ✓
example feature
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
OpenACC manual conversion manual conversion 1 directive per data object and routine kernel / host code in same routine reduced to single directive per kernel
coarse grained parallelism
GPU unfriendly storage order
data movement to/from device memory
separation device/host code
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
OpenACC Hybrid Fortran Now manual conversion directive based conversion manual conversion directive based conversion 1 directive per data object and routine 1 directive per data object and routine kernel / host code in same routine reduced to single directive per kernel reduced to single directive per kernel
coarse grained parallelism
GPU unfriendly storage order
data movement to/from device memory
separation device/host code
kernels / host code must reside in separate routines
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
OpenACC Hybrid Fortran Now manual conversion directive based conversion manual conversion directive based conversion 1 directive per data object and routine 1 directive per data object and routine kernel / host code in same routine reduced to single directive per kernel reduced to single directive per kernel
coarse grained parallelism
GPU unfriendly storage order
data movement to/from device memory
separation device/host code
kernel / host code in same routine
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
OpenACC Hybrid Fortran Now Hybrid Fortran 2017 manual conversion directive based conversion automatic conversion, centralized config manual conversion directive based conversion automatic conversion, centralized config 1 directive per data object and routine 1 directive per data object and routine 1 directive per data region kernel / host code in same routine kernel / host code in same routine reduced to single directive per kernel reduced to single directive per kernel reduced to single directive per kernel
coarse grained parallelism
GPU unfriendly storage order
data movement to/from device memory
separation device/host code
kernel / host code in same routine
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
Motivation Hybrid Fortran Results Outlook ➤ ➤ ➤
michel@sim.gsic.titech.ac.jp