Petascale Computational Fluid Dynamics with Python on GPUs F.D. - - PowerPoint PPT Presentation

petascale computational fluid dynamics with python on gpus
SMART_READER_LITE
LIVE PREVIEW

Petascale Computational Fluid Dynamics with Python on GPUs F.D. - - PowerPoint PPT Presentation

Petascale Computational Fluid Dynamics with Python on GPUs F.D. Witherden , P.E. Vincent Department of Aeronautics Imperial College London Introduction Computational fluid dynamics (CFD) is the bedrock of several high-tech industries.


slide-1
SLIDE 1

Petascale Computational Fluid Dynamics with Python on GPUs

F.D. Witherden, P.E. Vincent Department of Aeronautics Imperial College London

slide-2
SLIDE 2

Introduction

  • Computational fluid dynamics (CFD) is the bedrock of

several high-tech industries.

  • Desire amongst practitioners to perform unsteady, scale

resolving simulations, within the vicinity of complex geometries.

slide-3
SLIDE 3

Image courtesy of A.S. Ayer

slide-4
SLIDE 4

The Need for FLOP/s

  • From The Opportunities and Challenges of Exascale Computing, US DOE, fall 2010.
slide-5
SLIDE 5

RMAX != RPEAK

  • FLOP/s are great…
  • if you can get them.
  • Most commercial codes

struggle to get ~10% of peak on CPUs.

slide-6
SLIDE 6

PyFR

  • A high-order compressible Navier-

Stokes solver for unstructured grids.

  • Designed from the ground up to run on

NVIDIA GPUs.

  • Written entirely in Python!
slide-7
SLIDE 7

The Py in PyFR

  • Leverages PyCUDA and mpi4py.
  • Makes extensive use of run-time code generation.
  • All compute performed on device.
  • Overhead from the Python interpreter < 1%.
slide-8
SLIDE 8

The Py in PyFR

  • Leverages PyCUDA and mpi4py.
  • Makes extensive use of run-time code generation.
  • All compute performed on device.
  • Overhead from the Python interpreter < 1%.
slide-9
SLIDE 9

The FR in PyFR

  • Uses flux reconstruction (FR) approach;
  • can recover well-know schemes including nodal

Discontinuous Galerkin (DG) methods.

  • Lots of element-local structured compute.
slide-10
SLIDE 10

The FR in PyFR

  • Majority of operations are block-by-panel type matrix

multiplications:

  • where N ~ 105 and N ≫ (M, K).

C A B

N K M

slide-11
SLIDE 11

The FR in PyFR

  • In parallel only simple halo exchanges are required

between MPI ranks.

slide-12
SLIDE 12

The FR in PyFR

  • FR is a great fit for modern hardware.
  • Previous GTC talks have outlined the key tenants of an

efficient multi-GPU capable implementation:

  • GTC 2014 — PyFR: Technical Challenges of Bringing Next Generation Fluid

Dynamics to GPUs

  • GTC 2015 — GiMMiK: Generating Bespoke Matrix Multiplication Kernels
slide-13
SLIDE 13

PyFR Scaling

  • Evaluated on the Piz Daint cluster at CSCS.
  • Test case is a NACA 0021 aerofoil at a high angle of attack.

Animation courtesy

  • f J.S. Park
slide-14
SLIDE 14

PyFR Strong Scaling

% of Peak FLOP/s 20 40 60 80 100 K20X GPUs 50 100 200 400

slide-15
SLIDE 15

PyFR Weak Scaling

% of Peak FLOP/s 20 40 60 80 100 K20X GPUs 2 4 8 40 80 160 2000

1.31 PFLOP/s

slide-16
SLIDE 16

So The Solver Scales

  • There’s a lot more to a code than just the solver…
  • and it all needs to scale.
slide-17
SLIDE 17

Traditional Visualisation

  • Traditional visualisation pipeline with PyFR:
slide-18
SLIDE 18

Traditional Visualisation

  • Traditional visualisation pipeline with PyFR:
slide-19
SLIDE 19

Traditional Visualisation

  • Disk I/O…
  • like device↔host transfers only

slower

  • …much slower!

Bandwidth MiB/s 1400 2800 4200 5600 7000 Device↔host Disk

slide-20
SLIDE 20

In-situ Visualisation

  • Cut out the middle men…
slide-21
SLIDE 21

In-situ Visualisation

  • Cut out the middle men…
  • Using ParaView Catalyst it is possible to avoid disk I/O…
slide-22
SLIDE 22

In-situ Visualisation

  • Pipeline with Catalyst…
  • majority of processing performed on the host with VTK.

Solution Triangle list

slide-23
SLIDE 23

In-situ Visualisation

  • Can we do better?
  • Yes!
slide-24
SLIDE 24
  • Interface with PyFR using the plugin infrastructure.

In-situ Visualisation

C++ shared library CUDA pointer PyFR plugin

slide-25
SLIDE 25

In-situ Visualisation

  • Pipeline with Catalyst and VTK-m…
  • all compute performed on the device.

Solution Triangle list

slide-26
SLIDE 26

In-situ Visualisation

  • Pipeline with Catalyst and VTK-m…
  • all compute performed on the device.

Solution Triangle list

slide-27
SLIDE 27

In-situ Visualisation

  • Kitware
  • Utkarsh Ayachit
  • T.J. Corona
  • David DeMarle
  • Berk Geveci
  • Robert Maynard
  • Robert O’Bara
  • Patrick O’Leary
  • NVIDIA
  • Bhushan Desam
  • Tom Fogal
  • Peter Messmer
  • Jeremy Purches
  • Imperial College
  • Arvind Iyer
  • Jin Seok Park
  • Brian Vermeire
  • ORNL
  • Jack Wells
  • Zenotech
  • Mark Allan
  • Jamil Appa
  • Andrei Cimpoeru
  • David Standingford
slide-28
SLIDE 28

In-situ Visualisation

Animation courtesy of A.S. Ayer

slide-29
SLIDE 29

In-situ Visualisation

Animation courtesy of A.S. Ayer

slide-30
SLIDE 30

Summary

  • Funded and supported by
  • Any questions?
  • E-mail: freddie.witherden08@imperial.ac.uk
  • Website: http://pyfr.org