Burning on the GPU: Fast and Accurate Chemical Kinetics GPU - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference April 7, 2016 Russell Whitesides Session 6195 LLNL-PRES-687782 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

+ Why? To make it go faster? Lawrence Livermore National Laboratory 2 LLNL-PRES-687782

Why? • Transportation efficiency • Chemistry is vital to predictive simulations • Chemistry can be > 90% of simulation time. We burn a lot of gasoline. Lawrence Livermore National Laboratory 3 LLNL-PRES-687782

Why? Supercomputing @ DOE labs: Strong investment in GPUs with eye towards exascale OEM engine designers: Require fast turnaround with desktop class hardware National lab compute power and industry need. Lawrence Livermore National Laboratory 4 LLNL-PRES-687782

“Typical” engine simulation w/ detailed chemistry Temperature Y O2 “Colorful Fluid Dynamics” Lawrence Livermore National Laboratory 5 LLNL-PRES-687782

Detailed Chemistry in Reacting Flow CFD: Operator Splitting Technique: Solve independent set of ordinary differential equations (ODEs) in each cell to calculate chemical source terms for species and energy advection/diffusion equations. t t+ ∆ t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 6 LLNL-PRES-687782

CPU (un-coupled) chemistry integration t t+ ∆ t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 7 LLNL-PRES-687782

GPU (batched) chemistry integration t t+ ∆ t On the GPU we solve chemistry in batches of cells simultaneously. Lawrence Livermore National Laboratory 8 LLNL-PRES-687782

Previously at GTC: See also Whitesides & McNenly, GTC 2015; McNenly & Whitesides, GTC 2014 Lawrence Livermore National Laboratory 9 LLNL-PRES-687782

Note: most CFD simulations are done on distributed memory systems rank0 CPU rank4 CPU rank1 rank5 CPU CPU rank2 rank6 CPU CPU rank3 rank7 CPU CPU n_gpu = 0; Lawrence Livermore National Laboratory 10 LLNL-PRES-687782

Note: most CFD simulations are done on distributed memory systems rank0 CPU rank4 CPU rank1 rank5 CPU CPU rank2 rank6 CPU CPU rank3 rank7 CPU CPU ++n_gpu; //now what? Lawrence Livermore National Laboratory 11 LLNL-PRES-687782

Ideal CPU-GPU Work-sharing S GPU = walltime ( CPU ) walltime ( GPU ) Here CPU is a single core. Lawrence Livermore National Laboratory 12 LLNL-PRES-687782

Ideal CPU-GPU Work-sharing 8 S GPU = 8 N CPU =4 7 S GPU = walltime ( CPU ) 6 walltime ( GPU ) 5 N CPU =8 S total 4 ( ) ( ) S total = N CPU + N GPU S GPU − 1 N CPU =16 3 2 * N CPU * N CPU =32 1 1 2 3 4 N GPU § # CPU cores = N CPU * TITAN (1.4375) § # GPU devices = N GPU * surface (1.8750) Let’s make use of the whole machine. Lawrence Livermore National Laboratory 13 LLNL-PRES-687782

Good performance in simple case with both CPU and GPU doing work 10000 Chemistry Time (seconds) CPU Chemistry 1000 GPU Chemistry (std work sharing) 100 1 2 4 8 16 Number of Processors Distribute based on number of cells and give more to GPU. Lawrence Livermore National Laboratory 14 LLNL-PRES-687782

Good performance in simple case with both CPU and GPU doing work 10000 Chemistry Time (seconds) CPU Chemistry 1000 GPU Chemistry (std work sharing) GPU Chemistry (custom work sharing) 100 1 2 4 8 16 Number of Processors S total = 1.7 S GPU = 7 ( S GPU = 6.6) Distribute based on number of cells and give more to GPU. Lawrence Livermore National Laboratory 15 LLNL-PRES-687782

First attempt @ engine calculation on GPU+CPU Let’s go! Lawrence Livermore National Laboratory 16 LLNL-PRES-687782

First attempt @ engine calculation on GPU+CPU 21.2 hours § 2x Xeon E5-2670 (16 cores) => 17.6 hours § 2x Xeon E5-2670 + 2 Tesla K40m => ( S GPU = 2.6) § S total = 21.2/17.6 = 1.20 What happened? Lawrence Livermore National Laboratory 17 LLNL-PRES-687782

Integrator performance when doing batch solution vs. If the systems are not similar how much extra work needs to be done? Lawrence Livermore National Laboratory 18 LLNL-PRES-687782

Batches of dissimilar reactors will suffer from excessive extra steps What penalty do we pay when batching? Lawrence Livermore National Laboratory 19 LLNL-PRES-687782

Batches of dissimilar reactors will suffer from excessive extra steps What penalty do we pay when batching? Lawrence Livermore National Laboratory 20 LLNL-PRES-687782

Batches of dissimilar reactors will suffer from excessive extra steps Possibly a lot of extra steps. Lawrence Livermore National Laboratory 21 LLNL-PRES-687782

Sort reactors by how many steps they took to solve on the last CFD step n_steps >100 1 batch3 batch2 batch1 batch0 Easy as pie? Lawrence Livermore National Laboratory 22 LLNL-PRES-687782

Have to manage the sorting and load- balancing in distributed memory system rank4 rank0 rank5 rank1 rank2 rank6 rank7 rank3 Not so fast. Lawrence Livermore National Laboratory 23 LLNL-PRES-687782

Load balance based on expected cost and expected performance. rank4 rank0 rank5 rank1 rank6 rank2 rank7 rank3 MPI communication to re-balance for chemistry. Lawrence Livermore National Laboratory 24 LLNL-PRES-687782

Second attempt @ engine calculation on GPU+CPU Let’s go again! Lawrence Livermore National Laboratory 25 LLNL-PRES-687782

Total steps significantly reduced by batching appropriately How much does difference does it make? Lawrence Livermore National Laboratory 26 LLNL-PRES-687782

Engine results with improved work- sharing and reactor sorting 13.0 hrs 9.1 hrs S total =1.7 7.6 hrs S GPU =6.6 ~40 % reduction in chemistry time; ~36% reduction in overall time J Lawrence Livermore National Laboratory 27 LLNL-PRES-687782

Future directions § Improve S GPU • Derivative kernels • Matrix operations § Extrapolative integration methods • Less “startup” cost when re-initializing • Potentially well suited for GPU § Non-chemistry calc’s on GPU • Multi-species transport • Particle spray Possibilities for significant further improvements. Lawrence Livermore National Laboratory 28 LLNL-PRES-687782

Summary § Much improved CFD chemistry work- sharing with GPU § ~40% reduction in chemistry time for real engine case (~36% total time) § Working on further improvement + Thank you! Lawrence Livermore National Laboratory 29 LLNL-PRES-687782

Burning on the GPU: Fast and Accurate Chemical Kinetics GPU - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference April 7, 2016 Russell Whitesides Session

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

Foundations of Chemical Kinetics Lecture 24: Relationship between the chemical master equation

Foundations of Chemical Kinetics Lecture 23: The chemical master equation: Stationary

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Chemical Chemical Turbulent Fire Engines Laminar Laminar Detonations Soot Coal Kinetics

Differential equations for chemical kinetics Atmospheric Modelling course Sampo Smolander

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

A First Course on Kinetics and Reaction Engineering Class 15 on Unit 15 Where Were Going

A First Course on Kinetics and Reaction Engineering Class 12 on Unit 12 Where Were Going

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

Law and Activation Energy (2020/05/08 revised) Collect: 50 mL Erlenmeyer flask (10): wash

Accelerating the Computation of Detailed Chemical Reaction Kinetics for Simulating Combustion of

ENVIRONMENTAL PROTECTION Ljubljana, June 27 th - July 17 th , 2016 Ljubljana, European Green

Investor Su r Summary ry July 2 ly 2015 www.cleancoaltechnologiesinc.com (OTC: CCTC) Safe

What is special about autocatalysis? Peter Schuster Institut fr Theoretische Chemie,

Digital Analysis of Reactive Systems Introduction DARS is a complex chemical reaction analysis

Extent- -based Incremental Identification based Incremental Identification Extent of Reaction

Thermally oxidized SiO 2 formation on 4H-SiC substrate by considering the 4H SiC substrate by

Burning on the GPU: Fast and Accurate Chemical Kinetics GPU - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference April 7, 2016 Russell Whitesides Session

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

Foundations of Chemical Kinetics Lecture 24: Relationship between the chemical master equation

Foundations of Chemical Kinetics Lecture 23: The chemical master equation: Stationary

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Chemical Equations and Chemical Reactions Symbols Used in Chemical Equations Chemical Equations

Kinetic &amp; Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Chemical Chemical Turbulent Fire Engines Laminar Laminar Detonations Soot Coal Kinetics

Differential equations for chemical kinetics Atmospheric Modelling course Sampo Smolander

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

A First Course on Kinetics and Reaction Engineering Class 15 on Unit 15 Where Were Going

A First Course on Kinetics and Reaction Engineering Class 12 on Unit 12 Where Were Going

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

Law and Activation Energy (2020/05/08 revised) Collect: 50 mL Erlenmeyer flask (10): wash

Accelerating the Computation of Detailed Chemical Reaction Kinetics for Simulating Combustion of

ENVIRONMENTAL PROTECTION Ljubljana, June 27 th - July 17 th , 2016 Ljubljana, European Green

Investor Su r Summary ry July 2 ly 2015 www.cleancoaltechnologiesinc.com (OTC: CCTC) Safe

What is special about autocatalysis? Peter Schuster Institut fr Theoretische Chemie,

Digital Analysis of Reactive Systems Introduction DARS is a complex chemical reaction analysis

Extent- -based Incremental Identification based Incremental Identification Extent of Reaction

Thermally oxidized SiO 2 formation on 4H-SiC substrate by considering the 4H SiC substrate by

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics