Practical Combustion Kinetics with CUDA GPU Technology Conference - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides & Matthew McNenly Session S5468 LLNL-PRES-668639 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Collaborators § Cummins Inc. § Convergent Science § NVIDIA § Indiana University Good guys to work with. Lawrence Livermore National Laboratory 2 LLNL-PRES-668639

Does plus ? equal The big question. Lawrence Livermore National Laboratory 3 LLNL-PRES-668639

Lots of smaller questions: ? ? • What has already been done in this area? • How are we approaching the problem? • What have we accomplished? • What’s left to do? There won’t be a quiz at the end. Lawrence Livermore National Laboratory 4 LLNL-PRES-668639

? NVIDIA GPUs/CUDA Toolkit Why? Data from NVIDIA’s, CUDA C Programming Guide Version 6.0 , 2014. � More FLOP/s, More GB/s, Faster Growth in Both. Lawrence Livermore National Laboratory 5 LLNL-PRES-668639

? • Reacting flow simulation • Computational Fluid Dynamics (CFD) • Detailed chemical kinetics • Tracking 10-1000’s of species • ConvergeCFD (internal combustion engines) Approach also used to simulate gas turbines, burners, flames, etc. Lawrence Livermore National Laboratory 6 LLNL-PRES-668639

What has been done already in combustion kinetics on GPU’s? Recent review by Niemeyer & Sung [1]: • Spafford, Sankaran & co-workers (ORNL) (first published 2010) • Shi, Green & co-workers (MIT) • Stone (CS&E LLC) • Niemeyer & Sung (CWRU/OSU, UConn) Most approaches use explicit or semi-implicit Runge-Kutta techniques Some only use GPU for derivative calculation From [1]: “Furthermore, no practical demonstration of a GPU chemistry solver capable of handling stiff chemistry has yet been made. This is one area where efforts need to be focused.” [1] K.E. Niemeyer, C.-J. Sung, Recent progress and challenges in exploiting graphics processors in computational fluid dynamics, J Supercomput. 67 (2014) 528–564. doi:10.1007/s11227-013-1015-7. A few groups working (publicly) on this. Some progress has been made. Lawrence Livermore National Laboratory 7 LLNL-PRES-668639

Problem: Can’t directly port CPU chemistry algorithms to GPU § GPUs need dense data and lots of it. § Large chemical mechanisms are sparse. § Small chemical mechanisms don’t have enough data. (even large mechanisms aren’t large in GPU context ) Solution: Re-frame many uncoupled reactor calculations into a single system of coupled reactors. For chemistry it’s not as simple as adding new hardware. Lawrence Livermore National Laboratory 8 LLNL-PRES-668639

How do we solve chemistry on the CPU? Temperature Y O2 Example: Engine Simulation in Converge CFD Lawrence Livermore National Laboratory 9 LLNL-PRES-668639

How do we solve chemistry on the CPU? Temperature Y O2 Example: Engine Simulation in Converge CFD Lawrence Livermore National Laboratory 10 LLNL-PRES-668639

Detailed Chemistry in Reacting Flow CFD: Operator Splitting Technique: Solve independent Initial Value Problem in each cell (or zone) to calculate chemical source terms for species and energy advection/diffusion equations. Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 11 LLNL-PRES-668639

Detailed Chemistry in Reacting Flow CFD: Operator Splitting Technique: Solve independent Initial Value Problem in each cell (or zone) to calculate chemical source terms for species and energy advection/diffusion equations. Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 12 LLNL-PRES-668639

Detailed Chemistry in Reacting Flow CFD: Operator Splitting Technique: Solve independent Initial Value Problem in each cell (or zone) to calculate chemical source terms for species and energy advection/diffusion equations. t+ ∆ t t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 13 LLNL-PRES-668639

CPU (un-coupled) chemistry integration t t+ ∆ t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 14 LLNL-PRES-668639

GPU (coupled) chemistry integration t t+ ∆ t For the GPU we solve chemistry simultaneously in large groups of cells. Lawrence Livermore National Laboratory 15 LLNL-PRES-668639

What about variations in practical engine CFD? vs. If the systems are not similar how much extra work needs to be done? Lawrence Livermore National Laboratory 16 LLNL-PRES-668639

What are the equations we’re trying to solve? Jacobian Matrix Solution A L U Derivative Equations (vector calculations) = ¡ * ¡ dense dy i dt = w i dC i dt ρ species dT dt = − RT dC i = ¡ sparse ∑ * ¡ u i ρ c v dt i Derivative represents system • Matrix solution required due to stiffness of equations to be solved • Matrix storage in dense or sparse formats (perfectly stirred reactor). Significant effort to transform fastest CPU algorithms to GPU appropriate versions. Lawrence Livermore National Laboratory 17 LLNL-PRES-668639

We want to solve many of these simultaneously Not as easy as copy and paste. Lawrence Livermore National Laboratory 18 LLNL-PRES-668639

Example: Species production rates Net rates of production destroy create dC i ∑ ∑ R j R j dt = − j j Chemical reaction rates of progress species ν ij ∏ R i = k i C j j Chemical reaction step rate coefficients Equilibrium Reverse Rates Arrhenius Rates   0 0 prod reac − E A , i G j G j ∑ ∑ k i = k i , f K eq = k i , f exp   RT − k i = A i T n i e RT   RT   j j Third-body enhanced Rates Fall-off rates species ∑ k i = k i ′ α j C j k i = k i ... ′ j Major component of derivative; Lots of sparse operations. Lawrence Livermore National Laboratory 19 LLNL-PRES-668639

Example: Species production rates Net rates of production destroy create dC i ∑ ∑ R j R j dt = − • Chemical species connectivity j j • Generally sparsely connected Chemical reaction rates of progress • Leads to poor memory locality species • Bad for GPU performance ν ij ∏ R i = k i C j j Chemical reaction step rate coefficients Equilibrium Reverse Rates Arrhenius Rates   0 0 prod reac − E A , i G j G j ∑ ∑ k i = k i , f K eq = k i , f exp   RT − k i = A i T n i e RT   RT   j j Third-body enhanced Rates Fall-off rates species ∑ k i = k i ′ α j C j k i = k i ... ′ j Major component of derivative; Lots of sparse operations. Lawrence Livermore National Laboratory 20 LLNL-PRES-668639

Example: Species production rates Each column is data for single reactor (cell). Each row is data element for all reactors. data now arranged for coalesced access Approach: couple together reactors (or cells) and make smart use of GPU memory. Lawrence Livermore National Laboratory 21 LLNL-PRES-668639

Benchmarking Platforms: § Big Red 2 • AMD Opteron Interlagos (16 core) • 1x-Tesla K20 § (not pictured) Surface • Intel Xeon E5-2670 (16 core) • 2x-Tesla K40m CPU and GPU Used Both Matter Lawrence Livermore National Laboratory 22 LLNL-PRES-668639

Big Red 2 2048 1024 512 dC i 256 dt 128 simultaneous net production rate calculations Significant speedup achieved for species production rates. Lawrence Livermore National Laboratory 23 LLNL-PRES-668639

Surface 2048 1024 512 256 dC i dt 128 simultaneous net production rate calculations Less speedup than Big Red 2 because the CPU is faster. Lawrence Livermore National Laboratory 24 LLNL-PRES-668639

We have implemented or borrowed algorithms for the rest of the chemistry integration. Jacobian Matrix Solution A L U Derivative Equations (vector calculations) = ¡ * ¡ dense dy i dt = w i dC i dt ρ species dT dt = − RT dC i = ¡ sparse ∑ * ¡ u i ρ c v dt i Need to put the rest of the calculations on the GPU. Lawrence Livermore National Laboratory 25 LLNL-PRES-668639

We have implemented or borrowed algorithms for the rest of the chemistry integration. Jacobian Matrix Solution A L U Derivative Equations (vector calculations) = ¡ * ¡ dense dy i dt = w i dC i dt ρ Apart from dC i /dt, derivative is species dT dt = − RT dC i straightforward on GPU. = ¡ sparse ∑ * ¡ u i ρ c v dt i Need to put the rest of the calculations on the GPU. Lawrence Livermore National Laboratory 26 LLNL-PRES-668639

We have implemented or borrowed algorithms for the rest of the chemistry integration. Jacobian Matrix Solution A L U Derivative Equations (vector calculations) = ¡ * ¡ dense dy i dt = w i dC i • We are able to use NVIDIA dt ρ Apart from dC i /dt, developed algorithms to perform derivative is matrix operations on GPU. species dT dt = − RT dC i straightforward on GPU. = ¡ sparse ∑ * ¡ u i ρ c v dt i Need to put the rest of the calculations on the GPU. Lawrence Livermore National Laboratory 27 LLNL-PRES-668639

Practical Combustion Kinetics with CUDA GPU Technology Conference - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides & Matthew McNenly

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Modeling the Dynamics and the Dynamics and Modeling Kinetics of Gaseous Pollutants and Kinetics

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Reaction Kinetics Irreversible reaction is one in which the reactant(s) proceed to

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

ELEVATED TEMPERATURE TESTING AND VALIDATION OF POLYETHYLENE PIPING MATERIALS TOM WALSH PLASTIC

Overview of Organic Chemistry Presentation The purpose of this project is for you and your

Dr Nicholas Leadbeater NEW SYNTHETIC METHODS GROUP broadening the scope of preparative chemistry

Stepping Up Night: Transitioning from 7 th to 8 th -grade -- Ernest Everett Just Middle School

On-Line THM Analyser July 2012 Global THM-100 Installations 2 GLOBAL THM SPECIATION

A Survey on Analog Models of Computation Amaury Pouly Joint work with Olivier Bournez

Batch Modeling and Process Monitoring Geir Rune Flten Agenda CAMO Batch analysis

Preliminary full year/Fourth quarter presentation 2019 February 11, 2020 Agenda Highlights

Practical Combustion Kinetics with CUDA GPU Technology Conference - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides & Matthew McNenly

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

Kinetic &amp; Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Modeling the Dynamics and the Dynamics and Modeling Kinetics of Gaseous Pollutants and Kinetics

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Reaction Kinetics Irreversible reaction is one in which the reactant(s) proceed to

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

ELEVATED TEMPERATURE TESTING AND VALIDATION OF POLYETHYLENE PIPING MATERIALS TOM WALSH PLASTIC

Overview of Organic Chemistry Presentation The purpose of this project is for you and your

Dr Nicholas Leadbeater NEW SYNTHETIC METHODS GROUP broadening the scope of preparative chemistry

Stepping Up Night: Transitioning from 7 th to 8 th -grade -- Ernest Everett Just Middle School

On-Line THM Analyser July 2012 Global THM-100 Installations 2 GLOBAL THM SPECIATION

A Survey on Analog Models of Computation Amaury Pouly Joint work with Olivier Bournez

Batch Modeling and Process Monitoring Geir Rune Flten Agenda CAMO Batch analysis

Preliminary full year/Fourth quarter presentation 2019 February 11, 2020 Agenda Highlights

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics