Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation - PowerPoint PPT Presentation

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University Computational Science & Engineering, LLC GPU Technology Conference 2014

2 Outline  Combustion Kinetics: Operator Splitting  Stiff ODE Integration: Algorithms  CUDA Implementations  Benchmarks: VODE vs. RKx  Performance Analysis  Summary Computational Science & Engineering, LLC GPU Technology Conference 2014

3 Combustion Kinetics  Must solve Navier-Stokes and species conservation equations: massive PDE system.  This coupled PDE system is too expensive to solve when we have finite-rate kinetics, large mechanisms, and fine meshes.  Instead, decouple reaction terms spatially and solve them over the CFD time- step. …. Operator Splitting Computational Science & Engineering, LLC GPU Technology Conference 2014

4 Combustion Kinetics  Integrate convection-diffusion and reaction components separately with CFD time-step: still 2 nd -order time accurate  Valid when reaction time-scales are much faster than convection-diffusion time-scales.  Transforms massive PDE into thousands or millions of independent ODE systems that can be solved by a stiff solver in parallel. N s +1 coupled ODE system Non-linear RHS Function Computational Science & Engineering, LLC GPU Technology Conference 2014

5 Combustion Kinetics  ODEs integration still expensive due to 1. Numerical stiffness 2. Costly RHS reaction function – scales as ~ N s 3. Jacobian matrix factorization – scales as ~ ( N s ) 3  Targeting “modest” mechanisms: 10’s < N s < 100’s  Embarrassingly parallel on host assuming thousands of grid points per core  But must be careful with variable workload Each ODE is unique: different initial conditions. 1. Adaptive step-size: different number of steps per ODE. 2. Computational Science & Engineering, LLC GPU Technology Conference 2014

6 Keys to GPU Performance  Stream data from global memory: optimal bandwidth if threads within a warp read contiguous data from global memory  Oversubscribe threads to hide latency: run many warps per SM to hide slow global memory  Warp-level vector processing: optimal throughput if each thread within a warp executes the same instruction on different data (SIMD) Must use all 3 for best performance. SIMD processing challenging in ODE setting. Computational Science & Engineering, LLC GPU Technology Conference 2014

7 Mapping ODEs to the GPU A. ODE solver logic runs on device 1. One-thread-per-ODE: 10,000+ ODEs concurrently 2. One-block-per-ODE: 100+ ODEs concurrently B. ODE solver logic runs on host 1. One-kernel-per-ODE: 1 ODE at a time 2. Offload expensive components: RHS, Jacobian, etc. 3. Not considered in this study: must have very large mechanisms to reach enough parallelism Computational Science & Engineering, LLC GPU Technology Conference 2014

8 Mapping ODEs to the GPU One-thread-per-ODE One-block-per-ODE Thread Problem Dependent Variables and Thread Index Index parameters Index y 1 , y 2 , y 3 , … y n , T, p, d t 1 1 1 y 1 , y 2 , y 3 , … y n , T, p, d t 2 2 2 Block 1 y 1 , y 2 , y 3 , … y n , T, p, d t 3 3 3 Warp 1 y 1 , y 2 , y 3 , … y n , T, p, d t ⁞ ⁞ ⁞ y 1 , y 2 , y 3 , … y n , T, p, d t 32 32 32 y 1 , y 2 , y 3 , … y n , T, p, d t 33 33 33 y 1 , y 2 , y 3 , … y n , T, p, d t 34 34 34 Warp 2 Block 2 y 1 , y 2 , y 3 , … y n , T, p, d t 35 35 35 y 1 , y 2 , y 3 , … y n , T, p, d t ⁞ ⁞ ⁞ y 1 , y 2 , y 3 , … y n , T, p, d t 64 64 64 y 1 , y 2 , y 3 , … y n , T, p, d t ⁞ ⁞ ⁞ Computational Science & Engineering, LLC GPU Technology Conference 2014

9 Mapping ODEs to the GPU one-thread-per-ODE one-block-per-ODE 1. Straightforward: 1. Complex: exploit replicate serial code parallelism within solver and across ODEs 2. Warp-level conflict between ODEs 2. No warp-level conflict between ODEs 3. Only global memory 3. Only shared memory: 4. O(10k) problems for limits problem size full occupancy 4. O(100) problems for full occupancy Computational Science & Engineering, LLC GPU Technology Conference 2014

10 Stiff ODE Solvers  VODE: Variable-coefficient ODE solver  Backwards differentiation formula (BDF)  Maintained by LLNL since early 80’s  Flavors: DVODE (Fortran); CVODE (C/C++)  1-5 th order implicit BDF with non-linear Newton- Raphson iteration and h/p-adaption.  Solves generic system of ODEs with user-defined RHS function Computational Science & Engineering, LLC GPU Technology Conference 2014

11 VODE Stiff Algorithm Estimate initial step-size: h 0 = h 0 (f(y,t 0 ), e ) 1. 2. Initialize order p = 1 3. While (t < t 1 ) Predict y(t+h) using (p-1)-th order polynomial extrapolation A. Correct y(t+h) using p-th order (1 ≤ p ≤ 5) BDF B. Compute (approximate) Jacobian matrix ( J ) … if necessary i. Factorize iteration matrix M = ( I – g J ) … if necessary ii. While (not converged): iterate Newton-Raphson solver iii. Compute RHS function at y m a) Solve correction d y m+1 = M -1 f(y m ) Heuristic b) optimization: only If solution accepted: t = t + h C. execute if it will help run-time Adjust h and p D. 4. Interpolate solution backwards if overshoot (t > t 1 ) Computational Science & Engineering, LLC GPU Technology Conference 2014

12 Stiff ODE Solvers  VODE implicit BDF algorithm highly optimized to reduce sequential execution time for stiff problems … but …  Complex implicit logic makes parallel (SIMD) execution less efficient: Newton iterations, p-adaption, Jacobian recycling, selective factorization  Fixed-order, explicit schemes should have greater parallel efficiency in SIMD environments  … but usually perform poorly on stiff problems.  Sacrifice numerical efficiency for higher computational throughput. Computational Science & Engineering, LLC GPU Technology Conference 2014

13 RKF Algorithm 1. Estimate initial step-size: h 0 = h 0 (f(y,t 0 ), e ) Cash-Karp (RKCK) and Dormand-Prince 2. While (t < t 1 ) (DOPRI) other RK A. Compute trial y(t+h) and ||err|| variants similar to Fehlberg (RKF) B. If solution accepted: t = t + h C. Adjust h  Explicit 4 th -order with 5 th -order error estimation  Adaptive step-size for error control  6 RHS function evaluations per step  Minimal branching within time-loop: high SIMD efficiency  Variable number of steps still likely between ODEs Computational Science & Engineering, LLC GPU Technology Conference 2014

14 RKC Algorithm 1. Estimate initial step-size: h 0 = h 0 (f(y,t 0 ), e ) 2. While (t < t 1 ) A. Estimate spectral radius → s. B. Compute trial y(t+h) with s stages and ||err|| C. If solution accepted: t = t + h D. Adjust h  “ Stablized ” Runge -Kutta-Chebyshev (RKC) still explicit but can handle more stiffness than RKF.  2 th -order scheme with adaptive h for error control.  Number of stages s ≥ 2 estimated every 25 steps – or after rejection – based on spectral radius.  Adaptive s introduces additional variability. Computational Science & Engineering, LLC GPU Technology Conference 2014

15 Benchmark Details  Host: Opteron 6134 2.3  Simulation database GHz … b aseline DVODE is constructed from stochastic SERIAL! Counter-Flow Linear-Eddy Model (LEM-CF) with many  GPU: Fermi m2050 different fuel-oxidizer conditions:  ODE Parameters:  19 mil samples available  Relative tolerance = 10 -10  ~ 300 cells per simulation  Absolute tolerance = 10 -13  d t ~ 10 -7 sec  19 species C 2 H 4 mechanism  167 reversible reactions  10 quasi-steady species  Low stiffness Computational Science & Engineering, LLC GPU Technology Conference 2014

16 RKF Solver Performance RKF ~ 2x slower than VODE on host RKF ~ 2-3x faster than VODE on GPU Overhead less than 0.05% Computational Science & Engineering, LLC GPU Technology Conference 2014

17 RKF Solver Performance One-thread RKF: 20.2x max speed-up One-block RKF: 10.7x max speed-up One-thread VODE: 7.7x max speed-up One-block VODE: 7.3x max speed-up One-block breaks even at M ~ 10 One-thread breaks even at M ~ 1000 Computational Science & Engineering, LLC GPU Technology Conference 2014

18 RKC Solver Performance  RKC vs. VODE with moderately stiff natural gas mechanism (GRI Mech v3)  53 species, 634 reactions  Reltol = 10 -6 ; Abstol = 10 -10  d t = 10 -6 sec  Homogeneous ignition test problem  Host: Xeon X5650 2.67GHz  GPU: c2075  57x speed-up over CPU VODE Computational Science & Engineering, LLC GPU Technology Conference 2014

19 RKC Solver Performance  C 2 H 4 -Air mechanism:  111 species, 1566 reactions.  Reltol = 10 -6 ; Abstol = 10 -10  d t = 10 -6 sec  27x speed-up over CPU VODE … still very effective. Computational Science & Engineering, LLC GPU Technology Conference 2014

20 RKC Solver Performance  Increase the stiffness by increasing the global time-step:  d t = 10 -4 sec  RKC 2.5 slower than VODE for very stiff problem. Computational Science & Engineering, LLC GPU Technology Conference 2014

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation - PowerPoint PPT Presentation

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University Computational Science & Engineering, LLC GPU

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

Faster and still safe: Combining screening techniques and structured dictionaries to accelerate

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Modeling the Dynamics and the Dynamics and Modeling Kinetics of Gaseous Pollutants and Kinetics

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Reaction Kinetics Irreversible reaction is one in which the reactant(s) proceed to

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #21 Case Study: NOM-oxidant kinetics Primary

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #21 Case Study: NOM-oxidant kinetics Primary

Foundations of Chemical Kinetics Lecture 24: Relationship between the chemical master equation

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #19 Chloramines Cont: Primary Literature

Using Coffee Stirrers to Develop and Test Geometric Concepts Rice University School Mathematics

Welcome to Junior Night What you can expect to know by the end of this Evening Getting

Usability Testing Student Web Presence Guidelines So what do we mean by usability? Watching

ONLINE PRESENTATION read the following detailed step-by-step instructions. If you don't know how

Thesis 2005 Cira Centre Philadelphia Structural Redesign of Lateral Force Resisting System

Seed treatment till 2030. Business as usual? Rob Pronk Global Marketing Manager Incotec 26th

Development of bio-stimulants corn products DuPont Pioneer-Evogene collaboration August, 2017

FAQs How do I order it? Choose your vehicle pattern from this catalog. Choose your

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation - PowerPoint PPT Presentation

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University Computational Science & Engineering, LLC GPU

Kinetics and Equilibrium Slide 3 / 119 Slide 4 / 119 Kinetics Think About It... In kinetics we

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Sigmoidal Kinetics Cooperativity Binding Constant Kinetics of Allosteric Enzymes Contents

Faster and still safe: Combining screening techniques and structured dictionaries to accelerate

Kinetic &amp; Affinity Analysis An introduction What are kinetics and affinity? Kinetics

Modeling the Dynamics and the Dynamics and Modeling Kinetics of Gaseous Pollutants and Kinetics

Factors That Affect Reaction Rates Presence of a Catalyst Surface Area Catalysts speed up

Derivation Chemical Kinetics and . . . of Towards . . . Mathematical Analysis . . . Hills

Reaction Kinetics Irreversible reaction is one in which the reactant(s) proceed to

Foundations of Chemical Kinetics Lecture 1: Review of basic concepts in kinetics Marc R. Roussel

Foundations of Chemical Kinetics Lecture 32: Heterogeneous kinetics: Gases and surfaces Marc R.

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #21 Case Study: NOM-oxidant kinetics Primary

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #21 Case Study: NOM-oxidant kinetics Primary

Foundations of Chemical Kinetics Lecture 24: Relationship between the chemical master equation

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #19 Chloramines Cont: Primary Literature

Using Coffee Stirrers to Develop and Test Geometric Concepts Rice University School Mathematics

Welcome to Junior Night What you can expect to know by the end of this Evening Getting

Usability Testing Student Web Presence Guidelines So what do we mean by usability? Watching

ONLINE PRESENTATION read the following detailed step-by-step instructions. If you don't know how

Thesis 2005 Cira Centre Philadelphia Structural Redesign of Lateral Force Resisting System

Seed treatment till 2030. Business as usual? Rob Pronk Global Marketing Manager Incotec 26th

Development of bio-stimulants corn products DuPont Pioneer-Evogene collaboration August, 2017

FAQs How do I order it? Choose your vehicle pattern from this catalog. Choose your

Kinetic & Affinity Analysis An introduction What are kinetics and affinity? Kinetics