CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. - - PowerPoint PPT Presentation

convergence acceleration techniques for dual time stepping
SMART_READER_LITE
LIVE PREVIEW

CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. - - PowerPoint PPT Presentation

CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. Loppi Brian C. Vermeire Peter E. Vincent AI & HPC Solution Architect Aerospace Engineering Department of Aeronautics NVIDIA Concordia University Imperial College London


slide-1
SLIDE 1

CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING

Brian C. Vermeire Aerospace Engineering Concordia University Niki A. Loppi AI & HPC Solution Architect NVIDIA Peter E. Vincent Department of Aeronautics Imperial College London

slide-2
SLIDE 2

OVERVIEW

  • Incompressible flows require a divergence free velocity field
  • Artificial Compressibility Method (ACM) is a suitable approach
  • A range of novel convergence acceleration techniques
  • Locally Adaptive Pseudo-Timestepping (LAPTS)
  • Polynomial Multigrid (P-MG)
  • Optimal explicit Runge-Kutta Methods
slide-3
SLIDE 3

ARTIFICIAL COMPRESSIBILITY

  • An alternative to pressure projection in steady state
  • ACM uses a pseudo time problem to enforce incompressibility
  • Dual time-stepping can extend the ACM unsteady flows
  • This introduces a global hyperbolic problem in pseudo-time
  • Leverage the explicit solver technology already in PyFR
slide-4
SLIDE 4

ARTIFICIAL COMPRESSIBILITY

Conservation law ∂u ∂τ + Ic ∂u ∂t + ∂F ∂x + ∂G ∂y + ∂H ∂z = 0 ∂u ∂τ = Rn+1,m − Ic 2Δt (3un+1,m − 4un + un−1) Physical time u(k) = u(0) − αmΔτ (R(k−1) − Ic 2Δt (3u(k−1) − 4un + un−1)) Pseudo time Algorithm

(1)

slide-5
SLIDE 5

OVERVIEW

  • ACM performance relies on rapid convergence in pseudo-time
  • A range of novel convergence acceleration techniques in PyFR
  • Polynomial Multigrid (P-MG)
  • Locally Adaptive Pseudo-Timestepping (LAPTS)
  • Optimal explicit Runge-Kutta Methods
slide-6
SLIDE 6

POLYNOMIAL MULTIGRID

  • Leverage lower polynomial degrees to accelerate convergence
  • Less strict CFL limits on the coarser levels
  • Less expensive per iteration on the coarser levels
  • Low-frequency error is converged faster on coarse levels
  • Correction from coarse levels is then prolongated to fine levels
slide-7
SLIDE 7

POLYNOMIAL MULTIGRID

Iterate Restrict Iterate Restrict Iterate Iterate Prolongate Iterate Prolongate

slide-8
SLIDE 8

POLYNOMIAL MULTIGRID

  • Unsteady Circular Cylinder

~ 6.2x Speedup

slide-9
SLIDE 9

POLYNOMIAL MULTIGRID

  • Incompressible Taylor Green Vortex

~ 3.5x Speedup

slide-10
SLIDE 10

LAPTS

  • Convergence is accelerated by using local pseudo-time steps
  • Maximum permissible step size is limited by local CFL criteria
  • Element size
  • Polynomial degree
  • Local wave speeds and viscous effects
  • Runge-Kutta scheme properties
  • This limit is estimated via embedded pair Runge-Kutta schemes
slide-11
SLIDE 11

LAPTS

  • Embedded pair gives an estimate of the truncation error
  • Pseudo-time step size is the adapted using a PI-controller
  • For each element
  • For each field variable
  • Scaled up on coarser grid levels when combined with P-MG
slide-12
SLIDE 12

LAPTS

  • Unsteady Circular Cylinder

~ 4.1x Speedup

slide-13
SLIDE 13

LAPTS

  • SD7003 Airfoil

~ 2.4x Speedup

slide-14
SLIDE 14

OPTIMAL RUNGE-KUTTA SCHEMES

  • Properties of Runge-Kutta scheme limit pseudo-time step size
  • Each Runge-Kutta scheme has a stability polynomial
  • Each stability polynomial has a region of absolute stability
  • Pseudo-time step is limited by the size of this region
  • For the ACM, first-order in pseudo-time time is sufficient
slide-15
SLIDE 15

OPTIMAL RUNGE-KUTTA SCHEMES

Ps,1(z) = 1 + z +

s

j=2

γjzj, z = Δτωδ |Ps,1(Δτωδ)| − 1 ≤ 0, ∀ωδ {γ2, γ3, . . . , γs} Stability polynomial

Optimise to yield maximum Δτ subject to

slide-16
SLIDE 16

OPTIMAL RUNGE-KUTTA SCHEMES

  • Optimal stability polynomials can be used for embedded pairs
  • Divergence of a “test” scheme controls pseudo-time step
  • Allows automatic pseudo-time step size selection
slide-17
SLIDE 17
  • Unsteady Circular Cylinder

~ 2.1x Speedup

OPTIMAL RUNGE-KUTTA SCHEMES

slide-18
SLIDE 18

OPTIMAL RUNGE-KUTTA SCHEMES

  • Turbulent Jet

~ 2x Speedup

slide-19
SLIDE 19

PERFORMANCE

Speed Up for Cylinder Benchmark

5 10 15 20 25

RK4 RK-Opt LTS PMG RK-Opt+LTS+PMG

  • Advancements in numerical methods (2015 - 2020)

~ 21x Speedup

slide-20
SLIDE 20

PERFORMANCE

  • Advancements in hardware (2015 - 2020)

Peak DP TFLOP/s 5 10 15 20 K20 P100 V100 A100

~ 16x Speedup

slide-21
SLIDE 21

PERFORMANCE

  • Combined ~350x speedup (2015 - 2020)

Peak DP TFLOP/s 5 10 15 20 K20 P100 V100 A100

Speed Up for Cylinder Benchmark

5 10 15 20 25

RK4 LTS RK-Opt+LTS+PMG

slide-22
SLIDE 22

RESULTS

  • DARPA SUBOFF at Re = 1.2×106
slide-23
SLIDE 23

RESULTS

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

RESULTS

slide-28
SLIDE 28

CONFIGURATION

P-MG Optimal Runge Kutta LAPTS

slide-29
SLIDE 29

REFERENCES

  • NA Loppi, FD Witherden, A Jameson, PE Vincent, A high-order cross-platform incompressible Navier–Stokes solver via

artificial compressibility with application to a turbulent jet, Computer Physics Communications 233, 193-205, 2018.

  • BC Vermeire, NA Loppi, PE Vincent, Optimal Runge–Kutta schemes for pseudo time-stepping with high-order

unstructured methods, Journal of Computational Physics 383, 55-71, 2019.

  • NA Loppi, FD Witherden, A Jameson, PE Vincent, Locally adaptive pseudo-time stepping for high-order Flux

Reconstruction, Journal of Computational Physics 399, 2019.

  • BC Vermeire, NA Loppi, PE Vincent, Optimal embedded pair Runge-Kutta schemes for pseudo-time stepping, Journal of

Computational Physics, 415, 2020.

slide-30
SLIDE 30

QUESTIONS