Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - - PowerPoint PPT Presentation

simulations on gpu
SMART_READER_LITE
LIVE PREVIEW

Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - - PowerPoint PPT Presentation

Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the


slide-1
SLIDE 1

Unconventional HPC, EuroPar 2016 WS Grenoble, 23.08.2016

Acceleration of Turbomachinery steady CFD Simulations on GPU

Mohamed H. Aissa1,2

  • Dr. Tom Verstraete1
  • Prof. Cornelis Vuik2

1 Turbomachinery Department ,

Von Karman Institute, Belgium

2 Delft Institute of Applied Mathematics,

TU Delft, the Netherlands

slide-2
SLIDE 2

Topic of f In Interest

Topic of Interest: Reduce Fuel Consumption and CO2 Emission

Wikipedia.org

slide-3
SLIDE 3

Topic of f In Interest

Turbomachinery is about Performance and Efficiency

slide-4
SLIDE 4

Topic of f In Interest

Axial Jet Engine

Source: Wikipedia.org

slide-5
SLIDE 5

Content

  • Multidisciplinary Optimization
  • CFD simulations on GPU
  • Literature review
  • Implicit RANS Implementation
  • Benchmark
  • Optimization Case
slide-6
SLIDE 6

Topic of f In Interest

Optimization algorithm

Derivative-based optimization Derivative free methods: e.g. Population based

  • fast convergence but ..
  • derivative evaluation could be

complicated and problem specific (adjoint, automatic differentiation)

  • Simplicity
  • Black box approach of the

evaluation but ..

  • Large number of evaluations

min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝑕 𝒚 ≤ 0

slide-7
SLIDE 7

Topic of f In Interest

Optimization algorithm

Derivative-based optimization Derivative free methods: e.g. Population based

  • fast convergence but ..
  • derivative evaluation could be

complicated and problem specific (adjoint, automatic differentiation)

  • Simplicity
  • Black box approach of the

evaluation but ..

  • Large number of evaluations

min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝑕 𝒚 ≤ 0

slide-8
SLIDE 8

CFD: Core of the Optimization

CFD much slower than CSM Need for acceleration -> GPU

CADO: the VKI in-house optimizer

slide-9
SLIDE 9

Steady CFD Simulations

  • Simulation with a unique solution for given boundary Conditions.
  • A start solution is advanced iteratively in time until convergence
slide-10
SLIDE 10

Steady CFD Simulations

  • Simulation with a unique solution for given boundary Conditions.
  • A start solution is advanced iteratively in time until convergence
slide-11
SLIDE 11

Numerical Scheme:

Explicit Time Stepping (β=0):

Aissa, M.H., Verstraete, T., Vuik, C. "Aerodynamic Optimization of Supersonic Compressor Cascade using Differential Evolution on GPU". 13th Int. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM 2015)

Implicit Time Stepping (β=1):

slide-12
SLIDE 12

Implicit Time Stepping is more Stable but …

X1 X2 Xn

slide-13
SLIDE 13

Literature Review

  • What to Port
  • only linear solver when it is dominant
  • both assembly and solve is optimal (no communication)
  • Linear solver
  • Library : code maturity but restrictive

(petsc-dev, Paralution, AmgX, ViennaCL …)

  • Own code: flexibility
  • Storage format
  • Standard (CSR,DIA …)
  • New (hybrid)
slide-14
SLIDE 14

http://mhais sa.blogspot. be/2015/10 /for- paralution- gpu- conversion- and.html

CFD Solver (Standard)

Implicit Runge-Kutta scheme

Xu et Al. JCP 2015

slide-15
SLIDE 15

CFD Solver (Standard)

Implicit Runge-Kutta scheme

slide-16
SLIDE 16

CFD Solver (On-demand Factorization)

Implicit Runge-Kutta scheme

slide-17
SLIDE 17

CFD Solver (On-demand Factorization)

Stop condition relative, absolute or a combination

slide-18
SLIDE 18

CFD Solver (On-demand Factorization)

Stop condition relative, absolute or a combination

slide-19
SLIDE 19

1/12

Benchmark: Flow around LS89 2-Stages Runge-Kutta

slide-20
SLIDE 20

GPU

Assembly Acceleration

Assembly speedup Linear solve speedup Global speedup

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 2 4 6 8 10 12 14

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 1 2 3 4 5 6 7 8 9

4xCores Standard On-demand

Speedups

  • n Coarse Mesh

x 7.8 Speedups

  • n Fine Mesh

x 12.2

3xCores 2xCores GPU 4xCores Standard On-demand 3xCores 2xCores CPU CPU

70% 30% CPU GPU 10% 90%

slide-21
SLIDE 21

Linear Solver Acceleration

Assembly speedup Linear solve speedup Global speedup

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 1 2 3 4 5 6 7 8 9

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 2 4 6 8 10 12 14

Speedups

  • n Coarse Mesh

x 0.7 x 1.2 Speedups

  • n Fine Mesh

x 1.8 x 5.7

4xCores Standard On-demand 3xCores 2xCores 4xCores Standard On-demand 3xCores 2xCores

CPU CPU GPU GPU

slide-22
SLIDE 22

Global Acceleration

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 1 2 3 4 5 6 7 8 9

2xcores 3xcores 4xcores ILU ILU OD CPU GPU 2 4 6 8 10 12 14

Assembly speedup Linear solve speedup Global speedup

Speedups

  • n Coarse Mesh

x 2.0 x 3.2 x 4.8 x 9.6 Speedups

  • n Fine Mesh

4xCores Standard On-demand 3xCores 2xCores 4xCores Standard On-demand 3xCores 2xCores

CPU GPU CPU GPU Suggestion for better Performance assessment are very welcome!

slide-23
SLIDE 23

2 4 6 8 10 12 14 16 2 3 4 5 6

Speedup N stages

Assembly Solve Global

Increase of the Speedup for higher Numbers of Runge-Kutta Stages on Fine Mesh

slide-24
SLIDE 24

Content

  • Multidisciplinary Optimization
  • CFD simulations on GPU
  • Literature review
  • Implicit RANS Implementation
  • Benchmark
  • Optimization Case
slide-25
SLIDE 25

Topic of f In Interest

Test Case 3: TU Berlin TurboLab Stator Optimization requirements

Objectives:

  • Decrease outfow axial deviation
  • Decrease total pressure loss

Considering 3 operating points

slide-26
SLIDE 26

Topic of f In Interest

  • Nblades= 15
  • Chord length fixed
  • Casing fixture

TurboLab Manufacturing Constraints

60 mm d=10mm h=20mm d=2mm

slide-27
SLIDE 27

Inlet P0: 102713.0 Pa Inlet T0: 294.314 K Inlet whirl angle: 42° Inlet pitch angle: 0 ° Massflow imposed P2 adapted

TurboLab: Boundary conditions and summary

Objectives:

  • Decrease outfow axial deviation
  • Decrease total pressure loss

Considering 3 operating points 9 kg/s +/- 0.1

slide-28
SLIDE 28

Parametrization 21 Design variables

Span [-]

slide-29
SLIDE 29

Turbolab Parameterization

slide-30
SLIDE 30

Optimization Results

1.7 % IT074IND6 60% 0.17%

slide-31
SLIDE 31

Optimized Blade

slide-32
SLIDE 32

Baseline Vs Optimized

slide-33
SLIDE 33

Baseline Vs Optimized

slide-34
SLIDE 34

Isentropic Mach Number at mid-span

slide-35
SLIDE 35

Conclusion

  • Optimization
  • GPU Solver with implicit time stepping
  • On-demand (incomplete) Factorization
  • 10x speedup
  • Aerodynamic shape optimization
slide-36
SLIDE 36

Future Work

Benchmark Case: Transonic Turbine Stator T106c

10 20 30 40 50 60 70 80 50K 450k 900k Mesh Size

Speedup based on CPU explicit

CPU exp GPU exp GPU imp CPU imp

GPU Imp. CPU Exp. GPU Exp. CPU Imp.

slide-37
SLIDE 37

Thanks for your attention

Mohamed Hassanine Aissa Turbomachinery & Propulsion Department 72, chaussee de Waterloo B1640 - Rhode Saint Genese - Belgium Email: aissa@vki.ac.be ack cknowle ledgements: Support t H Hardware