Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - PowerPoint PPT Presentation

Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the Netherlands Unconventional HPC, EuroPar 2016 WS Grenoble, 23.08.2016

Topic of f In Interest Topic of Interest: Reduce Fuel Consumption and CO 2 Emission Wikipedia.org

Turbomachinery is about Performance and Topic of f In Interest Efficiency

Topic of f In Interest Axial Jet Engine Source: Wikipedia.org

Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case

Topic of f In Interest Optimization algorithm Derivative-based optimization Derivative free methods: e.g. Population based • • fast convergence but .. Simplicity • • derivative evaluation could be Black box approach of the complicated and problem specific evaluation but .. • (adjoint, automatic Large number of evaluations differentiation) min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝑕 𝒚 ≤ 0

CFD: Core of the Optimization CFD much slower than CSM Need for acceleration -> GPU CADO: the VKI in-house optimizer

Steady CFD Simulations • Simulation with a unique solution for given boundary Conditions. • A start solution is advanced iteratively in time until convergence

Numerical Scheme: Explicit Time Stepping ( β =0): Implicit Time Stepping ( β =1): Aissa, M.H., Verstraete, T., Vuik, C. "Aerodynamic Optimization of Supersonic Compressor Cascade using Differential Evolution on GPU". 13th Int. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM 2015)

Implicit Time Stepping is more Stable but … X 1 X 2 X n

Literature Review • What to Port • only linear solver when it is dominant • both assembly and solve is optimal (no communication) • Linear solver • Library : code maturity but restrictive (petsc-dev, Paralution, AmgX, ViennaCL …) • Own code: flexibility • Storage format • Standard (CSR,DIA …) • New (hybrid)

CFD Solver (Standard) http://mhais sa.blogspot. be/2015/10 /for- paralution- gpu- conversion- and.html Implicit Runge-Kutta scheme Xu et Al. JCP 2015

CFD Solver (Standard) Implicit Runge-Kutta scheme

CFD Solver ( On-demand Factorization) Implicit Runge-Kutta scheme

CFD Solver ( On-demand Factorization) Stop condition relative, absolute or a combination

Benchmark: Flow around LS89 2-Stages Runge-Kutta 1/12

Assembly Acceleration 9 Speedups x 7.8 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 Global speedup 2 1 CPU GPU 0 2xCores 2xcores 3xcores 4xcores ILU ILU OD 3xCores 10% 4xCores Standard On-demand CPU GPU CPU GPU 70% 14 x 12.2 Speedups 90% 12 on Fine Mesh 30% 10 8 6 4 2 0 2xCores 2xcores 3xcores 4xCores 4xcores ILU ILU OD 3xCores Standard On-demand CPU GPU CPU GPU

Linear Solver Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 x 1.2 Global speedup 2 x 0.7 1 0 2xCores 3xCores 4xCores 2xcores 3xcores 4xcores ILU ILU OD Standard On-demand CPU GPU CPU GPU 14 Speedups 12 on Fine Mesh 10 8 x 5.7 6 4 x 1.8 2 0 2xCores 3xCores 4xCores Standard 2xcores 3xcores 4xcores ILU On-demand ILU OD CPU CPU GPU GPU

Global Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup x 3.2 4 Linear solve speedup x 2.0 3 Global speedup 2 1 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU 14 Suggestion for better Speedups Performance assessment 12 on Fine Mesh x 9.6 are very welcome! 10 8 x 4.8 6 4 2 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU

Increase of the Speedup for higher Numbers of Runge-Kutta Stages on Fine Mesh 16 14 12 10 Speedup 8 6 4 2 0 2 3 4 5 6 N stages Assembly Solve Global

Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case

Test Case 3: TU Berlin TurboLab Stator Topic of f In Interest Optimization requirements Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points

Topic of f In Interest TurboLab Manufacturing Constraints • N blades = 15 • Chord length fixed • Casing fixture 60 mm d=10mm h=20mm d=2mm

TurboLab: Boundary conditions and summary 9 kg/s +/- 0.1 Massflow imposed P 2 adapted Inlet P 0 : 102713.0 Pa Inlet T 0 : 294.314 K Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points Inlet whirl angle: 42° Inlet pitch angle: 0 °

Parametrization 21 Design variables Span [-]

Turbolab Parameterization

Optimization Results 0.17% 1.7 % 60% IT074IND6

Optimized Blade

Baseline Vs Optimized

Isentropic Mach Number at mid-span

Conclusion • Optimization • GPU Solver with implicit time stepping • On-demand (incomplete) Factorization • 10x speedup • Aerodynamic shape optimization

Future Work Benchmark Case: Transonic Turbine Stator T106c 80 Speedup based on CPU explicit 70 CPU exp GPU exp 60 GPU imp CPU imp 50 40 GPU Imp. 30 GPU Exp. 20 CPU Imp. 10 CPU Exp. 0 50K 450k 900k Mesh Size

Thanks for your attention Mohamed Hassanine Aissa Turbomachinery & Propulsion Department 72, chaussee de Waterloo B1640 - Rhode Saint Genese - Belgium Email: aissa@vki.ac.be ack cknowle ledgements: Support t H Hardware

Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - PowerPoint PPT Presentation

Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Simulations in Coalgebra Bart Jacobs and Jesse Hughes { bart,jesseh } @cs.kun.nl. University of

OPTIONAL LABORATORY SESSIONS Second-order NL Optics simulations Simulations using SNLO, a

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Building a Distributed Build System at Google Scale Aysylu Greenberg Aysylu Greenberg @aysylu22

RIDING THE JET STREAMS FUAD MALIKOV | HAZELCAST JAVA 8 STREAM API WHAT IS IT? JAVA 8 PRE JAVA

Societal Risks and the Law Stats/CS/PoliSci C79 Philip Stark, David Wagner, Jasjeet Sekhon,

Lecture 8 Aircraft Mission Text: Constraints analysis Introduction Concept of Constraints

Data Structures and Besides reusing existing classes for new applications, OOP allows

Blazars: Can they (sometimes) be much faster than we thought? Markos Georganopoulos 1,2

Magnetic fields at the base of AGN jets: the case of M87 AGN M87

Mechanism Feasibility Design Task Dr. James Gopsill Design & Manufacture 2 Mechanism

Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - PowerPoint PPT Presentation

Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Simulations in Coalgebra Bart Jacobs and Jesse Hughes { bart,jesseh } @cs.kun.nl. University of

OPTIONAL LABORATORY SESSIONS Second-order NL Optics simulations Simulations using SNLO, a

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Building a Distributed Build System at Google Scale Aysylu Greenberg Aysylu Greenberg @aysylu22

RIDING THE JET STREAMS FUAD MALIKOV | HAZELCAST JAVA 8 STREAM API WHAT IS IT? JAVA 8 PRE JAVA

Societal Risks and the Law Stats/CS/PoliSci C79 Philip Stark, David Wagner, Jasjeet Sekhon,

Lecture 8 Aircraft Mission Text: Constraints analysis Introduction Concept of Constraints

Data Structures and Besides reusing existing classes for new applications, OOP allows

Blazars: Can they (sometimes) be much faster than we thought? Markos Georganopoulos 1,2

Magnetic fields at the base of AGN jets: the case of M87 AGN M87

Mechanism Feasibility Design Task Dr. James Gopsill Design &amp; Manufacture 2 Mechanism

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Mechanism Feasibility Design Task Dr. James Gopsill Design & Manufacture 2 Mechanism