challenges in fluid flow simulations using exa scale
play

Challenges in fluid flow simulations using Exa-scale computing - PowerPoint PPT Presentation

Challenges in fluid flow simulations using Exa-scale computing Mahendra Verma IIT Kanpur http://turbulencehub.org mkv@iitk.ac.in Hardware From Karniakadiss course slides 10 18 A Growth-Factor of a Billion 200 PF in Performance in a


  1. Challenges in fluid flow simulations using Exa-scale computing Mahendra Verma IIT Kanpur http://turbulencehub.org mkv@iitk.ac.in

  2. Hardware

  3. From Karniakadis’s course slides 10 18 A Growth-Factor of a Billion 
 200 PF in Performance in a Career TITAN Super Scalar/Vector/Parallel 1 PFlop/s (10 15 ) IBM Parallel BG/L ASCI White ASCI Red Pacific 1 TFlop/s (10 12 ) TMC CM-5 Cray T3D 2X Transistors/Chip Vector TMC CM-2 Every 1.5 Years Cray 2 1 GFlop/s Cray X-MP Super Scalar (10 9 ) 1941 1 (Floating Point operations / second, Flop/s) Cray 1 1945 100 1949 1,000 (1 KiloFlop/s, KFlop/s) CDC 7600 1951 10,000 IBM 360/195 Scalar 1 MFlop/s 1961 100,000 (10 6 ) 1964 1,000,000 (1 MegaFlop/s, MFlop/s) CDC 6600 1968 10,000,000 IBM 7090 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) 1 KFlop/s 2000 10,000,000,000,000 (10 3 ) UNIVAC 1 2005 131,000,000,000,000 (131 Tflop/s) EDSAC 1 1950 1960 1970 1980 1990 2000 2010 2018

  4. https://www.amd.com/en/products/cpu/amd-epyc-7551 NODE: 2 proc/node; Focus on a node Flop rating for 2 procs: 2*32*24 = 1536 GF Wants data ~ 8 TB/sec. Cache, RAM, HD

  5. Data transfer FLOPS free, data transfer expensive (Saday) Memory BW = 341 GB/s SSD: transfer rate = 6 Gbit/s peak IB Switch speed/port = 200 Gb/s

  6. software challenges

  7. For beginners • Abundance (MPI, OpenMP, CUDA, ML) • Leads to confusion and non-start.. • Structured programming • Pressure to do the science.. • Some times CS tools are too complex to be practical.

  8. For advanced users • Optimised use of hardware. • Structured and modular, usable code with documentation. • Keeping up with upgrades and abundance (MPI3, ML, C++11, Vector processors, GPU, XeonPhi, Rasberry Pi). • Optimization • Interactions with users + programers

  9. Now CFD (Computational fluid dynamics)

  10. Applications • Weather prediction and climate modelling • Aeroplane and cars (transport) • defence / offences • Turbines, dams, water management • Astrophysical flows • Theoretical understanding

  11. Field reversal with Mani Chandra

  12. Geomagnetism Glatzmaier & Roberts Nature, 1995 Polarity reversals after random time intervals (tens of millions of years to 50K years). Last reversal took place around 780,000 years ago.

  13. Nek5000 (Spectral-element) simulation (1,1) ➞ (2,2) ➞ (1,1) spectral-element code Nek5000 Chandra & Verma, PRE 2011, PRL 2013

  14. Methods • Finite difference • Finite volume • Finite element • Spectral • Spectral element

  15. Spectral method

  16. Example: Fluid solver

  17. velocity Pressure Ext. Force field ∂ t u + ( u ⋅∇ ) u = −∇ p + ν ∇ 2 u + F kinematic viscosity ∇⋅ u = 0 UL Incompressibility Reynolds no = ν

  18. Procedure

  19. ∑ ˆ f ( x ) = f ( k x ) exp[ i ( k x x )] k z ∑ [ ik x ˆ df ( x ) / dx = f ( k x ) ]exp[ i ( k x x )] k z

  20. Set of ODEs du i ( k ) ! − jk i p ( k ) − ν k 2 u i ( k ) = − jk m u m ( r ) u i ( r ) dt Time advance (e.g., Euler’s scheme) u i ( k , t + dt ) = u i ( k , t ) + dt ∗ RHS ( u ( k ) , t ) u i ( k , t + dt ) = u i ( k ) + dt × RHS i ( k , t ) Stiff equation for small viscosity ν (use exponential trick)

  21. Nonlinear terms computation: (pseudo-spectral) Fourier transforms take around 80% of total time.

  22. Tarang = wave (Sanskrit) Spectral code (Orszag) One code to do many turbulence & instabilities problems VERY HIGH RESOLUTION (6144 3 ) Cores: 196692 of Shaheen II of KAUST Opensource, download from http://turbulencehub.org Chatterjee et al., JPDC 2018

  23. Fluid MHD, Dynamo Scalar Rayleigh-Bénard convection Instabilities Stratified flows Chaos Rayleigh-Taylor flow Turbulence Liquid metal flows Rotating flow Rotating convection No-slip BC Cylinder sphere Periodic BC Toroid Free-slip BC (in progress)

  24. Rich libraries to compute New things Spectrum Fourier modes Fluxes Real space probes Shell-to-shell transfer Ring-spectrum Structure functions Ring-to-ring transfer Tested up to 6144 3 grids

  25. Object-oriented design

  26. Basis functions (FFF, SFF, SSF, SSS, ChFF) Basis-independent universal function (function overloading) e.g., compute_nlin (u. ∇ )u, (b. ∇ )u, (b. ∇ )b, (u. ∇ )T. General PDE solver We can use these general functions to simulate MHD, convection etc.

  27. Generated by Doxygen

  28. Parallelization

  29. Spectral Transform (FFT, SFT, Chebyshev) Multiplication in real space Input/Output HDF5 lib

  30. FFT Parallelization ∑ ∑ ∑ ˆ f ( x , y , z ) = exp[ i ( k x x + k y y + k z z )] f ( k x , k y , k z ) k x k y k z

  31. Slab decomposition Data divided among 4 procs

  32. Transpose-free FFT MPI vector, conconsecutive data transfer N y N y p0 p1 N x N x 1 2 3 4 p0 1 2 3 4 5 6 7 8 5 6 7 8 Inter-process Communication 9 10 11 12 9 10 11 12 p1 13 14 15 16 13 14 15 16 12-15% faster compared to FFTW

  33. Pencil decomposition

  34. FFT scaling On Shaheen 2 at KAUST with Anando Chatterjee, Abhishek Kumar, Ravi Samtaney, Bilel Hadri, Rooh Khurram Cray XC40 ranked 9th in top500 Chatterjee et al., JPDC 2018

  35. 768 3 1536 3 p 1 3072 3 n 0.7

  36. Tarang scaling On Shaheen at KAUST

  37. • Weak scaling: When we increase the size of the problem, as well as number of procs, then should get the same scaling.

  38. Average flop rating/core (~1.5 %) Compare with BlueGene/P (~8 %) Overlap Communication & Computation ?? GPUs ?? Xeon Phi ??

  39. To Petascale & then Exascale

  40. Finite difference code General code: Easy porting to GPU, MiC Collaborators: Roshan Samuel Fahad Anwer (AMU) Ravi Samtaney (KAUST)

  41. Summary ★ Code development ★ Module development ★ Optimization ★ Porting to large number of processors ★ GPU Porting ★ Testing

  42. Acknowledgements Ported to: Students: PARAM, CDAC Anando Chatterjee Shaheen, KAUST Abhishek Kumar HPC system IITK Roshan Samuel Sandeep Reddy Funding Mani Chandra Dept of Science and Tech., India Sumit Kumar & Vijay Dept of Atomic Energy, India Faculty: KAUST (computer time) Ravi Samtaney Fahad Anwer

  43. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend