NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 - PowerPoint PPT Presentation

Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 5 10 10 Supported by: NSF and NSF/DOE Supercomputer Centers, USA NIA CFD Conference – p.1/16

Petascale and Beyond: Some Remarks The “supercomputer arms race”: Earth Simulator (Japan) was No. 1 in 2002 at 40 Teraflops. In 2011: the same speed did not make it into top 500. Massive parallelism has been dominant trend but, because of communication and memory cache issues, most actual user codes at only a few percent of theoretical peak multi-cored processors for on-node shared memory Path to Exascale may require new modes of programming Tremendous demand for resources: both CPU hours and storage Advanced Cyberinfrastructure having a transformative impact on research in turbulence and other fields of science and engineering NIA CFD Conference – p.2/16

Direct Numerical Simulations (DNS) For science discovery: instantaneous flow fields (at all scales) via equations expressing fundamental conservation laws Navier-Stokes equations with constant density ( ∇· u =0 ): ∂ u /∂t + u · ∇ u = −∇ ( p/ρ ) + ν ∇ 2 u + f Fourier pseudo-spectral methods (for accuracy and efficiency) in our work: homogeneous turbulence (no boundaries) local isotropy: results relevant to high- Re turbulent flows Wide range of scales = ⇒ computationally intensive Tremendous detail, surpassing most laboratory experiments fundamental understanding, “thought experiments” help advance modeling (both input and output) NIA CFD Conference – p.3/16

NSF: Petascale Turbulence Benchmark (One of a few for acceptance testing of 11-PF Blue Waters) “A 12288 3 simulation of fully developed homogeneous turbulence in a periodic domain for 1 eddy turnover time at a value of R λ of O ( 2000 ) .” “The model problem should be solved using a dealiased, pseudospectral algorithm, a fourth-order explicit Runge-Kutta time-stepping scheme, 64-bit floating point (or similar) arithmetic, and a time-step of 0.0001 eddy turnaround times.” “Full resolution snapshots of the three-dimensional vorticity, velocity and pressure fields should be saved to disk every 0.02 eddy turnaround times. The target wall-clock time is 40 hours.” (PRAC grant from NSF, working with BW Project Team) NIA CFD Conference – p.4/16

2D Domain Decomposition Partition a cube along two directions, into “pencils” of data 3D FFT from physical space to wavenumber space: (Starting with pencils in x ) PENCIL Transform in x Transpose to pencils in z Transform in z Transpose to pencils in y Transform in y Up to N 2 cores for N 3 grid Transposes by message-passing, MPI: 2-D processor grid, collective communication M 1 (rows) × M 2 (cols) NIA CFD Conference – p.5/16

Factors Affecting Performance Much more than the number of operations... Domain decomposition: the “processor grid geometry” Load balancing: are all CPU cores equally busy? Software libraries, compiler optimizations Computation: cache size and memory bandwidth, per core Communication: bandwidth and latency, per MPI task Memory copies due to non-contiguous messages I/O: filesystem speed and capacity; control of traffic jams Environmental variables, network topology Practice: job turnaround, scheduler policies, and CPU-hour economics NIA CFD Conference – p.6/16

Current Petascale Implementations Pure MPI: performance dominated by collective communication usually 85-90% strong scaling every doubling of core count Hybrid MPI + OpenMP (multithreaded) shared memory on node, distributed across nodes less communication overhead, may scale better than pure MPI at large problem size and large core count memory affinity issues (system-dependent) Co-Array Fortran (Partitioned Global Address Space language) remote-memory addressing in place of MPI communication key routines by Cray expert (R.A. Fiedler) on Blue Waters project, significantly faster on Cray XK6 (using 131072 cores) NIA CFD Conference – p.7/16

DNS Code: Parallel Performance Largest tests on 2+ Petaflop Cray XK6 (Jaguarpf at ORNL) 4096 3 (circles) and 8192 3 (triangles), 4th-order RK CPU/step, MPI-OpenMP CPU/step, MPI + CAF 2 10 � 1 10 � 4 5 10 10 # cores # cores pure MPI, best processor grid, stride-1 arithmetic dealiasing: can skip some (high k ) modes in Fourier space better scaling when scalars added (blue, more work/core) NIA CFD Conference – p.8/16

Future Optimization Strategies Advanced MPI: one-sided communication let sending task write directly onto memory in receiving task Overlap between computation and communication not a new idea, but tricky to do, and little hardware support not too effective if there is not much to overlap Serialized-threads: let some OpenMP threads communicate, while others compute GPUs and accelerators: speed up computation and capable of v. large thread counts but need to copy data between GPU and CPU Or, shall we change the numerical method? (Consider the degree of need for communication) NIA CFD Conference – p.9/16

Turbulence: Uses of High-End HPC A wider range of scales (in space and/or time) higher Reynolds number (always!) mixing high Schmidt number ( Sc = ν/D ): smaller scales very low Sc : small time steps (fast molecular diffusion) Improved accuracy at the small scales fine-scale intermittency, thin reaction zones Longer simulations for better sampling or temporal evolution amount of data is also a challenge More complex physics, coupled with other phenomena e.g. stratification, rotation, MHD More complex boundary conditions channel, boundary layer, mixing layer etc (still canonical) NIA CFD Conference – p.10/16

Extreme Events and Intermittency Dissipation: ǫ = 2 νs ij s ij (strain rates squared) Enstrophy: Ω = ( ν ) ω i ω i (rotation rates squared) Same mean values in homogeneous turbulence, but moments and PDFs can be different Both represent small scales, but most data sources suggest enstrophy is more intermittent, contrary to expectation at high Reynolds no. (Nelkin 1999) Strong dissipation/straining can pull flame surfaces apart, while strong rotation leads to preferential particle concentration in multiphase flows Difficulties in resolution and sampling, — inherent nature of infrequent but extreme events NIA CFD Conference – p.11/16

3D Visualization [TACC visualization staff] 2048 3 , R λ ≈ 650 : intense enstrophy (red) has worm-like structure, while dissipation (blue) is more diffuse NIA CFD Conference – p.12/16

PDFs of Dissipation and Enstrophy From Yeung et al. J. Fluid Mech. 2012 (Vol. 700; Focus on Fluids) Highest Re , and best-resolved at moderate Re (both 4096 3 ) R λ 240 R λ 1000 PDF Ω Ω ǫ ǫ ǫ/ � ǫ � , Ω / � Ω � High Re : most intense events in both found to scale similarly Higher-order moments also become closer NIA CFD Conference – p.13/16

JPDF of Dissipation and Enstrophy Do intense ǫ and intense Ω tend to occur together? R λ 240 R λ 1000 Ω / � Ω � ǫ/ � ǫ � ǫ/ � ǫ � Yes, for most-intense fluctuations, at R λ 1000 (and 650) (contours in first quadrant, logarithmic intervals) NIA CFD Conference – p.14/16

Database and Data Management Three 4096 3 simulations have been performed, aimed at: Lagrangian statistics at highest Re feasible Improved resolution of smallest scales Higher Schmidt number for turbulent mixing (A fourth is planned, for mixing at very low Schmidt number) Several hundred Terabytes of data, mostly restart files that can be analyzed to answer various physical questions how best to keep/organize data, at national centers how best to share data with other researchers (and/or work with them to extract statistics they need) Cyber challenges: e.g. data management are non-trivial NIA CFD Conference – p.15/16

Concluding Remarks Successful extreme-scale DNS will require: Deep engagement with top HPC experts and vendors’ staff Communication, memory, and data; rather than raw speed Insights about the science: what will be most useful to compute, that cannot be obtained otherwise? Competition for hours, in high demand by other disciplines Q.: Will we be ready for Exascale in 2018? NIA CFD Conference – p.16/16

NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 - PowerPoint PPT Presentation

Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 5 10 10 Supported by: NSF and NSF/DOE

Science Clouds and CFD NIA CFD Conference: Future Directions in CFD Research, A Modeling and

NIA Magellan 1 Medical Specialty Solutions 1 NIA Magellan refers to National Imaging Associates,

NIA Medical Specialty Solutions Absolute Total Care Provider Training 1 - National Imaging

NIA Magellan 1 Medical Specialty Solutions Coventry Health Care of Georgia, Inc. Provider

Implicit Smoothing with Runge-Kutta Schemes for Navier-Stokes CFD Futures Conference NIA,

CFD Introduction Lecture 15 ME EN 412 Andrew Ning aning@byu.edu Outline CFD Overview CFD

National Imaging Associates, Inc. (NIA) 1 Medical Specialty Solutions Peach State Health Plan

Specialty Solutions Coventry Health Care of the Carolinas, Inc. Provider Training 1 - NIA

Year 11 Core GCSE Support 2017 'Shaping Futures' 'Shaping Futures' Three way Partnership

CFD Analysis ME 24-688 Introduction to CAD/CAE Tools Lecture Topics Team Project 2 Discussion

Joint Community Facilities Agreement for Transbay CFD December 9, 2014 Role of JCFA in CFD

Travis Unified School District CFD No. 1 and CFD No. 2 Governing Board Meeting October 8, 2019

Lunch & Learn | Phillip CFD FOR INTERNAL CIRCULATION ONLY Jasvind Singh CFD Dealer

Doru Caraeni CD-adapco, USA CFD Futures Conference, August 6-8, 2012 Why I did Residual-based

501 Commodity Trading Lab Index - Topics to cover What are futures contracts. Why do

Exclusive Presentation for Infinity Futures Trading Futures, Options on Futures, and retail

Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Basic info

How to implement a virtual How to implement a virtual network laboratory in six network

Multicore OSes: Looking Forward from 1991, er, 2011 David A. Holland and Margo I. Seltze (Harvard

OPEN PETASCALE LIBRARIES Advancing the development of numerical software for the new generation

A furtive fumble in Hard-Core Obscenity: the misuse of Template Meta-Programming to implement

2QBF workshop paper report: Graph Neural Network in the 2QBF Zhanfu Yang Content What is SAT

Safety and Security 1) Please silence your cell phones during the presentation. 2) Please make

Audience Segmentation and Messaging Worksheet Lets start by segmenting the audience to learn

NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 - PowerPoint PPT Presentation

Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 2 10 1 10 4 5 10 10 Supported by: NSF and NSF/DOE

Science Clouds and CFD NIA CFD Conference: Future Directions in CFD Research, A Modeling and

NIA Magellan 1 Medical Specialty Solutions 1 NIA Magellan refers to National Imaging Associates,

NIA Medical Specialty Solutions Absolute Total Care Provider Training 1 - National Imaging

NIA Magellan 1 Medical Specialty Solutions Coventry Health Care of Georgia, Inc. Provider

Implicit Smoothing with Runge-Kutta Schemes for Navier-Stokes CFD Futures Conference NIA,

CFD Introduction Lecture 15 ME EN 412 Andrew Ning aning@byu.edu Outline CFD Overview CFD

National Imaging Associates, Inc. (NIA) 1 Medical Specialty Solutions Peach State Health Plan

Specialty Solutions Coventry Health Care of the Carolinas, Inc. Provider Training 1 - NIA

Year 11 Core GCSE Support 2017 'Shaping Futures' 'Shaping Futures' Three way Partnership

CFD Analysis ME 24-688 Introduction to CAD/CAE Tools Lecture Topics Team Project 2 Discussion

Joint Community Facilities Agreement for Transbay CFD December 9, 2014 Role of JCFA in CFD

Travis Unified School District CFD No. 1 and CFD No. 2 Governing Board Meeting October 8, 2019

Lunch &amp; Learn | Phillip CFD FOR INTERNAL CIRCULATION ONLY Jasvind Singh CFD Dealer

Doru Caraeni CD-adapco, USA CFD Futures Conference, August 6-8, 2012 Why I did Residual-based

501 Commodity Trading Lab Index - Topics to cover What are futures contracts. Why do

Exclusive Presentation for Infinity Futures Trading Futures, Options on Futures, and retail

Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Basic info

How to implement a virtual How to implement a virtual network laboratory in six network

Multicore OSes: Looking Forward from 1991, er, 2011 David A. Holland and Margo I. Seltze (Harvard

OPEN PETASCALE LIBRARIES Advancing the development of numerical software for the new generation

A furtive fumble in Hard-Core Obscenity: the misuse of Template Meta-Programming to implement

2QBF workshop paper report: Graph Neural Network in the 2QBF Zhanfu Yang Content What is SAT

Safety and Security 1) Please silence your cell phones during the presentation. 2) Please make

Audience Segmentation and Messaging Worksheet Lets start by segmenting the audience to learn

Lunch & Learn | Phillip CFD FOR INTERNAL CIRCULATION ONLY Jasvind Singh CFD Dealer