q q uino a uino a
play

Q Q UINO A UINO A April 18, 2017 - PowerPoint PPT Presentation

Quinoa: Adaptive Computational Fluid Dynamics J. Bakosi , R. Bird, C. Junghans, R. Pavel, J. Waltz Los Alamos National Laboratory F. Gonzalez B. Rogers University of Illinois Urbana-Champaign University of Tennessee Q Q UINO A UINO A April


  1. Quinoa: Adaptive Computational Fluid Dynamics J. Bakosi , R. Bird, C. Junghans, R. Pavel, J. Waltz Los Alamos National Laboratory F. Gonzalez B. Rogers University of Illinois Urbana-Champaign University of Tennessee Q Q UINO A UINO A April 18, 2017 https://github.com/quinoacomputing/quinoa Goal: hardware-adaptive large-scale multiphysics ◮ Fluid dynamics, turbulence, particle transport, chemistry, plasma physics of non-ideal multiple mixing materials ◮ Automatic dynamic computational load redistribution for real-world problems ◮ Preserving the domain scientist’s sanity Agenda: ◮ Philosophy ◮ Infrastructure ◮ Two tools: particle solver, unstructured-grid PDE solver LA-UR-17-22931 ◮ Future plan

  2. Philosophy ◮ Partition everything ◮ Be asynchronous everywhere ◮ Automate everything ◮ Remember that everything fails Strategy ◮ Most physics codes start with capability then software engineering is an afterthought ◮ We start with a state-of-the-art production code then put in physics ◮ From scratch: not based on existing code ◮ C++11 & Charm++ (fully asynchronous, distributed-memory parallel) Funding & history ◮ Started as a hobby project in 2013 (weekends and nights) ◮ First funding: Oct 2016 Work in progress

  3. Infrastructure ◮ 46K lines of code ◮ 20+ third-party libraries, 3 compilers ◮ Unit-, and regression tests ◮ Open source: https://github.com/quinoacomputing/quinoa ◮ Continuous integration (build & test matrix) with Travis & TeamCity ◮ Continuous quantified test code coverage with Gcov & CodeCov.io ◮ Continuous quantified documentation coverage with CodeCov.io ◮ Continuous static analysis with CppCheck & SonarQube ◮ Continuous deployment (of binary releases) to DockerHub Ported to Linux, Mac, Cray (LANL, NERSC), Blue Gene/Q (ANL)

  4. Current tools 1. walker – Random walker for stochastic differential equations 2. inciter – Partial differential equations solver on 3D unstructured grids 3. rngtest – Random number generator test suite 4. unittest – Unit test suite 5. meshconv – Mesh file converter

  5. Quinoa::Walker ◮ Particle solver ◮ Numerical integrator for stochastic differential equations ◮ Used to analyze and design the evolution of fluctuating variables and their statistics ◮ Used in production for the design of statistical moment approximations required for modeling mixing materials in turbulence ◮ Future plan: Predict the probability density function in turbulent flows N − 1 N − 1 N − 1 ∂ 2 + 1 ∂ ∂ � � � � � � � ∂tF ( Y , t ) = − A α ( Y , t ) F ( Y , t ) B αβ ( Y , t ) F ( Y , t ) 2 ∂Y α ∂Y α ∂Y β α =1 α =1 β =1 N � d Y α ( t ) = A α ( Y , t )d t + b αβ ( Y , t )d W β ( t ) , α = 1 , . . . , N, B αβ = b αγ b γβ β =1

  6. Walker SDAG for each PE CenM OutS OrdM CenP OutP EvT AdvP OrdP NoSt AdvP – advance particles OrdM – estimate ordinary moments CenM – estimate central moments, e.g., � y − � Y �� 2 OutS – output statistical moments EvT – evaluate time step OrdP – estimate ordinary PDFs CenP – estimate central PDFs, e.g., F ( y − � Y � ) OutP – output PDFs NoSt – no stats, nor PDFs src/Walker/distributor.ci

  7. 9 particles Walker weak scaling with up to 3x10 1000 ideal 240 1200 2400 Wall clock time, sec 800 24000 600 12000 400 200 0 2 3 4 5 10 10 10 10 Number of CPU cores (24/node)

  8. Quinoa::Walker future plan 0.5 PDF, A=0.05 Equilibrium flow ◮ Goal: Predict the probability density function in PDF, A=0.25 Fully developed turbulence PDF, A=0.5 (Models exist) 0.4 DNS, A=0.05 DNS, A=0.25 turbulent flows turbulent kinetic energy DNS, A=0.5 A 0.3 ◮ Why: Because it requires less approximations g ◮ How: Integrate a large particle ensemble governed by light 0.2 heavy stochastic differential equations 0.1 Non−equilibrium flow Laminar−turbulent transition ◮ The ensemble represents the fluid itself (No models, very difficult to predict) 0 0 5 10 15 20 ◮ Statistics and the discrete PDF extracted from the time ensemble in cells 5 PDF, t=0 DNS, t=0 A = 0.5 PDF, t=1.7 DNS, t=1.7 PDF, t=2.4 DNS, t=2.4 4 PDF, t=2.5 DNS, t=2.5 PDF, t=3.0 DNS, t=3.0 PDF, t=3.8 DNS, t=3.8 3 probability 2 1 0 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 density

  9. Quinoa::Inciter ◮ PDE solver for 3D unstructured (tet-only) grids ◮ Native Charm++ code using MPI-only libs: hypre , Zoltan2 ◮ Simple Navier-Stokes solver for compressible flows ◮ Finite elements ◮ Flux-corrected transport ◮ Asynchronous linear system assembly ◮ File/PE I/O ◮ Current work: adaptive mesh refinement, V&V ◮ Future plan: use AMR to explore scalability with large load-imbalances

  10. Flux-corrected transport ◮ Used when stuff (e.g., energy) moves from A to B (i.e., all the time) ◮ Godunov theorem: No linear scheme of order greater than one will yield monotonic (wiggle-free) numerical solutions. ◮ A solution: Use a nonlinear scheme ◮ Combine a low-order (guaranteed to be monotonic) with a high-order (more accurate) scheme in a nonlinear fashion exact low-order high-order FCT

  11. Matrix assembly Matrix distributed across PEs (Charm++ group) L1 C2 C1 L2 C3 C5 L1,L2,... − LinSysMerger Charm++ group elements C4 − interact with MPI−only linear system solver lib − do not migrate L3 C1,C2,... − Carrier worker Charm++ array elements C7 C6 − perform heavy−lifting of physics − migrate (not yet but will) C9 C8

  12. Inciter SDAG for each PE ChRow – chares contribute their global row IDs ChBC – chares contribute their BC node IDs RowComplete – all groups have finished their row IDs Init – chares initialize dt – chares compute their next ∆ t Aux – Low order solution Solve – Call hypre to solve linear system Asm* – Assemble RHS/LHS/UNK Hypre* – Convert RHS/LHS/UNK to hypre data structure src/LinSys/linsysmerger.ci

  13. 4 10 Compressible Navier-Stokes, 794M (setup, 100 time steps, no I/O) 900 Navier-Stokes, RCB Navier-Stokes, MJ ideal Wall clock time, sec 1800 2520 3 10 3600 7200 14400 21600 36000 2 10 ~50Kel/PE 1 10 2 3 4 5 10 10 10 10 Number of CPU cores (36/node)

  14. Quinoa::Inciter future plan ◮ Now: Distributed-memory-parallel asynchronous AMR ◮ Next: Explore scalability with large load-imbalances (migration) ◮ Future: ◮ Asynchronous I/O ◮ Explore various threading and SIMD abstractions ◮ Explore CERN’s ROOT framework for data storage, statistical analysis, and visualization ◮ Fault tolerance Waltz, Int. J. Numer. Meth. Fluids, 2004.

  15. Acknowledgments TPLs: Charm++, Parsing Expression Grammar Template Library, C++ Template Unit Test Framework, Boost, Cartesian product, PStreams, HDF5, NetCDF, Trilinos: SEACAS, Zoltan2, Hypre, RNGSSE2, TestU01, PugiXML, BLAS, LAPACK, Adaptive Entropy Coding library, libc++, libstdc++, MUSL libc, OpenMPI, Intel Math Kernel Library, H5Part, Random123 Compilers: Clang, GCC, Intel Tools: Git, CMake, Doxygen, Ninja, Gold, Gcov, Lcov, NumDiff

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend