kokkos implementation of albany
play

Kokkos Implementation of Albany: you Towards Performance Portable - PowerPoint PPT Presentation

S Los Alamos National Laboratory LA-UR-16-22225 Kokkos Implementation of Albany: you Towards Performance Portable e Finite Element Code logo and delete wo e I. Demeshko, O. Guba, R. P. Pawlowski, A. G. Salinger, W. F. Spotz and I. K.


  1. S Los Alamos National Laboratory LA-UR-16-22225 Kokkos Implementation of Albany: you Towards Performance Portable e Finite Element Code logo and delete wo e I. Demeshko, O. Guba, R. P. Pawlowski, A. G. Salinger, W. F. Spotz and I. K. Tezaur, is M.A. Heroux 04/07/2016 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

  2. Los Alamos National Laboratory Performance Portability 04/07/16 | 2

  3. Los Alamos National Laboratory Performance Portability 04/07/16 | 3

  4. Los Alamos National Laboratory Performance Portability EXASCALE SYSTEM new programming models new architecture new libraries 04/07/16 | 4

  5. Los Alamos National Laboratory 5 04/07/16 | 5

  6. Los Alamos National Laboratory 04/07/16 | 6

  7. Los Alamos National Laboratory Albany : agile component-based par Albany : agile component-based parallel unstructur allel unstructured ed mesh application mesh applica tion • A finite element based application development environment containing the "typical" building blocks needed for rapid deployment and prototyping of analysis capabilities A Trilinos demonstra/on applica/on, built almost exclusively from reusable libraries. § Albany leverages 100+ packages/libraries. Albany Structure: Software Quality Tools Libraries Demo Apps Open-source Interface § s Analysis Tools Version Control Main Build System Optimization Strategic Goal: Input Parser Regression Testing UQ To enable the Rapid Nonlinea Problem Application Mesh Tools r Model Discretizatio development of new n Mesh Database Solvers Production codes Nonlinear Interoperability Mesh I/O Transient Use Case embedded with Load Balancing Transformational Linear Solve ManyCore Node PDE Assembly capabilities. Linear Solvers Node Kernels Field Manager Iterative PDE Terms Multi-Core Discretization Multi-Level Accelerators 04/07/16 | 7

  8. Los Alamos National Laboratory supports a wide variety of application physics areas Heat transfer Fluid dynamics Quantum device modeling Structural mechanics Climate modeling 04/07/16 | 8

  9. Los Alamos National Laboratory Albany te Albany team is R am is Rapidly Developing Sever apidly Developing Several New al New Component-Based omponent-Based Applica Applications tions 1. Turbulent CFD for nuclear energy [NE] 2. Computational mechanics R&D [ASC] 3. Quantum device design [LDRD] 4. Extended MHD [ASCR] 5. CRADA partner’s in-house code [CRADA] 6. Peridynamics solver [ASC] 7. Biogeochemical element cycling: climate [SciDAC] Temperature Strain 8. Fuel rod degradation modeling [NE] 9. Ice Sheet dynamics [SciDAC] 10. Atmospheric Dynamics [LDRD] + Impacting Many Others Codes are born: parallel, scalable, robust, with sensitivities, optimization, UQ … and ready to adopt: embedded UQ, multi- core kernels, adaptivity, code coupling, ROM 04/07/16 | 9

  10. Los Alamos National Laboratory Our goal: Our goal: To create an architecture-portable version of Albany by using Kokkos library. 04/07/16 | 10

  11. Los Alamos National Laboratory Albany to Albany to Kokk okkos os r refactoring efactoring Albany Library of interoperable tools for manages dependencies between compatible discretizations of different components of the Albany and Partial Differential Equations implements linear algebra objects , manages data in the code. including sparse graphs, sparse matrices, and dense vectors. Phalanx Intrepid Tpetra Piro Kokkos MueLu Trilinos 04/07/16 | 11

  12. Los Alamos National Laboratory • A new Albany-Kokkos implementation: • has Kokkos::Views at the base layer • has Kokkos::Vew –like temporary data • has Kokkos kernels in replacement of original nested loops • is a single code base that runs and is performant on diverse HPC architectures 04/07/16 | 12

  13. Los Alamos National Laboratory FELIX: FELIX: Albany Gr Albany Greenland Ice Sheet model eenland Ice Sheet model 04/07/16 | 13

  14. Los Alamos National Laboratory Albany FELIX Albany FELIX pr project oject An unstructured-grid finite element ice sheet code for • land-ice modeling (Greenland, Antarc/ca). Project objec*ve: • Provide sea level rise predic/on • Run on new architecture machines (hybrid systems). • – 50% *me spent in FE Assembly – 50% /me spent in Linear Solves Funding Source: SciDAC Collaborators: SNL, ORNL, LANL, LBNL, UT, FSU, SC, MIT, NCAR Sandia Staff: A. Salinger, I. Kalashnikova, M. Perego, R. Tuminaro, J. Jakeman, M. Eldred 04/07/16 | 14

  15. Los Alamos National Laboratory Phalanx graph for the Greenland Ice-Sheet model Scatter Stokes 10:9 Stokes Resid 9:5 9:8 9:3 9:4 ViscosityFO Stokes BodyForce 5:4 9:2 8:7 VecInterpolation VecGradInterpolation GradInterpolation 3:0 4:0 3:2 4:2 7:2 7:6 Gather Solution Compute Basis Functions Load State Field 2:1 Gather Coordinate Vector 04/07/16 | 15

  16. Los Alamos National Laboratory Kokkos implementation (Greenland Ice-Sheet model) Loop over the number of worksets Copy solution vector to the Device Scatter Stokes Device: 10:9 Stokes Resid 9:5 9:8 9:3 9:4 ViscosityFO Stokes BodyForce 5:4 9:2 8:7 VecInterpolation VecGradInterpolation GradInterpolation 3:0 4:0 3:2 4:2 7:2 7:6 Gather Solution Compute Basis Functions Load State Field 2:1 Gather Coordinate Vector Copy residual vector to the Host 04/07/16 | 16

  17. Los Alamos National Laboratory Kokkos functor example in Albany 04/07/16 | 17

  18. Los Alamos National Laboratory FELIX Performance results Evaluation environment: Shannon: 32 nodes: Two 8-core Sandy Bridge Xeon E5-2670 @ 2.6GHz (HT deactivated) per node, 128GB DDR3 memory per node, 2x NVIDIA K20x/k40 per node Serial=2 MPI processes OpenMP=16 OpenMP threads CUDA=1 Nvidia K80 GPU UVM for CPU-GPU data management 04/07/16 | 18

  19. Los Alamos National Laboratory FELIX performance results Evaluation environment: TITAN: 18,688 AMD Opteron nodes: • 16 cores per node, • 1 K20X Kepler GPUS per node, • 32GB + 6GB memory per node 04/07/16 | 19

  20. Los Alamos National Laboratory 04/07/16 | 20

  21. Los Alamos National Laboratory • Next generation global atmosphere model. • Numerics are similar to the Community Atmosphere Model - Spectral Elements (CAM-SE) • Model development: shallow water, X-Z hydrostatic, 3D hydrostatic, clouds, 3D non-hydrostatic 04/07/16 | 21

  22. Los Alamos National Laboratory Aeras performance results Evaluation environment: Shannon: Aeras compute !me (Total !me- Gather/Sca<er) Aeras total !me 32 nodes: 100.0 200.0 Two 8-core Sandy Bridge Xeon 90.0 180.0 80.0 E5-2670 @ 2.6GHz (HT 160.0 140.0 70.0 deactivated) per node, !me, sec 120.0 60.0 !me,sec 128GB DDR3 memory per 100.0 50.0 80.0 40.0 node, 60.0 30.0 2x NVIDIA K20x/k40 per node 40.0 20.0 20.0 10.0 0.0 0.0 100 1000 10000 100000 100 1000 10000 100000 #of elements per workset #of lements per workset Serial - 1 MPI thread per node OpenMP - 16 OpenMP threads per node CUDA - 1 NVIDIA K80 GPU per node 04/07/16 | 22

  23. Los Alamos National Laboratory Aeras performance results Evaluation environment: TITAN: 18,688 AMD Opteron nodes: • 16 cores per node, • 1 K20X Kepler GPUS per node, • 32GB + 6GB memory per node 04/07/16 | 23

  24. Los Alamos National Laboratory Conclusion • New version of Albany provides architecture-portability; • Our numerical experiments on two climate applications implemented in Albany show that: (1) a single code can execute correctly in several evaluation environments (MPI, OpenMP, CUDAUVM), and (2) reasonable performance is achieved across the different architectures without implicit data management: speed-ups using OpenMP and GPUs can be achieved over an MPI-only run; 04/07/16 | 24

  25. Los Alamos National Laboratory Acknowledgments I would like to thank: • C. R. Trott and H.C. Edwards for their help with Kokkos, • Adam V. Delora for his work on Intrepid, • Eric T. Phipps, Eric C. Cyr and Andrew Bradley for their help with Trilinos and Albany, • Steve Price and Matt Hoffman and Mauro Perego for providing the data used in the FELIX land-ice runs. 04/07/16 | 25

  26. Los Alamos National Laboratory Thank you! irina@lanl.gov 04/07/16 | 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend