using charm to support
play

Using Charm++ to Support you Multiscale Multiphysics On the - PowerPoint PPT Presentation

Los Alamos National Laboratory LA-UR-17-23218 Using Charm++ to Support you Multiscale Multiphysics On the Trinity Supercomputer nt wo Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by


  1. Los Alamos National Laboratory LA-UR-17-23218 Using Charm++ to Support you Multiscale Multiphysics On the Trinity Supercomputer nt wo Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

  2. Los Alamos National Laboratory Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) • ExMatEx was one of three* DOE Office of Science application co-design centers (2011-16) *Others are: CESAR (ANL/reactors), ExaCT (SNL-CA/combustion) • Large scale collaborations between national labs, academia, vendors • Coordinated with related DOE NNSA co-design efforts • Goal: to establish a relationships between algorithms, software stack, and architectures needed to enable exascale-ready science applications • Two ultimate objectives: • Identify the requirements for the exascale ecosystem that are necessary to perform computational materials science simulations (both single- and multi- scale) • Demonstrate and deliver a prototype scale-bridging materials science application based upon adaptive physics refinement 04/18/17 | 2

  3. Los Alamos National Laboratory Tabasco Test Problem: Modeling a Taylor cylinder impact test • The simple Taylor model cannot account for the twinning and anisotropy of a tantalum sample used in a LANL experiments (MST-8), and thus the final shapes do not match. • The physics goal of this demonstration is to show that the more accurate VPSC fine-scale model with an appropriate reduced- dimensionality (~60 degrees of freedom) model of texture can (qualitatively or quantitatively?) reproduce the experimental shape. P.J. Maudlin, J.F. Bingert, J.W. House, and S.R. Chen, Ta On the modeling of the Taylor cylinder impact test for orthotropic textured materials: experiments and simulations, Int. J. Plasticity 15 (2), 139–166 (1999). CoEVP 04/18/17 | 3

  4. Los Alamos National Laboratory Workflow Overview Eventually Node 1 Coarse-scale Subdomain 1 consistent Adaptive model DB Sampler distributed Subdomain 2 database DB Node N/2 Subdomain N-1 DB Adaptive DB Sampler Subdomain N DB On-demand fine FSM FSM FSM scale models 04/18/17 | 4

  5. Los Alamos National Laboratory Task-Based Scale-bridging Code (TaBaSCo) • Demonstrate the feasibility of at-scale heterogeneous computations composed of: • Coarse-scale Lagrangian hydrodynamics • Dynamically launched constitutive model calculations • Results of fine-scale evaluations stored for reuse in databases • Use of Taylor fine scale model evaluation • VPSC fine scale model evaluation • Adaptive sampling which queries the database, interpolates results, and decides when to spawn fine-scale evaluations • Combines an asynchronous task-based runtime environment with persistent database storage • Provides load-balancer and checkpoints for fault tolerant modes 04/18/17 | 5

  6. Los Alamos National Laboratory Tabasco Framework • Asynchronous Task-based runtimes explored • CHARM++ (built/ran on Trinity) • libCircle (built/ran on Darwin, but not Trinity) • MPI Task Pool (dual binary version built/ran on Darwin, single binary version ran for small examples on Trinity) • Nearest neighbor search • Mtree vs. FLANN (both worked on Trinity) • Database storage • In memory HashMap (was limited for long runs) • Posix (became our reliable database option for Trinity) • Posix/Data Warp (only worked for short runs on Trinity) • REDIS (never ran on Trinity) 04/18/17 | 6

  7. Los Alamos National Laboratory Chare Wrapper mapping Chare Dim Class Lib Resolution Migrate CoarseScaleModel 1D Lulesh MPI Rank N FineScaleModel 2D Constitutive/ CM Element Y ElastoPlasticity Evaluate 1D Taylor/VPSC CM Element Y NearestNeighbor 1D Approx CM/FLANN Request N Search NearestNeighbors /Mtree (Service) DBInterface 1D KrigingDataBase/ CM/Redis/ Read/Write N SingletonDB libhio/ (Service) POSIX 04/18/17 | 7

  8. Los Alamos National Laboratory Trinity: Advanced Technology System ( a mixture of Intel Haswell and Knights Landing (KNL) processors) 04/18/17 | 8

  9. Los Alamos National Laboratory Open Science Trinity Haswell nodes Port of TaBaSCo Coarse Scale (LULESH) Haswell nodes Burst Buffer nodes NoSQL Database Neighbor (Prior Fine Scale Results) Search Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 | 9

  10. Los Alamos National Laboratory Tabasco Weak and Strong Scaling Tabasco Weak Scaling • Weak scaling 180 160 • Brute force (w/o AS) and Adaptive 140 Runtime(s) 120 Sampling (w AS) 100 80 w AS • Edge=64, height=26-13,312, 60 w/o AS 46,592-23,855,104 elements 40 20 • Good till 128 nodes 0 1 4 16 64 256 • Communication overhead for Number of nodes Tabasco Strong Scaling >=256 (128x208) • Strong scaling 10000 • Brute force (w/o AS) and Adaptive 1000 Sampling (w AS) Runtime(s) 100 • Edge=128, height=208, 1,490,944 w/o AS elements 10 w AS 1 1 4 16 64 256 Number of nodes 04/18/17 | 10

  11. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 0 04/18/17 | 11

  12. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 500 04/18/17 | 12

  13. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 5000 04/18/17 | 13

  14. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 10,000 04/18/17 | 14

  15. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) – step 20,000 04/18/17 | 15

  16. Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) – step 22,000 T a 04/18/17 | 16

  17. Los Alamos National Laboratory Early Work in Hybrid Runs on Trinity • Trinity is a machine that was installed in two stages • 9408 compute nodes with Intel Haswell Processors • 32 CPU Cores per node, each with 2x hyperthreads • 9500 compute nodes with Intel Knights Landing processors • 68 cores per node • While similar, nodes have different strengths • Goal of Tabasco is to perform hybrid run in which both stages are utilized • Coarse scale solver and less compute intensive work on Haswell nodes • Fine grain solver on KNLs • Used Charm++’s Logical Machine Entities to identify KNLs and Haswells • And then used custom mapper to assign chares based on physical node type 04/18/17 | 17

  18. Los Alamos National Laboratory Open Science Trinity Haswell nodes Port of TaBaSCo Coarse Scale (LULESH) Haswell nodes Burst Buffer nodes NoSQL Database Neighbor (Prior Fine Scale Results) Search Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 | 18

  19. Los Alamos National Laboratory Proof of Concept Hybrid Run: Host Platform • Current Proof of Concept implementation running on Trinitite • Run with three types of node • Dual Socket Haswell ``Solver’’ Node • 32 MPI ranks per node • KNL ``Solver’’ Node • 64 MPI ranks per node • Dedicated Haswell ``Organizer’’ Node • 4 MPI ranks per node • Run on a subset of Trinitite • Goal was to work with the system stack and get initial performance results • Larger Runs Planned following unification of Trinity phases 04/18/17 | 19

  20. Los Alamos National Laboratory Proof of Concept Hybrid Run: Simulation Configuration • Restricted to ``Taylor’’ Solver • No Adaptive Sampling • Maximize Work, Minimize Communication • Re-used a run from Open Science Phase 1 • 48 Edge Elements • 104 Height Elements • 4 Domains (Coarse Scale Chares) • 34944 Fine Scale Evaluations • 100 Time Steps 04/18/17 | 20

  21. Los Alamos National Laboratory Proof of Concept Hybrid Run Results: Raw Execution Time 04/18/17 | 21

  22. Los Alamos National Laboratory Approximation of Energy Savings • Used very rough approximates to estimate energy savings of hybrid run • Execution times of Proof of Concept runs • TDP from spec sheets for each processor 𝐹 = TDP ∗ ExecutionTime • Don’t do this • Assumed all else the same 04/18/17 | 22

  23. Los Alamos National Laboratory Very Early TDP-Based Energy Results 04/18/17 | 23

  24. Los Alamos National Laboratory Energy Savings Through Use of KNL Solvers 04/18/17 | 24

  25. Los Alamos National Laboratory Questions? 04/18/17 | 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend