Using Charm++ to Support you Multiscale Multiphysics On the - PowerPoint PPT Presentation

Los Alamos National Laboratory LA-UR-17-23218 Using Charm++ to Support you Multiscale Multiphysics On the Trinity Supercomputer nt wo Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

Los Alamos National Laboratory Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) • ExMatEx was one of three* DOE Office of Science application co-design centers (2011-16) *Others are: CESAR (ANL/reactors), ExaCT (SNL-CA/combustion) • Large scale collaborations between national labs, academia, vendors • Coordinated with related DOE NNSA co-design efforts • Goal: to establish a relationships between algorithms, software stack, and architectures needed to enable exascale-ready science applications • Two ultimate objectives: • Identify the requirements for the exascale ecosystem that are necessary to perform computational materials science simulations (both single- and multiscale) • Demonstrate and deliver a prototype scale-bridging materials science application based upon adaptive physics refinement 04/18/17 | 2

Los Alamos National Laboratory Tabasco Test Problem: Modeling a Taylor cylinder impact test • The simple Taylor model cannot account for the twinning and anisotropy of a tantalum sample used in a LANL experiments (MST-8), and thus the final shapes do not match. • The physics goal of this demonstration is to show that the more accurate VPSC fine-scale model with an appropriate reduced- dimensionality (~60 degrees of freedom) model of texture can (qualitatively or quantitatively?) reproduce the experimental shape. P.J. Maudlin, J.F. Bingert, J.W. House, and S.R. Chen, Ta On the modeling of the Taylor cylinder impact test for orthotropic textured materials: experiments and simulations, Int. J. Plasticity 15 (2), 139–166 (1999). CoEVP 04/18/17 | 3

Los Alamos National Laboratory Workflow Overview Eventually Node 1 Coarse-scale Subdomain 1 consistent Adaptive model DB Sampler distributed Subdomain 2 database DB Node N/2 Subdomain N-1 DB Adaptive DB Sampler Subdomain N DB On-demand fine FSM FSM FSM scale models 04/18/17 | 4

Los Alamos National Laboratory Task-Based Scale-bridging Code (TaBaSCo) • Demonstrate the feasibility of at-scale heterogeneous computations composed of: • Coarse-scale Lagrangian hydrodynamics • Dynamically launched constitutive model calculations • Results of fine-scale evaluations stored for reuse in databases • Use of Taylor fine scale model evaluation • VPSC fine scale model evaluation • Adaptive sampling which queries the database, interpolates results, and decides when to spawn fine-scale evaluations • Combines an asynchronous task-based runtime environment with persistent database storage • Provides load-balancer and checkpoints for fault tolerant modes 04/18/17 | 5

Los Alamos National Laboratory Tabasco Framework • Asynchronous Task-based runtimes explored • CHARM++ (built/ran on Trinity) • libCircle (built/ran on Darwin, but not Trinity) • MPI Task Pool (dual binary version built/ran on Darwin, single binary version ran for small examples on Trinity) • Nearest neighbor search • Mtree vs. FLANN (both worked on Trinity) • Database storage • In memory HashMap (was limited for long runs) • Posix (became our reliable database option for Trinity) • Posix/Data Warp (only worked for short runs on Trinity) • REDIS (never ran on Trinity) 04/18/17 | 6

Los Alamos National Laboratory Chare Wrapper mapping Chare Dim Class Lib Resolution Migrate CoarseScaleModel 1D Lulesh MPI Rank N FineScaleModel 2D Constitutive/ CM Element Y ElastoPlasticity Evaluate 1D Taylor/VPSC CM Element Y NearestNeighbor 1D Approx CM/FLANN Request N Search NearestNeighbors /Mtree (Service) DBInterface 1D KrigingDataBase/ CM/Redis/ Read/Write N SingletonDB libhio/ (Service) POSIX 04/18/17 | 7

Los Alamos National Laboratory Trinity: Advanced Technology System ( a mixture of Intel Haswell and Knights Landing (KNL) processors) 04/18/17 | 8

Los Alamos National Laboratory Open Science Trinity Haswell nodes Port of TaBaSCo Coarse Scale (LULESH) Haswell nodes Burst Buffer nodes NoSQL Database Neighbor (Prior Fine Scale Results) Search Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 | 9

Los Alamos National Laboratory Tabasco Weak and Strong Scaling Tabasco Weak Scaling • Weak scaling 180 160 • Brute force (w/o AS) and Adaptive 140 Runtime(s) 120 Sampling (w AS) 100 80 w AS • Edge=64, height=26-13,312, 60 w/o AS 46,592-23,855,104 elements 40 20 • Good till 128 nodes 0 1 4 16 64 256 • Communication overhead for Number of nodes Tabasco Strong Scaling >=256 (128x208) • Strong scaling 10000 • Brute force (w/o AS) and Adaptive 1000 Sampling (w AS) Runtime(s) 100 • Edge=128, height=208, 1,490,944 w/o AS elements 10 w AS 1 1 4 16 64 256 Number of nodes 04/18/17 | 10

Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 0 04/18/17 | 11

Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Sampling) on Trinity (512 nodes) – step 10,000 04/18/17 | 14

Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) – step 20,000 04/18/17 | 15

Los Alamos National Laboratory Tabasco Brute Force (w/o Adaptive Samp.) on Trinity (512 nodes) – step 22,000 T a 04/18/17 | 16

Los Alamos National Laboratory Early Work in Hybrid Runs on Trinity • Trinity is a machine that was installed in two stages • 9408 compute nodes with Intel Haswell Processors • 32 CPU Cores per node, each with 2x hyperthreads • 9500 compute nodes with Intel Knights Landing processors • 68 cores per node • While similar, nodes have different strengths • Goal of Tabasco is to perform hybrid run in which both stages are utilized • Coarse scale solver and less compute intensive work on Haswell nodes • Fine grain solver on KNLs • Used Charm++’s Logical Machine Entities to identify KNLs and Haswells • And then used custom mapper to assign chares based on physical node type 04/18/17 | 17

Los Alamos National Laboratory Open Science Trinity Haswell nodes Port of TaBaSCo Coarse Scale (LULESH) Haswell nodes Burst Buffer nodes NoSQL Database Neighbor (Prior Fine Scale Results) Search Haswell nodes Neighbor Interpolation Haswell (Phase 1) or KNL (Phase 2) nodes Fine Scale and Evaluate (Taylor/VPSC) 04/18/17 | 18

Los Alamos National Laboratory Proof of Concept Hybrid Run: Host Platform • Current Proof of Concept implementation running on Trinitite • Run with three types of node • Dual Socket Haswell ``Solver’’ Node • 32 MPI ranks per node • KNL ``Solver’’ Node • 64 MPI ranks per node • Dedicated Haswell ``Organizer’’ Node • 4 MPI ranks per node • Run on a subset of Trinitite • Goal was to work with the system stack and get initial performance results • Larger Runs Planned following unification of Trinity phases 04/18/17 | 19

Los Alamos National Laboratory Proof of Concept Hybrid Run: Simulation Configuration • Restricted to ``Taylor’’ Solver • No Adaptive Sampling • Maximize Work, Minimize Communication • Re-used a run from Open Science Phase 1 • 48 Edge Elements • 104 Height Elements • 4 Domains (Coarse Scale Chares) • 34944 Fine Scale Evaluations • 100 Time Steps 04/18/17 | 20

Los Alamos National Laboratory Proof of Concept Hybrid Run Results: Raw Execution Time 04/18/17 | 21

Los Alamos National Laboratory Approximation of Energy Savings • Used very rough approximates to estimate energy savings of hybrid run • Execution times of Proof of Concept runs • TDP from spec sheets for each processor 𝐹 = TDP ∗ ExecutionTime • Don’t do this • Assumed all else the same 04/18/17 | 22

Los Alamos National Laboratory Very Early TDP-Based Energy Results 04/18/17 | 23

Los Alamos National Laboratory Energy Savings Through Use of KNL Solvers 04/18/17 | 24

Los Alamos National Laboratory Questions? 04/18/17 | 25

Using Charm++ to Support you Multiscale Multiphysics On the - PowerPoint PPT Presentation

Los Alamos National Laboratory LA-UR-17-23218 Using Charm++ to Support you Multiscale Multiphysics On the Trinity Supercomputer nt wo Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by

Recent Results in Charm Physics Recent Results in Charm Physics Topics Topics Rare Charm

State of Charm++ Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu

Charm++ Interoperability Nikhil Jain Charm Workshop - 2013 1 Monday, April 15, 13 1

Charm physics and XYZ states at BESIII Evgeny BOGER JINR Dubna On behalf of BESIII

How to Write a Parallel GPU Application Using CUDA and Charm++ Presented by Lukasz Wesolowski

Combination and QCD Analysis of Charm Production Cross Section Measurements in DIS at HERA Kenan

CHARM Community Health And Resources Management A Scenario Planning Mapping Tool Yu Wen Chou

CHARM: Cassini-Huygens Mission to Saturn 10 th Anniversary!! Titan Highlights Zibi Turtle,

Charm and and bottom bottom Heavy baryon Heavy baryon Charm mass spectrum from from mass

relaxation time on the quenched lattice Atsuro Ikeda, Masayuki Asakawa, Masakiyo Kitazawa Osaka

Charm++ as an Energy Efficient Runtime 1 4/18/17 BILGE ACUN - CHARM++ WORKSHOP 2017 Interaction

CHARM 2016 @ Bologna Italy Angelo Carbone on behalf of Department of Physics CHARM 2015 and

BigSim Tutorial Presented by Eric Bohm Charm++ Workshop 2008 Parallel Programming Laboratory

A Parallel Union-Find Library in Charm ++ Karthik Senthil Parallel Programming Laboratory

Review and Update on Wilhelm Hasselbring 1 & Andr van Hoorn 2 1 Kiel University (CAU)

SAT based Abstraction-Refinement using ILP and Machine Learning Techniques Edmund Clarke Anubhav

Complexity of distributions and average-case hardness Dmitry Sokolov joint work with Dmitry

Interface builder functions Building Web Applications in R with Shiny tags > names(tags)

HTML and CSS basics Lecture 2 CGS 3066 Fall 2016 September 15, 2016 Basics - Frimly Grasp It!!

Making Sense of Suppressions and Failures in Sensor Data: A Bayesian Approach Adam Silberstein

Varieties of De Morgan Monoids T. Moraschini 1 , J.G. Raftery 2 , and J.J. Wannenburg 2 1 Academy

Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop

Using Charm++ to Support you Multiscale Multiphysics On the - PowerPoint PPT Presentation

Los Alamos National Laboratory LA-UR-17-23218 Using Charm++ to Support you Multiscale Multiphysics On the Trinity Supercomputer nt wo Robert Pavel, Christoph Junghans, Susan M. Mniszewski, Timothy C. Germann April, 18 th 2017 Operated by

Recent Results in Charm Physics Recent Results in Charm Physics Topics Topics Rare Charm

State of Charm++ Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu

Charm++ Interoperability Nikhil Jain Charm Workshop - 2013 1 Monday, April 15, 13 1

Charm physics and XYZ states at BESIII Evgeny BOGER JINR Dubna On behalf of BESIII

How to Write a Parallel GPU Application Using CUDA and Charm++ Presented by Lukasz Wesolowski

Combination and QCD Analysis of Charm Production Cross Section Measurements in DIS at HERA Kenan

CHARM Community Health And Resources Management A Scenario Planning Mapping Tool Yu Wen Chou

CHARM: Cassini-Huygens Mission to Saturn 10 th Anniversary!! Titan Highlights Zibi Turtle,

Charm and and bottom bottom Heavy baryon Heavy baryon Charm mass spectrum from from mass

relaxation time on the quenched lattice Atsuro Ikeda, Masayuki Asakawa, Masakiyo Kitazawa Osaka

Charm++ as an Energy Efficient Runtime 1 4/18/17 BILGE ACUN - CHARM++ WORKSHOP 2017 Interaction

CHARM 2016 @ Bologna Italy Angelo Carbone on behalf of Department of Physics CHARM 2015 and

BigSim Tutorial Presented by Eric Bohm Charm++ Workshop 2008 Parallel Programming Laboratory

A Parallel Union-Find Library in Charm ++ Karthik Senthil Parallel Programming Laboratory

Review and Update on Wilhelm Hasselbring 1 &amp; Andr van Hoorn 2 1 Kiel University (CAU)

SAT based Abstraction-Refinement using ILP and Machine Learning Techniques Edmund Clarke Anubhav

Complexity of distributions and average-case hardness Dmitry Sokolov joint work with Dmitry

Interface builder functions Building Web Applications in R with Shiny tags &gt; names(tags)

HTML and CSS basics Lecture 2 CGS 3066 Fall 2016 September 15, 2016 Basics - Frimly Grasp It!!

Making Sense of Suppressions and Failures in Sensor Data: A Bayesian Approach Adam Silberstein

Varieties of De Morgan Monoids T. Moraschini 1 , J.G. Raftery 2 , and J.J. Wannenburg 2 1 Academy

Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop

Review and Update on Wilhelm Hasselbring 1 & Andr van Hoorn 2 1 Kiel University (CAU)

Interface builder functions Building Web Applications in R with Shiny tags > names(tags)