LAMMPS Dr. Richard Berger High-Performance Computing Group College - PowerPoint PPT Presentation

Latin American Introductory School on Parallel Programming and Parallel Architecture for High-Performance Computing LAMMPS Dr. Richard Berger High-Performance Computing Group College of Science and Technology Temple University Philadelphia, USA richard.berger@temple.edu

Outline Introduction Core Algorithms Geometric/Spatial Domain Decomposition Hybrid MPI+OpenMP Parallelization

What is LAMMPS? ◮ Classical Molecular-Dynamics Code ◮ Open-Source, highly portable C++ ◮ Freely available for download under GPL ◮ Atomistic, mesoscale, and coarse-grain ◮ Easy to download, install, and run simulations ◮ Well documented ◮ Variety of potentials (including ◮ Easy to modify and extend with new many-body and coarse-grain) features and functionality ◮ Variety of boundary conditions, ◮ Active user’s email list with over 650 constraints, etc. subscribers ◮ Developed by Sandia National ◮ More than 1000 citations/year Laboratories and many collaborators, such as Temple University

LAMMPS Development Pyramid “the big boss” Steve Plimpton core developers 2x @Sandia, 2x @Temple core functionality, maitainance, integration package maintainers > 30, mostly user pkgs, some core single/few style contributors > 100, user-misc and others Feedback from mailinglist, GitHub Issues

LAMMPS Use Cases (a) Solid Mechanics (c) Chemistry (b) Material Science (e) Granular Flow (d) Biophysics

What is Molecular Dynamics? Initial Position and Velocities Positions and MD Engine Velocities at many later times Interatomic Potential Mathematical Formulation ◮ classical mechanics d r i dt = v i ◮ atoms are point masses m i d v i dt = F i ◮ positions, velocities, forces: r i , v i , f i m i ◮ Potential Energy Functions: V ( r N ) F i = − d r N � � V ◮ 6N coupled ODEs d r i

Simulation of Liquid Argon with Periodic Boundary Conditions

Basic Structure Setup ◮ Setup domain & read in parameters and initial conditions ◮ Propagate system state over multiple Run time steps

Basic Structure Setup Update Forces Each time step consists of ◮ Computing forces on all atoms ◮ Integrate equations of motion (EOMs) Integrate EOMs ◮ Output data to disk and/or screen Output

Velocity-Verlet Integration Setup ◮ By default, Velocity-Verlet integration scheme is Integration Step 1 used in LAMMPS to propagate the positions of atoms Update Forces 1. Propagate all velocities for half a time step and all positions for a full time step Integration Step 2 2. Compute forces on all atoms to get accelerations 3. Propagate all velocities for half a time step 4. Output intermediate results if needed Output

Force Computation Setup Pairwise Interactions The total force acting on each atom i is the sum of all Integration Step 1 pairwise interactions with atoms j : Update Forces F i = ∑ F ij j � = i Integration Step 2 Cost Output With n atoms the total cost of computing all forces F ij would be O( n 2 )

Force Computation ◮ cost of each individual force computation depends on selected interaction models ◮ many models operate using a cutoff distance r c , beyond which the force contribution is zero Lennard-Jones pairwise additive interaction: � � 7 �  � 13 � � − 12 σ + 6 σ 4 ε r ij < r c  r ij r ij F ij = r ij ≥ r c  0

Reducing the number of forces to compute Verlet-Lists (aka. Neighbor Lists) Using Newton’s Third Law of Motion ◮ each atom stores a list of neighboring ◮ Whenever a first body exerts a force F atoms within a cutoff radius (larger than on a second body, the second body force cutoff) exerts a force − F on the first body. ◮ this list is valid for multiple time steps ◮ if we compute F ij , we already know F ji ◮ only forces between an atom and its F ji = − F ij neighbors are computed ◮ ⇒ We can cut our force computations in half! ◮ Neighbor lists only need to be half size

Reducing the number of forces to compute Verlet-Lists (aka. Neighbor Lists) Using Newton’s Third Law of Motion ◮ each atom stores a list of neighboring ◮ Whenever a first body exerts a force F atoms within a cutoff radius (larger than on a second body, the second body force cutoff) exerts a force − F on the first body. ◮ this list is valid for multiple time steps ◮ if we compute F ij , we already know F ji ◮ only forces between an atom and its F ji = − F ij neighbors are computed ◮ ⇒ We can cut our force computations Note: in half! Finding neighbors is still an O ( n 2 ) operation! ◮ Neighbor lists only need to be half size But we can do better. . .

Cell List Algorithm ◮ We want to compute the forces acting on the red atom

Cell List Algorithm ◮ Without any optimization, we would have look at all the atoms in the domain

Cell List Algorithm ◮ When using Cell Lists we divide our domain into equal-size cells ◮ The cell size is proportional to the force cut-off

Cell List Algorithm ◮ Each atom is part of one cell

Cell List Algorithm ◮ Because of the size of each cell, we can assume any neighbor must be within the surrounding cells of an atom’s parent cell

Cell List Algorithm ◮ Only a stencil of neighboring cells is searched when building an atom’s neighbor list: ◮ 9 cells in 2D ◮ 27 cells in 3D ◮ To avoid corner cases additional cells are added to the data structure which allows using the same stencil for all cells. y cell of atom stencil of surrounding cells domain cells additional cells x

Finding Neighbors Setup Integration Step 1 Neighbor List Building ◮ Combination of Cell-List and Verlet-List algorithm ◮ Reduces the number of atom pairs which have to Update Forces be traversed Integration Step 2 Output

Improving caching efficiency Setup Integration Step 1 Spatial Sorting ◮ atom data is periodically sorted ◮ atoms close to each other are placed in nearby Neighbor List Building memory blocks ◮ this can be efficently implemented by sorting by Update Forces cells ◮ this improves cache efficiency during traversal Integration Step 2 Output

Geometric/Spatial Domain Decomposition ◮ LAMMPS uses spatial decomposition to scale over many thousands of cores

Geometric/Spatial Domain Decomposition ◮ the simulation box is split into multiple A B parts across available dimensions

Geometric/Spatial Domain Decomposition ◮ each MPI process is responsible for computations on atoms within its subdomain A B ◮ each subdomain is extended with halo regions which duplicates information from adjacent subdomains

Geometric/Spatial Domain Decomposition ◮ each process only stores owned atoms ghost and ghost atoms owned atom: process is responsible for computation and update of A B atom properties ghost atom: atom information comes from another process and is synchronized before each time step

Geometric/Spatial Domain Decomposition ◮ each process only stores owned atoms and ghost atoms owned atom: process is responsible for computation and update of A B atom properties ghost atom: atom information comes from another process and is owned synchronized before each time step

Geometric/Spatial Domain Decomposition ◮ cell lists are used to determine which A B atoms need to be communicated owned

MPI Communication Setup Setup Integration Step 1 Integration Step 1 Communication Communication Spatial Sorting Spatial Sorting Neighbor List Building Neighbor List Building Update Forces Update Forces Integration Step 2 Integration Step 2 Output

MPI Communication ◮ communication happens after first integration step ◮ this is when atom positions have been updated ◮ atoms are migrated to another process if necessary ◮ positions (and other properties) of ghosts are updated ◮ Each process can have up to 6 communication partners in 3D ◮ With periodic boundary conditions it can also be its own communication partner (in this case it will simply do a copy) ◮ Both send and receive happen at the same time ( MPI_Irecv & MPI_Send )

Decompositions (a) P = 2 (b) P = 4 Figure: Possible domain decompositions with 2 and 4 processes

Communication volume ◮ The intersection of two adjacent halo regions determines the communication volume in that direction ◮ If you let LAMMPS determine your decomposition, it will try to minimize this volume (a) xz halo region (b) xy halo region Figure: Halo regions of two different decompositions of a domain with an extent of 1x1x2.

Influence of Process Mapping ◮ The mapping of processes to physical hardware determines the amount of intra-node and inter-node communication ◮ (a) four processes must communicate with another node ◮ (b) two processes must communicate with another node (a) (b) Figure: Two process mappings of a 1x2x4 decomposed domain.

LAMMPS Dr. Richard Berger High-Performance Computing Group College - PowerPoint PPT Presentation

Latin American Introductory School on Parallel Programming and Parallel Architecture for High-Performance Computing LAMMPS Dr. Richard Berger High-Performance Computing Group College of Science and Technology Temple University Philadelphia,

Maintainability and Performance for LAMMPS Chris8an Tro: ,

Sustainability and Performance through Kokkos: A Case Study with LAMMPS Chris&an Tro, , Si

Henry2: Scientific Visualization with ParaView ParaView is an open-source, multi-platform data

Building and Breaking Block Chains Merlin Corey Pandoblox Engineer Shellcon 2018 Who is that

Beyond bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories Camille

Ava: From Data to Insights Through Conversation Rogers Jeffrey Leo John, Navneet Potti, and

The Truth About MLMs What is Multi-Level Marketing? - Get-Rich-Quick scheme - marketing

AMeshlessHierarchical RepresentaKonforLightTransport JaakkoLehKnen 1,2

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Python Best Practices on Blue Waters Roland Haas, Victor Anisimov (NCSA) Email:

Applying Symbolic Mathematics in Stata using Python Kye Lippold 2020 Stata Conference 7/31/2020

Introduction to Python I Fall 2013 Carola Wenk Python Programming def f(a, b): print

Procedure Procedure: a description of a computation that, given an input, produces an output.

A Crash Course in Python Based on Learning Python By Mark Lutz & David Ascher, O'Reilly

Brief Introduc.on to Python and Network Programming Phani Vadrevu pvadrevu@uga.edu

RSVP FOR QOS: What role for the IETF? Terminology RSVP has two major historical uses: making

A Study of Network Quality of Service in Many-Core MPI Applications Lee Savoie 1 , David

LAMMPS Dr. Richard Berger High-Performance Computing Group College - PowerPoint PPT Presentation

Latin American Introductory School on Parallel Programming and Parallel Architecture for High-Performance Computing LAMMPS Dr. Richard Berger High-Performance Computing Group College of Science and Technology Temple University Philadelphia,

Maintainability and Performance for LAMMPS Chris8an Tro: ,

Sustainability and Performance through Kokkos: A Case Study with LAMMPS Chris&amp;an Tro, , Si

Henry2: Scientific Visualization with ParaView ParaView is an open-source, multi-platform data

Building and Breaking Block Chains Merlin Corey Pandoblox Engineer Shellcon 2018 Who is that

Beyond bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories Camille

Ava: From Data to Insights Through Conversation Rogers Jeffrey Leo John, Navneet Potti, and

The Truth About MLMs What is Multi-Level Marketing? - Get-Rich-Quick scheme - marketing

AMeshlessHierarchical RepresentaKonforLightTransport JaakkoLehKnen 1,2

3D Viewing &amp; Clipping Where do geometries come from? Where do geometries come from? Pin-hole

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Python Best Practices on Blue Waters Roland Haas, Victor Anisimov (NCSA) Email:

Applying Symbolic Mathematics in Stata using Python Kye Lippold 2020 Stata Conference 7/31/2020

Introduction to Python I Fall 2013 Carola Wenk Python Programming def f(a, b): print

Procedure Procedure: a description of a computation that, given an input, produces an output.

A Crash Course in Python Based on Learning Python By Mark Lutz &amp; David Ascher, O'Reilly

Brief Introduc.on to Python and Network Programming Phani Vadrevu pvadrevu@uga.edu

RSVP FOR QOS: What role for the IETF? Terminology RSVP has two major historical uses: making

A Study of Network Quality of Service in Many-Core MPI Applications Lee Savoie 1 , David

Sustainability and Performance through Kokkos: A Case Study with LAMMPS Chris&an Tro, , Si

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole

A Crash Course in Python Based on Learning Python By Mark Lutz & David Ascher, O'Reilly