he ur Operated by Triad National Security, LLC for the U.S. - - PowerPoint PPT Presentation

he ur
SMART_READER_LITE
LIVE PREVIEW

he ur Operated by Triad National Security, LLC for the U.S. - - PowerPoint PPT Presentation

he ur Operated by Triad National Security, LLC for the U.S. Department of Energy's NNSA Los Alamos National Laboratory LA-UR-19-24811 Understanding Storage System Challenges you for Parallel Scientific Simulations nt wo Brad Settlemyer


slide-1
SLIDE 1

he ur

Operated by Triad National Security, LLC for the U.S. Department of Energy's NNSA

slide-2
SLIDE 2

you nt wo

Los Alamos National Laboratory

Understanding Storage System Challenges for Parallel Scientific Simulations

Brad Settlemyer

Los Alamos National Laboratory

LA-UR-19-24811

slide-3
SLIDE 3

Los Alamos National Laboratory 4/23/19 | 3

Outline

  • Intro to Computational Science
  • VPIC Overview
  • PIC Introduction
  • VPIC Scientific Workflow
  • VPIC I/O Workloads
  • Real VPIC I/O Challenges
slide-4
SLIDE 4

Los Alamos National Laboratory 4/23/19 | 4

A Brief Introduction to Computational Science

slide-5
SLIDE 5

Los Alamos National Laboratory 4/23/19 | 5

The Traditional Scientific Method

  • A method for understanding

the physical world

  • Begins with observation
  • Some parts of the physical

world are not well suited to

  • bservation
  • Galaxy formations/collisions
  • Climate models
  • Asteroid collisions
  • Fluid dynamics
slide-6
SLIDE 6

Los Alamos National Laboratory 4/23/19 | 6

Incorporating Simulation into The Scientific Method

  • Computer-based simulation

enables new scientific inquiry

  • Long time-scales
  • Complex interactions
  • Dangerous interactions
  • Computational Challenges
slide-7
SLIDE 7

Los Alamos National Laboratory 4/23/19 | 7

Incorporating Simulation into The Scientific Method

  • Computer-based simulation

enables new scientific inquiry

  • Long time-scales
  • Complex interactions
  • Dangerous interactions
  • Computational Challenges
  • Tightly-coupled simulations

imply bulk-synchronous I/O

  • A single job may require

months of compute time

slide-8
SLIDE 8

Los Alamos National Laboratory 4/23/19 | 8

  • 1. Create Mesh (Computational Science Workflow)

Fixed Mesh (Valves, cylinders) Adaptive Mesh (Turbulent combustion) Mesh deformation (Shock propagating in fluid)

slide-9
SLIDE 9

Los Alamos National Laboratory 4/23/19 | 9

  • 2. Calculate Physics (Computational Science Workflow)
  • Often takes weeks or months
  • Figure shows particle-in-cell

(PIC) method

  • Many other methods
  • Finite Element Methods
  • Finite Difference Methods
  • Monte Carlo Methods
  • The actual scientific question

being answered typically favors one method or another

slide-10
SLIDE 10

Los Alamos National Laboratory 4/23/19 | 10

  • 3. Generate Data (Computational Science Workflow)
  • Simulation pauses when all

processes reach some interesting point in the simulation

  • Save state to protect against a

failure (checkpoint/restart)

  • Save state for later analysis
  • Machine failures and scientific

insight occur at different frequencies L

  • Once I/O is complete,

simulation resumes

Compute/ Clients Lustre OSS Lustre MDS Lustre OST PFS Routers IOBB (Infiniband)

slide-11
SLIDE 11

Los Alamos National Laboratory 4/23/19 | 11

  • 3. Generate Data (Computational Science Workflow)
  • Simulation pauses when all

processes reach some interesting point in the simulation

  • Save state to protect against a

failure (checkpoint/restart)

  • Save state for later analysis
  • Machine failures and scientific

insight occur at different frequencies L

  • Once I/O is complete,

simulation resumes

Compute/ Clients Lustre OSS Lustre MDS Lustre OST PFS Routers IOBB (Infiniband)

slide-12
SLIDE 12

Los Alamos National Laboratory 4/23/19 | 12

  • 4. Analyze Data (Computational Science Workflow)
  • Scientists analyze/visualize

simulation output

  • Test and validate hypotheses
  • Source of new phenomena
  • bservations!
  • Automatic and in-situ analysis

emerging as relevant to some scientific fields

slide-13
SLIDE 13

Los Alamos National Laboratory 4/23/19 | 13

What makes HPC computing unique and difficult?

  • Simulation Scale
  • Frequently billions or trillions of mesh cells (1.5PB simulations on Trinity)
  • Simulations run for weeks or months
  • Longest simulation on Trinity: 7 months
  • Longest I’ve heard of: 18 months
  • Universe tends toward disorder (entropy increases)
  • As simulation progresses, high % of memory is frequently modified
  • Tight-coupling, frequent communication due to boundary condition

exchanges and load balancing over time

  • Large storage system requirements
  • Checkpoint/restart bursts to support long running jobs
  • Capacity to store large quantities of restart dumps and analysis data
slide-14
SLIDE 14

Los Alamos National Laboratory 4/23/19 | 14

An Overview of VPIC

slide-15
SLIDE 15

Los Alamos National Laboratory 4/23/19 | 15

Quick Particle-In-Cell (PIC) Overview

Particles model material

  • Millions of particles per process
  • Trillions of particles per simulation

Fixed Mesh

  • Method extends to 3D well
  • Each process maintains a

contiguous chunk of the mesh

  • Updates fields and materials
  • Solves the Maxwell-Boltzmann

kinetic equations

  • Applications in astrophysics, fusion,

plasma interactions

Node 0 Node 1 Node2 Node3 t0 t1

PIC Introduction: https://www.youtube.com/watch?v=CmhSWPpa_6w

slide-16
SLIDE 16

Los Alamos National Laboratory 4/23/19 | 16

Why do I/O researchers use VPIC?

  • Excellent scaling
  • Demonstrated across 4096 Trinity nodes (32k

processes)

  • Flexible code
  • Popular CS languages (engine is 16k sloc C/C++)
  • Supports MPI, OpenMP, and Pthreads
  • Can be field dominant or particle dominant
  • Can be compute/comm/memory intensive
slide-17
SLIDE 17

Los Alamos National Laboratory 4/23/19 | 17

VPIC’s Simulation Science Workflow

Data Retention Time

Forever Temporary

Setup/Parameterize/ Create Mesh Simulate Physics Viz

Initial Mesh Setup

Checkpoint Dump Job Begin Job End

Campaign

Checkpoint Dump Time-step Data Set Sampled Data Set Down- Sample Post- Process Analysis Data Set

Sim Input Deck

Phase S1 Phase S2 Phase S3 Phase S4 Phase S5

Checkpoint Dump

4 – 8x per week 5 - 15x per pipeline

Time-step Data Set

5 – 10x per week

Simulation Science Pipeline

slide-18
SLIDE 18

Los Alamos National Laboratory 4/23/19 | 18

VPIC Checkpoint/Restart

  • Essential for simulations running for long duration
  • ver thousands of nodes
  • Basic paradigms: N-N, N-M, N-1
  • Typically the largest consumer of bandwidth/capacity
  • In general must store both the particles and the fields
  • Why?! Performance!
  • Approximately 80% of system memory
  • VPIC uses N-N file organization for checkpoint/restart
slide-19
SLIDE 19

Los Alamos National Laboratory 4/23/19 | 19

LANL’s Trinity Computer Burst Buffer ( 3 .5 PB, 2 .5 TB/ s)

Platform Mem ory ( 2 PiB, PiB/ s)

Lustre PFS ( 7 8 PB, 1 .1 TB/ s) Cam paign/ Archive

HPC Checkpoint Workload

slide-20
SLIDE 20

Los Alamos National Laboratory 4/23/19 | 20

VPIC Time Step Data Sets

  • Types of data
  • Particles (32 – 48 bytes each)
  • Fields (typically <1k, but could be much more)
  • Cell Materials (often 0 bytes)
  • 2 primary methods for data reduction
  • Sampling (mean, spatial average, etc.)
  • Decimation
  • Scientist typically determines the processing methods needed
  • Frequently not well optimized
  • Bound on bandwidth and performed on front ends
slide-21
SLIDE 21

Los Alamos National Laboratory 4/23/19 | 21

VPIC Visualization

  • Format the data into a parallel

visualization format

  • Paraview, Ensight, VisIt, etc
  • Visualization workflows are

typically bound on read performance

  • Interactivity defeats pre-fetching

algorithms

  • Viewing doesn’t always occur

along the contiguous dimension

slide-22
SLIDE 22

Los Alamos National Laboratory 4/23/19 | 22

Real VPIC I/O Challenges

slide-23
SLIDE 23

Los Alamos National Laboratory 4/23/19 | 23

Tracking the Trajectory of High Energy particles

Assumptions:

  • Simulation has trillions of

particles

  • Highest energy particles only

known at simulation end

  • Insufficient memory to track the

history of each particle Goal:

  • Determine if the trajectory of

the high-energy particles follows Fermi acceleration between magnetic islands

  • Highly selective queries
slide-24
SLIDE 24

Los Alamos National Laboratory 4/23/19 | 24

Spatial distribution of particles within energy band

Assumptions

  • Simulation has trillions of

particles

  • Energy distribution changing
  • ver time

Goals

  • Filter particles by energy band to

examine the spatial location of energy bands

  • Scan intensive workload

Image and problem from “Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation,” Byna, et al.

slide-25
SLIDE 25

Los Alamos National Laboratory 4/23/19 | 25

The tip of the iceberg …

  • Where are the largest clusters of similarly charged particles (i.e.

magnetic islands)?

  • Which particles have most recently moved between magnetic

islands?

  • Which particles are moving as groups and how are they

moving?

  • Is it possible to develop a taxonomy of formations that occur

during a magnetic reconnection?

  • And more …
slide-26
SLIDE 26

Los Alamos National Laboratory 4/23/19 | 26

Conclusions

  • VPIC is an excellent resource for I/O researchers
  • Open source
  • Popular programming languages (subsets)
  • Doesn’t require exotic compilers
  • Highly scalable
  • Important scientific problems
  • VPIC scientists have real I/O problems
  • A VPIC researcher has consumed all of the Trinity storage systems inodes
  • Extremely small writes are an unsolved problem
  • Data analysis performance severely limits current insight
slide-27
SLIDE 27

Los Alamos National Laboratory 4/23/19 | 27

Thanks!