Efficient Scientific Data Efficient Scientific Data Management on - PowerPoint PPT Presentation

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on Supercomputers Suren Byna Scientific Data Management Group, LBNL

Scientific Data - Where is it coming from? ▪ Simulations ▪ Experiments ▪ Observations 2

Life of scientific data Generation In situ analysis Processing Storage Analysis Preservation Sharing Refinement 3

Supercomputing systems 4

Typical supercomputer architecture Blade&&=&2&x&Burst&Buffer&Node&(2x&SSD)& Compute&Nodes& I/O&Node&(2x&InfiniBand&HCA)& BB& SSD& CN& CN& CN& CN& CN& SSD& Lustre&OSSs/OSTs& Storage&Fabric&(InfiniBand)& ION& IB& CN& CN& CN& CN& CN& IB& BB& SSD& CN& CN& CN& CN& CN& SSD& BB& SSD& CN& CN& CN& CN& CN& SSD& ION& IB& CN& CN& CN& CN& CN& IB& Storage&Servers& BB& SSD& CN& CN& CN& CN& CN& SSD& Aries&HighHSpeed&Network& InfiniBand&Fabric& Cori system 5

Scientific Data Management in supercomputers ▪ Data representation – Metadata, data structures, data models ▪ Data storage – Storing and retrieving data and metadata to file systems fast ▪ Data access – Improving performance of data access that scientists desire ▪ Facilitating analysis – Strategies for supporting finding the meaning in the data ▪ Data transfers – Transfer data within a supercomputing system and between different systems 6

Scientific Data Management in supercomputers ▪ Data representation – Metadata, data structures, data models ▪ Data storage – Storing and retrieving data and metadata to file systems fast ▪ Data access – Improving performance of data access that scientists desire ▪ Facilitating analysis – Strategies for supporting finding the meaning in the data ▪ Data transfers – Transfer data within a supercomputing system and between different systems 7

Focus of this presentation ▪ Storing and retrieving data – Parallel I/O – Software stack – Modes of parallel I/O – Tuning parallel I/O performance ▪ Autonomous data management system – Proactive Data Containers (PDC) system – Metadata management service – Data management service 8

Trends – Storage system transformation Current Upcoming Conventional Eg. Cori @ NERSC Eg. Aurora @ ALCF Memory Memory Memory Node-local storage IO Gap IO Gap Shared burst buffer Shared burst buffer Parallel file system Parallel file system Parallel file system Campaign storage (Lustre, GPFS) (Lustre, GPFS) Archival Storage Archival Storage Archival storage (HPSS tape) (HPSS tape) (HPSS tape) • IO performance gap in HPC storage is a significant bottleneck because of slow disk-based storage • SSD and new memory technologies are trying to fill the gap, but increase the depth of storage hierarchy 9

Contemporary Parallel I/O ST Stack Applications High Level I/O Libraries I/O Middleware I/O Forwarding Parallel File System I/O Hardware 10

Parallel I/O software stack § I/O Libraries – HDF5 (The HDF Group) [LBL, ANL] Applications – ADIOS (ORNL) – PnetCDF (Northwestern, ANL) High Level I/O Library (HDF5, NetCDF, ADIOS) – NetCDF-4 (UCAR) I/O Middleware (MPI-IO) • Middleware – POSIX-IO, MPI-IO (ANL) I/O Forwarding • I/O Forwarding Parallel File System (Lustre, GPFS,..) • File systems: Lustre (Intel), GPFS I/O Hardware (IBM), DataWarp (Cray), … § I/O Hardware (disk-based, SSD-based, … ) 11

Parallel I/O – Application view ▪ Types of parallel I/O • 1 writer/reader, 1 file … … … … … P 0 P 1 P n • N writers/readers, N files P 0 P 0 P 1 P 1 P n-1 P n-1 P n P n P 0 P 0 P 1 P 1 P n-1 P n-1 P n-1 P n P n (File-per-process) • N writers/readers, 1 file file.0 file.0 file.m file.0 file.1 file.n file.n-1 • M writers/readers, 1 file File.1 File.1 M Writers/Readers, 1 File M Writers/Readers, M Files 1 Writer/Reader, 1 File n Writers/Readers, n Files n Writers/Readers, 1 File – Aggregators – Two-phase I/O • M aggregators, M files (file- per-aggregator) – Variations of this mode 12

Parallel I/O – System view Logical view ▪ Parallel file systems – Lustre and Spectrum Scale (GPFS) File ▪ Typical building blocks of parallel file systems Communication – Storage hardware – HDD or SSD network RAID – Storage servers (in Lustre, Object Storage Servers [OSS], and object Physical view on a parallel file system storage targets [OST] – Metadata servers – Client-side processes and interfaces ▪ Management – Stripe files for parallelism OST 0 OST 1 OST 2 OST 3 – Tolerate failures File 13

How to achieve peak parallel I/O performance? Application ▪ Parallel I/O software stack HDF5 provides options for performance (Alignment, Chunking, etc.) MPI I/O optimization (Enabling collective buffering, Sieving buffer size, collective buffer size, collective buffer nodes, etc.) ▪ Challenge: Complex inter- Parallel File System (Number of I/O nodes, stripe size, enabling prefetching buffer, etc.) dependencies among SW and Storage Hardware Storage Hardware HW layers 14

Tuning parameter space The$whole$space$visualized$ Stripe_Count& 4& 8& 16& 32& 64& 128& 5242 Stripe_Size&(MB)& 1& 2& 4& 8& 16& 32& 64& 128& 4& 1& 1& 1& 64& 88& cb_nodes& 1& 2& 4& 8& 16& 32& …$ 23040$ cb_buffer_size&(MB)& 1& 2& 4& 8& 16& 32& 64& 128& 1& 1048 128& 128& 32& 128& 576& MB& alignment& 524288& 1048576& 1& 64& 128& 256& 512& siv_buf_size&(KB)& MB& 15 July 30, CScADS Workshop 2012

Tuning for writing trillion particle datasets ▪ Simulation of magnetic reconnection (a space weather phenomenon) with VPIC code – 120,000 cores – 8 arrays (HDF5 datasets) – 32 TB to 42 TB files at 10 time steps ▪ Extracted I/O kernel ▪ M Aggregators to 1 shared file ▪ Trial-and-error selection of Lustre file system parameters while scaling the problem size ▪ Reached peak performance in many instances in a real simulation More details: SC12 and CUG 2013 papers 16

Tuning combinations are abundant • Searching through all combinations manually is impractical • Users, typically domain scientists, should not be burdened with tuning • Performance auto-tuning has been explored heavily for optimizing matrix operations • Auto-tuning for parallel I/O is challenging due to shared I/O subsystem and slow I/O • Need a strategy for reduce the search space with some knowledge 17

Our solution: I/O Auto-tuning • Auto-tuning framework to search the parameter space with a reduced number of combinations • HDF5 I/O library sets the optimization parameters • H5Tuner: Dynamic interception of HDF5 calls • H5Evolve: – Genetic algorithm based selection – Model-based selection 18

Dynamic Model-driven Auto-tuning Overview of Dynamic Model-driven I/O tuning I/O Kernel • Auto-tuning using empirical Model Generation Refitting performance models of I/O Training Training Phase • Steps Set – Training phase to develop an (Controled by user) Refit the model Develop an I/O model I/O Model – Pruning phase to select the Pruning top-K configurations All Possible I/O Model Values – Exploration phase to select Top k the best configuration Configurations – Refitting step to refine Exploration performance model Performance Results HPC System Select the Best Performing Configuration Storage System 19

Empirical Performance Model • Non-linear regression model n b ∑ m( x ; β ) = β k φ k ( x ) k = 1 • Linear combinations of n b non-linear, low polynomial basis functions ( ϕ k ), and hyper-parameters β (selected with standard regression approach) for a parameter configuration of x • For example: 1 1 c f f cf m ( x ) = β 1 + β 2 s + β 3 a + β 4 s + β 5 c + β 6 s + β 7 a , • f: file size; a: number of aggregators; c: stripe count; s: stripe size with a fit to the data yielding β = [10.59, 68.99, 59.83, − 1.23, 2.26, 0.18, 0.01] i 20

Performance Improvement: 4K cores Edison Hopper Stampede Default VPIC-IO on Hopper Default VORPAL-IO on Hopper 40 Default GCRM-IO on Hopper 30 20 10 I/O Bandwidth (GB/s) 1 0.4 0.3 0.1 VPIC-IO VORPAL-IO GCRM-IO 21

Performance Improvement: 8K cores Edison Hopper Default VPIC-IO on Hopper 30 Default VORPAL-IO on Hopper 20 10 I/O Bandwidth (GB/s) 94x 1 0.3 0.2 0.1 VPIC-IO VORPAL-IO GCRM-IO 22

Autonomous data management 23

Storage Systems and I/O: Current status Hardware Usage Software High-level lib Memory Applications (HDF5, etc.) Node-local storage IO middleware … Data (in memory) (POSIX, MPI-IO) Shared burst buffer Parallel file system IO forwarding Tune middleware IO software Tune file systems Campaign storage Parallel file systems Archival storage … Files in file system (HPSS tape) • Challenges – Multi-level hierarchy complicates data movement, especially if user has to be involved – POSIX-IO semantics hinder scalability and performance of file systems and IO software 24

Efficient Scientific Data Efficient Scientific Data Management on - PowerPoint PPT Presentation

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on Supercomputers Suren Byna Scientific Data Management Group, LBNL Scientific Data - Where is it coming from? Simulations Experiments

Efficient Scientific Data Management on Supercomputers Suren Byna Staff Scientist Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

Topics The Scientific Data Deluge Data-Intensive Scientific Discovery NSF OCI Data/Viz Task

SCIENCE SCIENCE Scientific Question Hypothesis Prediction Experimental Test Scientific

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

Efficient Scientific Data Management on Supercomputers HDF5 and Proactive Data Containers

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Modernising Scientific Careers: Modernising Scientific Careers: The story so far and future

life-cycle of the product Spiros Vamvakas Head of Scientific Advice Product Development

Scientific Inquiry Introduction to Evolution and Scientific Inquiry Dr. Spielman;

UNDERSTANDING THE SCIENTIFIC METHOD ATI TEAS SCIENCE - THE SCIENTIFIC METHOD Questions related

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

An Introduction to Empirical Support of Efficient Market Hypothesis Behavioral Finance

Efficient Graph Rewriting York Semigroup Graham Campbell May 2019 Graham Campbell Efficient

Horn Formulas 1 Efficient satisfiability checks In the following: A very efficient

National Knowledge Network Overview Experience life with 1000000000 bps 1 November 2012 NKN

An Exploration into Object Storage Lance Evans Raghu Chandrasekar Office of the CTO Storage

Prospects for truly asynchronous communication with pure MPI and hybrid MPI/OpenMP on current

Coheirs With Christ PBCs College & University Group (CUG) 22/4/2018 A person legally

Computability of the Zero-Error capacity with Kolmogorov Oracle Holger Boche 1 and Christian Deppe

graph processing CHERNOSKUTOV MIKHAIL IMM UB RAS, IMCS URFU, YEKATERINBURG E-MAIL:

Stream Processing for Remote Collaborative Data Analysis Scott Klasky 146 , C. S. Chang 2 , Jong

A Uniform Architecture for Parsing and Generation of Natural Language G unter Neumann DFKI

Efficient Scientific Data Efficient Scientific Data Management on - PowerPoint PPT Presentation

Efficient Scientific Data Efficient Scientific Data Management on Supercomputers Management on Supercomputers Suren Byna Scientific Data Management Group, LBNL Scientific Data - Where is it coming from? Simulations Experiments

Efficient Scientific Data Management on Supercomputers Suren Byna Staff Scientist Scientific

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Scientific report Mariusz ynel April 22, 2015 Scientific report 2 Contents 1 Scientific

Topics The Scientific Data Deluge Data-Intensive Scientific Discovery NSF OCI Data/Viz Task

SCIENCE SCIENCE Scientific Question Hypothesis Prediction Experimental Test Scientific

Scientific Programming in mpags-python.github.io Steven Bamford An introduction to scientific

Efficient Scientific Data Management on Supercomputers HDF5 and Proactive Data Containers

Efficient signal processing using Haskell and LLVM Henning Thielemann 2016-09-15 Efficient

Modernising Scientific Careers: Modernising Scientific Careers: The story so far and future

life-cycle of the product Spiros Vamvakas Head of Scientific Advice Product Development

Scientific Inquiry Introduction to Evolution and Scientific Inquiry Dr. Spielman;

UNDERSTANDING THE SCIENTIFIC METHOD ATI TEAS SCIENCE - THE SCIENTIFIC METHOD Questions related

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

An Introduction to Empirical Support of Efficient Market Hypothesis Behavioral Finance

Efficient Graph Rewriting York Semigroup Graham Campbell May 2019 Graham Campbell Efficient

Horn Formulas 1 Efficient satisfiability checks In the following: A very efficient

National Knowledge Network Overview Experience life with 1000000000 bps 1 November 2012 NKN

An Exploration into Object Storage Lance Evans Raghu Chandrasekar Office of the CTO Storage

Prospects for truly asynchronous communication with pure MPI and hybrid MPI/OpenMP on current

Coheirs With Christ PBCs College &amp; University Group (CUG) 22/4/2018 A person legally

Computability of the Zero-Error capacity with Kolmogorov Oracle Holger Boche 1 and Christian Deppe

graph processing CHERNOSKUTOV MIKHAIL IMM UB RAS, IMCS URFU, YEKATERINBURG E-MAIL:

Stream Processing for Remote Collaborative Data Analysis Scott Klasky 146 , C. S. Chang 2 , Jong

A Uniform Architecture for Parsing and Generation of Natural Language G unter Neumann DFKI

Coheirs With Christ PBCs College & University Group (CUG) 22/4/2018 A person legally