Integration of Burst Buffer in High- level Parallel IO Library for - PowerPoint PPT Presentation

Integration of Burst Buffer in High- level Parallel IO Library for Exa- scale Computing Era SC 2018 PDSW workshop Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao

Overview • Background & Motivation • Our idea – aggregation on burst buffer  Benefit  Challenges • Summary of results 2 PDSW-DISCS 2018

I/O in The Exa-scale Era • Huge data size  >10PB system memory  Data generated by application are in similar magnitude • I/O speed cannot catch up the increase of data size  Parallel File System (PFS) architecture is not scalable • Burst buffer introduced into I/O hierarchy  Made of new hardware such as SSDs, Non-volatile RAM …etc.  Tries to bridge the performance gap between computing and I/O • The role and potential of burst buffer hasn’t been fully explored  How can burst buffer help on improvement I/O performance 3 PDSW-DISCS 2018

I/O Aggregation Using the Burst Buffer • PFSs are made of rotating hard disks  High capacity, low speed  Usually used as main storage on super computer  Sequential access is fast while random access is slow  Handling large data is more efficient than handling small data • Burst buffers are made of SSDs or NVMs  Higher speed, lower capacity • I/O aggregation on burst buffer  Gather write requests on the burst buffer  Reorder requests into sequential  Combine all requests into one large request 4 PDSW-DISCS 2018

Related Work • LogFS [1]  I/O aggregation library using low-level offset and length data representation • Simpler implementation • Does not preserve the structure of the data  Log-based data structure for recording write operations • Data Elevator [2]  A user level library to move buffered files on the burst buffer to PFS  File is written to the burst buffer as is and copied to the PFS later • Does not alter I/O pattern on the burst buffer  Work only on shared burst buffer  Faster than moving the file using system functions on large scale • When number of nodes larger than number of burst buffer servers [1] D. Kimpe, R. Ross, S. Vandewalle and S. Poedts, "Transparent log-based data storage in MPI-IO applications," in Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface , Paris, 2007. [2] Dong, Bin, et al. "Data Elevator: Low-Contention Data Movement in Hierarchical Storage System." High Performance Computing (HiPC), 2016 IEEE 23rd International Conference on . IEEE, 2016. 5 PDSW-DISCS 2018

About PnetCDF • High-level I/O library  Built on top of MPI-IO  Abstract data description • Enable parallel access to NetCDF formatted file • Consists of I/O modules called drivers that deal with lower level libraries • https://github.com/Parallel- NetCDF/PnetCDF Picture courtesy from: Li, Jianwei et al. “Parallel netCDF: A High-Performance Scientific I/O Interface.” ACM/IEEE SC 2003 Conference (SC'03) (2003): 39-39. 6 PDSW-DISCS 2018

I/O Aggregation in PnetCDF User Application PnetCDF Dispatcher IO Drivers MPI-IO Driver Burst Buffer Driver MPI-IO POSIX IO Parallel File System Burst Buffer 7 PDSW-DISCS 2018

Recording Write Requests 8 PDSW-DISCS 2018

Compared to Lower-level Approach • Retain the structure of original data  Most scientific data are sub-array of high-dimensional arrays  Performance optimization  Can be used to support other operations such as in-situ analysis • Lower memory footprint  One high-level request can translate to multiple offsets and lengths • More complex operations to record  Not as simple as offset and length • Must follow the constraint of lower-level library  Less freedom to manipulate raw data 9 PDSW-DISCS 2018

Generating Aggregated Request • Limitation of MPI-IO  Flattened offset of a MPI write call must be monotonically non- decreasing • Can not simply stacking high-level requests together  May violate the requirement • Offsets must be sorted in order  Performance issue on large data 10 PDSW-DISCS 2018

2-stage Reordering Strategy • Group the requests  Requests from different group will never interleave each other  Requests within a group interleaves each other • Perform sorting on groups  Without broken up request to offsets • Perform sorting within group  Break up requests 11 PDSW-DISCS 2018

Experiment • Cori at NERSC  Cray DataWarp – shared burst buffer • Theta at ALCF  Local burst buffer made of SSD • Comparing with other approach  PnetCDF collective I/O without aggregation  Data elevator  Cray DataWarp staging out functions  LogFS • Comparing different log to process mapping 12 PDSW-DISCS 2018

Benchmarks IOR - Contiguous IOR - Strided Round 1 Round 1 Round 2 Round 2 Round 3 Round 3 Block 0 1 2 3 4 5 6 7 8 Block 0 1 2 3 4 5 6 7 8 P0 P1 P2 P0 P1 P2 FLASH Round 1 Round 2 … Round 3 Block 0 1 2 3 4 5 6 7 8 9 10 11 P0 P1 P2 Picture courtesy from: Liao, Wei-keng, et al. "Using MPI file caching to improve parallel write performance for large-scale scientific applications." Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on . IEEE, 2007. 13 PDSW-DISCS 2018

Cori – Shared Burst Buffer IOR - Contiguous - 512 Processes IOR - Strided - 8 MiB 40 8 I/O Bandwidth (GiB/s) I/O Bandwidth (GiB/s) 30 6 20 4 10 2 0 0 1/4 1/2 1 2 4 256 512 1 K 2 K 4 K Transfer Size (MiB) Number of Processes FLASH - I/O - Checkpoint File BTIO - Strong Scaling 12 I/O Bandwidth (GiB/s) I/O Bandwidth (GiB/s) 5 10 4 8 3 6 2 4 1 2 0 0 256 1 K 4 K 256 512 1 K 2 K 4 K Number of Processes Number of Processes Burst Buffer Driver LogFS PnetCDF Raw DataWarp Stage Out LogFS Approximate 14 PDSW-DISCS 2018

Cori – Shared Burst Buffer IOR - Contiguous - 512 Processes IOR - Strided - 8 MiB 100 20 Execution Time (sec.) Execution Time (sec.) 80 15 60 10 40 5 20 0 0 1/4 1/2 1 2 4 256 512 1 K 2 K 4 K Transfer Size (MiB) Number of Processes FLASH - I/O - Checkpoint File BTIO - Strong Scaling 50 70 Execution Time (sec.) Execution Time (sec.) 60 40 50 30 40 30 20 20 10 10 0 0 256 1 K 4 K 256 512 1 K 2 K 4 K Number of Processes Number of Processes Burst Buffer Driver Data Elevator 15 PDSW-DISCS 2018

Theta – Local Burst Buffer IOR - Strided - 8 MiB IOR - Contiguous - 1 K Processes 7 20 I/O Bandwidth (GiB/s) I/O Bandwidth (GiB/s) 6 15 5 4 10 3 5 2 1 0 0 256 512 1 K 2 K 4 K 1/4 1/2 1 2 4 Number of Processes Transfer Size (MiB) FLASH - I/O - Checkpoint File BTIO - Strong Scaling 6 2 I/O Bandwidth (GiB/s) I/O Bandwidth (GiB/s) 5 1.5 4 3 1 2 0.5 1 0 0 256 512 1 K 2 K 4 K 256 1 K 4 K Number of Processes Number of Processes Burst Buffer Driver LogFS PnetCDF Raw LogFS Approx 16 PDSW-DISCS 2018

Impact of log to process mapping FLASH - I/O - Cori • Use log per node on shared 4 3.5 burst buffer 3 2.5 Time (sec.) 2  Metadata Server bottleneck 1.5 Log File Read 1 when creating large number 0.5 Log File Write 0 of files Log File Init A B C A B C A B C A B C A B C 256 512 1 K 2 K 4 K • Use log per process on local A: Log Per Node Number of Processes burst buffer B: Log Per Process – Private C: Log Per Process FLASH - I/O – Theta  Reduce file sharing 4 overhead 3 Time (sec.) 2 • Use local burst buffer if Log File Read 1 available Log File Write 0 Log File Init A B A B A B A B A B  Configure DataWarp to A: Log Per Node 256 512 1 K 2 K 4 K private mode B: Log Per Process Number of Processes 17 PDSW-DISCS 2018

Conclusion and Future work • Burst buffer opens up new opportunities for I/O aggregation • Aggregation in a high-level I/O library is effective to improve performance  The concept can be applied to other high-level I/O libraries • HDF5, NetCDF-4 … etc. • Performance improvement  Overlap burst buffer and PFS I/O • Reading from burst buffer and writing to PFS can be pipelined  Support reading from the log without flushing • Reduce number of flush operation 18 PDSW-DISCS 2018

Thank You This research was supported by the Exascale Computing Project (17-SC-20- SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE- AC02-06CH11357.

Integration of Burst Buffer in High- level Parallel IO Library for - PowerPoint PPT Presentation

Integration of Burst Buffer in High- level Parallel IO Library for Exa- scale Computing Era SC 2018 PDSW workshop Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Observation of THz CSR Observation of THz CSR Burst at UVSOR- -II II Burst at UVSOR 1 Miho

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Introduction Buffer Overflows Buffer overflows were the most common form of security

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Week 03 Lectures PostgreSQL Buffer Manager 1/95 PostgreSQL buffer manager: provides a shared

SCR and Preparing for Burst Buffers DOE COE Performance Portability Meeting August 23, 2017

Lab 2: Buffer Overflows Fengwei Zhang Wayne State University Course: Cyber Security Prac@ce 1

1 Fast hits by Avoiding Address Fast hits by Avoiding Address Translation Translation Send

Adapting Router Buffers for Energy Efficiency Arun Vishwanath CEET, University of Melbourne

Chapter 5: Standard I/O Library CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Using Linux Media Controller for Wayland/Weston Renderer Technology Consulting Company

Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with

Integration of Burst Buffer in High- level Parallel IO Library for - PowerPoint PPT Presentation

Integration of Burst Buffer in High- level Parallel IO Library for Exa- scale Computing Era SC 2018 PDSW workshop Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Observation of THz CSR Observation of THz CSR Burst at UVSOR- -II II Burst at UVSOR 1 Miho

Gamma- -Ray Burst observation with GLAST Ray Burst observation with GLAST Gamma F. Piron F.

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes &amp; Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Introduction Buffer Overflows Buffer overflows were the most common form of security

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Week 03 Lectures PostgreSQL Buffer Manager 1/95 PostgreSQL buffer manager: provides a shared

SCR and Preparing for Burst Buffers DOE COE Performance Portability Meeting August 23, 2017

Lab 2: Buffer Overflows Fengwei Zhang Wayne State University Course: Cyber Security Prac@ce 1

1 Fast hits by Avoiding Address Fast hits by Avoiding Address Translation Translation Send

Adapting Router Buffers for Energy Efficiency Arun Vishwanath CEET, University of Melbourne

Chapter 5: Standard I/O Library CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Using Linux Media Controller for Wayland/Weston Renderer Technology Consulting Company

Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning