Integration of Burst Buffer in High- level Parallel IO Library for Exa- scale Computing Era
SC 2018 PDSW workshop
Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao
Integration of Burst Buffer in High- level Parallel IO Library for - - PowerPoint PPT Presentation
Integration of Burst Buffer in High- level Parallel IO Library for Exa- scale Computing Era SC 2018 PDSW workshop Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao
Kaiyuan Hou, Reda Al-Bahrani, Esteban Rangel, Ankit Agrawal, Robert Latham, Robert Ross, Alok Choudhary, and Wei-keng Liao
2
PDSW-DISCS 2018
3
PDSW-DISCS 2018
4
PDSW-DISCS 2018
5
PDSW-DISCS 2018
[1] D. Kimpe, R. Ross, S. Vandewalle and S. Poedts, "Transparent log-based data storage in MPI-IO applications," in Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface , Paris, 2007. [2] Dong, Bin, et al. "Data Elevator: Low-Contention Data Movement in Hierarchical Storage System." High Performance Computing (HiPC), 2016 IEEE 23rd International Conference on. IEEE, 2016.
6
PDSW-DISCS 2018
Picture courtesy from: Li, Jianwei et al. “Parallel netCDF: A High-Performance Scientific I/O Interface.” ACM/IEEE SC 2003 Conference (SC'03) (2003): 39-39.
7
PDSW-DISCS 2018
8
PDSW-DISCS 2018
9
PDSW-DISCS 2018
10
PDSW-DISCS 2018
11
PDSW-DISCS 2018
12
PDSW-DISCS 2018
13
PDSW-DISCS 2018
Round 1 Round 2 Round 3 Block 1 2 3 4 5 6 7 8 P0 P1 P2
Round 1 Round 2 Round 3 Block 1 2 3 4 5 6 7 8 P0 P1 P2
Picture courtesy from: Liao, Wei-keng, et al. "Using MPI file caching to improve parallel write performance for large-scale scientific applications." Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on. IEEE, 2007.
Round 1 Round 2 … Round 3 Block 1 2 3 4 5 6 7 8 9 10 11
P0 P1 P2
14
PDSW-DISCS 2018
2 4 6 8 1/4 1/2 1 2 4 I/O Bandwidth (GiB/s) Transfer Size (MiB) IOR - Contiguous - 512 Processes 10 20 30 40 256 512 1 K 2 K 4 K I/O Bandwidth (GiB/s) Number of Processes IOR - Strided - 8 MiB 2 4 6 8 10 12 256 512 1 K 2 K 4 K I/O Bandwidth (GiB/s) Number of Processes FLASH - I/O - Checkpoint File 1 2 3 4 5 256 1 K 4 K I/O Bandwidth (GiB/s) Number of Processes BTIO - Strong Scaling Burst Buffer Driver LogFS PnetCDF Raw DataWarp Stage Out LogFS Approximate
15
PDSW-DISCS 2018
5 10 15 20 1/4 1/2 1 2 4 Execution Time (sec.) Transfer Size (MiB) IOR - Contiguous - 512 Processes 20 40 60 80 100 256 512 1 K 2 K 4 K Execution Time (sec.) Number of Processes IOR - Strided - 8 MiB 10 20 30 40 50 256 512 1 K 2 K 4 K Execution Time (sec.) Number of Processes FLASH - I/O - Checkpoint File 10 20 30 40 50 60 70 256 1 K 4 K Execution Time (sec.) Number of Processes BTIO - Strong Scaling Burst Buffer Driver Data Elevator
16
PDSW-DISCS 2018
1 2 3 4 5 6 7 1/4 1/2 1 2 4 I/O Bandwidth (GiB/s) Transfer Size (MiB)
IOR - Contiguous - 1 K Processes
5 10 15 20 256 512 1 K 2 K 4 K I/O Bandwidth (GiB/s) Number of Processes
IOR - Strided - 8 MiB
1 2 3 4 5 6 256 512 1 K 2 K 4 K I/O Bandwidth (GiB/s) Number of Processes
FLASH - I/O - Checkpoint File
0.5 1 1.5 2 256 1 K 4 K I/O Bandwidth (GiB/s) Number of Processes
BTIO - Strong Scaling
Burst Buffer Driver LogFS PnetCDF Raw LogFS Approx
17
PDSW-DISCS 2018
0.5 1 1.5 2 2.5 3 3.5 4 A B C A B C A B C A B C A B C 256 512 1 K 2 K 4 K Time (sec.) Number of Processes FLASH - I/O - Cori Log File Read Log File Write Log File Init 1 2 3 4 A B A B A B A B A B 256 512 1 K 2 K 4 K Time (sec.) Number of Processes FLASH - I/O – Theta Log File Read Log File Write Log File Init A: Log Per Node B: Log Per Process A: Log Per Node B: Log Per Process – Private C: Log Per Process
18
PDSW-DISCS 2018
This research was supported by the Exascale Computing Project (17-SC-20- SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE- AC02-06CH11357.