Glenn K. Lockwood
Shane Snyder Suren Byna Philip Carns Nicholas J. Wright
Understanding Data Motion in the Modern HPC Data Center
- 1 -
Understanding Data Motion in the Modern HPC Data Center Glenn K. - - PowerPoint PPT Presentation
Understanding Data Motion in the Modern HPC Data Center Glenn K. Lockwood Shane Snyder Suren Byna Philip Carns Nicholas J. Wright - 1 - Scientific computing is more than compute! Tape Tape Tape Tape Tape Tape Tape Tape GW Tape
Shane Snyder Suren Byna Philip Carns Nicholas J. Wright
Science Gateway Science Gateway GW GW Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Data Transfer Data Transfer Data Transfer Data Transfer Data Transfer Data Transfer
Router SN SN SN SN SN Center-wide Fabric Center-wide Network WAN (ESnet) CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN ION ION SN SN SN SN SN SN SN SN ION ION ION ION Storage Fabric ION ION GW GW
Science Gateway Science Gateway GW GW Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Tape Data Transfer Data Transfer Data Transfer Data Transfer Data Transfer Data Transfer
Router SN SN SN SN SN Center-wide Fabric Center-wide Network WAN (ESnet) CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN ION ION SN SN SN SN SN SN SN SN ION ION ION ION Storage Fabric ION ION GW GW
Outside World HPSS Gateway DTNs Cori cscratch home project burst buffer Science Gateways archive
≤ 8 TiB/day > 8 TiB/day > 16 TiB/day > 32 TiB/day > 64 TiB/day > 128 TiB/day
Outside World HPSS Gateway DTNs Cori cscratch home project burst buffer Science Gateways archive
≤ 8 TiB/day > 8 TiB/day > 16 TiB/day > 32 TiB/day > 64 TiB/day > 128 TiB/day
Outside World HPSS Gateway DTNs Cori cscratch home project burst buffer Science Gateways archive
External Facilities Storage Systems Compute Systems Storage- External Compute- Storage Storage- Storage
C
p u t e
t
a g e S t
a g e
p u t e S t
a g e
t
a g e S t
a g e
A N W A N
t
a g e 512 GiB 2 TiB 8 TiB 32 TiB 128 TiB 512 TiB Data Transferred (TiB)
External Facilities Storage Systems Compute Systems Storage- External Compute- Storage Storage- Storage
1 b y t e 3 2 b y t e s 1 K i B 3 2 K i B 1 M i B 3 2 M i B 1 G i B 3 2 G i B 1 T i B 3 2 T i B 1 P i B Size of transfer 0.00 0.25 0.50 0.75 1.00 Cumulative fraction of total transfers
Globus transfers Darshan transfers HPSS transfers Files at rest
– Job I/O: optimized – Others: fire-and-forget
1 M i B 3 2 M i B 1 G i B 3 2 G i B 1 T i B 3 2 T i B 1 P i B Size of transfer 0.00 0.25 0.50 0.75 1.00 Cumulative fraction of total volume transferred
Amy/Darshan Bob/Darshan Carol/Darshan Dan/Darshan Eve/Darshan,Globus,HPSS Frank/Darshan,Globus Gail/Darshan,Globus Henry/HPSS
0 bytes 128 TiB 256 TiB 384 TiB 512 TiB 640 TiB 768 TiB Write by User Write by FS Reads by User Reads by FS
Amy Bob Carol Dan Other users tmpfs burst buffer cscratch homes project
1 2 Compute- Storage (TiB) 1 2 Storage- Compute (TiB) Apr 29, 2019 May 12, 2019 May 25, 2019 Jun 7, 2019 Jun 20, 2019 Jul 3, 2019 Jul 16, 2019 Jul 29, 2019 1 2 Storage- Storage (TiB)
Outside World HPSS Gateway DTNs Cori cscratch home project burst buffer Science Gateways archive
cscratch project Burst Buffer archive Outside World 20 40 60 80 100 % True Data Volume Captured by Transfers In/Write Out/Read
Cori HPSS Gateway DTNs Science Gateways 0.0 0.5 1.0 1.5 Incongruency 1.27 0.137 0.018 0.613
Outside World HPSS Gateway DTNs Cori cscratch home project burst buffer Science Gateways archive
New profiling tools to capture I/O from
(bbcp, scp, etc) Better insight into what is happening inside Docker containers More robust collection
aware I/O data (LDMS) Improve analysis process to handle complex transfers
This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contracts DE- AC02- 05CH11231 and DE-AC02-06CH11357. This research used resources and data generated from resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and the Argonne Leadership Computing Facility, a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
1 M i B 3 2 M i B 1 G i B 3 2 G i B 1 T i B 3 2 T i B 1 P i B Size of transfer 0.00 0.25 0.50 0.75 1.00 Cumulative fraction of total volume transferred
Amy/Darshan Bob/Darshan Carol/Darshan Dan/Darshan Eve/Darshan,Globus,HPSS Frank/Darshan,Globus Gail/Darshan,Globus Henry/HPSS
10% 1% 0.1% 0.01% 0.001% 0.0001% Percent total volume transferred 10 50 150 250 350 Number of users
– how correlatable is a user’s I/O across all vectors – how easily we can guess what a user’s workflow is doing
0.00 0.50 1.00 Mean user correlation coefficient 0.0 0.2 0.4 0.6 Fraction of users All vectors Excluding C-S/S-C vectors
1,123 users represented in “all vectors” 486 users represented in “excluding C-S/S-C vectors”