burst buffer simulation in dragonfly network
play

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael - PowerPoint PPT Presentation

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of Technology, Los Alamos National Laboratory Purpose: Residing in the compute node network, and using solid state drives (SSD), burst buffers bring a


  1. Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of Technology, Los Alamos National Laboratory

  2. Purpose: • Residing in the compute node network, and using solid state drives (SSD), burst buffers bring a significant I/O performance boost compared with traditional external hard disk drive (HDD) storage system. • bottleneck of fully leveraging burst buffers still remains unknown. • Sharing network • Limited burst buffer number • Due to large system scales, it is usually too expensive to change system setup and configurations.

  3. Trinity System Overview : • System Level • Cabinet Level • Chassis Level

  4. Inside a Group:

  5. Trinity Phase II Network Configuration: • Global link: • 37.6 GB/s • Local link: • Intra-chassis, 5.25GB/s • Inter-chassis, 15.75GB/s (3 tiles). • Intra-Blade link: • PCIE 3.0, 16GB/s *All bi-directional

  6. General: • All-to-all pattern interconnections among routers. • Optical Cable Link among groups. In Trinity, each group link is made with 2 cables. Each cable provides 4.7GB/s bandwidth. • On the cabinet level, there are 2 cabinets in each group, which are connected by backplane electrical links. Each cabinet contains 3 chassis. The bandwidth of each inter-chassis link is 15.75GB/s. • Inside a chassis, the bandwidth of link between each router is 5.25GB/s.

  7. Connections-Router • All nodes connected by routers • 10 inter-group ports • 15 inter-chassis ports • 15 inter-blade ports

  8. Connections-Chassis • 16 blades • 40 connectors to other groups • 5 connectors to other chassis per blade • Backplane connections among blades • PCIE -3 x 16 between a node and blade

  9. Connections-Inter-group • Connection between 2 group ports of 2 routers in 2 groups. • One link between each group • Use Absolute(Direct) pattern.

  10. Datawarp • Burst buffers are implemented as Datawarp nodes in Cray XC40.

  11. Simulation Detail • 96 routers • 10 Burst Buffer nodes • 2 LNET nodes • Final phase 224 LNET nodes • 360 Compute Nodes • 384 Nodes in total, 372 in simulation. • Trinity Phase II, 23 Groups • 230 Burst Buffers • 8280 Compute Nodes • Adaptive routing

  12. Simulation Framework • Application Layer • IOR workload • Darshan 3.1 workload • Model-net Layer • Burst Buffer process • Codes-0.5.2

  13. Results • N-N Write • 4 procs per node • 8MB stripe size on BB • <1024: 32GB per proc • >=1024: 2GB per proc

  14. Results • N-1 Write

  15. Results • N-N Read

  16. Problems • Darshan traces of applications. • Mostly checkpointing at LANL • Lustre is fast enough. • Must use Datawarp APIs. • Datawarp software is still updating. • Currently modeling Trinity Phase II. Final phase is undergoing: • 576 Burst Buffer Nodes • ~20,000 Compute nodes • More modeling details need to be confirmed. • READ bug: • Simulation ends when compute nodes scale up to 2048 in N-N Read. • Ends sooner in N-1 Read simulation. • Seems to be messages are already freed when a reverse event is received.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend