Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael - - PowerPoint PPT Presentation

burst buffer simulation in dragonfly network
SMART_READER_LITE
LIVE PREVIEW

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael - - PowerPoint PPT Presentation

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of Technology, Los Alamos National Laboratory Purpose: Residing in the compute node network, and using solid state drives (SSD), burst buffers bring a


slide-1
SLIDE 1

Burst Buffer Simulation In Dragonfly Network

Jian Peng, Michael Lang

Illinois Institute of Technology, Los Alamos National Laboratory

slide-2
SLIDE 2

Purpose:

  • Residing in the compute node network, and using solid state drives

(SSD), burst buffers bring a significant I/O performance boost compared with traditional external hard disk drive (HDD) storage system.

  • bottleneck of fully leveraging burst buffers still remains unknown.
  • Sharing network
  • Limited burst buffer number
  • Due to large system scales, it is usually too expensive to change

system setup and configurations.

slide-3
SLIDE 3

Trinity System Overview:

  • System Level
  • Cabinet Level
  • Chassis Level
slide-4
SLIDE 4

Inside a Group:

slide-5
SLIDE 5

Trinity Phase II Network Configuration:

  • Global link:
  • 37.6 GB/s
  • Local link:
  • Intra-chassis, 5.25GB/s
  • Inter-chassis, 15.75GB/s

(3 tiles).

  • Intra-Blade link:
  • PCIE 3.0, 16GB/s

*All bi-directional

slide-6
SLIDE 6

General:

  • All-to-all pattern interconnections among routers.
  • Optical Cable Link among groups. In Trinity, each group link is made

with 2 cables. Each cable provides 4.7GB/s bandwidth.

  • On the cabinet level, there are 2 cabinets in each group, which are

connected by backplane electrical links. Each cabinet contains 3

  • chassis. The bandwidth of each inter-chassis link is 15.75GB/s.
  • Inside a chassis, the bandwidth of link between each router is

5.25GB/s.

slide-7
SLIDE 7

Connections-Router

  • All nodes connected by routers
  • 10 inter-group ports
  • 15 inter-chassis ports
  • 15 inter-blade ports
slide-8
SLIDE 8

Connections-Chassis

  • 16 blades
  • 40 connectors to other groups
  • 5 connectors to other chassis per blade
  • Backplane connections among blades
  • PCIE -3 x 16 between a node and blade
slide-9
SLIDE 9

Connections-Inter-group

  • Connection between 2 group ports
  • f 2 routers in 2 groups.
  • One link between each group
  • Use Absolute(Direct) pattern.
slide-10
SLIDE 10

Datawarp

  • Burst buffers are implemented as

Datawarp nodes in Cray XC40.

slide-11
SLIDE 11

Simulation Detail

  • 96 routers
  • 10 Burst Buffer nodes
  • 2 LNET nodes
  • Final phase 224 LNET nodes
  • 360 Compute Nodes
  • 384 Nodes in total, 372 in simulation.
  • Trinity Phase II, 23 Groups
  • 230 Burst Buffers
  • 8280 Compute Nodes
  • Adaptive routing
slide-12
SLIDE 12

Simulation Framework

  • Application Layer
  • IOR workload
  • Darshan 3.1 workload
  • Model-net Layer
  • Burst Buffer process
  • Codes-0.5.2
slide-13
SLIDE 13

Results

  • N-N Write
  • 4 procs per node
  • 8MB stripe size on BB
  • <1024: 32GB per proc
  • >=1024: 2GB per proc
slide-14
SLIDE 14

Results

  • N-1 Write
slide-15
SLIDE 15

Results

  • N-N Read
slide-16
SLIDE 16

Problems

  • Darshan traces of applications.
  • Mostly checkpointing at LANL
  • Lustre is fast enough.
  • Must use Datawarp APIs.
  • Datawarp software is still updating.
  • Currently modeling Trinity Phase II. Final phase is undergoing:
  • 576 Burst Buffer Nodes
  • ~20,000 Compute nodes
  • More modeling details need to be confirmed.
  • READ bug:
  • Simulation ends when compute nodes scale up to 2048 in N-N Read.
  • Ends sooner in N-1 Read simulation.
  • Seems to be messages are already freed when a reverse event is received.