Extending Scalability of Collective IO Through Nessie and Staging - - PowerPoint PPT Presentation

extending scalability of collective io through nessie and
SMART_READER_LITE
LIVE PREVIEW

Extending Scalability of Collective IO Through Nessie and Staging - - PowerPoint PPT Presentation

Extending Scalability of Collective IO Through Nessie and Staging Parallel Data Storage Workshop November 13, 2011 Jay Lofstead (SNL) Ron Oldfield (SNL) Todd Kordenbrock (HP) Charles Reiss (UCB) Sandia National Laboratories is a


slide-1
SLIDE 1

Extending Scalability of Collective IO Through Nessie and Staging

Parallel Data Storage Workshop November 13, 2011 Jay Lofstead (SNL) Ron Oldfield (SNL) Todd Kordenbrock (HP) Charles Reiss (UCB)

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

slide-2
SLIDE 2

PDSW 2011 2

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Motivation

  • Collective & Two-Phase IO proven beneficial

– Relatively modest data volumes, 1-D, 2-D

  • But….
  • Trade-off of inter-node communication for data

reorganization to save IO not always beneficial

– Large datasets – 3-D domain decompositions particularly bad

slide-3
SLIDE 3

PDSW 2011 3

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Motivation

  • Problem: technique is central to some IO APIs

– netCDF4, HDF5

  • Problem: Changing IO techniques/file format may

not be an option for some applications

– CESM climate model is committed to netCDF file format

  • Problem: Continued scaling of problem sizes and

resolution making things worse

slide-4
SLIDE 4

PDSW 2011 4

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Solution

  • Use efficient transport layer and data staging

transparently in the IO stack

– Re-implement native IO API (link-time compatible) – Ensure format on disk is identical

  • Requirements

– Efficient, portable transport layer – Staging area functionality to reduce time to completion

slide-5
SLIDE 5

PDSW 2011 5

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Solution Architecture

Native Nessie

Science Application PnetCDF library PnetCDF API Native Implementation

Storage

Science Application PnetCDF library PnetCDF API Native Implementation

Storage

NSSI Redirector PnetCDF API Staging Functionality PnetCDF Staging library

slide-6
SLIDE 6

PDSW 2011 6

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Nessie Transport Layer

  • Network Scalable Services Interface (Nessie)

– Originally developed for the Lightweight File Systems project – RPC-like asynchronous API layer supporting RDMA – Physical layer support

  • InfiniBand
  • Portals
  • Cray Gemini

– Server-directed for bulk data

  • Writes: pull from client
  • Reads: push to client
slide-7
SLIDE 7

PDSW 2011 7

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Staging Functionality

  • Collect data packets prior to writing to storage

– Cache data chunks to afford other optimizations

  • Perform data rearrangement

– Perform partial data rearrangement like two-phase IO

  • Use different techniques for writing to storage

– Direct, Caching, Aggregation

slide-8
SLIDE 8

PDSW 2011 8

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Nessie Performance (Portals)

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

NSSI Scaling Performance on Red Storm

Bytes/Transfer Throughput (MB/s) SeaStar Network

!

1 client 4 clients 16 clients 64 clients 32 1024 32768 1048576 33554432 512 1024 1536 2048 25 50 75 100 Percentage of Peak

Performance of xfer_write_rdma on Red Storm

slide-9
SLIDE 9

PDSW 2011 9

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Nessie Performance (InfiniBand)

  • NSSI Scaling Performance on Thunderbird

Bytes/Transfer Throughput (MB/s) InfiniBand Network

  • 1 client

4 clients 16 clients 64 clients 32 1024 32768 1048576 33554432 256 512 768 1024 25 50 75 100 Percentage of Peak

Performance of xfer_write_rdma on Thunderbird

slide-10
SLIDE 10

PDSW 2011 10

Jay Lofstead www.lofstead.org gflofst@sandia.gov

NetCDF Staging Operation

  • 1. Initiate Request
  • 2. Start data retrieval
  • 3. Move data
  • 4. Put completion/

result

  • 5. Process in staging

area

Compute Area Staging Area 1 2 3 4 5

slide-11
SLIDE 11

PDSW 2011 11

Jay Lofstead www.lofstead.org gflofst@sandia.gov

NetCDF Staging Functionality

  • NetCDF4 and PnetCDF API supported
  • Direct – synchronous with client calls
  • Cache Independent – asynch with client calls
  • Aggregate Independent – asynch with client calls,

but aggregate data prior to writing (on node only) request response

slide-12
SLIDE 12

PDSW 2011 12

Jay Lofstead www.lofstead.org gflofst@sandia.gov

NetCDF Staging Functionality

  • Untested functionality for this paper

– Collective IO versions of cache and aggregate – like independent versions, but a maximal number of collective IO calls made for writing – Other data manipulation

  • Different implementation using Nessie for data

analysis hosting

slide-13
SLIDE 13

PDSW 2011 13

Jay Lofstead www.lofstead.org gflofst@sandia.gov

NetCDF Staging Performance

Testing on JaguarPF using S3D IO kernel

!" #!!!" $!!!" %!!!" &!!!" '!!!" (!!!" #!$&" $!&)" %!*$" &!+(" '#$!" (#&&" *#()" )#+$" !"#$%&'$()*+',%

  • .)($''%/)0*1%

,-./0"1,02345" 678092"/7-":2-;7,;"

  • ;;80;-20<7,6=0,60,2"

9-9>7,;<7,60=0,60,2"

slide-14
SLIDE 14

PDSW 2011 14

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Current Status

  • Nessie and NetCDF staging now part of Trilinos

– Trios capability area

  • Port to BlueGene underway
  • Integration of accelerators for staging processing
slide-15
SLIDE 15

PDSW 2011 15

Jay Lofstead www.lofstead.org gflofst@sandia.gov

Future Work

  • Finish tests on RedSky to isolate Lustre issues
  • Test collective IO routines
  • Examine impact on reading performance
  • For Nessie, other applications

– ‘In flight’ data analysis routines – Transactions for resilience in data staging (see our poster!) – Hybrid, high level IO routine complications

  • Exodus + NetCDF