I/O Mini-apps, Compression, and I/O Libraries for Physics-based - - PowerPoint PPT Presentation

i o mini apps compression and i o libraries for physics
SMART_READER_LITE
LIVE PREVIEW

I/O Mini-apps, Compression, and I/O Libraries for Physics-based - - PowerPoint PPT Presentation

I/O Mini-apps, Compression, and I/O Libraries for Physics-based Simulations User Productivity Enhancement, Technology Transfer, and Training (PETTT) Presented by Sean Ziegeler (Engility PETTT) November 13, 2017 PETTT DISTRIBUTION STATEMENT


slide-1
SLIDE 1

PETTT

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

I/O Mini-apps, Compression, and I/O Libraries for Physics-based Simulations

Presented by Sean Ziegeler (Engility PETTT) November 13, 2017

User Productivity Enhancement, Technology Transfer, and Training (PETTT)

slide-2
SLIDE 2

2

PETTT

MiniIO: I/O Mini-apps

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

”Unstruct” ”Cartiso” ”Struct” ”AMR”

slide-3
SLIDE 3

3

PETTT

MiniIO: I/O Mini-apps

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

”Unstruct” ”Cartiso” ”Struct” ”AMR”

slide-4
SLIDE 4

4

PETTT

Struct Mini-app

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

— Struct: Structured grids with masks/blanking

Masks for missing or invalid data (e.g., land in an ocean model)

˜

2D simplectic noise to generate synthetic mask maps

˜

Can choose % of blanked data points

˜

Noise frequency governs sizes of blanked areas (continents vs islands)

4D simplectic noise to fill time-variant variables

Option for load balancing non- masked points evenly (as desired) across ranks

˜

But creates load imbalance for I/O because blanked data is still written

˜

Compression theoretically rebalances the I/O (blanked constants compress well)

slide-5
SLIDE 5

5

PETTT

Results

0.00 50.00 100.00 150.00 200.00 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell ADIOS POSIX

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./zfp Bal./No Compr. Bal./zlib Bal./szip Bal./zfp 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 512 4096 8192 Throughput (GB/s) Cores

KNL ADIOS POSIX

Computationally unbalanced Balanced (I/O unbalanced!)

ADIOS POSIX: one file per rank

Red: No compression Blue: zlib deflate compression (think gzip) Green: szip compression Purple: zfp (error bounded lossy, 0.0001), ~9:1 on average

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-6
SLIDE 6

6

PETTT

Results

0.00 50.00 100.00 150.00 200.00 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell ADIOS POSIX

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./zfp Bal./No Compr. Bal./zlib Bal./szip Bal./zfp 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 512 4096 8192 Throughput (GB/s) Cores

KNL ADIOS POSIX

ADIOS POSIX: one file per rank

—

Initial scalability with core count

—

Computational balancing hurts performance a little

But compression sometimes helps

—

Zfp is the fastest compression

—

KNL is slower

—

ADIOS POSIX is the fastest without compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-7
SLIDE 7

7

PETTT

Results

0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell ADIOS MPI

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./zfp Bal./No Compr. Bal./zlib Bal./szip Bal./zfp 0.00 20.00 40.00 60.00 80.00 100.00 512 4096 8192 Throughput (GB/s) Cores

KNL ADIOS MPI

ADIOS MPI: one file for all ranks

—

Good scalability with core count, especially with compression

—

Computational balancing hurts performance a little

But compression mostly helps

—

Zfp is by far the fastest compression

—

KNL is much slower, especially the compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-8
SLIDE 8

8

PETTT

Results

0.00 50.00 100.00 150.00 200.00 250.00 300.00 350.00 400.00 450.00 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell ADIOS MPI-Lustre

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./zfp Bal./No Compr. Bal./zlib Bal./szip Bal./zfp 0.00 20.00 40.00 60.00 80.00 100.00 512 4096 8192 Throughput (GB/s) Cores

KNL ADIOS MPI-Lustre

ADIOS MPI-Lustre: one file for all ranks, tuned for Lustre file system on that system

—

Good scalability with core count, especially with compression

—

Computational balancing hurts performance a little

But compression mostly helps

—

Zfp is by far the fastest compression

—

KNL is much slower, especially the compression

—

MPI-Lustre is the fastest with compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-9
SLIDE 9

9

PETTT

Results

  • 30

20 70 120 170 220 270 320 370 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell ADIOS MPI-Aggregate

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./zfp Bal./No Compr. Bal./zlib Bal./szip Bal./zfp 0.00 20.00 40.00 60.00 80.00 100.00 512 4096 8192 Throughput (GB/s) Cores

KNL ADIOS MPI-Aggregate

ADIOS MPI-Aggregate: m files, m < number of ranks, on Lustre: m = #_of_OSTs

—

Good scalability with core count, especially with compression

—

Computational balancing hurts performance very little

Compression helps, but not as much

—

Zfp is by far the fastest compression

—

KNL is much slower, especially the compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-10
SLIDE 10

10

PETTT

Results

10 20 30 40 50 60 70 80 528 4048 8008 21912 Throughput (GB/s) Cores

Broadwell HDF5

Unbal./No Compr. Unbal./zlib Unbal./szip Unbal./shuffle+zlib Bal./No Compr. Bal./zlib Bal./szip Bal./shuffle+zlib 0.00 5.00 10.00 15.00 20.00 25.00 512 4096 8192 Throughput (GB/s) Cores

KNL HDF5

HDF5: one file for all ranks

—

Starts slower, but scalability with core count, especially with compression

—

Computational balancing hurts performance a lot

But compression helps somewhat

—

Shuffle+zlib is the fastest compression (zfp not available at the time)

—

KNL is much slower, especially the compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-11
SLIDE 11

11

PETTT

Conclusions

— Compression can ”fix” I/O performance issues introduced by

computational load balancing

With the right output method, it is faster than unbalanced, uncompressed output

— Compression can be faster than uncompressed I/O

Always been theoretically possible, but rare in practice

Part computation: So can scale with the simulation

— Zfp compression is very fast even at a modest compression ratio

(~9:1)

At scale, produces “virtual” throughput faster than the file system

Shuffle+zlib in HDF5 is also good

— KNL is slower, with & without compression

More cores per node è fewer nodes doing parallel I/O

Much weaker integer processing means slower compression

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-12
SLIDE 12

12

PETTT

Next Steps

— Tests on Intel Broadwell cores at larger scales

Complete 20k cores, begin at 40-60k cores

— Zfp with HDF5 — Quilting (setting aside a few cores dedicated to I/O)

Works very well for struct [separate study by SDSC] & similar apps

Hypothesize that quilting would be very poor for compression

E.g., for zfp at scale, expect that we do not want to use quilting

Or, at least compression on all cores, quilting after for actual I/O

— Test on Intel Skylake cores

Google Compute Engine, Gluster file system

512 – 4096 cores

Hypothesize performance between Broadwell & KNL

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-13
SLIDE 13

13

PETTT

This material is based upon work supported by,

  • r in part by, the Department of Defense

High Performance Computing Modernization Program (HPCMP) under User Productivity, Technology Transfer and Training (PETTT) contract number GS04T09DBC0017.

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

slide-14
SLIDE 14

Work-in-Progress Abstract Compiler-Assisted Scientific Workflow Optimization

Hadia Ahmed1, Peter Pirkelbauer2, Purushotham Bangalore2, Anthony Skjellum3

1 Lawrence Berkeley National Laboratory 2 University of Alabama at Birmingham 3 University of Tennessee at Chattanooga puri@uab.edu Workflow Optimization November 13, 2017 1 / 6

slide-15
SLIDE 15

Introduction

Exascale Systems Data analytics will face tremendous challenges on Exascale systems Many compute nodes communicate with analytics nodes Simulations produce vast amount of data In-situ (in-transit) analytics necessary to deal with limited bandwidth Simulation / analytics code need to be re-organized

puri@uab.edu Workflow Optimization November 13, 2017 2 / 6

slide-16
SLIDE 16

Idea

Describe Re-organization Users specify re-organization with an annotation language Tool generates optimized version Move code from analytics node to simulation (or vice versa) Describe reductions . . .

puri@uab.edu Workflow Optimization November 13, 2017 3 / 6

slide-17
SLIDE 17

Approach

Compiler-based Use ROSE to read, analyze, and re-organize source files

puri@uab.edu Workflow Optimization November 13, 2017 4 / 6

slide-18
SLIDE 18

Early Results

Restructure Bonds-CSym On a single system, we achieved speedups between 4% and 12%. Restructured Bonds-CSym in a 1:1 configuration Re-organized code

Eliminates storage to file system Eliminates data container conversion Enables further compile-time optimizations

Bonds-CSym is quadratic, smaller input sizes exhibit larger speedups Reduced need for network communication

puri@uab.edu Workflow Optimization November 13, 2017 5 / 6

slide-19
SLIDE 19

Thank you

contact: Peter Pirkelbauer (UAB) e-mail: pirkelbauer@uab.edu

puri@uab.edu Workflow Optimization November 13, 2017 6 / 6

slide-20
SLIDE 20

Micro-Storage Services for Open Ethernet Drive

Hariharan Devarajan, hdevarajan@hawk.iit.edu Anthony Kougkas, akougkas@hawk.it.edu Xian-He Sun, sun@iit.edu

slide-21
SLIDE 21

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Introduction

11/10/2017 Slide 2

  • High cost from storage
  • Purchase
  • Real-Estate (physical space)
  • Maintenance
  • Energy
  • Up to 40% of the entire energy

footprint

  • A very long and complex storage software

stack

  • Exa-scale will exacerbate this problem

Supercomputer K Kaust Tianhe-2 Trinity # storage nodes 2000 400 1000 400

slide-22
SLIDE 22

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Open Ethernet Drive

11/10/2017 Slide 3

  • Intelligent drive
  • ARM-powered
  • Fixed sized ram
  • Network card
  • Runs full-fledged Linux OS
  • Prototype devices by:
  • Seagate Kinetic
  • Western Digital (HGST)
  • Presented in enclosures of

multiple such drives (JBOD)

  • Enclosures have an embedded

switched fabric (60Gbit/s)

slide-23
SLIDE 23

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Open Ethernet Drive - Initial results

Pros

  • OEDs are capable Parallel FS and Object

Store servers as well as I/O accelerators (i.e., burst buffers).

  • OEDs proved to be 2.2x to 15x more

energy efficient than a typical server.

  • Can achieve great parallelism for the

same power cap

Cons

  • Computation power is not at par with

server nodes

  • No API to use JBOD.
  • Running a full-fledged Linux OS on OEDs

is extremely heavy and poses unnecessary overheads

11/10/2017 Slide 4

Published Work

  • H. Devarajan, A. Kougkas, and X. H. Sun, “Open Ethernet Drive Evolution of Energy-Efficient Storage Technology.” in

Proceedings of DataCloud’17, Denver,CO.

  • A. Kougkas, A. Fleck, and X. H. Sun, “Towards energy efficient data management in HPC: The open Ethernet drive

approach,” in Proceedings of PDSW-DISCS’16: 2017, pp. 43–48.

slide-24
SLIDE 24

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Proposal – Design Objectives

  • Micro storage kernel
  • Minimize OS unnecessary overheads.
  • Modules which are not crucial to storage

nodes would be removed.

  • Maximize performance
  • Fine-tune the kernel to better suit the

needs of the OED technology

  • Lightweight API
  • Maximize utilization of JBOD
  • Parallelization of I/O tasks
  • Offload small computation to JBOD
  • JBOD Services:
  • Manager, I/O Scheduler, Load Balancer
  • Provide mount point for application

11/10/2017 Slide 5

slide-25
SLIDE 25

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Our first steps

11/10/2017 Slide 6

  • BusyBox 1.27.2 Linux
  • As a building block
  • Very small size (i.e., ~5MB)
  • Add XFS file system
  • Results
  • Reduced boot time by 1300%
  • Smaller memory footprint leading

to more available memory to applications (i.e., from 350MB to

  • nly 15MB)
  • Next step:
  • Investigate other lightweight Linux

distributions for embedded and mobile platforms (e.g., ToyBox)

  • Develop a light-weight parallel file

system within the JBOD.

slide-26
SLIDE 26

Micro-Storage Services for Open Ethernet Drive Hariharan Devarajan, PhD Student, hdevarajan@hawk.iit.edu

Micro-Storage Services for Open Ethernet Drive

11/10/2017 Slide 7

Hariharan Devarajan, hdevarajan@hawk.iit.edu

slide-27
SLIDE 27

Burst Buffer Evaluation Research directions References

Comprehensive Burst Buffer Evaluation

Eugen Betke, Julian Kunkel

Research Group German Climate Computing Center 2017-11-12

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-28
SLIDE 28

Burst Buffer Evaluation Research directions References

Objectives

Understanding how burst buffers can be used in an alternative way

Burst buffers are mainly used for catching I/O peaks

Improving runtime of I/O intensive application by better workflows Reducing procurement costs by intelligent usage of burst buffers

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-29
SLIDE 29

Burst Buffer Evaluation Research directions References

Test systems and evaluation tools

Test systems Kove XPD [3]

In-memory storage

DDN IME [5]

SSD-based

Cray DataWarp [2]

SSD-based

Parallel I/O benchmark tools NetCDF-Bench [4]

is a parallel NetCDF benchmark generates I/O load to a shared NetCDF file mimics scientific data

Many climate scientist favor NetCDF to

  • ther formats, because it offers powerful

features and has a simple interface.

IOR

uses MPI-IO interface in our tests generates I/O load to individual files in

  • rder to get best I/O performance

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-30
SLIDE 30

Burst Buffer Evaluation Research directions References

Short-term campaign storage space

Purpose

Reduction of I/O load on main storage

Basic idea

Storing temporary data on main storage may be inefficient when

Temporary data is stored on burst buffer Results are stored on main storage

Expectation

Speed up of I/O intensive applications

Evaluation methodology

Gathering of burst buffer characteristics

Goal

Intelligent and efficient workflows

I/O intensive Application Burst Buffer Main Storage

final results temporary data

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-31
SLIDE 31

Burst Buffer Evaluation Research directions References

Reducing procurement costs of HPCs [1]

CN0

(64GB RAM)

CN1

(64GB RAM)

. . . CNX

(64GB RAM)

Storage

(52PB)

Observations made on Mistral [1] (HPC of DKRZ) Most applications are using only a fraction

  • f available memory

A few memory intensive applications have high memory requirements

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-32
SLIDE 32

Burst Buffer Evaluation Research directions References

Reducing procurement costs of HPCs [2]

CN0

(32GB RAM)

CN1

(32GB RAM)

. . . CNX

(32GB RAM)

Storage

(52PB)

Remote swap

(how large?)

Purpose

Reducing total HPC costs

Basic idea

Equip compute nodes with less memory For memory intensive application use remote swap file system

Expectation

Most programs are not affected Memory intensive application are affected by swap overhead

Evaluation methodology

Tracing of swap in/out with kprobes

Goal

Cost model

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-33
SLIDE 33

Burst Buffer Evaluation Research directions References

References

HLRE-3 "Mistral". https://www.dkrz.de/Klimarechner/hpc. Accessed

  • n 2017-03-22.

Cray Inc. Cray XC40 DataWarp’s applications I/O accelerator. Cray

  • Inc. Cray Inc. 901 Fifth Avenue, Suite 1000 Seattle, WA 98164, Oct. 2015.
  • Kove. Kove XPD.

http://kove.net/downloads/Kove-XPD-L3-datasheet.pdf. Accessed on 2017-08-24. 2017. NetCDF-Bench. https://github.com/joobog/netcdf-bench. Accessed

  • n 2017-08-25.

DDN Storage. Burst buffer & beyond; I/O & Application Acceleration

  • Technology. DDN Storage. Sept. 2015.

Eugen Betke, Julian Kunkel Comprehensive Burst Buffer Evaluation

slide-34
SLIDE 34

spcl.inf.ethz.ch @spcl_eth

  • S. DI GIROLAMO, P. SCHMID, T. SCHULTHESS, T. HOEFLER

SimFS: A Simulation Data Virtualizing File System

slide-35
SLIDE 35

spcl.inf.ethz.ch @spcl_eth

2

Disk-Backed Solution

T1 Analysis/Visualization Tools T2 T3 T4 Simulator Results

Runs the simulation

1

Stores the results

2

Analyze the results

3 Elasticity Persistent data I/O Capacity I/O Bandwidth

1995

MegaBytes

2015

PetaBytes

2025

ExaBytes? Maintenance Cost: 100$/TB/year

Exabyte/year cost: 100'000'000$

slide-36
SLIDE 36

spcl.inf.ethz.ch @spcl_eth

3

SimFS: Virtualizing Simulation Data

T1 Analysis Tools T2 T3 T4 COSMO Checkpoints

Runs the simulation

1

Stores the checkpoints

2

Analyze the results

3 Data Virtualization Layer

Get data

4 Simulation Time Restart Requested Re-Simulated

How to cache? Where to cache? How to prefetch? When to prefetch?

slide-37
SLIDE 37

spcl.inf.ethz.ch @spcl_eth

4

Backup

slide-38
SLIDE 38

spcl.inf.ethz.ch @spcl_eth

5

In Situ Solution

Simulator

Runs the simulation

1 T1 Analysis/Visualization Tools T2 T3 T4

Get simulation data*

2 Elasticity Persistent data I/O Capacity I/O Bandwidth

slide-39
SLIDE 39

spcl.inf.ethz.ch @spcl_eth

6

Node 1 Node 3 Node 2 Node 4

Inter-Node Virtualizer/Cache (DVL)

Intra- Node Cache

Local Cache Local Cache Local Cache Local Cache

Intra- Node Cache

Local Cache Local Cache Local Cache Local Cache

Intra- Node Cache

Local Cache Local Cache Local Cache Local Cache

Intra- Node Cache

Local Cache Local Cache Local Cache Local Cache

Inter-Node Virtualizer/Cache (DVL)

Intra-Node Cache

Local Cache Local Cache

Intra-Node Cache

Local Cache Local Cache

SDaVI Framework

slide-40
SLIDE 40

spcl.inf.ethz.ch @spcl_eth

DVL-C

7

Analysis Tool DVL i.query(x)

Send query(x) Call real nc_open(x)

nc_open(x) Notify Analysis Tool

Wait ACK from the DVL

Hit = Offline Simulation

slide-41
SLIDE 41

spcl.inf.ethz.ch @spcl_eth

DVL-C

8

Analysis Tool DVL i.query(x)

Send query(x) Call real nc_open(x)

nc_open(x) DVL-S Simulator

Notify DVL

r = restart(x) s = simblock(x) simulate(r, s) nc_open(x) nc_puts … nc_close(x) i.insert(x) … … Notify Analysis Tool

Wait ACK from the DVL

slide-42
SLIDE 42

spcl.inf.ethz.ch @spcl_eth

DVL-C

9

Analysis Tool DVL i.query(x)

Send query(x)

nc_open(x) DVL-S Simulator

Notify DVL

r = restart(x) s = simblock(x) simulate(r, s) nc_open(x) nc_puts … nc_close(x) i.insert(x) … … Notify Analysis Tool

Wait for data

nc_get(x,t1) RDMA

?

Miss = In Situ Simulation

slide-43
SLIDE 43

spcl.inf.ethz.ch @spcl_eth

10

Does intra/local node caching make sense?

  • M. Besta, T. Hoefler, Fault Tolerance for Remote Memory Access Programming Models,

HPDC'14

RMA read of 10MB Intra-node 1.08 ms Inter-Node 3.47 ms Intra-Cabinet 7.74 ms Inter-Cabinet 11.36 ms

Inter-Node Virtualizer/Cache (DVL)

Intra-Node Cache

Local Cache Local Cache

Intra-Node Cache

Local Cache Local Cache

slide-44
SLIDE 44

Establishing the IO-500 Benchmark

Julian M. Kunkel, John Bent, Jay Lofstead, George S. Markomanolis 2017-11-13 http://www.io500.org

IO

500

slide-45
SLIDE 45

Approach Challenges & Approach Outlook

The IO-500

Goals Tracking storage performance Sharing best practices Benchmarking Approach Community driven effort Patterns: metadata, data, search

Easy for optimized patterns Hard for naive patterns

Relies on community benchmarks

Data pattern complexity

IOR Easy IOR Hard MD Hard MD Easy

Namespace complexity

Find

Julian M. Kunkel

IO

500

2 / 7

slide-46
SLIDE 46

Approach Challenges & Approach Outlook

List Results from BeeGFS, DataWarp, IME, Spectrum Scale, Lustre

Julian M. Kunkel

IO

500

3 / 7

slide-47
SLIDE 47

Approach Challenges & Approach Outlook

Challenges of Establishing the Benchmark

This is a short summary of experience gained by Feedback from discussions

From SC/ISC BoFs Peers

Feedback of people executing the IO-500 on different systems Thanks to everybody contributing

Julian M. Kunkel

IO

500

4 / 7

slide-48
SLIDE 48

Approach Challenges & Approach Outlook

Challenges & Approach

Representative of applications and user requirements Supply workloads providing

Upper bound for optimized applications Performance expectation for non-optimized applications

More workloads and concurrent execution to be integrated Understandable and human comprehensive results Report meaningful metrics Imply low variability of repeated measurements Computing of an overall score for ranking but retain individual values

Julian M. Kunkel

IO

500

5 / 7

slide-49
SLIDE 49

Approach Challenges & Approach Outlook

Challenges & Approach

Portable Ran into Python (Shell) portability issues C-APIs: readdir() does not return type on DataWarp Non-POSIX stat() call on one system Inclusive: cover various storage technology and non-POSIX APIs Allow vendors to use specific optimizations (for easy runs)

Enable replacement for find (IBM Spectrum Scale has optimizations here)

Relying on (IOR’s) AIOR interface (thanks to Nathan for porting mdtest) We are still the process to support more storage APIs

Julian M. Kunkel

IO

500

6 / 7

slide-50
SLIDE 50

Approach Challenges & Approach Outlook

Challenges & Approach

Scalable, i.e., run on large-scale computers and relevant storage systems IOR and mdtest are MPI parallelized Supply a parallel find version Lightweight: easy to setup and cheap to run 5 minute write/creation phases to limit runtime Extended IOR/mdtest for phase-out stonewalling options Trustworthy: prevent (unintended) cheating Reveal all tunings made (also shares best practice) Sufficiently large working set

Julian M. Kunkel

IO

500

7 / 7

slide-51
SLIDE 51

Visit our Birds of a Feather at SC

IO

500