Bringsel: A Tool for Measuring Storage System Reliability, - - PowerPoint PPT Presentation

bringsel a tool for measuring storage system reliability
SMART_READER_LITE
LIVE PREVIEW

Bringsel: A Tool for Measuring Storage System Reliability, - - PowerPoint PPT Presentation

Bringsel: A Tool for Measuring Storage System Reliability, Uniformity, Performance and Scalability John Kaitschuck Cray Federal CUG2007 jkaitsch@cray.com 5/2007 Overview - Challenges in File Systems Testing and Technology - Points for


slide-1
SLIDE 1

Bringsel: A Tool for Measuring Storage System Reliability, Uniformity, Performance and Scalability

John Kaitschuck Cray Federal jkaitsch@cray.com 5/2007

CUG2007

slide-2
SLIDE 2

CUG2007 ||23

Overview

  • Challenges in File Systems Testing and Technology
  • Points for Consideration
  • A Generalized Requirement Framework
  • Bringsel, Yet Another File System Benchmark?
  • Features
  • Examples
  • Sample Output
  • Testing/Taxonomy
  • Some Results
  • Possible Future Directions for Bringsel
  • Questions

01

slide-3
SLIDE 3

CUG2007 ||23

Challenges in File System Testing and Technology

  • Primary focus within community, users and suppliers.
  • Rarely consider reliability (implied/assumed).
  • Pace of hardware technology vs. system software.
  • Limits on testing, temporal and hardware wise.
  • Focus derived from RFP/SOW/Facility breakdown.
  • Scaling, doing end to end testing.
  • Historical context, past vs. present.
  • Differing customer/user requirements.
  • Sometimes ideas ignore operational context.

"If seven maids with seven mops Swept it for half a year, Do you suppose," the Walrus said, "That they could get it clear?"

  • - Lewis Carroll

02

slide-4
SLIDE 4

CUG2007 ||23

Points for Consideration [1] Service Specifics - API's, Documentation, Security... [2] Reliability - Given N bits, reflect N bits of content... [3] Uniformity - Under load X, for period T... [4] Performance - Provide high bandwidth, low latency... [5] Scalability - Provide 1 -> 4 at sizes required...

Full Partial

03

slide-5
SLIDE 5

CUG2007 ||23

A Generalized Requirement Framework

Sea + Reb+ Unc + Ped + Sce

∑ ∑ ∑ ∑ ∑

  • Where these elements take on a series of unique values, which are...
  • Defined by the facility.
  • Defined by the application(s).
  • Constrained by the technology/architecture (fs, dfs, pfs).

04

slide-6
SLIDE 6

CUG2007 ||23

Sea + Reb+ Unc + Ped + Sce

∑ ∑ ∑ ∑ ∑ Benchmarks Technology A Generalized Requirement Framework: Ideally

05

slide-7
SLIDE 7

CUG2007 ||23

Bringsel, Yet Another File System Benchmark?

  • Plenty of existing benchmarks/utilities...

bonnie++, iozone, filebench, perf, pdvt, ior, xdd, explode trace, etc.

  • Not all are "operational inclusive" (mixed ops and blocks).
  • Most focus on separated MD/Data testing.
  • Need a known context, bringsel development started in

~1998, focused on HPTC, a strictly part time project.

  • Need to have a code that is easy to modify,

comment, extend, maintain and balance simplicity/complexity.

  • Need a code with a known utilization history.

(Industry, NSF, other Federal sites)

  • Need to focus on central point within user space for "nd" I/O.
  • Unique tools, enable unique discoveries.
  • Diversification of available test programs.

06

slide-8
SLIDE 8

CUG2007 ||23

Features

  • Symmetric tree creation and population.
  • MultiAPI support:

POSIX, STREAM, MMAP, MPI_IO

  • POSIX threads support (AD).
  • File checksums via haval.
  • Directory walks, across created structures.
  • Metadata loop measurements.
  • MSI support via MPI (MPP/Clusters).
  • Mixed access types (RW, SR, etc.).
  • Mixed block sizes (16K, 1024K, etc.).
  • Remedial configuration file parsing.
  • Coordinated looping/iteration support.
  • Misc functionality:

truncation, async I/O, appending, etc.

  • Numerous reliability checks.
  • Of course, Bandwidth and IOPS performance measurement

as well.

07

slide-9
SLIDE 9

CUG2007 ||23

Examples

bringsel -T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha bringsel -T 4 -a sx -D /snarf/foo:1,2,2 -L General File Operation Directory Walk

Simple CLI Invocation

08

slide-10
SLIDE 10

CUG2007 ||23

Examples Configuration File Utilization

# # Comments begin with "#" #

  • T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha
  • T 4 -a sx -D /snarf/foo:1,2,2 -L

Invocation

bringsel -C ./sample.cnf 09

slide-11
SLIDE 11

CUG2007 ||23 bringsel -T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha /snarf/foo

A0001 B0001 B0002 C0001 C0002 C0001 C0002 MAXD = 20 MAXB = 100

1 2 2

T1 T1 T2 T1 T2 T3 T4

Barrier Barrier

Example: Parallel Directory Creation

mkdir stat mkdir stat mkdir stat mkdir stat mkdir stat mkdir stat mkdir stat Time N

10

slide-12
SLIDE 12

CUG2007 ||23 bringsel -T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha

T1 T2 T3 T4

Example: Metadata Loop Operations

A0001

tmp_file1 tmp_file2 tmp_file3 tmp_file4

  • pen

close stat rename mkdir chmod utime Error?

11

slide-13
SLIDE 13

CUG2007 ||23 bringsel -T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha

T1 T2 T3 T4

Example: File Operations

A0001

alpha_0001 100MB alpha_0002 100MB alpha_0003 100MB alpha_0004 100MB

  • pen

write close chksum Error?

32KB 32KB 32KB 32KB

POSIX

12

slide-14
SLIDE 14

CUG2007 ||23 bringsel -T 4 -D /snarf/foo:1,2,2 -M -L -c -b 32 -S 100M alpha /snarf/foo

A0001 B0001 B0002 C0001 C0002 C0001 C0002

1 2 2

Barrier Barrier

Example: Sequence of Operations

Next Complete 4x 100 MB

1 2 3 4 5 6

Barrier Barrier Barrier End Results

13

slide-15
SLIDE 15

CUG2007 ||23 bringsel -T 4 -a sx -D /snarf/foo:1,2,2 -L

Example: Directory Walk

T1 T2 T3 T4

A0001

  • pendir

readdir stat rewinddir closedir Error?

B0001 B0002 C0001 C0002 C0001 C0002

4x 100 MB

14

slide-16
SLIDE 16

CUG2007 ||23 /snarf/foo

A0001 B0001 B0002 C0001 C0002 C0001 C0002

/snarf /widget

A0001 B0001 B0002 C0001 C0002 C0001 C0002

Repository

Backup Restore

Example: Hash Trees

15

slide-17
SLIDE 17

CUG2007 ||23

Example: Hash Tree Formulation

B0002

2

.bringsel_sd01

A = H( f1, f2, f3, f4,B,C)

C0001

5

.bringsel_sd01

B = H( f1, f2, f3, f4)

C0002

6

.bringsel_sd01

C = H( f1, f2, f3, f4)

V= H( f1 → fn,d1 → dn)

H() → SHA − 256

bringsel -T 4 -a ds -D /snarf/foo:1,2,2

T2 T3 T4

Barrier Time

16

N

slide-18
SLIDE 18

CUG2007 ||23

Op/Size Date/Time Thread/Iter MD Time Opn Lat IOPs MBps Error? Etime

Sample Raw Output

Op/Dir Date/Time Thread/Iter MD Time Sym Cnt File Cnt Dir Cnt Etime Error?

Standard File Operations Directory Walk

17

slide-19
SLIDE 19

CUG2007 ||23

Testing/Taxonomy [ 24 : 1 : 1 : 1,0 : 0,0 ] POS : CR : 64K : 310M

Nodes Threads per Node Directory Serial Access Parallel Access Interface Operation Block Size File Size # files,str/seg # files,str/seg 18

slide-20
SLIDE 20

CUG2007 ||23

Sample Results: Reliability

  • Of 25 Tests...
  • ~350 TB of data written without corruption or access failures.
  • No major hardware failures in ~90 days of operation.
  • All checksums valid.
  • Early SLES9 NFS client problems under load, detected and

corrected via patch. (735130)

  • 1 FC DDU failure, without data loss.
  • Spatial use from 0% to 100%+ during various test cases.
  • Test case durations of several minutes to several days.

19

slide-21
SLIDE 21

CUG2007 ||23

Sample Results: Uniformity

~10% Variation across a 12.5 hour run. [ 24 : 1 : * : 1,0 : 0,0 ] POS:CR:64K:310M - SLES9 2.6.5-7.244 with 6x 802.3ad 20

slide-22
SLIDE 22

CUG2007 ||23

Sample Results: Scalability

22 20 18 16 14 12 10 8 6 4 2

16 32 64 128 256 512 1024

500 450 400 350 300 250 200 150 100 50 Aggregate MBps Aggregate MBps

K Blocks

Number of Nodes 50 100 150 200 250 300 350 400 450 500

[ VAR : 1 : 1 : 1,0 : 0,0 ] POS:RW:VAR:500M - SLES10 2.6.16.21-0.8 with 6x Dedicated @ 0% Spatial Utilization 21

slide-23
SLIDE 23

CUG2007 ||23

Some Possible Future Directions for Bringsel

  • Code refinement, documentation.
  • Tree discovery/tree limit.
  • UPC support.
  • Adding and pruning directories in CF.
  • Selectable horiz/vert barriers.
  • Fault injection.
  • Parser refinements.
  • Modules to support tracing output, either

VFS or library level.

  • Better visualization methods (external).
  • Long term, automated style driver (external).

22

slide-24
SLIDE 24

CUG2007 ||23

Questions?

23