A Generic Framework for Testing Parallel File Systems Jinrui Cao, - - PowerPoint PPT Presentation

a generic framework for testing parallel file systems
SMART_READER_LITE
LIVE PREVIEW

A Generic Framework for Testing Parallel File Systems Jinrui Cao, - - PowerPoint PPT Presentation

A Generic Framework for Testing Parallel File Systems Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, and Yong Chen Computer Science Department, New Mexico State University Computer Science Department, Texas Tech University


slide-1
SLIDE 1

A Generic Framework for Testing Parallel File Systems

Jinrui Cao†, Simeng Wang†, Dong Dai‡, Mai Zheng†, and Yong Chen‡ † Computer Science Department, New Mexico State University ‡ Computer Science Department, Texas Tech University Presented by Simeng Wang SC16’ PDSW-DISCS

  • 11. 14. 2016.
slide-2
SLIDE 2

Motivation

2

Jan, 2016 @HPCC: power outage lead to unmeasurable data loss

slide-3
SLIDE 3

Motivation

q Existingmethods fortestingstoragesystemsare not good enough for large-

scaleparallelfilesystems (PFS)

Ø Modelchecking [e.g., EXPLODE@OSDI’06] v difficult to build a controllable model for PFS v state explosion problem Ø Formalmethods [e.g., FSCQ@SOSP’15] v challenging to write correct specifications for PFS Ø Automatic Testing[e.g., TorturingDB, CrashConsistency@OSDI’14] v closely tied to local storage stack: intrusive for PFS v only work for single-node

3

slide-4
SLIDE 4

Our Contributions

q A genericframeworkfor testingfailurehandlingofparallelfilesystem Ø Minimalinterference& high portability v decouple PFS from the testing framework through a remote storage

protocol(iSCSI)

Ø Systematicallygeneratefailureevents with high fidelity v fine-grained,controllablefailureemulation v emulaterealisticfailuremodes q An initialprototypeforLustrefilesystem Ø

Uncover internal I/O behaviors of Lustre under different workloads and failureconditions

4

slide-5
SLIDE 5

Outline

q Introduction q Design Ø Virtual Device Manager Ø Failure State Emulator Ø Data-Intensive Workloads Ø Post-Failure Checker q Preliminary Experiments q Conclusion and Future Work

5

slide-6
SLIDE 6

Outline

q Introduction q Design Ø Virtual Device Manager Ø Failure State Emulator Ø Data-Intensive Workloads Ø Post-Failure Checker q Preliminary Experiments q Conclusion and Future Work

6

slide-7
SLIDE 7

Overview

7

Device File

……

MGS MGT MDS MDT OSS OST OSS OST OSS OST

….. LustreNodes

Virtual Device Manager

……

Virtual Device Virtual Device Virtual Device Virtual Device Virtual Device

Post-Failure Checker Failure State Emulator Data-Intensive Workload

Testing Framework

MGS: Management Server MGT: Management Target MDS: Metadata Server MDT: Metadata Target OSS: Object Storage Server OST: Object Storage Target

slide-8
SLIDE 8

Overview

8

Device File

……

MGS MGT MDS MDT OSS OST OSS OST OSS OST

….. LustreNodes

Virtual Device Manager

……

Virtual Device Virtual Device Virtual Device Virtual Device Virtual Device

Post-Failure Checker Failure State Emulator Data-Intensive Workload

Testing Framework

slide-9
SLIDE 9

Virtual Device Manager

q Createsand maintainsdevicefiles for storagedevices. q Mounted to Lustrenodesas virtualdevices via iSCSI. q I/O operations aretranslatedinto diskI/O commands

q Log commandsintoa commandhistorylog Ø Include nodeIDs,commanddetails, andactual datatransferred Ø Used bythe FailureStateEmulator

9

slide-10
SLIDE 10

Overview

10

Virtual Device Virtual Device Virtual Device Virtual Device Virtual Device

…… ……

Failure State Emulator Post-Failure Checker Data-Intensive Workload

Testing Framework LustreNodes

MGS MGT MDS MDT OSS OST OSS OST OSS OST

…..

Device File

……

Virtual Device Manager

slide-11
SLIDE 11

Failure State Emulator

q Generatefailureeventsin a systematicand controllableway. Ø Manipulate I/Ocommandsand emulatesfailure state ofeach individualdevice

Ø Emulate four realistic failure modes based on previous studies [e.g., FAST’13,

OSDI’14, TOCS’16, FAST’16]

11

1.WholeDeviceFailure

Device becomes invisible to the host

2.CleanTerminationofWrites

Emulates simplestpower outage

3.ReorderingoftheWrites

Commits writes in an order different from the issuing order

4.CorruptionoftheDeviceBlock

Change content of writes

slide-12
SLIDE 12

Overview

12

Virtual Device Virtual Device Virtual Device Virtual Device Virtual Device

…… ……

Data-Intensive Workload

Testing Framework LustreNodes

MGS MGT MDS MDT OSS OST OSS OST OSS OST

…..

Device File

Post-Failure Checker Virtual Device Manager Failure State Emulator

slide-13
SLIDE 13

Co-design Workloads and Checkers

q Post-Failure Checkers

Ø examines the post-failurebehavior andcheckifit can recover withoutdataloss Ø May use existing checkers (e.g.,, LFSCKfor Lustre)

13

q Data-Intensive workloads

Ø Stress Lustre and generate I/O operations to age the system and bring it

to a statethatmaybedifficultto recover

Ø Mayuseexistingdata-intensiveworkloads Ø Mayincludeself-identification/verificationinformation

slide-14
SLIDE 14

Outline

q Introduction q Design Ø Virtual Device Manager Ø Failure State Emulator Ø Data-Intensive Workloads Ø Post-Failure Checker q Preliminary Experiments q Conclusion and Future Work

14

slide-15
SLIDE 15

Preliminary Experiment

q Experiment setup

Ø Cluster of sevenVMs, installed with CentOS 7. Ø Lustrefile system (version 2.8) on five VMs. Ø OneMGS/MGT node, oneMDS/MDT node, and threeOSS/OST nodes. Ø Sixth VM: hosts the Virtual Device Manager and the Failure State

Emulator

v Virtual Device Manager is built on top of the Linux SCSI target framework

Ø Last VM: used as client for launching workloads and LFSCK

v Data-Intensive Workload, Post-Failure Checker

15

slide-16
SLIDE 16

16

Workload Description

Montage/m101 cp tar rm astronomical imagemosaic engine copy a file into Lustre decompress a file on Lustre delete a file from Lustre

q Workloads

Ø Normal Workloads ran on Lustre

Operation Description lfs setstripe dd-nosync dd-sync LFSCK set striping pattern create & extend a Lustrefile create & extend a Lustrefile check & repair Lustre

Ø Post-Failure Workloads ran on Lustre

Preliminary Experiment

slide-17
SLIDE 17

Preliminary Results

17

Luster Nodes cp tar rm Montage/m101 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12

MGS/MGT 0 MDS/MDT 0.1 5 0.2

6 0.4 6 0.5 6 0.6 6 0.7 6 1 6 1

OSS/OST#1 0

14 14 28 14 66 14 66 18 66 18 94 56 94

OSS/OST#2 15 14 15 14 43

14 81 14 81 19 81 19 109 19 110

OSS/OST#3 0

16 16 24 16 24 17 24 21 24 21 49 58 49

q Internal Pattern of Writes without Failure

Ø Numbers of bytes (MB) written to different Lustrenodes

under different workloads.

Ø Montage/m101 is spilt into twelve steps (i.e., s1 to s12)

to show the fine-grained write pattern.

slide-18
SLIDE 18

18

q Internal Pattern of Writes without Failure

Ø Accumulated numbers of bytes (KB) written to

different nodes during the workloads.

Preliminary Results

slide-19
SLIDE 19

19

Operation Description Report Error? lfs setstripe dd-nosync dd-sync LFSCK set striping pattern create & extend a Lustrefile create & extend a Lustrefile check & repair Lustre No No Yes No

q Post-Failure Behavior q Emulate a whole device failure on MDS/MDT node q Run operations on Lustre after the emulated device failure

Ø dd-nosyncmeans using dd to create and extend a Lustrefile Ø dd-sync means enforcing synchronous writes on the dd command Ø The last column shows whether the operation reported error or not

Preliminary Results

slide-20
SLIDE 20

Outline

q Introduction q Design Ø Virtual Device Manager Ø Failure State Emulator Ø Data-Intensive Workloads Ø Post-Failure Checker q Preliminary Experiments q Conclusion and Future Work

20

slide-21
SLIDE 21

Conclusion and Future Work

q Proposed and prototyped a framework for testing failure handling of

large-scale parallel file systems.

q Uncovered internal behaviors towards workloads under normal and

failure conditions

q More effective post-failure checking operations q More file systems (e.g., PVFS, Ceph) q Explore novel mechanisms to enhance the resilience of large-scale

parallel file systems

21

slide-22
SLIDE 22

22