DataMods Programmable File System Services Noah Watkins*, Carlos - - PowerPoint PPT Presentation

datamods
SMART_READER_LITE
LIVE PREVIEW

DataMods Programmable File System Services Noah Watkins*, Carlos - - PowerPoint PPT Presentation

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico 1 Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and


slide-1
SLIDE 1

DataMods

Programmable File System Services

Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico

1

slide-2
SLIDE 2

Talk Agenda

  • 1. Middleware and modern IO stacks
  • 2. Services in middleware and parallel file systems
  • 3. Avoid duplicating work with DataMods
  • 4. Case study: Checkpoint/restart

2

slide-3
SLIDE 3

Why DataMods?

  • Applications struggle to scale on POSIX I/O
  • Parallel FS rarely provide other interfaces

– POSIX I/O designed to prevent lock-in

  • Open-source PFS are now available

– Ability to avoid lock-in

  • Can we generalize PFS services to provide new

behavior to new users?

3

slide-4
SLIDE 4

Application Middleware

4

  • Complex data models and interfaces
  • Difficult to work directly with simple byte stream
  • Middleware maps the complex onto the simple
slide-5
SLIDE 5

Middleware Complexity Bloat

5

  • Hadoop and “Big Data” data models

– Ordered key/value pairs stored in file – Dictionary for random key-oriented access – Common table abstractions

slide-6
SLIDE 6

Middleware Complexity Bloat

6

  • Scientific data

– Multi-dimensional arrays – Imaging – Genomics

slide-7
SLIDE 7

Middleware Complexity Bloat

7

  • IO Middleware

– Low-level data models and I/O optimization – Transformative I/O avoids POSIX limitations

slide-8
SLIDE 8

Middleware Scalability Challenges

8

  • Scalable storage system
  • Exposes one data model
  • Must find ‘magic’ alignment
slide-9
SLIDE 9

Data Model Modules

  • Plugin new “file” interfaces and behavior
  • Native support; atop existing scalable services

9

New behavior Generalized storage services Pluggable customization (new programmer role)

slide-10
SLIDE 10

What does middleware do?

Metadata Management Data Placement Intelligent Access Asynchronous Services

10

slide-11
SLIDE 11

Middleware: Metadata Management

  • Byte stream layout
  • Data type information
  • Data model attributes
  • Example: Mesh Data Model

– How is the mesh represented? – What does it represent?

11

Header File

slide-12
SLIDE 12

Middleware: Data Placement

  • Serialization
  • Placement index
  • Physical alignment

– Including the metadata

  • Example: Mesh Data Model

– Vertex lists – Mesh elements – Metadata

12

Header

Data Data Data Met a Met a

slide-13
SLIDE 13

Middleware: Intelligent Access

  • Data model specific interfaces
  • Rich access methods

– Views, subsetting, filtering

  • Write-time optimizations
  • Locality and data movement

13

Header

Data Data Data Met a Met a

HDF5 Library

Array-based Application read(array-slice)

slide-14
SLIDE 14

Middleware: Asynchronous Services

  • Workflows

– Regridding

  • Compression
  • Indexing
  • Layout optimization
  • Performed online

14

Header

Data Data Data Met a Met a

Workflow Driver

HDF5 Library

slide-15
SLIDE 15

Middleware Challenges

  • Inflexible byte stream abstraction
  • Scalability rules are simple

– But middleware is complex

  • Applying ‘magic number’

– Unnatural and difficult to propogate

  • Loss of detail at lower-levels

– Difficult for in-transit / co-located compute

15

slide-16
SLIDE 16

Storage System Services

  • Scalable meta data

– Clustered service – Scalability invariants

  • Distributed object store

– Local compute resources – Define new behavior

  • File operations

– POSIX

  • Fault-tolerance

– Scrubbing and replication

16

slide-17
SLIDE 17

DataMods Abstraction

File Manifold

(Metadata and Data Placement)

Typed and Active Storage Asynchronous Services

17

slide-18
SLIDE 18

DataMods Architecture

  • Generalized file system services
  • Exposed through programming model

18

slide-19
SLIDE 19

File Manifold

  • Metadata management and data placement

– Flexible, custom layouts

  • Extensible interfaces
  • Object namespace managed by manifold
  • Placement rules evaluated by system

19

slide-20
SLIDE 20

Typed and Active Storage

  • Active storage adoption has been slow

– Code injection is scary – Security and QoS

  • Reading, writing, and checksums are not free
  • Exposing scalable services is tractable

– Well-defined data models supports optimization – Programming model support data model creation – Indexing and filtering

20

slide-21
SLIDE 21

Asynchronous Services

  • Re-use of active / typed storage components
  • Temporal relationship to file manifold

– Incremental processing – After file is closed – Object update trigger

  • Scheduling

– Exploit idle time – Integrate with larger ecosystem – Preempted or aborted

21

slide-22
SLIDE 22

Case Study: PLFS Checkpoint/Restart

  • Long-running simulations need fault-tolerance

– Checkpoint simulation state

  • Simulations run on expensive machines

– Very expensive machines. Really, very expensive.

  • Decrease cost (time) of checkpoint/restart
  • Translation: increase bulk I/O bandwidth

22

slide-23
SLIDE 23

Overview of PLFS

  • Middleware layer

– Transforms I/O pattern

  • IO Pattern: N-1

– Most common

  • IO Pattern: N-N

– File system friendly

  • Convert N-1 into N-N
  • Applications see the same logical file

23

slide-24
SLIDE 24

Simplified PLFS I/O Behavior

Client 1 Client 2 Client 3

Log-structured Index Log-structured Index Log-structured Index

24

Parallel Log-structured File System

slide-25
SLIDE 25

PLFS Scalability Challenges

  • Index maintenance and volume
  • Optimization above file system

– Compression and reorganization

25

Application PLFS File System Optimization Process

Time

Compute

slide-26
SLIDE 26

Moving Overhead to Storage System

  • Checkpoints are not read immediately (if at all)

– Index maintenance and optimization in storage

26

Application PLFS File System Optimization Process

Time

Compute Return to compute sooner

slide-27
SLIDE 27

DataMods Module for PLFS

  • File Manifold

– Logical file view – Per-process log-structured files – Index

  • Hierarchical Solution

– Top-level manifold routes to logs – Inner manifold implements log-structured file – Automatic namespace management (metadata)

27

slide-28
SLIDE 28

PLFS Outer File Manifold

28

Logical top-half file is not materialized

slide-29
SLIDE 29

PLFS Outer File Manifold

29

Logical top-half file is not materialized Routes to per- process log file

slide-30
SLIDE 30

PLFS Inner File Manifold

30

Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace

slide-31
SLIDE 31

PLFS Inner File Manifold

31

Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace Index-enabled

  • bjects record

logical-to-phy

slide-32
SLIDE 32

PLFS Inner File Manifold

32

Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace Index-enabled

  • bjects record

logical-to-phy Interface to index maintenance routines

slide-33
SLIDE 33

Active and Typed Objects

  • Append-only object
  • Automatic indexing
  • Managed layout
  • Built on existing services
  • Logical view at lowest level
  • Index maintenance interface
slide-34
SLIDE 34

Offline Index Optimization

  • Extreme index fragmentation (per-object)
  • Exploit opportunities for optimization

– Storage system idle time – Re-use of analysis I/O – Piggy-backed on scrubbing / healing

  • Index Compression

– Merging contiguous entries – Pattern discovery and replacement – Consolidation

34

slide-35
SLIDE 35

Offline Index Optimization

  • Three stage pipeline

– Incremental compression and consolidation

  • Incremental compression
  • 1. Merging physically contiguous entries (in PLFS)
  • Not subject to buffer size limits
  • Applied technique to 92 PLFS indexes

published by LANL

35

slide-36
SLIDE 36

Merging Reduces PLFS Index Size

1 10 100 1000 10000 100000 1000000 10000000 1 11 21 31 41 51 61 71 81 91 Number of Index Entries PLFS Map File Raw Trace (Baseline) Merge Compress

Contiguous Writes Large, Strided

36

slide-37
SLIDE 37

Index Compression: Pattern

  • Compactly represent extents using patterns
  • Example pattern template

– offset + stride * i, low < i < high

  • Fit data to this pattern to reduce index size
  • Linear algorithm; parallel across logs

37

slide-38
SLIDE 38

Pattern Compression Improves Over Merging

1 10 100 1000 10000 100000 1000000 10000000 1 11 21 31 41 51 61 71 81 91 Number of Index Entries PLFS Map File Raw Trace (Baseline) Merge Compress Pattern Compress

Strided pattern identified

38

slide-39
SLIDE 39

Index Consolidation

  • Combines all logs together (in PLFS)
  • Increases index read efficiency

39

Index Consolidation

Index Pack

slide-40
SLIDE 40

Wrapping Up

  • Implementing new data model plugins

– Hadoop and Visualization – Refining API with more use cases – Constructing specification language

  • Thank you to supporters

– DOE funding (DE-SC0005428), Gary Grider John Bent, James Nunez

  • Questions? --- jayhawk@cs.ucsc.edu
  • Poster session

40

slide-41
SLIDE 41

Extra Slides

41

slide-42
SLIDE 42

Index Reduction Improvements

42

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 PatternIO.47K.5 PatternIO.47K.4 PatternIO.47K.1 PatternIO.47K.6 PatternIO.47K.2 LANL_App3.64.ma LANL_App1.64.dm LANL_App2.mpiio chombo.512.map chombo.128.map flash.8PE.hdf5_ flash.32PE.hdf5 flash.128PE.hdf flash.512PE.hdf flash.16PE.hdf5 flash.64PE.hdf5 flash.256PE.hdf chombo.32.map strided.8PE.10M flash.16PE.hdf5 flash.64PE.hdf5 flash.256PE.hdf PatternIO.10MB. PatternIO.10MB. PatternIO.10MB. PatternIO.10MB. PatternIO.10MB. strided.8PE.10M strided.16PE.10 strided.24PE.10 strided.32PE.10 strided.40PE.10 strided.48PE.10 strided.56PE.10 strided.64PE.10 strided.128PE.1 nonstrided.8PE. nonstrided.24PE nonstrided.40PE nonstrided.56PE nonstrided.128P nonstrided.1PE. nonstrided.3PE. nonstrided.5PE. nonstrided.7PE. PatternIO.10MB. Reduction Factor over Baseline Reduction from Merging Reduction from Pattern

Global reduction HDF5 Indexing, Data reorganization