DataMods Programmable File System Services Noah Watkins*, Carlos - PowerPoint PPT Presentation

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico 1

Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and parallel file systems 3. Avoid duplicating work with DataMods 4. Case study: Checkpoint/restart 2

Why DataMods? • Applications struggle to scale on POSIX I/O • Parallel FS rarely provide other interfaces – POSIX I/O designed to prevent lock-in • Open-source PFS are now available – Ability to avoid lock-in • Can we generalize PFS services to provide new behavior to new users? 3

Application Middleware • Complex data models and interfaces • Difficult to work directly with simple byte stream • Middleware maps the complex onto the simple 4

Middleware Complexity Bloat • Hadoop and “Big Data” data models – Ordered key/value pairs stored in file – Dictionary for random key-oriented access – Common table abstractions 5

Middleware Complexity Bloat • Scientific data – Multi-dimensional arrays – Imaging – Genomics 6

Middleware Complexity Bloat • IO Middleware – Low-level data models and I/O optimization – Transformative I/O avoids POSIX limitations 7

Middleware Scalability Challenges • Scalable storage system • Exposes one data model • Must find ‘magic’ alignment 8

Data Model Modules • Plugin new “ file ” interfaces and behavior • Native support; atop existing scalable services New behavior Generalized storage services Pluggable customization ( new programmer role ) 9

What does middleware do? Metadata Data Management Placement Intelligent Asynchronous Access Services 10

Middleware: Metadata Management File • Byte stream layout Header • Data type information • Data model attributes • Example: Mesh Data Model – How is the mesh represented? – What does it represent? 11

Middleware: Data Placement • Serialization Header • Placement index • Physical alignment Data – Including the metadata • Example: Mesh Data Model Met a – Vertex lists Data – Mesh elements Met a – Metadata Data 12

Middleware: Intelligent Access • Data model specific interfaces Header • Rich access methods – Views, subsetting, filtering Data • Write-time optimizations • Locality and data movement Met a Data HDF5 Library Met read(array-slice) Array-based a Application Data 13

Middleware: Asynchronous Services • Workflows Header – Regridding • Compression HDF5 Library Workflow Data • Indexing Driver • Layout optimization Met a • Performed online Data Met a Data 14

Middleware Challenges • Inflexible byte stream abstraction • Scalability rules are simple – But middleware is complex • Applying ‘magic number’ – Unnatural and difficult to propogate • Loss of detail at lower-levels – Difficult for in-transit / co-located compute 15

Storage System Services • Scalable meta data – Clustered service – Scalability invariants • Distributed object store – Local compute resources – Define new behavior • File operations – POSIX • Fault-tolerance – Scrubbing and replication 16

DataMods Abstraction File Manifold (Metadata and Data Placement) Typed and Active Asynchronous Storage Services 17

DataMods Architecture • Generalized file system services • Exposed through programming model 18

File Manifold • Metadata management and data placement – Flexible, custom layouts • Extensible interfaces • Object namespace managed by manifold • Placement rules evaluated by system 19

Typed and Active Storage • Active storage adoption has been slow – Code injection is scary – Security and QoS • Reading, writing, and checksums are not free • Exposing scalable services is tractable – Well-defined data models supports optimization – Programming model support data model creation – Indexing and filtering 20

Asynchronous Services • Re-use of active / typed storage components • Temporal relationship to file manifold – Incremental processing – After file is closed – Object update trigger • Scheduling – Exploit idle time – Integrate with larger ecosystem – Preempted or aborted 21

Case Study: PLFS Checkpoint/Restart • Long-running simulations need fault-tolerance – Checkpoint simulation state • Simulations run on expensive machines – Very expensive machines. Really, very expensive. • Decrease cost (time) of checkpoint/restart • Translation: increase bulk I/O bandwidth 22

Overview of PLFS • Middleware layer – Transforms I/O pattern • IO Pattern: N-1 – Most common • IO Pattern: N-N – File system friendly • Convert N-1 into N-N • Applications see the same logical file 23

Simplified PLFS I/O Behavior Client 1 Client 2 Client 3 Parallel Log-structured File System Index Index Index Log-structured Log-structured Log-structured 24

PLFS Scalability Challenges • Index maintenance and volume • Optimization above file system – Compression and reorganization Compute Application PLFS File System Optimization Process Time 25

Moving Overhead to Storage System • Checkpoints are not read immediately (if at all) – Index maintenance and optimization in storage Compute Application PLFS File System Return to compute sooner Time Optimization Process 26

DataMods Module for PLFS • File Manifold – Logical file view – Per-process log-structured files – Index • Hierarchical Solution – Top-level manifold routes to logs – Inner manifold implements log-structured file – Automatic namespace management (metadata) 27

PLFS Outer File Manifold Logical top-half file is not materialized 28

PLFS Outer File Manifold Logical top-half file is not materialized Routes to per- process log file 29

PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace 30

PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Append striping within object namespace Index-enabled objects record logical-to-phy 31

PLFS Inner File Manifold Logical top-half file is not materialized Routes to per- process log file Interface to index maintenance Append striping routines within object namespace Index-enabled objects record logical-to-phy 32

Active and Typed Objects • Append-only object • Automatic indexing • Managed layout • Built on existing services • Logical view at lowest level • Index maintenance interface

Offline Index Optimization • Extreme index fragmentation (per-object) • Exploit opportunities for optimization – Storage system idle time – Re-use of analysis I/O – Piggy-backed on scrubbing / healing • Index Compression – Merging contiguous entries – Pattern discovery and replacement – Consolidation 34

Offline Index Optimization • Three stage pipeline – Incremental compression and consolidation • Incremental compression 1. Merging physically contiguous entries (in PLFS) Not subject to buffer size limits • Applied technique to 92 PLFS indexes • published by LANL 35

Merging Reduces PLFS Index Size 10000000 Raw Trace (Baseline) Large, Strided Merge Compress 1000000 100000 Number of Index Entries 10000 Contiguous Writes 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 PLFS Map File 36

Index Compression: Pattern • Compactly represent extents using patterns • Example pattern template – offset + stride * i, low < i < high • Fit data to this pattern to reduce index size • Linear algorithm; parallel across logs 37

Pattern Compression Improves Over Merging 10000000 Raw Trace (Baseline) Strided pattern identified Merge Compress 1000000 Pattern Compress 100000 Number of Index Entries 10000 1000 100 10 1 1 11 21 31 41 51 61 71 81 91 PLFS Map File 38

Index Consolidation • Combines all logs together (in PLFS) • Increases index read efficiency Index Consolidation Index Pack … 39

Wrapping Up • Implementing new data model plugins – Hadoop and Visualization – Refining API with more use cases – Constructing specification language • Thank you to supporters – DOE funding (DE-SC0005428), Gary Grider John Bent, James Nunez • Questions? --- jayhawk@cs.ucsc.edu • Poster session 40

Extra Slides 41

DataMods Programmable File System Services Noah Watkins*, Carlos - PowerPoint PPT Presentation

DataMods Programmable File System Services Noah Watkins, Carlos Maltzahn, Scott Brandt UC Santa Cruz, Inktank Adam Manzanares California State University, Chico 1 Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution

Differential analysis of microarray data, Multiple testing problems and Local False Discovery

When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/

August 28, 2020 BILINGUAL COORDINATORS NETWORK (BCN) UPDATES FEDERAL PROGRAM MONITORING T

Proposed Approaches to Determine Progress on the Local Control Funding Formula Evaluation Rubrics

S yntax Darrell Larsen Linguistics 101 Introduction Syntactic Categories Constituency Tests

An Algebraic Theory of Markov Processes Giorgio Bacci , Radu Mardare, Prakash Panangaden and

L OW U TILIZATION ! Servers operate at 10% - 40% utilization most of the time Google cluster

Peninsula Clean Energy Board of Directors Meeting July 25, 2019 Agenda Call to order / Roll

Introduction to IDEA MOE Overview of the Maintenance of Effort requirement under the Individuals

1Q17 Supplem ental Slides John C. R. Hele Chief Financial Officer Table of Contents Page

SUSY searches in Jets + MET at CMS Leonardo Sala (ETH Zurich) for the CMS Collaboration

Binding back to the future Patrick D. Elliott and Yasu Sudo July 2, 2019 Asymmetries in

Welcome to ABET 101. The purpose of this presentation is to educate the College of Engineering

The 1960s Some people and mathematics I met John J. Benedetto Norbert Wiener Center Department

CASA EDUCATIONAL ADVOCACY TRAINING CASA of New Jersey, Inc. Whole-System Learning

Carbon Pricing 2. Different Approaches Taxes Emissions Trading Systems France EU Sweden

File Processing Midterm Logistics Midterm tonight , 7PM 9PM Last name A O: Cemex

Getting to or from Golden Gate Park by cable car, at the corner of Stanyan and Haight, was quite

Adaptive Learning Meets Crowdsourcing Towards Development of Cost-Effective Adaptive Educational

Develop Your Data Mindset Module 7 - Student Level Goal Setting Part 3B - Answer By Nathan

Annotating Reduced Argument Scope Using QA-SRL Gabriel Stanovsky, Ido Dagan and Meni Adler

DataMods Programmable File System Services Noah Watkins*, Carlos - PowerPoint PPT Presentation

DataMods Programmable File System Services Noah Watkins*, Carlos Maltzahn, Scott Brandt UC Santa Cruz, *Inktank Adam Manzanares California State University, Chico 1 Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and

Re-Open S martly with Confidence Jeri Denniston Small Business Development Center At Yavapai

The discovery and execution of entirely new classes of Web attacks i l l f b k in order to

1 June 26. Punch-through detection using Muon Spectrometer Showers &amp; MET resolution

Differential analysis of microarray data, Multiple testing problems and Local False Discovery

When Semi-Supervised Learning Meets Ensemble Learning Zhi-Hua Zhou http://cs.nju.edu.cn/zhouzh/

August 28, 2020 BILINGUAL COORDINATORS NETWORK (BCN) UPDATES FEDERAL PROGRAM MONITORING T

Proposed Approaches to Determine Progress on the Local Control Funding Formula Evaluation Rubrics

S yntax Darrell Larsen Linguistics 101 Introduction Syntactic Categories Constituency Tests

An Algebraic Theory of Markov Processes Giorgio Bacci , Radu Mardare, Prakash Panangaden and

L OW U TILIZATION ! Servers operate at 10% - 40% utilization most of the time Google cluster

Peninsula Clean Energy Board of Directors Meeting July 25, 2019 Agenda Call to order / Roll

Introduction to IDEA MOE Overview of the Maintenance of Effort requirement under the Individuals

1Q17 Supplem ental Slides John C. R. Hele Chief Financial Officer Table of Contents Page

SUSY searches in Jets + MET at CMS Leonardo Sala (ETH Zurich) for the CMS Collaboration

Binding back to the future Patrick D. Elliott and Yasu Sudo July 2, 2019 Asymmetries in

Welcome to ABET 101. The purpose of this presentation is to educate the College of Engineering

The 1960s Some people and mathematics I met John J. Benedetto Norbert Wiener Center Department

CASA EDUCATIONAL ADVOCACY TRAINING CASA of New Jersey, Inc. Whole-System Learning

Carbon Pricing 2. Different Approaches Taxes Emissions Trading Systems France EU Sweden

File Processing Midterm Logistics Midterm tonight , 7PM 9PM Last name A O: Cemex

Getting to or from Golden Gate Park by cable car, at the corner of Stanyan and Haight, was quite

Adaptive Learning Meets Crowdsourcing Towards Development of Cost-Effective Adaptive Educational

Develop Your Data Mindset Module 7 - Student Level Goal Setting Part 3B - Answer By Nathan

Annotating Reduced Argument Scope Using QA-SRL Gabriel Stanovsky, Ido Dagan and Meni Adler

DataMods Programmable File System Services Noah Watkins, Carlos Maltzahn, Scott Brandt UC Santa Cruz, Inktank Adam Manzanares California State University, Chico 1 Talk Agenda 1. Middleware and modern IO stacks 2. Services in middleware and

1 June 26. Punch-through detection using Muon Spectrometer Showers & MET resolution