[PPT] - AbstractStorage Movingfileformatspecificabstrac7onsinto PowerPoint Presentation

SLIDE 1

Abstract Storage 

Moving file format‐specific abstrac7ons into  petabyte‐scale storage systems  Joe Buck, Noah Watkins,   Carlos Maltzahn & ScoD Brandt  

SLIDE 2

Introduc7on 

Current HPC environment separates

computa7on from storage 

– Tradi7onal focus on computa7on, not I/O  – Applica7ons require I/O architecture  independence 

Many scien7fic applica7ons are data intensive 
Performance increasingly limited by data‐

movement 

SLIDE 3

Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

HPC Architecture 

SLIDE 4

Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

HPC Architecture 

HW boDleneck 

Current  boDleneck in the  controllers 

SLIDE 5

Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

HPC Architecture 

HW boDleneck 

Future boDleneck:   I/O nodes / storage nodes  network 

SLIDE 6

Approach:   Move func7ons closer to data 

Use spare CPU cycles at intelligent storage

nodes 

– Replace communica7on with CPU cycles 

Provide storage interfaces with higher

abstrac7ons 

Enable file system op7miza7ons due to

knowledge of data structure 

Do this for small selec7on of data structures

– This is not another object‐oriented database!  

SLIDE 7

Why Now? 

Parallel file systems move more intelligence into

storage nodes anyways 

Advances in performance management and

virtualiza7on 

Moving bytes slated to be a dominant cost in exa‐scale

systems 

Scien7fic file formats and operators increasingly

standard 

– NetCDF, HDF 

Structured abstrac7ons have seen recent success

– BigTable, MapReduce  – CouchDB 

SLIDE 8

Abstract Storage  Storage as an Abstract Data Type 

ADT decouples interface from implementa7on 
Only few ADTs necessary, e.g.:

– Dic7onary (Key/value pairs)  – Hypercube (Coordinate Systems)  – Queue 

Op7mize each one for each parallel architecture

– Data placement  – Performance management   – Buffer cache management (incl. pre‐fetching)  – Coherence 

SLIDE 9

ADTs and Scien7fic Data 

Scien7fic data is normally mul7‐dimensional,

lending itself well to this approach 

– Mul7‐dimensional and hierarchical structures are  readily mapped onto data types 

Mul7ple structures mapped onto (por7ons) of

the same data for more efficient access 

– Operate on the appropriate structure (matrix, row,  element, etc) 

SLIDE 10

Implementa7on Challenges 

Programming model for implemen7ng ADTs 
Everything based on byte streams

– Current storage APIs (e.g. POSIX)  – Current file system subsystems 

Buffer cache 
Striping strategies 
Storage node interfaces 
Need awareness of structured data

– New interfaces at various storage layers 

SLIDE 11

Prototype: Ceph Doodle 

Focus: Programming model for implemen7ng

ADTs 

Construc7on and test framework for:

– Storage abstrac7ons   – ADT implementa7ons  – Programming models (flexibility, ease‐of‐use) 

Based on object‐based parallel file system

architecture (e.g. Ceph). 

SLIDE 12

Ceph Doodle Features 

Rapid prototyping:

– Uses RPC mechanism  – WriDen in Python 

Support for plugins for different ADTs

– Byte stream (implemented as storage objects)  – Dic7onary (implemented as skip lists) 

SLIDE 13

Ceph Doodle Overview 

Client Applica7on 

ADT‐Opera7on(…) 

Data Type 

ADT‐Opera7on(…)  RPC_X(Op, ObjID, Context)  RPC_Y(Op, ObjID, Context)  RPC_Z(Op, ObjID, Context)  … 

Client  OSD 

RPC ADT Opera7on(Object, Context)  RPC to OSD  With Object 

Striping  &  Caching  Strategy 

Clients use applica7on‐specific interfaces  Data types are cross‐cufng system modules 

Mappings route ADT RPCs to storage nodes  Striping and caching are op7mized per data  type 

SLIDE 14

Dic7onary Implementa7on: Skip lists 

4 3 2 1

.head 9 23 1024 1025 .tail

SLIDE 15

Splifng skip lists across nodes 

4 3 2 1

.head 9 23 1024 1025 .tail

SLIDE 16

Future Work 

Building on top of Ceph

– New dynamically loadable object libraries 

Redesigning caching

– Data structure boundary aware v.s. pages  – Pre‐fetching = access paDerns = ADT parameters 

Rethinking striping strategies 
Unified views supported by virtual ADT layer 
Embedding versioning and provenance capturing

into file system 

SLIDE 17

Abstract Storage

Moving file format‐specific abstrac7ons into petabyte‐scale storage systems Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt

Introduc7on

computa7on from storage

– Tradi7onal focus on computa7on, not I/O – Applica7ons require I/O architecture independence

movement

HPC Architecture

HPC Architecture

Current boDleneck in the controllers

HPC Architecture

Future boDleneck: I/O nodes / storage nodes network

Approach: Move func7ons closer to data

nodes

– Replace communica7on with CPU cycles

abstrac7ons

knowledge of data structure

– This is not another object‐oriented database!

Why Now?

storage nodes anyways

virtualiza7on

systems

standard

– NetCDF, HDF

– BigTable, MapReduce – CouchDB

Abstract Storage Storage as an Abstract Data Type

– Dic7onary (Key/value pairs) – Hypercube (Coordinate Systems) – Queue

– Data placement – Performance management – Buffer cache management (incl. pre‐fetching) – Coherence

ADTs and Scien7fic Data

lending itself well to this approach

– Mul7‐dimensional and hierarchical structures are readily mapped onto data types

the same data for more efficient access

– Operate on the appropriate structure (matrix, row, element, etc)

Implementa7on Challenges

– Current storage APIs (e.g. POSIX) – Current file system subsystems

– New interfaces at various storage layers

Prototype: Ceph Doodle

ADTs

– Storage abstrac7ons – ADT implementa7ons – Programming models (flexibility, ease‐of‐use)

architecture (e.g. Ceph).

Ceph Doodle Features

– Uses RPC mechanism – WriDen in Python

– Byte stream (implemented as storage objects) – Dic7onary (implemented as skip lists)

Ceph Doodle Overview

Dic7onary Implementa7on: Skip lists

4 3 2 1

Splifng skip lists across nodes

4 3 2 1

Future Work

– New dynamically loadable object libraries

– Data structure boundary aware v.s. pages – Pre‐fetching = access paDerns = ADT parameters

into file system

Thank you

buck@cs.ucsc.edu

Abstract Storage 

Moving file format‐specific abstrac7ons into  petabyte‐scale storage systems  Joe Buck, Noah Watkins,   Carlos Maltzahn & ScoD Brandt  

Introduc7on 

computa7on from storage 

– Tradi7onal focus on computa7on, not I/O  – Applica7ons require I/O architecture  independence 

movement 

HPC Architecture 

HPC Architecture 

Current  boDleneck in the  controllers 

HPC Architecture 

Future boDleneck:   I/O nodes / storage nodes  network 

Approach:   Move func7ons closer to data 

nodes 

– Replace communica7on with CPU cycles 

abstrac7ons 

knowledge of data structure 

– This is not another object‐oriented database!  

Why Now? 

storage nodes anyways 

virtualiza7on 

systems 

standard 

– NetCDF, HDF 

– BigTable, MapReduce  – CouchDB 

Abstract Storage  Storage as an Abstract Data Type 

– Dic7onary (Key/value pairs)  – Hypercube (Coordinate Systems)  – Queue 

– Data placement  – Performance management   – Buffer cache management (incl. pre‐fetching)  – Coherence 

ADTs and Scien7fic Data 

lending itself well to this approach 

– Mul7‐dimensional and hierarchical structures are  readily mapped onto data types 

the same data for more efficient access 

– Operate on the appropriate structure (matrix, row,  element, etc) 

Implementa7on Challenges 

– Current storage APIs (e.g. POSIX)  – Current file system subsystems 

– New interfaces at various storage layers 

Prototype: Ceph Doodle 

ADTs 

– Storage abstrac7ons   – ADT implementa7ons  – Programming models (flexibility, ease‐of‐use) 

architecture (e.g. Ceph). 

Ceph Doodle Features 

– Uses RPC mechanism  – WriDen in Python 

– Byte stream (implemented as storage objects)  – Dic7onary (implemented as skip lists) 

Ceph Doodle Overview 

Dic7onary Implementa7on: Skip lists 

Splifng skip lists across nodes 

Future Work 

– New dynamically loadable object libraries 

– Data structure boundary aware v.s. pages  – Pre‐fetching = access paDerns = ADT parameters 

into file system 

Thank you 

buck@cs.ucsc.edu