AbstractStorage Movingfileformatspecificabstrac7onsinto - - PowerPoint PPT Presentation

abstract storage
SMART_READER_LITE
LIVE PREVIEW

AbstractStorage Movingfileformatspecificabstrac7onsinto - - PowerPoint PPT Presentation

AbstractStorage Movingfileformatspecificabstrac7onsinto petabytescalestoragesystems JoeBuck,NoahWatkins, CarlosMaltzahn&ScoDBrandt Introduc7on


slide-1
SLIDE 1

Abstract
Storage


Moving
file
format‐specific
abstrac7ons
into
 petabyte‐scale
storage
systems
 Joe
Buck,
Noah
Watkins,

 Carlos
Maltzahn
&
ScoD
Brandt



slide-2
SLIDE 2

Introduc7on


  • Current
HPC
environment
separates


computa7on
from
storage


– Tradi7onal
focus
on
computa7on,
not
I/O
 – Applica7ons
require
I/O
architecture
 independence


  • Many
scien7fic
applica7ons
are
data
intensive

  • Performance
increasingly
limited
by
data‐

movement


slide-3
SLIDE 3

Diagram
courtesy
of
Rob
Ross,
Argonne
Na7onal
Laboratory


HPC
Architecture


slide-4
SLIDE 4

Diagram
courtesy
of
Rob
Ross,
Argonne
Na7onal
Laboratory


HPC
Architecture


HW
boDleneck


Current
 boDleneck
in
the
 controllers


slide-5
SLIDE 5

Diagram
courtesy
of
Rob
Ross,
Argonne
Na7onal
Laboratory


HPC
Architecture


HW
boDleneck


Future
boDleneck:

 I/O
nodes
/
storage
nodes
 network


slide-6
SLIDE 6

Approach:

 Move
func7ons
closer
to
data


  • Use
spare
CPU
cycles
at
intelligent
storage


nodes


– Replace
communica7on
with
CPU
cycles


  • Provide
storage
interfaces
with
higher


abstrac7ons


  • Enable
file
system
op7miza7ons
due
to


knowledge
of
data
structure


  • Do
this
for
small
selec7on
of
data
structures


– This
is
not
another
object‐oriented
database!



slide-7
SLIDE 7

Why
Now?


  • Parallel
file
systems
move
more
intelligence
into


storage
nodes
anyways


  • Advances
in
performance
management
and


virtualiza7on


  • Moving
bytes
slated
to
be
a
dominant
cost
in
exa‐scale


systems


  • Scien7fic
file
formats
and
operators
increasingly


standard


– NetCDF,
HDF


  • Structured
abstrac7ons
have
seen
recent
success


– BigTable,
MapReduce
 – CouchDB


slide-8
SLIDE 8

Abstract
Storage
 Storage
as
an
Abstract
Data
Type


  • ADT
decouples
interface
from
implementa7on

  • Only
few
ADTs
necessary,
e.g.:


– Dic7onary
(Key/value
pairs)
 – Hypercube
(Coordinate
Systems)
 – Queue


  • Op7mize
each
one
for
each
parallel
architecture


– Data
placement
 – Performance
management

 – Buffer
cache
management
(incl.
pre‐fetching)
 – Coherence


slide-9
SLIDE 9

ADTs
and
Scien7fic
Data


  • Scien7fic
data
is
normally
mul7‐dimensional,


lending
itself
well
to
this
approach


– Mul7‐dimensional
and
hierarchical
structures
are
 readily
mapped
onto
data
types


  • Mul7ple
structures
mapped
onto
(por7ons)
of


the
same
data
for
more
efficient
access


– Operate
on
the
appropriate
structure
(matrix,
row,
 element,
etc)


slide-10
SLIDE 10

Implementa7on
Challenges


  • Programming
model
for
implemen7ng
ADTs

  • Everything
based
on
byte
streams


– Current
storage
APIs
(e.g.
POSIX)
 – Current
file
system
subsystems


  • Buffer
cache

  • Striping
strategies

  • Storage
node
interfaces

  • Need
awareness
of
structured
data


– New
interfaces
at
various
storage
layers


slide-11
SLIDE 11

Prototype:
Ceph
Doodle


  • Focus:
Programming
model
for
implemen7ng


ADTs


  • Construc7on
and
test
framework
for:


– Storage
abstrac7ons

 – ADT
implementa7ons
 – Programming
models
(flexibility,
ease‐of‐use)


  • Based
on
object‐based
parallel
file
system


architecture
(e.g.
Ceph).


slide-12
SLIDE 12

Ceph
Doodle
Features


  • Rapid
prototyping:


– Uses
RPC
mechanism
 – WriDen
in
Python


  • Support
for
plugins
for
different
ADTs


– Byte
stream
(implemented
as
storage
objects)
 – Dic7onary
(implemented
as
skip
lists)


slide-13
SLIDE 13

Ceph
Doodle
Overview


Client
Applica7on


ADT‐Opera7on(…)


Data
Type


ADT‐Opera7on(…)
 RPC_X(Op,
ObjID,
Context)
 RPC_Y(Op,
ObjID,
Context)
 RPC_Z(Op,
ObjID,
Context)
 …


Client
 OSD


RPC
ADT
Opera7on(Object,
Context)
 RPC
to
OSD
 With
Object


Striping
 &
 Caching
 Strategy


Clients
use
applica7on‐specific
interfaces
 Data
types
are
cross‐cufng
system
modules


Mappings
route
ADT
RPCs
to
storage
nodes
 Striping
and
caching
are
op7mized
per
data
 type


slide-14
SLIDE 14

Dic7onary
Implementa7on:
Skip
lists


4 3 2 1

.head 9 23 1024 1025 .tail

slide-15
SLIDE 15

Splifng
skip
lists
across
nodes


4 3 2 1

.head 9 23 1024 1025 .tail

slide-16
SLIDE 16

Future
Work


  • Building
on
top
of
Ceph


– New
dynamically
loadable
object
libraries


  • Redesigning
caching


– Data
structure
boundary
aware
v.s.
pages
 – Pre‐fetching
=
access
paDerns
=
ADT
parameters


  • Rethinking
striping
strategies

  • Unified
views
supported
by
virtual
ADT
layer

  • Embedding
versioning
and
provenance
capturing


into
file
system


slide-17
SLIDE 17

Thank
you


buck@cs.ucsc.edu