PASS PASS Provenance-Aware Storage System Provenance-Aware Storage - - PowerPoint PPT Presentation

pass pass provenance aware storage system provenance
SMART_READER_LITE
LIVE PREVIEW

PASS PASS Provenance-Aware Storage System Provenance-Aware Storage - - PowerPoint PPT Presentation

PASS PASS Provenance-Aware Storage System Provenance-Aware Storage System Margo Seltzer, David Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Jonathan Ledlie Harvard University What is Provenance? What is Provenance? What did the


slide-1
SLIDE 1

PASS PASS Provenance-Aware Storage System Provenance-Aware Storage System

Margo Seltzer, David Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Jonathan Ledlie

Harvard University

slide-2
SLIDE 2

What is Provenance? What is Provenance?

  What did the President know and when

What did the President know and when did he know it? did he know it?

  What the President knew

What the President knew – – data data

  When he knew it

When he knew it – – provenance provenance

  Provenance is metadata about the

Provenance is metadata about the history of an object history of an object

Systems Research At Harvard Systems Research At Harvard

slide-3
SLIDE 3

What is Provenance? ( What is Provenance? (contd contd) )

  For computer objects, provenance is the

For computer objects, provenance is the complete history or lineage of a object complete history or lineage of a object

  On what is this object based?

On what is this object based?

  How was this object created?

How was this object created?

  How can it be re-created?

How can it be re-created?

Systems Research At Harvard Systems Research At Harvard

slide-4
SLIDE 4

Example Example

Systems Research At Harvard Systems Research At Harvard

Provenance of C Provenance of C

  Input Files A, B

Input Files A, B

  Application P

Application P

  Command line

Command line Args Args

  Environment

Environment

  Processor type, OS, etc

Processor type, OS, etc

A C P B

read r e a d write

slide-5
SLIDE 5

Sample Applications Sample Applications

  Science: how did I (or they) get this

Science: how did I (or they) get this result? result?

  ILM: tweak ILM policies for data

ILM: tweak ILM policies for data belonging to a particular application belonging to a particular application

  Homeland Security: from what sources

Homeland Security: from what sources did I derive this conclusion? did I derive this conclusion?

Systems Research At Harvard Systems Research At Harvard

slide-6
SLIDE 6

The State of Provenance Today The State of Provenance Today

  Many provenance systems are domain-

Many provenance systems are domain- specific. specific.

  Most provenance is entered manually.

Most provenance is entered manually.

  In many fields, provenance support is simply

In many fields, provenance support is simply lacking. lacking.

Systems Research At Harvard Systems Research At Harvard

slide-7
SLIDE 7

Provenance-Aware Storage Systems Provenance-Aware Storage Systems (PASS) (PASS)

  Storage systems (e.g., file systems) in

Storage systems (e.g., file systems) in which provenance is a first class entity. which provenance is a first class entity.

  Provenance:

Provenance:

  is generated and maintained as

is generated and maintained as transparently as possible. transparently as possible.

  can be indexed and queried.

can be indexed and queried.

Systems Research At Harvard Systems Research At Harvard

slide-8
SLIDE 8

Research Questions: Research Questions:

  Storing provenance: What is the most

Storing provenance: What is the most appropriate way to represent provenance? appropriate way to represent provenance?

  Security: what is the right security model

Security: what is the right security model for provenance? for provenance?

  The wire: how do we implement a

The wire: how do we implement a distributed PASS? distributed PASS?

  Evaluation: how do we evaluate PASS?

Evaluation: how do we evaluate PASS?

Systems Research At Harvard Systems Research At Harvard

slide-9
SLIDE 9

Research Questions ( Research Questions (contd contd): ):

  What is the most appropriate query

What is the most appropriate query interface? interface?

  Search: can we do better than general-

Search: can we do better than general- purpose search? purpose search?

  Pruning: when do you delete provenance

Pruning: when do you delete provenance (or change history) (or change history)

Systems Research At Harvard Systems Research At Harvard

slide-10
SLIDE 10

PASS Prototype PASS Prototype

  Linux 2.4.29,

Linux 2.4.29, RedHat RedHat 7.3 7.3

  In-kernel transactional data store

In-kernel transactional data store

  Port of Berkeley DB into the kernel

Port of Berkeley DB into the kernel

  Provided by SUNY Stony Brook

Provided by SUNY Stony Brook

  Provenance And Storage Layer: PASTA

Provenance And Storage Layer: PASTA

  Stacked file system

Stacked file system

  Constructed using

Constructed using FiST FiST

Systems Research At Harvard Systems Research At Harvard

slide-11
SLIDE 11

Systems Research At Harvard Systems Research At Harvard

PASS Architecture PASS Architecture

Native FS KERNEL User process USER Pasta VFS Layer Syscall Layer

Data Provenance

KBDB

Provenance

Collector

Intercept Syscalls Provenance

slide-12
SLIDE 12

Questions? Questions?

Contact: Contact: pass@eecs.harvard.edu pass@eecs.harvard.edu www.eecs.harvard.edu www.eecs.harvard.edu/syrah/pass /syrah/pass Prototype Available in January Prototype Available in January Thanks to our Sponsors: Thanks to our Sponsors:

Systems Research At Harvard Systems Research At Harvard