OceanStore Status and Directions ROC/OceanStore Retreat 6/10/02 - - PowerPoint PPT Presentation

oceanstore status and directions
SMART_READER_LITE
LIVE PREVIEW

OceanStore Status and Directions ROC/OceanStore Retreat 6/10/02 - - PowerPoint PPT Presentation

OceanStore Status and Directions ROC/OceanStore Retreat 6/10/02 John Kubiatowicz University of California at Berkeley Everyones Data, One Utility Millions of servers, billions of clients . 1000-YEAR durability (excepting fall of


slide-1
SLIDE 1

OceanStore Status and Directions

ROC/OceanStore Retreat 6/10/02

John Kubiatowicz University of California at Berkeley

slide-2
SLIDE 2

OceanStore:2 ROC/OceanStore Jan’02

Everyone’s Data, One Utility

  • Millions of servers, billions of clients ….
  • 1000-YEAR durability (excepting fall of society)
  • Maintains Privacy, Access Control, Authenticity
  • Incrementally Scalable (“Evolvable”)
  • Self Maintaining!
  • Not quite peer-to-peer:
  • Utilizing servers in infrastructure
  • Some computational nodes more equal than others
slide-3
SLIDE 3

OceanStore:3 ROC/OceanStore Jan’02

The Path of an OceanStore Update

Second-Tier Caches Multicast trees Inner-Ring Servers Clients

slide-4
SLIDE 4

OceanStore:4 ROC/OceanStore Jan’02

Big Push: OSDI

  • We analyzed and tuned the write path

– Many different bottlenecks and bugs found – Currently committing data and archiving it at about 3-5 Mb/sec

slide-5
SLIDE 5

OceanStore:5 ROC/OceanStore Jan’02

Big Push: OSDI

  • Stabilized basic OceanStore code base
  • Interesting issues:

– Cryptography in critical path

  • Fragment generation/SHA-1 limiting archival

throughput at the moment

  • Signatures are problem for inner ring

– (although – Sean will tell you about cute batching trick)

– Second-tier can shield inner ring

  • Actually shown this with Flash-crowd-like benchmark

– Berkeley DB has max limit approx 10mb/sec

  • Buffer cache layer can’t meet that
slide-6
SLIDE 6

OceanStore:6 ROC/OceanStore Jan’02

OceanStore Goes Global!

  • OceanStore components running “globally:”

– Australia, Georgia, Washington, Texas, Boston – Able to run the Andrew File-System benchmark with inner ring spread throughout US – Interface: NFS on OceanStore

  • Word on the street: it was easy to do

– The components were debugged locally – Easily set up remotely

  • I am currently talking with people in:

– England, Maryland, Minnesota, …. – Intel P2P testbed will give us access to much more

slide-7
SLIDE 7

OceanStore:7 ROC/OceanStore Jan’02

Inner Ring

  • Running Byzantine ring from Castro-Liskov

– Elected “general” serializes requests

  • Proactive Threshold signatures

– Permits the generation of single signature from Byzantine agreement process

  • Highly tuned cryptography (in C)

– Batching of requests yields higher throughput

  • Delayed updates to archive

– Batches archival ops for somewhat quiet periods

  • Currently getting approximately 5Mb/sec
slide-8
SLIDE 8

OceanStore:8 ROC/OceanStore Jan’02

We have Throughput Graphs! (Sean will discuss)

slide-9
SLIDE 9

OceanStore:9 ROC/OceanStore Jan’02

Self-Organizing second-tier

  • Have simple algorithms for placing replicas on

nodes in the interior

– Intuition: locality properties

  • f Tapestry help select positions

for replicas – Tapestry helps associate parents and children to build multicast tree

  • Preliminary results

show that this is effective

  • We have tentative writes!

– Allows local clients to see data quickly

slide-10
SLIDE 10

OceanStore:10 ROC/OceanStore Jan’02

Effectiveness of second tier

slide-11
SLIDE 11

OceanStore:11 ROC/OceanStore Jan’02

Archival Layer

  • Initial implementation needed lots of tuning

– Was getting 1Mb/sec coding throughput – Still lots of room to go:

  • A “C” version of fragmentation could get 26MB/s
  • SHA-1 evaluation expensive
  • Beginnings of online analysis of servers

– Collection facility similar to web crawler – Exploring failure correlations for global web sites – Eventually used to help distribute fragments

slide-12
SLIDE 12

OceanStore:12 ROC/OceanStore Jan’02

New Metric: FBLPY

  • No more discussion of 1034 years MTTF
  • Easier to understand?
slide-13
SLIDE 13

OceanStore:13 ROC/OceanStore Jan’02

Basic Tapestry Mesh

Incremental suffix-based routing

4 2 3 3 3 2 2 1 2 4 1 2 3 3 1 3 4 1 1 4 3 2 4 NodeID 0x43FE NodeID 0x13FE NodeID 0xABFE NodeID 0x1290 NodeID 0x239E NodeID 0x73FE NodeID 0x423E NodeID 0x79FE NodeID 0x23FE NodeID 0x73FF NodeID 0x555E NodeID 0x035E NodeID 0x44FE NodeID 0x9990 NodeID 0xF990 NodeID 0x993E NodeID 0x04FE NodeID 0x43FE

slide-14
SLIDE 14

OceanStore:14 ROC/OceanStore Jan’02

Dynamic Adaptation in Tapestry

  • New algorithms for nearest-neighbor

acquisition [SPAA ’02]

  • Massive parallel inserts with objects staying

continuously available [SPAA ’02]

  • Deletes (voluntary and involuntary): [SPAA ’02]
  • Hierarchical objects search for mobility

[MOBICOM submission]

  • Continuous adjustment of neighbor links to

adapt to failure [ICNP]

  • Hierarchical routing (Brocade): [IPTPS’01]
slide-15
SLIDE 15

OceanStore:15 ROC/OceanStore Jan’02

Reality: Web Caching through OceanStore

slide-16
SLIDE 16

OceanStore:16 ROC/OceanStore Jan’02

Other Apps

  • This summer: Email through OceanStore

– IMAP and POP proxies – Let normal mail clients access mailboxes in OS

  • Palm-pilot synchronization

– Palm data base as an OceanStore DB

  • Better file system support

– Windows IFS (Really!)

slide-17
SLIDE 17

OceanStore:17 ROC/OceanStore Jan’02

Summer Work

  • Big push to get privacy aspects of OceanStore

up and running

  • Big push for more apps
  • Big push for Introspective computing aspects

– Continuous adaptation of network – Replica placement – Management/Recovery – Continuous Archival Repair

  • Big push for stability

– Getting stable OceanStore running continuously – Over big distances – …

slide-18
SLIDE 18

OceanStore:18 ROC/OceanStore Jan’02

For more info:

  • OceanStore vision paper for ASPLOS 2000

“OceanStore: An Architecture for Global-Scale Persistent Storage”

  • OceanStore paper on Maintenance (IEEE IC):

“Maintenance-Free Global Data Storage”

  • SPAA paper on dynamic integration

“Distributed Object Location in a Dynamic Network”

  • Both available on OceanStore web site:

http://oceanstore.cs.berkeley.edu/