PADS: Policy Architecture for Distributed Storage Systems
Nalini Belaramani, Jiandan Zheng, Amol Nayate, Robert Soulé, Mike Dahlin and Robert Grimm.
University of Texas at AusHn, Amazon Inc., IBM T.J. Watson, New York University
PADS: Policy Architecture for Distributed Storage Systems Nalini - - PowerPoint PPT Presentation
PADS: Policy Architecture for Distributed Storage Systems Nalini Belaramani, Jiandan Zheng, Amol Nayate, Robert Soul, Mike Dahlin and Robert Grimm. University of Texas at AusHn, Amazon Inc., IBM T.J. Watson, New York University Lots of data
University of Texas at AusHn, Amazon Inc., IBM T.J. Watson, New York University
1985 1995 2005 NFS AFS Coda Deceit Ficus Zebra Bayou XFS OceanStore Farsite Pangaea Ivy Google File System Chain Replication BlueFS Segank Ceph OmniStore WheelFS Dynamo TierStore WinFS Cimbiosys
System 1 Policy PRACTI Mechanisms System 2 Policy System 3 Policy
[*] “PRACTI Replication”, Nalini Belaramani, Mike Dahlin, Lei Gao, Amol Nayate, Arun Venkataramani, Praveen Yalagandula, and Jiandan Zheng. NSDI 2006.
Topology Independence Any Consistency
Bayou Pangaea Coda TRIP* Chain ReplicaHon TierStore FCS
Par5al Replica5on
Bayou* Coda* TierStore* TRIP SCS
10 ‐ 100 LOC
When and where to send an update? Who to contact on a local read miss? Bayou Coda Chain ReplicaHon TierStore
Source Node Destination Node
PADS RouHng Policy Blocking Policy
Operation block Write Delete Inval arrived Send body succ Send body failed Subscription start Subscription caught- up Subscription end Add inval sub Add body sub Remove inval sub Remove body sub Send body Assign seq B_action
[*] “ImplemenHng DeclaraHve Overlays”. Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros ManiaHs, Timothy Roscoe, Ion Stoica. SOSP 2005.
PADS RouHng Policy Blocking Policy Read “/foo” Read Operation Block: “/foo” Add Subscription: Server to me, “/foo”
addInvalSubscription(@C, S, C, Obj, Catchup) :-
serverId(@C, S), BPoint==“ReadNowBlock”, Catchup:=“CP”. Table lookup CondiHons Assignment ResulHng AcHon Triggering event
in0 TRIG readEvent(@X, ObjId) :‐ EVT iniHalize(@X), ObjId := "/.parent“. pp0 TBL parent(@X, P) :‐ RCV parent(@X, P). pp1 TRIG readAndWatchEvent(@X, ObjId) :‐ RCV iniHalize(@X), ObjId := "/.subList". pSb0 TBL subscripHon(@X, SS) :‐ RCV subscripHon(@X, SS). pSb1 ACT addInvalSub(@X, P, X, SS, CTP) :‐ RCV subscripHon(@X, SS), TBL parent(@X, P), CTP=="LOG". pSb2 ACT addBodySub(@X, P, X, SS) :‐ RCV subscripHon(@X, SS), TBL parent(@X, P). f1 ACT addInvalSub(@X, P, X, SS, CTP) :‐ TRIG subEnd(@X, P, X, SS, , Type), TBL parent(@X, P), Type=="Inval", CTP:="LOG". f2 ACT addBodySub(@X, P, X, SS) :‐ TRIG subEnd(@X, P, X, SS, , Type), TBL parent(@X, P), TYPE=="Body", CTP:="LOG". cSb1 ACT addInvalSub(@X, C, X, SS, CTP) :‐ TRIG subStart(@X, X, C, , Type), C 6= P, Type == "Inval", SS := "/*", CTP := "LOG". cSb2 ACT addBodySub(@X, C, X, SS, CTP) :‐ TRIG subStart(@X, X, C, , Type), C 6= P, Type == "Body", SS := "/*". dtn1 ACT addInvalSub(@X, R, X, SS, CTP) :‐ EVT relayNodeArrives(@X, R), TBL subscripHon(@X, SS), CTP=="LOG". dtn2 ACT addBodySub(@X, R, X, SS) :‐ EVT relayNodeArrives(@X, R), TBL subscripHon(@X, SS), CTP=="LOG". dtn3 ACT addInvalSub(@X, X, R, SS, CTP) :‐ EVT relayNodeArrives(@X, R), SS:="/*", CTP=="LOG". dtn4 ACT addBodySub(@X, X, R, SS) :‐ EVT relayNodeArrives(@X, R), SS:="/*", CTP=="LOG".
Parent Config PublicaHons Config SubscripHons from parent SubscripHons from child DTN support
[*] “TierStore: A Distributed Storage System for Challenged Networks”. M. Demmer, B. Du, and E. Brewer. FAST 2008.
PADS
Read Write Update
Is valid Is causal Is sequenced Max staleness R_Msg
Read at block: Is_causal
Write after block : R_Msg (ackFromServer)
Update Update Update Update Inval Inval Inval Inval v. update progagaHon Disconnected
Leases Callbacks Mono‐ Reads Causal Lineari‐ zable Mono‐ Reads Seque nHal Open/ Close Seque nHal Consistency Demand caching ParHal Full Full ParHal Full ParHal ParHal ReplicaHon Ad‐Hoc Ad‐ Hoc Chains Tree Client/ Server Client/ Server Client/ Server Topology Pangaea Bayou Chain Repl Tier Store TRIP Coda FCS Prefetching Inval Seque nHal ParHal Client/ Server SCS CooperaHve caching
System P‐Pangaea P‐Bayou P‐TierStore P‐Coda P‐TRIP Rou5ng Rules 75 9 14 31 6 Blocking Condi5ons 1 3 1 5 3 P‐Chain Rep 75 5 P‐FCS 43 6 P‐Bayou* 9 3 P‐Coda* 44 5 P‐TierStore* P‐TRIP* 29 6 1 3
Read Latency (ms)
Kilobytes transferred Number of updates
Time (ms)
Kilobytes Transferred
[*] “TierStore: A Distributed Storage System for Challenged Networks”. M. Demmer, B. Du, and E. Brewer. FAST 2008.