Depot Maciej Smolenski 19 January 2011 Introduction Introduction - - PowerPoint PPT Presentation

depot
SMART_READER_LITE
LIVE PREVIEW

Depot Maciej Smolenski 19 January 2011 Introduction Introduction - - PowerPoint PPT Presentation

Depot Maciej Smolenski 19 January 2011 Introduction Introduction 2 / 36 Depot Cloud storage system. Cloud storage: in the spirit of S3, Azure, Google Storage. Depot clients do not have to trust that servers operate correctly.


slide-1
SLIDE 1

Depot

Maciej Smolenski

19 January 2011

slide-2
SLIDE 2

Introduction

2 / 36

Introduction

slide-3
SLIDE 3

Depot

Introduction 3 / 36

Cloud storage system.

  • Cloud storage: in the spirit of S3, Azure, Google

Storage.

  • Depot clients do not have to trust that servers
  • perate correctly.
slide-4
SLIDE 4

Untrusted Storage Service Providers

Introduction 4 / 36

  • Software bugs.
  • Misconfigured servers or operator errors.
  • Malicious insiders.
  • Acts of God or Man (e.g. fires).

Providing trust guarantees will help both Clients and Storage Service Providers (SSPs).

slide-5
SLIDE 5

Depot Consistency

Introduction 5 / 36

Depot ensures that the updates observed by correct nodes are consistently ordered under Fork-Join- Causal consistency (FJC). FJC is a slight weakening of causal consistency that can be both safe and live despite faulty nodes.

slide-6
SLIDE 6

Depot Guarantees

Introduction 6 / 36

Depot implements protocols are based on FJC updates ordering. Depot provides guarentees for:

  • Consistency
  • Availability
  • Durability
  • Staleness
  • Latency

Depot provides these guarantees with low overhead.

slide-7
SLIDE 7

Consistency

7 / 36

Consistency

slide-8
SLIDE 8

Consistency Types

Consistency 8 / 36

  • Sequential Consistency
  • Causal Consistency
  • Fork Consistency
  • Fork Join Consistency
  • Fork Join Causal Consistency
slide-9
SLIDE 9

Sequential Consistency

Consistency 9 / 36

updating (events): n1: ------------------------<n1_u1>-------- n2: ------------------------<n2_u1>-------- n3: ---<n3_u1>-----------------------------

  • rdering:

n1,n2,n3: (n3_u1) (n2_u1) (n1_u1)

  • r

n1,n2,n3: (n3_u1) (n1_u1) (n2_u1)

  • Events are order linearly (totally).
  • Conflicts resolution: no conflicts.
slide-10
SLIDE 10

Causal Consistency

Consistency 10 / 36

updating (events): n1: ------------------------<n1_u1>-------- n2: ------------------------<n2_u1>-------- n3: ---<n3_u1>-----------------------------

  • rdering:

n1: (n3_u1) (n1_u1) (n2_u1) n2: (n3_u1) (n2_u1) (n1_u1) n3: (n3_u1) (n1_u1) (n2_u1)

  • r:

(n3_u1) (n2_u1) (n1_u1)

  • Events are order causally.
  • Conflicts resolution: merge (application-specific).
slide-11
SLIDE 11

Fork Consistency

Consistency 11 / 36

Server orders events sequentially (correct server).

updating (events): c1: -<c1_u1>----------------------------------<c1_u2>----------------------------------- c2: ---------------------<c2_u1>--------------------------------------<c2_u2>-----------

  • rdering:

s: (c1_u1) (c2_u1) (c1_u2) (c2_u2) c1: (c1_u1) (c2_u1) (c1_u2) (c2_u2) c2: (c1_u1) (c2_u1) (c1_u2) (c2_u2)

  • Events are order totally
  • Conflicts resolution: no conflicts.
slide-12
SLIDE 12

Fork Consistency (Continued)

Consistency 12 / 36

Faulty server (can't modify data - just forks).

updating: c1: -<c1_u1>----------------------------------<c1_u2>----------------------------------- c2: --------------------<c2_u1>---------------------------------------<c2_u2>-----------

  • rdering:

/--------(c1_u2)------------------------- version for c1 s: -(c1_u1)-------------(c2_u1)-----< \--------------------------------(c2_u2)- version for c2 c1:-(c1_u1)-------------(c2_u1)---------------(c1_u2)----------------------------------- c2:-(c1_u1)-------------(c2_u1)---------------------------------------(c2_u2)-----------

  • Server may show different version to different

clients.

  • Clients can be sure that they will be partitioned

forever after fork (server misbehaviour detection).

  • Conflicts resultion: detection is enaugh.
  • Untrasted server (fork is the only way to lie).
slide-13
SLIDE 13

Fork Join Consistency

Consistency 13 / 36

Same as Fork Consistency when server is correct. Fork resolution (join) when servers misbehaviour detected.

updating: c1: -<c1_u1>----------------------------------<c1_u2>---------------------------------- c2: ---------------------<c2_u1>--------------------------------------<c2_u2>----------

  • rdering:

/--------(c1_u2)---------------------------------- s: -(c1_u1)--------------(c2_u1)----< \--------------------------------(c2_u2)---------- fork seen as concurrent updates to different servers: s1:-(c1_u1)-------------(c2_u1)-------------------------(c1_u2)------------------------- s2:-(c1_u1)-------------(c2_u1)-------------------------(c2_u2)-------------------------

  • Events after fork can be seen as concurrent (on

many servers).

  • Conflicts resolution: merge (application-specific).
slide-14
SLIDE 14

Fork Join Causal Consistency (FJC)

Consistency 14 / 36

  • Fork Join Consistency with causal event ordering.
  • Forks seen as concurrent operation (conflicts

resolution when needed).

  • Conflicts resolution: merge (application-specific).

Weaker then "Sequential Consistency" but for many application still useful. Provides session guarantees for each client.

  • Monotonic reads
  • Monotonic writes
  • Read-your-writes
  • Writes-follow-reads

Because of weaker consistency more can be achived

  • n the other fields:
  • High availability.
slide-15
SLIDE 15

Fork Join Causal Consistency (FJC) (Continued)

Consistency 15 / 36

  • Untrasted servers.
slide-16
SLIDE 16

Depot

16 / 36

Depot

slide-17
SLIDE 17

Plan

Depot 17 / 36

Protocol to achive FJC. Extended protocol to create storage system with best guarantees (possible with FJC) for:

  • Trust.
  • Availability.
  • Staleness.
  • Durability.
  • Integrity and authorization.
  • Fault tolerance.
  • Data recovery.

Depot - storage system.

slide-18
SLIDE 18

Architecture

Depot 18 / 36

Nodes

  • Clients.
  • Servers.

All nodes run the same protocol. They are just configured differently. Servers can't issue valid updates (standard cryptography techniques). Key-Value store (GET/PUT interface).

slide-19
SLIDE 19

Node

Depot 19 / 36

Each node maintains a Log and a Checkpoint:

  • Log - current updates.
  • Checkpoint - stable system state.
slide-20
SLIDE 20

Updates Propagation

Depot 20 / 36

Gossiping (Log exchange)

  • Server gossips with other servers regularly

(configuration parameter).

  • Client gossips with its primary server regularly

(configuration parameter). Client switches to client- client mode when all servers unavailable.

slide-21
SLIDE 21

Update Message

Depot 21 / 36

Extra data:

  • Dependency Version Vector (causal ordering).
  • History hash (causes that only possible server's

misbehaviour is a fork). Signature.

slide-22
SLIDE 22

Conflicts Resolution

Depot 22 / 36

Concurrent updates: application-specific merge. Forks - handled the same way as concurrent updates.

  • Application-specific merge.
  • Proof Of Misbehaviour (POM).
  • I-vouch-for-this certificates.
slide-23
SLIDE 23

Replication

Depot 23 / 36

Storing values:

  • Owner (client) keeps its value.
  • Each value should be replicated on K servers.
  • Receipt (prove that value is replicated on K

servers).

slide-24
SLIDE 24

High Availability

Depot 24 / 36

Writes: always. Reads: when available.

  • Owner's copy (client).
  • Replicated on K servers.
  • Client accepts updates only when associated with

receipt or value.

slide-25
SLIDE 25

Staleness

Depot 25 / 36

Depot guarantees bounded staleness. Each client generates special update regularly (logicaltime). When client A sees this special update from client B then this client can be sure that it has seen all preceeding updates from client B (Fork Consistency). In FC (FJC also) staleness is separated from consistency (flexibility).

slide-26
SLIDE 26

Eventual Consistency

Depot 26 / 36

Depot guarantees eventual consistency.

  • Safety - successful reads of an object at correct

nodes that observe the same set of updates return the same values.

  • Liveness - Any update issued or observed by

a correct node is eventually observeable by all correct nodes.

slide-27
SLIDE 27

Evaluation

27 / 36

Evaluation

slide-28
SLIDE 28

Method

Evaluation 28 / 36

  • 8 Clients 4 Servers
  • Server/Server gossiping every second
  • Client/Primary Server gossiping every 5 seconds
  • Key size: 32B
  • Value size: 3B 10KB 1MB
  • Read/Write: 0/100 10/90 50/50 90/10 100/0
  • Non-overlapping key renges for clients (no

conflicts)

slide-29
SLIDE 29

Method (Continued)

Evaluation 29 / 36

Compare versions (only Depot - different configurations)

  • B(Baseline)
  • B+H(Hash)
  • B+H+S(Sign)
  • B+H+S+St(Store)
  • B+H+S+St(Store)+FJC=Depot

+FJC (fork handling):

  • Extra data (history:

storing,checking,hashing,signing,verifying,transfering).

slide-30
SLIDE 30

Latency

Evaluation 30 / 36

Overhead

  • GET: no
  • PUT: extra 40%
slide-31
SLIDE 31

Cost

Evaluation 31 / 36

Factors:

  • CPU
  • Network
  • Storage

Overhead

  • GET: no
  • PUT: extra 40%
slide-32
SLIDE 32

Cost (Continued)

Evaluation 32 / 36

slide-33
SLIDE 33

Fault: Cloud Disaster

Evaluation 33 / 36

300 seconds into the experiment all servers had been stopped. Clients swiched to client-client mode. Factors:

  • Latency
  • Staleness
slide-34
SLIDE 34

Fault: Claud Disaster (Continued)

Evaluation 34 / 36

Latency Better then before (because client don't gossip with client in GET - as it does with server).

slide-35
SLIDE 35

Fault: Claud Disaster (Continued)

Evaluation 35 / 36

Staleness Worse then before (because client don't gossip with client in GET - as it does with server).

slide-36
SLIDE 36

Fault: Fork

Evaluation 36 / 36

Depot handles forks the same way as concurrency.

  • No overhead in communication (just POMs and i-

vouch-for-this certificates).

  • No overhead in cpu (all the time there is only

correct data in the system - no need to hurry).