Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, - - PowerPoint PPT Presentation

atomicity
SMART_READER_LITE
LIVE PREVIEW

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, - - PowerPoint PPT Presentation

Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 2 / 38 Introduction Introduction Implementing


slide-1
SLIDE 1

Atomicity

Bailu Ding Oct 18, 2012

Bailu Ding Atomicity Oct 18, 2012 1 / 38

slide-2
SLIDE 2

Outline

1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication

Bailu Ding Atomicity Oct 18, 2012 2 / 38

slide-3
SLIDE 3

Introduction

Introduction

Implementing Fault-Tolerance Services Using State Machine Approach Sinfonia: A New Paradim for Building Scalable Distributed Systems The Dangers of Replication and a Solution

Bailu Ding Atomicity Oct 18, 2012 3 / 38

slide-4
SLIDE 4

State Machine

Outline

1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication

Bailu Ding Atomicity Oct 18, 2012 4 / 38

slide-5
SLIDE 5

State Machine

State Machine

Server

State variables Commands Example: memory, reads, writes. Outputs of a state machine are completely determined by the sequence

  • f requests it processes

Client Output

Bailu Ding Atomicity Oct 18, 2012 5 / 38

slide-6
SLIDE 6

State Machine

Causality

Requests issued by a single client to a given state machine are processed by the order they were issued If request r was made to a state machine sm caused a request r′ to sm, then sm processes r before r′

Bailu Ding Atomicity Oct 18, 2012 6 / 38

slide-7
SLIDE 7

State Machine

Fault Tolerance

Byzantine failures Fail-stop failures t fault tolerant

Bailu Ding Atomicity Oct 18, 2012 7 / 38

slide-8
SLIDE 8

State Machine

Fault-Tolerant State Machine

Replicate state machine t fault tolerant

Byzantine: 2t+1 Fail-stop: t+1

Bailu Ding Atomicity Oct 18, 2012 8 / 38

slide-9
SLIDE 9

State Machine

Replica Coordination

Requriements Agreement: receive the same sequence of requests Order: process the requests in the same relative order

Bailu Ding Atomicity Oct 18, 2012 9 / 38

slide-10
SLIDE 10

State Machine

Agreement

Transmitter: disseminate a value to other processors All nonfaulty processors agree on the same value If the transmitter is nonfaulty, then all nonfaulty processors use its value as the one on which they agree

Bailu Ding Atomicity Oct 18, 2012 10 / 38

slide-11
SLIDE 11

State Machine

Order

Each request has a unique identifier State machine processes requests ordered by unique identifiers Stable: no request with a lower unique identifier can arrive Challenge Unique identifier assignment that satisfies causality Stability test

Bailu Ding Atomicity Oct 18, 2012 11 / 38

slide-12
SLIDE 12

State Machine

Order Implementation

Logical Clocks Each event e has a timestamp T(e) Each processor p has a counter T(p) Each message sent by p is associated with a timestamp T(p) T(p) is updated when sending or receiving a message Satisfy causality Stability test for fail-stop failures

Send a request r to processor p ensures T(p) > T(r) A request r is stable if T(p) > T(r) for all processors

Bailu Ding Atomicity Oct 18, 2012 12 / 38

slide-13
SLIDE 13

State Machine

Order Implmentation

Synchronized Real-Time Clocks Approximately synchronized clocks Use real time as timestamps Satisfy causality

No client makes two or more requests between successive clock ticks Degree of clock synchronization is better than the minimum message delivery time

Stability test I: wait after delta time Stability test II: receive larger identifier from all clients

Bailu Ding Atomicity Oct 18, 2012 13 / 38

slide-14
SLIDE 14

State Machine

Order Implementation

Replica-Generated Identifiers Two phase

State machine replicas propose candidate unique identifiers One of the candidates is selected

Communication between all processors are not necessary Stability test:

Selected candidate is the maximum of all the candidates Candidate proposed by a replica is larger than the unique identifier of any accepted request

Causality: a client waits until all replicas accept its previous request

Bailu Ding Atomicity Oct 18, 2012 14 / 38

slide-15
SLIDE 15

State Machine

Faulty Clients

Replicate the client Challenges

Requests with different unique identifiers Requests with different content

Bailu Ding Atomicity Oct 18, 2012 15 / 38

slide-16
SLIDE 16

State Machine

Reconfiguration

Remove faulty state machine Add new state machine

Bailu Ding Atomicity Oct 18, 2012 16 / 38

slide-17
SLIDE 17

Sinfonia

Outline

1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication

Bailu Ding Atomicity Oct 18, 2012 17 / 38

slide-18
SLIDE 18

Sinfonia

Sinfonia

Two Phase Commit Sinfonia

Bailu Ding Atomicity Oct 18, 2012 18 / 38

slide-19
SLIDE 19

Sinfonia

Two Phase Commit

Problem All participate in a distributed atomic transaction commit or abort a transaction

Bailu Ding Atomicity Oct 18, 2012 19 / 38

slide-20
SLIDE 20

Sinfonia

Two Phase Commit

Problem All participate in a distributed atomic transaction commit or abort a transaction Challenge A transaction can commit its updates on one participate, but a second participate can fail before the transaction commits there. When the failed participant recovers, it must be able to commit the transaction.

Bailu Ding Atomicity Oct 18, 2012 19 / 38

slide-21
SLIDE 21

Sinfonia

Two Phase Commit

Idea Each participant must durably store its portion of updates before the transaction commits anywhere. Prepare (Voting) Phase: a coordinator sends updates to all participants Commit Phase: a coordinator sends commit requests to all participants

Bailu Ding Atomicity Oct 18, 2012 20 / 38

slide-22
SLIDE 22

Sinfonia

Motivation

Problem Data centers are growing quickly Need distributed applications scale well Current protocols are often too complex Idea New building block

Bailu Ding Atomicity Oct 18, 2012 21 / 38

slide-23
SLIDE 23

Sinfonia

Scope

System within a data center

Network latency is low Nodes can fail Stable storage can fail

Infrastructure applications

Fault-tolerant and consistent Cluster file systems, distributed lock managers, group communication services, distributed name services

Bailu Ding Atomicity Oct 18, 2012 22 / 38

slide-24
SLIDE 24

Sinfonia

Approach

Idea What can we sqeeuze out of 2PC?

Bailu Ding Atomicity Oct 18, 2012 23 / 38

slide-25
SLIDE 25

Sinfonia

Approach

Idea What can we sqeeuze out of 2PC? Observation For pre-defined read set, an entire transaction can be piggybacked in 2PC.

Bailu Ding Atomicity Oct 18, 2012 23 / 38

slide-26
SLIDE 26

Sinfonia

Approach

Idea What can we sqeeuze out of 2PC? Observation For pre-defined read set, an entire transaction can be piggybacked in 2PC. Solution Minitransaction: compare-read-write

Bailu Ding Atomicity Oct 18, 2012 23 / 38

slide-27
SLIDE 27

Sinfonia

Minitransaction

Minitransaction Compare items, read items, write items Prepare phase: compare items Commit phase: if all comparison succeed, return read items and update write items; otherwise, abort.

Bailu Ding Atomicity Oct 18, 2012 24 / 38

slide-28
SLIDE 28

Sinfonia

Minitransaction

Minitransaction Compare items, read items, write items Prepare phase: compare items Commit phase: if all comparison succeed, return read items and update write items; otherwise, abort. Applications Compare and swap Atomic read of multiple data Acquire multiple leases Sinfonia File System

Bailu Ding Atomicity Oct 18, 2012 24 / 38

slide-29
SLIDE 29

Sinfonia

Architecture

Bailu Ding Atomicity Oct 18, 2012 25 / 38

slide-30
SLIDE 30

Sinfonia

Fault Tolerance

App crash, memory crash, storage crash Disk images, logging, replication, backup

Bailu Ding Atomicity Oct 18, 2012 26 / 38

slide-31
SLIDE 31

Dangers of Replication

Outline

1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication

Bailu Ding Atomicity Oct 18, 2012 27 / 38

slide-32
SLIDE 32

Dangers of Replication

Contribution

Dangers of Replication A ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Solution Two-tier replication algorithm Commutative transactions

Bailu Ding Atomicity Oct 18, 2012 28 / 38

slide-33
SLIDE 33

Dangers of Replication

Existing Replication Algorithms

Replication Propagation Eager replication: replication as part of a transaction Lazy replication: replication as multiple transactions Replication Regulation Group: update anywhere Master: update the primary copy

Bailu Ding Atomicity Oct 18, 2012 29 / 38

slide-34
SLIDE 34

Dangers of Replication

Analytic Model

Parameters Number of nodes (Nodes) Number of transactions per second (TPS) Number of items updated per transaction (Actions ) Duration of a transaction (Action Time) Database size (DB Size) Serial replication

Bailu Ding Atomicity Oct 18, 2012 30 / 38

slide-35
SLIDE 35

Dangers of Replication

Analysis of Eager Replication

Single Node Concurrent Transactions: Transactions = TPS × Actions × Action Time Resource: Transactions × Actions/2 Locked Resource: Transactions × Actions/2/DB Size Probability of Waits Per Transaction: PW = (1 − Transactions × Actions/2/DB Size)Actions ≈ Transactions × Actions2/2/DB Size Probability of Deadlocks Per Transaction: PD ≈ PW 2/Transactions = TPS × Action Time × Actions5/4/DB Size2 Deadlock Rate Per Trasction: DR = PD/(Actions × Action Time) ≈ TPS × Actions4/4/DB Size2 Deadlock Rate Per Node: DT = TPS2 × Actions5 × Action Time/4/DB Size2

Bailu Ding Atomicity Oct 18, 2012 31 / 38

slide-36
SLIDE 36

Dangers of Replication

Analysis of Eager Replication

Multiple Nodes Transaction Duration: Actions × Nodes × Action Time Concurrent Transactions: Transactions = TPS × Actions × Action Time × Nodes2 Probability of Waits Per Transaction: PWm ≈ PW × Nodes2 Probability of Deadlocks Per Transaction: PDm ≈ PW 2/Transactions = PD × Nodes2 Deadlock Rate Per Transaction: DRm ≈ DR × Nodes Deadlock Rate Total: DTm ≈ DT × Nodes3 DB Grows Linearly (unlikely): DT × Nodes

Bailu Ding Atomicity Oct 18, 2012 32 / 38

slide-37
SLIDE 37

Dangers of Replication

Analysis of Eager Replication

Master Serialized at the master No deadlocks if each transaction updates a single replica Deadlocks for mutiple masters

Bailu Ding Atomicity Oct 18, 2012 33 / 38

slide-38
SLIDE 38

Dangers of Replication

Lazy Replication

Lazy Group Replication No waits or deadlocks, but reconciliation. Reconciliation rate: TPS2 × Action Time × (Actions × Nodes)3/2/DB Size Lazy Master Replication Reconciliation rate is quadratic to Nodes.

Bailu Ding Atomicity Oct 18, 2012 34 / 38

slide-39
SLIDE 39

Dangers of Replication

Sinfonia Revisit

Analysis of Scalability The number of application nodes: App Nodes The number of memory nodes: Mem Nodes Total TPS: TPS′ = TPS × App Nodes Total DB size: DB Size′ = DB Size × Mem Nodes Single App/Mem node: Rate = TPS2xAction TimexActions5/4/DB Size2 Multiple App/Mem nodes: Rate′ = TPS′2xAction TimexActions5/4/DB Size′2 = (App Nodes/Mem Nodes)2xRate

Bailu Ding Atomicity Oct 18, 2012 35 / 38

slide-40
SLIDE 40

Dangers of Replication

Sinfonia Revisit

Analysis

Bailu Ding Atomicity Oct 18, 2012 36 / 38

slide-41
SLIDE 41

Dangers of Replication

Sinfonia Revisit

Analysis

Bailu Ding Atomicity Oct 18, 2012 37 / 38

slide-42
SLIDE 42

Dangers of Replication

Discussion

Parallel Eager Replication Transaction Duration: Actions × Action Time Concurrent Transactions: Transactions = TPS × Actions × Action Time × Nodes Probability of Waits Per Transaction: PWp ≈ PW × Nodes Probability of Deadlocks Per Transaction: PDp ≈ PW 2/Transactions = PD × Nodes Deadlock Rate Per Transaction: DRp ≈ DR Deadlock Rate Total: DTp ≈ DT × Nodes DB Grows Linearly: DT/Nodes Any problem?

Bailu Ding Atomicity Oct 18, 2012 38 / 38

slide-43
SLIDE 43

Dangers of Replication

Discussion

Fault Tolerance Logging v.s. Replication? Ordering Timestamping in recent system, i.e. Percolator?

Bailu Ding Atomicity Oct 18, 2012 39 / 38