Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , - - PowerPoint PPT Presentation

flease lease coordination without a lock server
SMART_READER_LITE
LIVE PREVIEW

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , - - PowerPoint PPT Presentation

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , Mikael Hgqvist, Jan Stender, Felix Hupfeld * Zuse Institute Berlin, * Google Switzerland GmbH File and Metadata Replication in XtreemFS Bjrn Kolbeck 1 Problem: Data


slide-1
SLIDE 1

File and Metadata Replication in XtreemFS · Björn Kolbeck 1

Flease - Lease Coordination Without a Lock Server Björn Kolbeck, Mikael Högqvist, Jan Stender, Felix Hupfeld* Zuse Institute Berlin, *Google Switzerland GmbH

slide-2
SLIDE 2

2/20

Problem: Data Replication

– Data replication with strong consistency – Apply updates in same order

~ total order broadcast Destination Agreement: (Multi)Paxos Fixed Sequencer: Primary/Backup

slide-3
SLIDE 3

3/20

Data Replication: Primary/Backup

– “Easy“ to implement

Single process takes all decisions

Widley used: Google GFS, many RDBMS (Oracle, DB2, MySQL) – Primary is SPOF

Primary role must be revoked when process failed/disconnected

➔ Leases for Primary election

Lease: Exclusive access for limited period of time

Exclusive access = primary role

Timeout = revocation

slide-4
SLIDE 4

4/20

Outline

1.Distributed Lease Coordination 2.The Flease Algorithm 3.Decentralized Lease Coordination 4.Evaluation

slide-5
SLIDE 5

5/20

Distributed Lease Coordination

– Lease = exclusive access – Lease Invariant:

At most one valid lease at any point in time.

– Distributed System

Many processes concurrently trying to get a lease

All processes must agree on the same lease – Distributed Consensus (?)

(Multi)Paxos

slide-6
SLIDE 6

6/20

Distributed Lease Coordination: Agreement

– Agreement (Consensus):

If process p decides v then all process will decide v. – Agreement (Leases):

If process p decides l then all process will decide l until l has timed out. – Leases have a timeout.

We don't care about leases that have timed out

slide-7
SLIDE 7

7/20

Deconstructing Paxos: Round Based Register

– Round-based register

Atomic read-modify-write

read(version)

write(version, new value) – Register on each process – Majority-based (Quorum Intersection Property) 1 2 3 read X 1 2 X 3 write(X)

slide-8
SLIDE 8

8/20

Paxos vs. Flease

– Consensus with RBR – Lease Agreement with RBR value = read(version) IF value = empty THEN value := proposed value END IF IF write(value, version) THEN „decide“ value END IF lease = read(version) IF lease = empty OR timed_out(lease) THEN lease := (me, tnow + tmax) END IF IF write(version, lease) THEN „decide“ lease END IF

slide-9
SLIDE 9

9/20

Flease: No persistent state

– Process crashes

Register contents is lost – Lease has timed out = empty register

IF lease = empty OR timed_out(lease) THEN – Flease: wait for tmax before recovering

Lease in register has timed out X 1 2 X 3 X 1 2 3

slide-10
SLIDE 10

10/20

Advantages of Flease

– Smaller state

Multipaxos: one Paxos instance per lease

Flease: only a single register ▪ easier to implement – No disk access

(Multi)Paxos: two writes per lease (on all nodes)

Flease: no disk writes ▪ lower latency ▪ throughtput limited only by bandwidth of RAM ▪ share server with I/O intensive applications

slide-11
SLIDE 11

11/20

Throughput under heavy IO load

1000 10000 20000 50000 500 1000 1500 2000 2500 zookeeper (IOZone) flease (IOZone) zookeeper (alone) flease (alone) batch size (leases per node) throughput (leases/second)

slide-12
SLIDE 12

12/20

Decentralized Lease coordination

– No separate lock service – Central Lock Service vs. Decentralized Leases

No extra service (saves hardware, maintenance)

Availability of replicas depends only on replica machines

Automatically scales with the system size

slide-13
SLIDE 13

13/20

Evaluation: Scalability

– Zookeeper: 3 servers – Flease: 3 nodes (2 randomly selected)

slide-14
SLIDE 14

14/20

Evaluation: Max. number of open files/server

10 20 30 40 50 60 20000 40000 60000 80000 100000 120000

245 489 1223 2445 3668 7336 14672 1701 3402 8505 17010 25515 51029 102058

Flease Zookeeper

lease timeout (s) 30 nodes, LAN 1 sec 1700 245 5 sec 8500 1223 10 sec 17010 2445

slide-15
SLIDE 15

15/20

Thank You

– Conclusion

If you need a primary/exclusive access you can do better without a central lock service

– Open Source implementation

– www.xtreemfs.org

– www.contrail-project.eu

The Contrail project is supported by funding under the Seventh Framework Programme of the European Commission: ICT, Internet of Services, Software and Virtualization. GA nr.: FP7-ICT-257438.

slide-16
SLIDE 16

16/20

slide-17
SLIDE 17

17/20

Flease: Renewing Leases

– Modified Lease Invariant:

If process p decides l=(p',t) then all process will decide l'=(p',t') with t' >= t until l has timed out. lease = read(version) IF lease = empty OR timed_out(lease) OR owner(lease) = me THEN lease := (me, tnow + tmax) END IF IF write(version, lease) THEN „decide“ lease END IF

slide-18
SLIDE 18

18/20

Flease: The other half of the truth.

– Assumed perfectly synchronized clocks – Instead: Loosely synchronized clocks

c(t) < c(t') if t < t'

At any time t for any two processes p, q: | cp(t) – cq(t) | < ε

ε system-wide constant, e.g. 1 sec lease = read(version) IF lease.t < tnow AND lease.t > tnow + ε THEN wait ε retry END IF ...

slide-19
SLIDE 19

19/20

Throughput vs. Messages

slide-20
SLIDE 20

20/20

XtreemFS: Flease for file replication

– One lease per file = one primary per file

better load balancing

arbitrary replica placement – When a file is openend

Elect a primary with Flease

Execute Replica Reset

Read locally, write quorum