flease lease coordination without a lock server
play

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , - PowerPoint PPT Presentation

Flease - Lease Coordination Without a Lock Server Bjrn Kolbeck , Mikael Hgqvist, Jan Stender, Felix Hupfeld * Zuse Institute Berlin, * Google Switzerland GmbH File and Metadata Replication in XtreemFS Bjrn Kolbeck 1 Problem: Data


  1. Flease - Lease Coordination Without a Lock Server Björn Kolbeck , Mikael Högqvist, Jan Stender, Felix Hupfeld * Zuse Institute Berlin, * Google Switzerland GmbH File and Metadata Replication in XtreemFS · Björn Kolbeck 1

  2. Problem: Data Replication – Data replication with strong consistency – Apply updates in same order ~ total order broadcast Destination Agreement: Fixed Sequencer: (Multi)Paxos Primary/Backup 2/20

  3. Data Replication: Primary/Backup – “Easy“ to implement Single process takes all decisions – Widley used: Google GFS, many RDBMS (Oracle, DB2, MySQL) – – Primary is SPOF Primary role must be revoked when process failed/disconnected – ➔ Leases for Primary election Lease: Exclusive access for limited period of time – Exclusive access = primary role – Timeout = revocation – 3/20

  4. Outline 1.Distributed Lease Coordination 2.The Flease Algorithm 3.Decentralized Lease Coordination 4.Evaluation 4/20

  5. Distributed Lease Coordination – Lease = exclusive access – Lease Invariant: At most one valid lease at any point in time. – Distributed System Many processes concurrently trying to get a lease – All processes must agree on the same lease – – Distributed Consensus (?) (Multi)Paxos – 5/20

  6. Distributed Lease Coordination: Agreement – Agreement (Consensus): If process p decides v then all process will decide v. – – Agreement (Leases): If process p decides l then all process will decide l – until l has timed out. – Leases have a timeout. We don't care about leases that have timed out – 6/20

  7. Deconstructing Paxos: Round Based Register – Round-based register Atomic read-modify-write – read(version) – write(version, new value) – – Register on each process – Majority-based (Quorum Intersection Property) X 1 1 read write(X) 2 2 X 3 3 7/20

  8. Paxos vs. Flease – Consensus with RBR value = read(version) IF value = empty THEN value := proposed value END IF IF write(value, version) THEN „decide“ value END IF – Lease Agreement with RBR lease = read(version) IF lease = empty OR timed_out(lease) THEN lease := (me, t now + t max ) END IF IF write(version, lease) THEN „decide“ lease END IF 8/20

  9. Flease: No persistent state – Process crashes Register contents is lost – X X 1 1 2 2 X 3 3 – Lease has timed out = empty register IF lease = empty OR timed_out(lease) THEN – – Flease: wait for t max before recovering Lease in register has timed out – 9/20

  10. Advantages of Flease – Smaller state Multipaxos: one Paxos instance per lease – Flease: only a single register – ▪ easier to implement – No disk access (Multi)Paxos: two writes per lease (on all nodes) – Flease: no disk writes – ▪ lower latency ▪ throughtput limited only by bandwidth of RAM ▪ share server with I/O intensive applications 10/20

  11. Throughput under heavy IO load 2500 zookeeper (IOZone) flease (IOZone) zookeeper (alone) flease (alone) 2000 throughput (leases/second) 1500 1000 500 0 1000 10000 20000 50000 batch size (leases per node) 11/20

  12. Decentralized Lease coordination – No separate lock service – Central Lock Service vs. Decentralized Leases No extra service (saves hardware, maintenance) – Availability of replicas depends only on replica machines – Automatically scales with the system size – 12/20

  13. Evaluation: Scalability – Zookeeper: 3 servers – Flease: 3 nodes (2 randomly selected) 13/20

  14. Evaluation: Max. number of open files/server 120000 102058 Flease 100000 10 sec Zookeeper 17010 5 sec 2445 8500 80000 1 sec 1223 1700 60000 245 51029 40000 25515 20000 17010 14672 8505 7336 3668 3402 2445 1701 1223 489 245 0 0 10 20 30 40 50 60 lease timeout (s) 30 nodes, LAN 14/20

  15. Thank You – Conclusion If you need a primary/exclusive access you can do better without a central lock service – Open Source implementation – www.xtreemfs.org – www.contrail-project.eu The Contrail project is supported by funding under – the Seventh Framework Programme of the European Commission: ICT, Internet of Services, Software and Virtualization. GA nr.: FP7-ICT-257438. 15/20

  16. 16/20

  17. Flease: Renewing Leases – Modified Lease Invariant: If process p decides l=(p',t) then all process will decide l'=(p',t') – with t' >= t until l has timed out. lease = read(version) IF lease = empty OR timed_out(lease) OR owner(lease) = me THEN lease := (me, t now + t max ) END IF IF write(version, lease) THEN „decide“ lease END IF 17/20

  18. Flease: The other half of the truth. – Assumed perfectly synchronized clocks – Instead: Loosely synchronized clocks c(t) < c(t') if t < t' – At any time t for any two processes p, q: | c p (t) – c q (t) | < ε – ε system-wide constant, e.g. 1 sec – lease = read(version) IF lease.t < t now AND lease.t > t now + ε THEN wait ε retry END IF ... 18/20

  19. Throughput vs. Messages 19/20

  20. XtreemFS: Flease for file replication – One lease per file = one primary per file better load balancing – arbitrary replica placement – – When a file is openend Elect a primary with Flease – Execute Replica Reset – Read locally, write quorum – 20/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend