maxymilian miech
play

Maxymilian miech But here are those who did the work: Wyatt Lloyd, - PowerPoint PPT Presentation

COPS Scalable Causal Consistency for Wide-Area Storage A presentation by Maxymilian miech But here are those who did the work: Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen What it will be about 1. Problem definition


  1. COPS Scalable Causal Consistency for Wide-Area Storage A presentation by Maxymilian Śmiech But here are those who did the work: Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen

  2. What it will be about 1. Problem definition 2. Idea of the solution 3. Implementation overview 4. Performance analysis 5. Previous work 6. Summary

  3. The ultimate goal ● Distributed storage system should: – Give consistent view of data – Be always available – Perform well in case of partitioning

  4. Unfortunately: CAP Theorem ● It is not possible to have strongly consistent (linearizable), always available system with partition tolerance ● In practice we sacrifice consistency

  5. Over the years... ● It was sufficient in the past early search engines – synchronization was not critical ● Now we have distributed systems with complex dependencies modern social networks – inconsistent data leads to frustration

  6. What is worth fighting for ● A vailability „Always on” ● low L atency experience ● P artition tolerance ● high S calability We can't have all C , A , P . Instead we trade strong consistency for ability to easily achieve low latency and high scalability. CAP → ALPS But we don't want to give up whole C – single view of data helps writing simple software

  7. Solution – COPS data store ● Clusters of Order-Preserving Servers ● It has causal+ consistency: – Causal consistency – Convergent conflict handling ● Causal+ is the strongest consistency model achievable under ALPS constraints

  8. Causal consistency ● Ensures dependencies between data No need to handle them at application level ● Case study: Alice adds a photo to an album: 1.Save uploaded photo (and its metadata) 2.Add photo (its reference) to an album Now Bob opens album page: 1.Read album data (list of photo references) 2.For each photo reference, put link on the page

  9. Causal vs Eventual ● In e ventual data store, cluster can return updates „out of order”. Therefore application server must ensure that Bob won't get affected by references to pictures not present in „his” cluster. If not, he may get a „404” error! At application level we must check if data store has all photos referenced from the album. If not – we don't render bad links on the page. Each time the album is viewed, we check which photos are available. We shouldn't think that way!

  10. Causal vs Eventual ● We switch to causal Now each cluster checks if it has received the photo. If not – it will return old album info, without that dangling photo reference. Old album contents will be returned even if updated version is available. Cluster delays updates received from remote cluster, until all dependencies are satisfied. Result: when data store returns updated album, the application can be sure that new photo is also available.

  11. Convergent conflict handling ● Every cluster uses the same handler function to resolve conflicts between two values assigned to the same key. ● We require that handler is associative and commutative. ● That ensures convergence to the same final value , independent of conflict resolution order. ● Handler can be provided by the application. It can execute some processing or just „add” both possibilities, put them as a new value and let application handle it later.

  12. Design details ● Two versions: – COPS Read/write single pieces of data. Reads always return values according to causal+. – COPS-GT Get transaction – ability to retrieve consistent set of values. ● They differ in stored metadata – single system must consist of same-type clusters.

  13. Assumptions ● Small number of big datacenters ● Each datacenter contains application (front-end) servers talking to local storage cluster ● Each cluster keeps copy of all data and is contained entirely in single datacenter ● Datacenters are good enough to provide low latency of local operations and resistance to partitioning (cluster is linearizable)

  14. Expectations ● COPS requires powerful datacenters, so what it gives in return? Asynchronous replication in the background: – Data is constantly exchanged with other datacenters without blocking current operations – Data always respects causal+ properties... …even if any of datacenters fails, dependencies are preserved

  15. COPS (abstract) interface ● Nothing more than a simple key-value store: value = get(key) put(key, value) ● Execution thread – a stateful „session” used by client (application server) when performing operations on the data store All communication between threads happen through COPS (so dependencies can be tracked)

  16. Causality relation ● If a and b happen in single execution thread and a happens before b , then a→b If a is put(k,v) and b is get(k) which returns the value put by a , then a→b a→b and b→c implies a→c ● If a→b and both are puts, we say that b depends on a ● If a ↛ ↛ b and b a , then a and b are concurrent . They are unrelated and can be replicated independently. But if such a is put(k,v) and b is put(k,w) , then a and b are in conflict . It must be resolved.

  17. Causality relation: example There should be more arrows, but they are implied by those shown above

  18. Architecture Node – part of a linearizable key-value Application context to track store with additional extensions to dependencies in execution thread support replication in a causal+ way

  19. Dividing the keyspace ● Each cluster has full copy of key-value set Cluster can use consistent hashing or other methods of dividing keyspace between its nodes ● Cluster can use chain replication for fault tolerance. For each key there is single primary node per cluster. Only primary nodes of corresponding keys exchange messages between clusters

  20. Library interface ● ctx_id = createContext() ● bool = deleteContext(ctx_id) ● bool = put(key, value, ctx_id) ● value = get(key, ctx_id) [In COPS] values = get_trans(keys, ctx_id) [In COPS-GT] ● ctx_id is used to track specific context when single client of COPS (application server) handles multiple user sessions

  21. Lamport timestamp ● Used to assign version to <key, value> after each put(key,val) . It respects causal dependencies (larger timestamp means later update) ● Basically: counter is incremented before each local update and is send with replication messages. Receiver chooses maximum-plus-one from received value and its own counter as the time of message arrival and receiver's current counter. ● Combined with unique node identifier allows to implement default convergent conflict handling (we get global order on updates to the same key so just let last write win)

  22. Nearest dependencies ● Used to limit the size of metadata kept by the client library and number of checks done by nodes ● COPS-GT must keep all dependencies

  23. Dependencies ● Context keeps <key, version, [deps]> entries version is increasing with causally-related puts to key ● val = get(key) adds <key, version, [deps]> to the context (application saw the val , so next actions may be somehow based on it) ● put(key,val) uses current context as set of dependencies for key COPS: After, it will clear current context and add single <key, ver> for that put It is possible in COPS because of transitivity of dependencies – only the nearest are needed and this put is nearer than anything before COPS-GT cannot remove anything because it must be able to support get transactions

  24. Replication: sender's cluster <bool,ver> = put_after(key,val[,deps],nearest,ver) ● ● Write to local cluster: ver = null Primary node is responsible for assigning ver and returning it to the client library. In local cluster all dependencies are already satisfied. ● Remote replication: Primary node asynchronously issues the same put_after to remote primary nodes, but including previously assigned ver

  25. Replication: receiver's cluster ● bool = dep_check(key,ver) It is called by the remote node for each of the nearest dependencies to determine if it is satisfied in receiver's cluster. Remember that each key is assigned to single node – that node will not return from above call until it has written required dependency. That dependency will be asynchronously replicated between that node and its corresponding one from sender's cluster. ● dep_check can time out, it may be because of node failure. It is called again, probably to other node responsible for the key.

  26. COPS: Retrieving data ● <val, ver> = get_by_version(key) Always the latest version is returned (and stored internally). ● Client library will update context accordingly: <key, ver> will be added

  27. COPS-GT: Retrieving data ● <val, ver, deps> = get_by_version(key, ver) Default behavior is to get latest version, but older versions can be retrieved, so get_trans will work properly. ● Client library will update context accordingly: <key, ver, deps> will be added

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend