globalfs a strongly consistent multi site filesystem
play

GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro - PowerPoint PPT Presentation

GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivire Pascal Felber RainbowFS Workshop May 3rd, 2017 Distributed applications GlobalFS: A Strongly Consistent


  1. GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivière Pascal Felber RainbowFS Workshop May 3rd, 2017

  2. Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  3. Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  4. Distributed applications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  5. Distributed applications ? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  6. Distributed applications Distributed Storage GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  7. Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Caches Coordination Systems File Systems GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  8. Distributed applications Distributed Storage SQL Databases NoSQL Databases Key-value storage Caches Coordination Systems Easy interoperability File Systems File Systems for existing aplications GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2

  9. Global infrastructure Amazon’s AWS global infrastructure GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 3

  10. CAP theorem Weak Consistency Strong Consistency Lower latency Clear semantics and guarantees Higher availability Easier to reason about Possibly incorrect/unexpected Block instead of providing incorrect results results GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 4

  11. What is GlobalFS? Geographically distributed filesystem Familiar interface (POSIX) Strong consistency Fault-tolerance through replication Flexible performance through locality GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 5

  12. Overall design Separate data and metadata Partial replication Metadata protocol exploiting atomic multicast Causal reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 6

  13. Separate data and metadata Metadata Immutable data Controls file contents, Variable sized blobs properties and filesystem structure Metadata refers to data blobs 1 | 2 | 3 | 4 | … GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 7

  14. Partial replication Immutable data is simple to replicate consistently Metadata is partitioned between replica groups (i.e., partitions) GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 8

  15. Partial replication EU US SA GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 9

  16. Partial replication EU US / www bin etc home SA alice bob mark GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 10

  17. Partial replication EU US / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 11

  18. Partial replication EU US Global Replication / www bin etc home SA alice bob mark US SA EU GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 12

  19. Partial replication EU US Global Replication / www bin etc home SA alice bob mark Local multicast US SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 13

  20. Partial replication EU Global multicast (global replication) US - costly updates - fast local reads Global Replication / www bin etc home SA alice bob mark Local multicast US SA EU - fast updates - local or remote reads GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 14

  21. Partial ordering GlobalFS exploits atomic multicast Atomic delivery to groups of processes Partial ordering: messages for different groups don’t have to be ordered betweem themselves Partial ordering is critical for scalability GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 15

  22. Architecture Metadata replicas Atomic Send read or update multicast commands Application Client Data store (FUSE) Insert or fetch immutable data GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 16

  23. Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17

  24. Consistent update operations Step 1 Write data blobs to data store Step 2 Issue a metadata update Single-partition Uncoordinated Coordinated multi-partition multi-partition Reply Reply Reply Req Req Req G 1 G 1 G 1 G 2 G 2 G 2 write to file in G 1 write to file in { G 1 , G 2 } move file from G 1 to G 2 Atomic Multicast Execution GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17

  25. Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

  26. Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

  27. Causal read operations Causally related updates are seen in the same order e.g., operations done by the same client Client A Client B Creates an image cat.jpg Opens the pets.html page and finds a broken image reference Modifies a page pets.html to include the image cat.jpg Where is the cat? GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18

  28. Causal read operations Step 1 Contact a metadata replica for a list of blob ids Step 2 Get the data from the data store Approach inspired by vector clocks Vector is composed of one counter per replica group GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 19

  29. Evaluation Complete prototype in Java https://github.com/pacheco/GlobalFS Filesystem in Userspace (FUSE) URingPaxos for atomic multicast Global deployment using Amazon EC2 GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 20

  30. Maximum throughput by operation GlobalFS throughput 60000 1800 GlobalFS CalvinFS 1600 50000 1400 Operations/sec 40000 1200 1000 30000 800 20000 600 Locality 400 10000 200 0 0 read 1KB local create 1KB local write 1KB glob. create 1KB glob. write 1KB 3 region deployment US west, US east and Europe GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 21

  31. Geographical scalability 1 Region 3 Regions 6 Regions 9 Regions Geographical Scalability s p s s p p o o o 1 2 2 8 0 8 7 Ideal 8 0 6 1 6 3 1 0.8 0.6 0.4 0.2 read 1KB create write 1KB Normalized throughput per region as more regions are added 9 regions uses all EC2 regions available at the time GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 22

  32. GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23

  33. GlobalFS: Summary Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations Thank you! Leandro Pacheco pachecol@usi.ch GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend