GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro - - PowerPoint PPT Presentation
GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro - - PowerPoint PPT Presentation
GlobalFS: A Strongly Consistent Multi-Site Filesystem Leandro Pacheco Raluca Halalai Valerio Schiavoni Fernando Pedone Etienne Rivire Pascal Felber RainbowFS Workshop May 3rd, 2017 Distributed applications GlobalFS: A Strongly Consistent
Distributed applications
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
?
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
Distributed Storage
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
Distributed Storage
SQL Databases Key-value storage NoSQL Databases Coordination Systems Caches File Systems
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Distributed applications
Distributed Storage
SQL Databases Key-value storage NoSQL Databases Coordination Systems Caches File Systems File Systems Easy interoperability for existing aplications
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 2
Global infrastructure
Amazon’s AWS global infrastructure
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 3
CAP theorem
Weak Consistency Lower latency Higher availability Possibly incorrect/unexpected results Strong Consistency Clear semantics and guarantees Easier to reason about Block instead of providing incorrect results
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 4
What is GlobalFS?
Geographically distributed filesystem Familiar interface (POSIX) Strong consistency Fault-tolerance through replication Flexible performance through locality
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 5
Overall design
Separate data and metadata Partial replication Metadata protocol exploiting atomic multicast Causal reads
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 6
Separate data and metadata
Immutable data Variable sized blobs Metadata Controls file contents, properties and filesystem structure Metadata refers to data blobs
1 | 2 | 3 | 4 | …
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 7
Partial replication
Immutable data is simple to replicate consistently Metadata is partitioned between replica groups (i.e., partitions)
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 8
Partial replication
US SA EU
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 9
Partial replication
US SA EU
/ bin etc home www mark bob alice
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 10
Partial replication
US SA EU
US SA EU / bin etc home www mark bob alice
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 11
Partial replication
US SA EU
US SA EU
Global Replication
/ bin etc home www mark bob alice
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 12
Partial replication
US SA EU
US SA EU
Global Replication
/ bin etc home www mark bob alice
Local multicast
- fast updates
- local or remote reads
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 13
Partial replication
US SA EU
US SA EU
Global Replication
/ bin etc home www mark bob alice
Local multicast
- fast updates
- local or remote reads
Global multicast (global replication)
- costly updates
- fast local reads
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 14
Partial ordering
GlobalFS exploits atomic multicast Atomic delivery to groups of processes Partial ordering: messages for different groups don’t have to be
- rdered betweem themselves
Partial ordering is critical for scalability
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 15
Architecture
Application Client (FUSE)
Data store
Metadata replicas
Atomic multicast Send read or update commands Insert or fetch immutable data
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 16
Consistent update operations
Step 1 Write data blobs to data store Step 2 Issue a metadata update
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17
Consistent update operations
Step 1 Write data blobs to data store Step 2 Issue a metadata update Single-partition
G1 G2 Reply Req write to file in G1
Uncoordinated multi-partition
G1 G2 Reply Req write to file in {G1,G2}
Coordinated multi-partition
G1 G2 Reply Req move file from G1 to G2 Atomic Multicast Execution
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 17
Causal read operations
Causally related updates are seen in the same order e.g., operations done by the same client
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations
Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations
Causally related updates are seen in the same order e.g., operations done by the same client Client A Creates an image cat.jpg Modifies a page pets.html to include the image cat.jpg Client B Opens the pets.html page and finds a broken image reference Where is the cat?
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 18
Causal read operations
Step 1 Contact a metadata replica for a list of blob ids Step 2 Get the data from the data store Approach inspired by vector clocks Vector is composed of one counter per replica group
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 19
Evaluation
Complete prototype in Java https://github.com/pacheco/GlobalFS Filesystem in Userspace (FUSE) URingPaxos for atomic multicast Global deployment using Amazon EC2
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 20
Maximum throughput by operation
GlobalFS throughput 10000 20000 30000 40000 50000 60000 read 1KB local create 1KB local write 1KB Operations/sec
GlobalFS CalvinFS
200 400 600 800 1000 1200 1400 1600 1800
- glob. create 1KB
- glob. write 1KB
Locality
3 region deployment US west, US east and Europe
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 21
Geographical scalability
0.2 0.4 0.6 0.8 1 read 1KB create write 1KB Geographical Scalability
1 Region 3 Regions 6 Regions 9 Regions 1 6 8 1
- p
s 6 8 8 2
- p
s 3 7 2
- p
s
Ideal Normalized throughput per region as more regions are added 9 regions uses all EC2 regions available at the time
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 22
GlobalFS: Summary
Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23
GlobalFS: Summary
Strong consistency at global scale Simple and familiar API (POSIX) Flexible performance through partial replication and locality Cheap causal read operations
Thank you!
Leandro Pacheco pachecol@usi.ch
GlobalFS: A Strongly Consistent Multi-Site Filesystem - Leandro Pacheco 23