Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU - - PowerPoint PPT Presentation
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 11: Distributed File Systems Version: December 2, 2009 Contents Chapter 01: Introduction 02:
Contents
Chapter 01: Introduction 02: Architectures 03: Processes 04: Communication 05: Naming 06: Synchronization 07: Consistency & Replication 08: Fault Tolerance 09: Security 10: Distributed Object-Based Systems 11: Distributed File Systems 12: Distributed Web-Based Systems 13: Distributed Coordination-Based Systems
2 / 20
Distributed File Systems 11.1 Architecture
Distributed File Systems
General goal Try to make a file system transparently available to remote clients.
Client Filestays
- nserver
Server Requestsfrom clienttoaccess remotefile Client Server 1.Filemovedtoclient 3.Whenclientisdone, fileisreturnedto
- 2. Accessesare
doneonclient Oldfile Newfile
Remote access model Upload/download model
3 / 20
Distributed File Systems 11.1 Architecture
Example: NFS Architecture
NFS NFS is implemented using the Virtual File System abstraction, which is now used for lots of different operating systems.
Virtual file system (VFS) layer Virtual file system (VFS) layer System call layer System call layer NFS client RPC client stub RPC server stub NFS server Local file system interface Local file system interface Network Client Server
4 / 20
Distributed File Systems 11.1 Architecture
Example: NFS Architecture
Essence VFS provides standard file system interface, and allows to hide difference between accessing local or remote file system. Question Is NFS actually a file system?
5 / 20
Distributed File Systems 11.1 Architecture
NFS File Operations
Oper. v3 v4 Description Create Yes No Create a regular file Create No Yes Create a nonregular file Link Yes Yes Create a hard link to a file Symlink Yes No Create a symbolic link to a file Mkdir Yes No Create a subdirectory Mknod Yes No Create a special file Rename Yes Yes Change the name of a file Remove Yes Yes Remove a file from a file system Rmdir Yes No Remove an empty subdirectory Open No Yes Open a file Close No Yes Close a file Lookup Yes Yes Look up a file by means of a name Readdir Yes Yes Read the entries in a directory Readlink Yes Yes Read the path name in a symbolic link Getattr Yes Yes Get the attribute values for a file Setattr Yes Yes Set one or more file-attribute values Read Yes Yes Read the data contained in a file Write Yes Yes Write data to a file
6 / 20
Distributed File Systems 11.1 Architecture
Cluster-Based File Systems
Observation When dealing with very large data collections, following a simple client-server approach is not going to work ⇒ for speeding up file accesses, apply striping techniques by which files can be fetched in parallel
File block of file a File block of file e
a a a b b b c c c d d d e e e a c d e b a c e d b a b e c d
File-striped system Whole-file distribution
7 / 20
Distributed File Systems 11.1 Architecture
Example: Google File System
Chunk server Linux file system Chunk server Linux file system Chunk server Linux file system Master GFS client file name, chunk index contact address Chunk-server state Instructions Chunk ID, range Chunk data
The Google solution Divide files in large 64 MB chunks, and distribute/replicate chunks across many servers: The master maintains only a (file name, chunk server) table in main memory ⇒ minimal I/O Files are replicated using a primary-backup scheme; the master is kept
- ut of the loop
8 / 20
Distributed File Systems 11.1 Architecture
P2P-based File Systems
Chord DHash Ivy
- Chord
DHash Ivy
- Chord
DHash Ivy
- Network
- Node where a file system is rooted
File system layer Block-oriented storage DHT layer
Basic idea Store data blocks in the underlying P2P system: Every data block with content D is stored on a node with hash h(D). Allows for integrity check. Public-key blocks are signed with associated private key and looked up with public key. A local log of file operations to keep track of blockID,h(D) pairs.
9 / 20
Distributed File Systems 11.3 Communication
RPCs in File Systems
Observation Many (traditional) distributed file systems deploy remote procedure calls to access files. When wide-area networks need to be crossed, alternatives need to be exploited.
LOOKUP READ LOOKUP OPEN READ Lookup name Read file data Open file Lookup name Read file data (a) (b) Client Client Server Server Time Time
10 / 20
Distributed File Systems 11.3 Communication
Example: RPCs in Coda
Observation When dealing with replicated files, sequentially sending information is not the way to go.
Invalidate Invalidate Invalidate Invalidate Reply Reply Reply Reply Time Time Server Server Client Client Client Client (a) (b)
Note In Coda, clients can cache files, but will be informed when an update has been performed.
11 / 20
Distributed File Systems 11.5 Synchronization
File sharing semantics
Problem When dealing with distributed file systems, we need to take into account the ordering of concurrent read/write
- perations and expected semantics
(i.e., consistency).
Single machine
- 1. Write "c"
Original file a a a a a a b b b b b b c c Process A Process A Process B Process B
- 2. Read gets "abc"
- 1. Read "ab"
- 2. Write "c"
- 3. Read gets "ab"
Client machine #1 File server Client machine #2 (a) (b)
12 / 20
Distributed File Systems 11.5 Synchronization
File sharing semantics
Semantics UNIX semantics: a read operation returns the effect of the last write operation ⇒ can only be implemented for remote access models in which there is only a single copy of the file Transaction semantics: the file system supports transactions on a single file ⇒ issue is how to allow concurrent access to a physically distributed file Session semantics: the effects of read and write operations are seen only by the client that has opened (a local copy) of the file ⇒ what happens when a file is closed (only one client may actually win)
13 / 20
Distributed File Systems 11.5 Synchronization
Example: File sharing in Coda
Essence Coda assumes transactional semantics, but without the full-fledged capabilities of real transactions. Note: Transactional issues reappear in the form of “this ordering could have taken place.”
Time Server Client Client Open(RD) Open(WR) File f File f Close Close Invalidate Session S Session S
A B
14 / 20
Distributed File Systems 11.6 Consistency and Replication
Consistency and replication
Observation In modern distributed file systems, client-side caching is the preferred technique for attaining performance; server-side replication is done for fault tolerance. Observation Clients are allowed to keep (large parts of) a file, and will be notified when control is withdrawn ⇒ servers are now generally stateful
Client Server Old file Updated file Local copy
- 2. Server delegates file
- 3. Server recalls delegation
- 4. Client sends returns file
- 1. Client asks for file
15 / 20
Distributed File Systems 11.6 Consistency and Replication
Example: Client-side caching in Coda
Time Server Client A Client B Open(RD) Open(RD) Open(WR) Open(WR) File f File f File f Close Close Close Close Invalidate (callback break) OK (no file transfer) Session S Session S Session S Session S
A A B B
Note By making use of transactional semantics, it becomes possible to further improve performance.
16 / 20
Distributed File Systems 11.6 Consistency and Replication
Example: Server-side replication in Coda
Server S1 Server S2 Server S3 Client A Client B Broken network
Main issue Ensure that concurrent updates are detected: Each client has an Accessible Volume Storage Group (AVSG): is a subset of the actual VSG. Version vector CVVi(f)[j] = k ⇒ Si knows that Sj has seen version k of f. Example: A updates f ⇒ S1 = S2 = [+1,+1,+0]; B updates f ⇒ S3 = [+0,+0,+1].
17 / 20
Distributed File Systems 11.7 Fault Tolerance
Fault tolerance
Observation FT is handled by simply replicating file servers, generally using a standard primary-backup protocol:
Data store Primary server for item x Client Client Backup server
- W1. Write request
- W2. Forward request to primary
- W3. Tell backups to update
- W4. Acknowledge update
- W5. Acknowledge write completed
W1 W2 W3 W3 W3 W4 W4 W4 W5
- R1. Read request
- R2. Response to read
R1 R2
18 / 20
Distributed File Systems 11.7 Fault Tolerance
High availability in P2P systems
Problem There are many fully decentralized file-sharing systems, but because churn is high (i.e., nodes come and go all the time), we may face an availability problem ⇒ replicate files all over the place (replication factor: rrep). Alternative Apply erasure coding: Partition a file F into m fragments, and recode into a collection F ∗
- f n > m fragments
Property: any m fragments from F ∗ are sufficient to reconstruct F. Replication factor: rec = n/m
19 / 20
Distributed File Systems 11.7 Fault Tolerance
Replication vs. erasure coding
Comparison With an average node availability a, and required file unavailability ε, we have for erasure coding: 1−ε =
rec·m
∑
i=m
rec ·m i
- ai(1−a)rec·m−i
and for file replication: 1−ε = 1−(1−a)rrep
0.2 0.4 0.6 0.8 1 1.4 1.6 1.8 2.0 2.2 Node availability rrep
ec
r
20 / 20