4/4/2014 Distributed File Systems • Early networking and files – Had FTP to transfer files – Telnet to remote login to other systems with files Distributed Computing Systems • But want more transparency! – local computing with remote file system • Distributed file systems � One of earliest distributed system components Distributed File Systems • Enables programs to access remote files as if local – Transparency • Allows sharing of data and programs • Performance and reliability comparable to local disk Outline Concepts of Distributed File System • Overview (done) • Transparency • Concurrent Updates • Basic principles (next) • Replication – Concepts • Fault Tolerance – Models • Consistency • Network File System (NFS) • Platform Independence • Andrew File System (AFS) • Security • Dropbox • Efficiency 1
4/4/2014 Transparency Concurrent Updates Illusion that all files are similar. Includes: • Changes to file from one client should not • Access transparency — a single set of operations. Clients interfere with changes from other clients that work on local files can work with remote files. • Location transparency — clients see a uniform name – Even if changes at same time space. Relocate without changing path names. • Solutions often include: • Mobility transparency —files can be moved without modifying programs or changing system tables – File or record-level locking • Performance transparency —within limits, local and remote file access meet performance standards • Scaling transparency —increased loads do not degrade performance significantly. Capacity can be expanded. 5 6 Replication Fault Tolerance • File may have several copies of its data at • Function when clients or servers fail different locations • Detect, report, and correct faults that occur – Often for performance reasons – Requires update other copies when one copy is • Solutions often include: changed – Redundant copies of data, redundant hardware, • Simple solution backups, transaction logs and other measures – Change master copy and periodically refresh the other copies – Stateless servers • More complicated solution – Idempotent operations – Multiple copies can be updated independently at same time needs finer grained refresh and/or merge 7 8 2
4/4/2014 Consistency Platform Independence • Data must always be complete, current, and correct • Access even though hardware and OS • File seen by one process looks the same for all completely different in design, architecture processes accessing and functioning, from different vendors • Consistency special concern whenever data is • Solutions often include: duplicated – Well-defined way for clients to communicate with • Solutions often include: servers – Timestamps and ownership information 9 10 Security Efficiency • File systems must be protected against • Overall, want same power and generality as unauthorized access, data corruption, loss and local file systems other threats • Early days, goal was to share “expensive” resource � the disk • Solutions include: – Access control mechanisms (ownership, • Now, allow convenient access to remotely permissions) stored files – Encryption of commands or data to prevent “sniffing” 11 12 3
4/4/2014 Outline File Service Models • Overview (done) Upload/Download Model Remote Access Model • Read file: copy file from server • File service provides functional • Basic principles (next) to client interface Write file: copy file from client • – Create, delete, read bytes, write – Concepts to server bytes, … • Good – Models • Good – Client only gets what’s needed – Simple • Network File System (NFS) – Server can manage coherent view • Bad of file system – Wasteful – what if client only • Andrew File System (AFS) • Bad needs small piece? – Possible server and network – Problematic – what if client congestion • Dropbox doesn’t have enough space? • Servers used for duration of access – Consistency – what if others • Same data may be requested need to modify file? repeatedly Semantics of File Service Accessing Remote Files (1 of 2) • For transparency, implement client as module Sequential Semantics Session Semantics under VFS Read returns result of last write Relax sequential rules • Easily achieved if • Changes to open file are – Only one server initially visible only to – Clients do not cache data process that modified it • But • Last process to modify file – Performance problems if no cache “wins” – Can instead write-through • Can hide or lock file under • Must notify clients holding modification from other copies • Requires extra state, generates clients extra traffic (Additional picture next slide) 4
4/4/2014 Accessing Remote Files (2 of 2) Stateful or Stateless Design Virtual file system allows for transparency Stateful Stateless Server maintains no information on Server maintains client-specific client accesses state • Each request must identify file and offsets • Shorter requests • Server can crash and recover • Better performance in – No state to lose processing requests • No open/close needed – They only establish state • Cache coherence possible • No server space used for state – Server can know who’s – Don’t worry about supporting many clients accessing what • Problems if file is deleted on • File locking possible server • File locking not possible Caching Concepts of Caching (1 of 2) Centralized control • Hide latency to improve performance for • Keep track of who has what open and cached on repeated accesses each node • Four places: • Stateful file system with signaling traffic – Server’s disk – Server’s buffer cache (memory) Read-ahead (pre-fetch) – Client’s buffer cache (memory) • Request chunks of data before needed – Client’s disk • Minimize wait when actually needed • Client caches risk cache consistency problems • But what if data pre-fetched is out of date? 5
4/4/2014 Concepts of Caching (2 of 2) Outline Write-through • Overview (done) • All writes to file sent to server – What if another client reads its own (out-of-date) cached copy? • Basic principles (done) • All accesses require checking with server • Or … server maintains state and sends invalidations • Network File System (NFS) (next) • Andrew File System (AFS) Delayed writes (write-behind) • Only send writes to files in batch mode (i.e., buffer locally) • Dropbox • One bulk write is more efficient than lots of little writes • Problem: semantics become ambiguous – Watch out for consistency – others won’t see updates! Write on close • Only allows session semantics • If lock, must lock whole file Network File System (NFS) NFS Overview • Introduced in 1984 (by Sun Microsystems) • Provides transparent access to remote files – Independent of OS (e.g., Mac, Linux, Windows) or • Not first made, but first to be used as product hardware • Symmetric – any computer can be server and client • Made interfaces in public domain – But many institutions have dedicated server – Allowed other vendors to produce • Export some or all files implementations • Must support diskless clients • Internet standard is NFS protocol (version 3) • Recovery from failure – Stateless, UDP, client retries – RFC 1913 • High performance • Still widely deployed, up to v4 but maybe too – Caching and read-ahead bloated so v3 widely used 6
4/4/2014 Underlying Transport Protocol NSF Protocols • Since clients and servers can be implemented for • Initially NSF ran over UDP using Sun RPC different platforms, need well-defined way to communicate � Protocol • Why UDP? – Protocol – agreed upon set of requests and responses – Slightly faster than TCP between client and servers • Once agreed upon, Apple implemented Mac NFS client – No connection to maintain (or lose) can talk to a Sun implemented Solaris NFS server – NFS is designed for Ethernet LAN • NFS has two main protocols • Relatively reliable – Mounting Protocol : Request access to exported directory tree – Error detection but no correction – Directory and File Access Protocol : Access files and • NFS retries requests directories (read, write, mkdir, readdir … ) NFS Architecture NFS Mounting Protocol • In many cases, on same LAN, but not required • Request permission to access contents at pathname – Can even have client-server on same machine • Client • Directories available on server through /etc/exports – Parses pathname – When client mounts, becomes part of directory hierarchy – Contacts server for file handle Server 1 Client Server 2 • Server (root) (root) (root) – Returns file handle: file device #, i-node #, instance # • Client export . . . vmunix usr nfs – Create in-memory VFS i-node at mount point – Internally point to r-node for remote files • Client keeps state, not server Remote Remote people students x staff users • Soft-mounted – if client access fails, throw error to mount mount processes. But many do not handle file errors well big jon bob . . . jim ann jane joe • Hard-mounted – client blocks processes, retries until server up (can cause problems when NFS server down) File system mounted at /usr/students is sub-tree located at /export/people in Server 1, and file system mounted at /usr/staff is sub-tree located at /nfs/users in Server 2 7
Recommend
More recommend