distributed file systems
play

Distributed File Systems Early networking and files Had FTP to - PowerPoint PPT Presentation

4/4/2014 Distributed File Systems Early networking and files Had FTP to transfer files Telnet to remote login to other systems with files Distributed Computing Systems But want more transparency! local computing with remote


  1. 4/4/2014 Distributed File Systems • Early networking and files – Had FTP to transfer files – Telnet to remote login to other systems with files Distributed Computing Systems • But want more transparency! – local computing with remote file system • Distributed file systems � One of earliest distributed system components Distributed File Systems • Enables programs to access remote files as if local – Transparency • Allows sharing of data and programs • Performance and reliability comparable to local disk Outline Concepts of Distributed File System • Overview (done) • Transparency • Concurrent Updates • Basic principles (next) • Replication – Concepts • Fault Tolerance – Models • Consistency • Network File System (NFS) • Platform Independence • Andrew File System (AFS) • Security • Dropbox • Efficiency 1

  2. 4/4/2014 Transparency Concurrent Updates Illusion that all files are similar. Includes: • Changes to file from one client should not • Access transparency — a single set of operations. Clients interfere with changes from other clients that work on local files can work with remote files. • Location transparency — clients see a uniform name – Even if changes at same time space. Relocate without changing path names. • Solutions often include: • Mobility transparency —files can be moved without modifying programs or changing system tables – File or record-level locking • Performance transparency —within limits, local and remote file access meet performance standards • Scaling transparency —increased loads do not degrade performance significantly. Capacity can be expanded. 5 6 Replication Fault Tolerance • File may have several copies of its data at • Function when clients or servers fail different locations • Detect, report, and correct faults that occur – Often for performance reasons – Requires update other copies when one copy is • Solutions often include: changed – Redundant copies of data, redundant hardware, • Simple solution backups, transaction logs and other measures – Change master copy and periodically refresh the other copies – Stateless servers • More complicated solution – Idempotent operations – Multiple copies can be updated independently at same time needs finer grained refresh and/or merge 7 8 2

  3. 4/4/2014 Consistency Platform Independence • Data must always be complete, current, and correct • Access even though hardware and OS • File seen by one process looks the same for all completely different in design, architecture processes accessing and functioning, from different vendors • Consistency special concern whenever data is • Solutions often include: duplicated – Well-defined way for clients to communicate with • Solutions often include: servers – Timestamps and ownership information 9 10 Security Efficiency • File systems must be protected against • Overall, want same power and generality as unauthorized access, data corruption, loss and local file systems other threats • Early days, goal was to share “expensive” resource � the disk • Solutions include: – Access control mechanisms (ownership, • Now, allow convenient access to remotely permissions) stored files – Encryption of commands or data to prevent “sniffing” 11 12 3

  4. 4/4/2014 Outline File Service Models • Overview (done) Upload/Download Model Remote Access Model • Read file: copy file from server • File service provides functional • Basic principles (next) to client interface Write file: copy file from client • – Create, delete, read bytes, write – Concepts to server bytes, … • Good – Models • Good – Client only gets what’s needed – Simple • Network File System (NFS) – Server can manage coherent view • Bad of file system – Wasteful – what if client only • Andrew File System (AFS) • Bad needs small piece? – Possible server and network – Problematic – what if client congestion • Dropbox doesn’t have enough space? • Servers used for duration of access – Consistency – what if others • Same data may be requested need to modify file? repeatedly Semantics of File Service Accessing Remote Files (1 of 2) • For transparency, implement client as module Sequential Semantics Session Semantics under VFS Read returns result of last write Relax sequential rules • Easily achieved if • Changes to open file are – Only one server initially visible only to – Clients do not cache data process that modified it • But • Last process to modify file – Performance problems if no cache “wins” – Can instead write-through • Can hide or lock file under • Must notify clients holding modification from other copies • Requires extra state, generates clients extra traffic (Additional picture next slide) 4

  5. 4/4/2014 Accessing Remote Files (2 of 2) Stateful or Stateless Design Virtual file system allows for transparency Stateful Stateless Server maintains no information on Server maintains client-specific client accesses state • Each request must identify file and offsets • Shorter requests • Server can crash and recover • Better performance in – No state to lose processing requests • No open/close needed – They only establish state • Cache coherence possible • No server space used for state – Server can know who’s – Don’t worry about supporting many clients accessing what • Problems if file is deleted on • File locking possible server • File locking not possible Caching Concepts of Caching (1 of 2) Centralized control • Hide latency to improve performance for • Keep track of who has what open and cached on repeated accesses each node • Four places: • Stateful file system with signaling traffic – Server’s disk – Server’s buffer cache (memory) Read-ahead (pre-fetch) – Client’s buffer cache (memory) • Request chunks of data before needed – Client’s disk • Minimize wait when actually needed • Client caches risk cache consistency problems • But what if data pre-fetched is out of date? 5

  6. 4/4/2014 Concepts of Caching (2 of 2) Outline Write-through • Overview (done) • All writes to file sent to server – What if another client reads its own (out-of-date) cached copy? • Basic principles (done) • All accesses require checking with server • Or … server maintains state and sends invalidations • Network File System (NFS) (next) • Andrew File System (AFS) Delayed writes (write-behind) • Only send writes to files in batch mode (i.e., buffer locally) • Dropbox • One bulk write is more efficient than lots of little writes • Problem: semantics become ambiguous – Watch out for consistency – others won’t see updates! Write on close • Only allows session semantics • If lock, must lock whole file Network File System (NFS) NFS Overview • Introduced in 1984 (by Sun Microsystems) • Provides transparent access to remote files – Independent of OS (e.g., Mac, Linux, Windows) or • Not first made, but first to be used as product hardware • Symmetric – any computer can be server and client • Made interfaces in public domain – But many institutions have dedicated server – Allowed other vendors to produce • Export some or all files implementations • Must support diskless clients • Internet standard is NFS protocol (version 3) • Recovery from failure – Stateless, UDP, client retries – RFC 1913 • High performance • Still widely deployed, up to v4 but maybe too – Caching and read-ahead bloated so v3 widely used 6

  7. 4/4/2014 Underlying Transport Protocol NSF Protocols • Since clients and servers can be implemented for • Initially NSF ran over UDP using Sun RPC different platforms, need well-defined way to communicate � Protocol • Why UDP? – Protocol – agreed upon set of requests and responses – Slightly faster than TCP between client and servers • Once agreed upon, Apple implemented Mac NFS client – No connection to maintain (or lose) can talk to a Sun implemented Solaris NFS server – NFS is designed for Ethernet LAN • NFS has two main protocols • Relatively reliable – Mounting Protocol : Request access to exported directory tree – Error detection but no correction – Directory and File Access Protocol : Access files and • NFS retries requests directories (read, write, mkdir, readdir … ) NFS Architecture NFS Mounting Protocol • In many cases, on same LAN, but not required • Request permission to access contents at pathname – Can even have client-server on same machine • Client • Directories available on server through /etc/exports – Parses pathname – When client mounts, becomes part of directory hierarchy – Contacts server for file handle Server 1 Client Server 2 • Server (root) (root) (root) – Returns file handle: file device #, i-node #, instance # • Client export . . . vmunix usr nfs – Create in-memory VFS i-node at mount point – Internally point to r-node for remote files • Client keeps state, not server Remote Remote people students x staff users • Soft-mounted – if client access fails, throw error to mount mount processes. But many do not handle file errors well big jon bob . . . jim ann jane joe • Hard-mounted – client blocks processes, retries until server up (can cause problems when NFS server down) File system mounted at /usr/students is sub-tree located at /export/people in Server 1, and file system mounted at /usr/staff is sub-tree located at /nfs/users in Server 2 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend