Silberschatz and Galvin Chapter 17 Distributed File Systems CPSC - PDF document

Silberschatz and Galvin Chapter 17 Distributed File Systems CPSC 410--Richard Furuta 4/15/99 1 Distributed File Systems ¥ Naming and Transparency ¥ Remote File Access ¥ Stateful versus Stateless Service ¥ File Replication CPSC 410--Richard Furuta 4/15/99 2 1

Terminology ¥ Distributed file system (DFS): a distributed implementation of the classical time-sharing model of a file system, where multiple users share files and storage resources. Ð A DFS manages sets of dispersed storage devices. Ð Overall storage space managed by a DFS is composed of different, remotely located, smaller storage spaces. Ð There is usually a correspondence between constituent storage spaces and sets of files. CPSC 410--Richard Furuta 4/15/99 3 Terminology ¥ Service Ð software entity running on one or more machines and providing a particular type of function to a priori unknown clients. ¥ Server Ð service software running on a single machine. ¥ Client Ð process that can invoke a service using a set of operations that forms its client interface . Ð A client interface for a file service is formed by a set of primitive file operations (create, delete, read, write). Ð Client interface of a DFS should be transparent, i.e., not distinguish between local and remote files. ¥ Key performance measure: time to satisfy service requests CPSC 410--Richard Furuta 4/15/99 4 2

Naming and Transparency ¥ Naming Ð mapping between logical and physical objects. Ð Example: file names versus physical blocks of data stored on data tracks ¥ Multilevel mapping Ð abstraction of a file that hides the details of how and where on the disk the file is actually stored. ¥ A transparent DFS hides the location where in the network the file is stored. Ð For a file being replicated in several sites, the mapping returns a set of the locations of this fileÕs replicas; both the existence of multiple copies and their location are hidden. CPSC 410--Richard Furuta 4/15/99 5 Naming Structures ¥ Location transparency Ð file name does not reveal the fileÕs physical storage location. Ð File name still denotes a specific, although hidden, set of physical disk blocks. Ð Convenient way to share data. Ð Can expose correspondence between component units and machines. CPSC 410--Richard Furuta 4/15/99 6 3

Naming Structures ¥ Location independence Ð file name does not need to be changed when the fileÕs physical storage location changes. Ð Better file abstraction. Ð Promotes sharing the storage space itself. Ð Separates the naming hierarchy from the storage-devices hierarchy. CPSC 410--Richard Furuta 4/15/99 7 Naming Structures ¥ Location independence can map same file name to different locations at different times ¥ Location independence is a stronger property than is location transparency ¥ However most current DFSs provide location transparency but not file migration; hence location independence is not relevant CPSC 410--Richard Furuta 4/15/99 8 4

Naming Structures ¥ Separation of name and location enables diskless clients Ð rely on servers to provide all files, including the operating system kernel Ð booting requires boot protocol, stored in ROM, and the kernel or boot code stored in a fixed location Ð diskless client advantages: lower cost (diminishing return with lower cost disks), less noise, easier to upgrade OS (update server copy) Ð diskless client disadvantages: added complexity of local protocols; performance loss resulting from use of network, rather than disk. CPSC 410--Richard Furuta 4/15/99 9 Naming Schemes ¥ Three main approaches to naming Ð host name, local name combination Ð attaching remote directories to local directories Ð single global name structure CPSC 410--Richard Furuta 4/15/99 10 5

Naming Schemes: host name/local name ¥ Files named by a combination of their host name and local name ¥ Guarantees a unique system-wide name ¥ Example (as in rcp): host:localname Ð dilbert:myfile.txt Ð dilbert:/etc/hosts CPSC 410--Richard Furuta 4/15/99 11 Naming Schemes: attach remote directory to local ¥ Gives the appearance of a coherent directory tree ¥ Automount feature Ð mounts occur on-demand based on a table of mount points and file structure names Ð previously, remote directories had to be mounted in advance Ð examples include NFS Ð issues: what to do if remote directory is (or becomes) inaccessible? Which machines are allowed to mount directory? CPSC 410--Richard Furuta 4/15/99 12 6

Naming Schemes: total integration ¥ A single global name structure spans all the files in the system. ¥ If a server is unavailable; some arbitrary set of directories on different machines also becomes unavailable. ¥ Special files (e.g., device files and other machine specific files) make true isomorphism difficult CPSC 410--Richard Furuta 4/15/99 13 Remote File Access ¥ Remote-service mechanism to satisfy user requests for access to remote files. ¥ Analogy between remote service in a DFS (perhaps implemented by RPC) and local service Ð remote service method analogous to performing a disk access for each access request ¥ Caching: improve performance by reducing both network traffic and also disk I/O CPSC 410--Richard Furuta 4/15/99 14 7

Remote File Access Caching ¥ Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally. Ð If needed data not already cached, a copy of data is brought from the server to the user. Ð Accesses are performed on the cached copy. Ð Replacement policy keeps cache size bounded. Ð Files identified with one master copy residing at the server machine, but copies of (parts of) the file are scattered in different caches. CPSC 410--Richard Furuta 4/15/99 15 Remote File Access: Caching ¥ Cache-consistency problem Ð keeping the cached copies consistent with the master file. CPSC 410--Richard Furuta 4/15/99 16 8

Remote File Access: Cache Location ¥ Cached data can be stored on disk or in memory. ¥ In practice, though, many are hybrids. ¥ Advantages of disk caches Ð More reliable. Ð Cached data kept on disk are still there during recovery and donÕt need to be fetched again. CPSC 410--Richard Furuta 4/15/99 17 Remote File Access: Cache Location ¥ Advantages of main-memory caches: Ð Permit workstations to be diskless. Ð Data can be accessed more quickly. Ð Performance speedup in bigger memories. Ð Server caches (used to speed up disk I/O) are in main memory regardless of where user caches are located; using main-memory caches on the user machine permits a single caching mechanism for servers and users since server caches (e.g., to speed up disk I/O) will be in main memory. CPSC 410--Richard Furuta 4/15/99 18 9

Remote File Access: Cache Update Policy ¥ Write-through Ð write data through to disk as soon as they are placed on any cache. Reliable, but poor performance. ¥ Delayed-write Ð modifications written to the cache and then written through to the server later. Write accesses complete quickly; some data may be overwritten before they are written back, and so need never be written at all. Ð Poor reliability; unwritten data will be lost if a user machine crashes Ð Variation Ð write modified data blocks when ejecting from clientÕs cache. However, some blocks may reside in cache a long time. Ð Variation Ð scan cache at regular intervals and flush blocks that have been modified since the last scan. Ð Variation Ð write-on-close , writes data back to the server when the file is closed. Best for files that are open for long periods and frequently modified. CPSC 410--Richard Furuta 4/15/99 19 Remote File Access: Consistency ¥ Is locally cached copy of the data consistent with the master copy? ¥ Client-initiated approach Ð Client initiates a validity check. Ð Server checks whether the local data are consistent with the master copy. Ð May load network and server. ¥ Server-initiated approach Ð Server records, for each client, the (parts of) files it caches. Ð When server detects a potential inconsistency, it must react (for example, notification) CPSC 410--Richard Furuta 4/15/99 20 10

Remote File Access: Comparing Caching and Remote Service ¥ In caching, many remote accesses handled efficiently by the local cache; most remote accesses will be served as fast as local ones. ¥ Servers are contacted only occasionally in caching (rather than for each access). Ð Reduces server load and network traffic. Ð Enhances potential for scalability. ¥ Remote server method handles every remote access across the network; penalty in network traffic, server load, and performance. CPSC 410--Richard Furuta 4/15/99 21 Remote File Access: Comparing Caching and Remote Service ¥ Total network overhead in transmitting big chunks of data (caching) is lower than a series of responses to specific requests (remote-service). ¥ Caching is superior in access patterns with infrequent writes. ¥ With frequent writes, substantial overhead incurred to overcome cache-consistency problem. CPSC 410--Richard Furuta 4/15/99 22 11

Silberschatz and Galvin Chapter 17 Distributed File Systems CPSC - PDF document

Silberschatz and Galvin Chapter 17 Distributed File Systems CPSC 410--Richard Furuta 4/15/99 1 Distributed File Systems Naming and Transparency Remote File Access Stateful versus Stateless Service File Replication CPSC

Silberschatz and Galvin Chapter 4 Processes CPSC 410--Richard Furuta 01/19/99 1 Chapter

Silberschatz and Galvin Chapter 1 Introduction CPSC 410--Richard Furuta 01/19/99 1 Chapter

Silberschatz and Galvin Chapter 15 Network Structures CPSC 410--Richard Furuta 3/30/99 1

Chapter 20 Silberschatz and Galvin Security Protection and Security Protection is a strictly

Silberschatz and Galvin Chapter 2 Computer-System Structures CPSC 410-501 Fall 1994 01/19/99 1

Silberschatz and Galvin Chapter 5 CPU Scheduling CPSC 410--Richard Furuta 01/19/99 1 Topics

Silberschatz and Galvin Chapter 3 Operating System Structures CPSC 410-Richard Furuta 01/19/99

Silberschatz and Galvin Chapter 10 File-System Interface CPSC 410--Richard Furuta 2/26/99 1

Silberschatz and Galvin Chapter 12 I/O Systems CPSC 410--Richard Furuta 3/19/99 1 Topic

Silberschatz and Galvin Chapter 16 Distributed System Structures CPSC 410--Richard Furuta

Silberschatz and Galvin Chapter 7 Deadlocks CPSC 410--Richard Furuta 2/26/99 1 Deadlocks

Silberschatz and Galvin Chapter 14 Tertiary Storage Structure CPSC 410--Richard Furuta 3/29/99

Silberschatz and Galvin Chapter 6 Process Synchronization CPSC 410--Richard Furuta 2/26/99 1

Silberschatz and Galvin Chapter 13 Secondary Storage Structure CPSC 410--Richard Furuta 3/30/99

Silberschatz and Galvin Chapter 11 File System Implementation CPSC 410--Richard Furuta 4/28/99

Silberschatz and Galvin Chapter 19 Protection CPSC 410--Richard Furuta 4/26/99 1 Protection

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos

Module 17: Distributed-File Systems Background Naming and Transparency Remote File

Rules for Geospatial Semantic W eb Applications Harry Chen, Stephane Fellah, Yaser Bishr

Modeling and Performance Evaluation for the Least Recently Used Cache Eviction Policy Modern

Fluxo Improving the Responsiveness of Internet Services with Automa7c Cache Placement Alexander

Distributed Systems - II Delayed-write modifications written to the cache and then

From FIFO to Predictive Cache Replacement Daniel Meint advised by Stefan Liebald Thursday 11 th

Measuring the on-lineness of data streams Manfred K. Warmuth Jiazhong Nie University of

Sambuz

Useful Links

Newsletter

Mail Us