Hard State Revisited: Network Filesystems Hard State Revisited: - PowerPoint PPT Presentation

Hard State Revisited: Network Filesystems Hard State Revisited: Network Filesystems Jeff Chase CPS 212, Fall 2000

Network File System (NFS) Network File System (NFS) server client syscall layer user programs VFS syscall layer NFS VFS server *FS NFS *FS client RPC over UDP or TCP

NFS Vnodes NFS Vnodes The NFS protocol has an operation type for (almost) every vnode operation, with similar arguments/results. struct nfsnode* np = VTONFS(vp); syscall layer VFS nfs_vnodeops NFS NFS client stubs server nfsnode RPC *FS network The nfsnode holds client state needed to interact with the server to operate on the file.

File Handles File Handles Question: how does the client tell the server which file or directory the operation applies to? • Similarly, how does the server return the result of a lookup ? More generally, how to pass a pointer or an object reference as an argument/result of an RPC call? In NFS, the reference is a file handle or fhandle , a token/ticket whose value is determined by the server. • Includes all information needed to identify the file/object on the server, and find it quickly. volume ID inode # generation #

NFS: From Concept to Implementation NFS: From Concept to Implementation Now that we understand the basics, how do we make it fast? • caching data blocks file attributes lookup cache (dnlc) : name->fhandle mappings directory contents? • read-ahead and write-behind file I/O at wire speed And of course we want the full range of other desirable “*ility” properties....

NFS as a “Stateless” Service NFS as a “Stateless” Service A classical NFS server maintains no in-memory hard state. The only hard state is the stable file system image on disk. • no record of clients or open files • no implicit arguments to requests E.g., no server-maintained file offsets : read and write requests must explicitly transmit the byte offset for each operation. • no write-back caching on the server • no record of recently processed requests • etc., etc.... Statelessness makes failure recovery simple and efficient.

Recovery in Stateless NFS Recovery in Stateless NFS If the server fails and restarts, there is no need to rebuild in- memory state on the server. • Client reestablishes contact (e.g., TCP connection). • Client retransmits pending requests. Classical NFS uses a connectionless transport (UDP). • Server failure is transparent to the client; no connection to break or reestablish. A crashed server is indistinguishable from a slow server. • Sun/ONC RPC masks network errors by retransmitting a request after an adaptive timeout. A dropped packet is indistinguishable from a crashed server.

Drawbacks of a Stateless Service Drawbacks of a Stateless Service The stateless nature of classical NFS has compelling design advantages (simplicity), but also some key drawbacks: • Recovery-by-retransmission constrains the server interface. ONC RPC/UDP has execute-at-least-once semantics (“send and pray”), which compromises performance and correctness. • Update operations are disk-limited. Updates must commit synchronously at the server. • NFS cannot (quite) preserve local single-copy semantics. Files may be removed while they are open on the client. Server cannot help in client cache consistency. Let’s explore these problems and their solutions...

Problem 1: Retransmissions and Idempotency Problem 1: Retransmissions and Idempotency For a connectionless RPC transport, retransmissions can saturate an overloaded server. Clients “kick ‘em while they’re down”, causing steep hockey stick. Execute-at-least-once constrains the server interface. • Service operations should/must be idempotent. Multiple executions should/must have the same effect. • Idempotent operations cannot capture the full semantics we expect from our file system. remove, append-mode writes, exclusive create

Solutions to the Retransmission Problem Solutions to the Retransmission Problem 1. Hope for the best and smooth over non-idempotent requests. E.g., map ENOENT and EEXIST to ESUCCESS. 2. Use TCP or some other transport protocol that produces reliable, in-order delivery. higher overhead...and we still need sessions. 3. Implement an execute-at-most once RPC transport. TCP-like features (sequence numbers)...and sessions. 4. Keep a retransmission cache on the server [Juszczak90] . Remember the most recent request IDs and their results, and just resend the result....does this violate statelessness? DAFS persistent session cache.

Problem 2: Synchronous Writes Problem 2: Synchronous Writes Stateless NFS servers must commit each operation to stable storage before responding to the client. • Interferes with FS optimizations, e.g., clustering, LFS, and disk write ordering (seek scheduling). Damages bandwidth and scalability. • Imposes disk access latency for each request. Not so bad for a logged write; much worse for a complex operation like an FFS file write. The synchronous update problem occurs for any storage service with reliable update ( commit ).

Speeding Up Synchronous NFS Writes Speeding Up Synchronous NFS Writes Interesting solutions to the synchronous write problem, used in high-performance NFS servers: • Delay the response until convenient for the server. E.g., NFS write-gathering optimizations for clustered writes (similar to group commit in databases). Relies on write-behind from NFS I/O daemons ( iods ). • Throw hardware at it: non-volatile memory (NVRAM) Battery-backed RAM or UPS (uninterruptible power supply). Use as an operation log (Network Appliance WAFL)... ...or as a non-volatile disk write buffer (Legato). • Replicate server and buffer in memory (e.g., MIT Harp).

NFS V3 Asynchronous Writes NFS V3 Asynchronous Writes NFS V3 sidesteps the synchronous write problem by adding a new asynchronous write operation. • Server may reply to client as soon as it accepts the write, before executing/committing it. If the server fails, it may discard any subset of the accepted but uncommitted writes. • Client holds asynchronously written data in its cache, and reissues the writes if the server fails and restarts. When is it safe for the client to discard its buffered writes? How can the client tell if the server has failed?

NFS V3 Commit NFS V3 Commit NFS V3 adds a new commit operation to go with async-write. • Client may issue a commit for a file byte range at any time. • Server must execute all covered uncommitted writes before replying to the commit. • When the client receives the reply, it may safely discard any buffered writes covered by the commit. • Server returns a verifier with every reply to an async write or commit request. The verifier is just an integer that is guaranteed to change if the server restarts, and to never change back. • What if the client crashes?

Problem 3: File Cache Consistency Problem 3: File Cache Consistency Problem: Concurrent write sharing of files. Contrast with read sharing or sequential write sharing . Solutions: • Timestamp invalidation (NFS). Timestamp each cache entry, and periodically query the server: “has this file changed since time t ?”; invalidate cache if stale. • Callback invalidation (AFS, Sprite, Spritely NFS). Request notification (callback) from the server if the file changes; invalidate cache and/or disable caching on callback. • Leases (NQ-NFS) [Gray&Cheriton89,Macklem93,NFS V4] • Later: distributed shared memory

File Cache Example: NQ- -NFS Leases NFS Leases File Cache Example: NQ In NQ-NFS, a client obtains a lease on the file that permits the client’s desired read/write activity. “A lease is a ticket permitting an activity; the lease is valid until some expiration time.” • A read-caching lease allows the client to cache clean data. Guarantee : no other client is modifying the file. • A write-caching lease allows the client to buffer modified data for the file. Guarantee : no other client has the file cached. Allows delayed writes : client may delay issuing writes to improve write performance (i.e., client has a writeback cache).

Using NQ- -NFS Leases NFS Leases Using NQ 1. Client NFS piggybacks lease requests for a given file on I/O operation requests (e.g., read/write). NQ-NFS leases are implicit and distinct from file locking. 2. The server determines if it can safely grant the request, i.e., does it conflict with a lease held by another client. read leases may be granted simultaneously to multiple clients write leases are granted exclusively to a single client 3. If a conflict exists, the server may send an eviction notice to the holder of the conflicting lease. If a client is evicted from a write lease, it must write back. Grace period : server grants extensions while the client writes. Client sends vacated notice when all writes are complete.

NQ- -NFS Lease Recovery NFS Lease Recovery NQ Key point: the bounded lease term simplifies recovery. • Before a lease expires, the client must renew the lease. • What if a client fails while holding a lease? Server waits until the lease expires, then unilaterally reclaims the lease; client forgets all about it. If a client fails while writing on an eviction, server waits for write slack time before granting conflicting lease. • What if the server fails while there are outstanding leases? Wait for lease period + clock skew before issuing new leases. • Recovering server must absorb lease renewal requests and/or writes for vacated leases.

Hard State Revisited: Network Filesystems Hard State Revisited: - PowerPoint PPT Presentation

Hard State Revisited: Network Filesystems Hard State Revisited: Network Filesystems Jeff Chase CPS 212, Fall 2000 Network File System (NFS) Network File System (NFS) server client syscall layer user programs VFS syscall layer NFS VFS

This time we'll talk about filesystems. We'll start out by looking at disk partitions, which are

Introduction Introduction to storage and to storage and filesystems filesystems Introduction

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

Block Devices, Filesystems And Block Layer Alignment Christoph Anton Mitterer

Problem-solving revisited Problem-solving revisited David Lim (District Judge / Mediator) State

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Environmental Acquisition Revisited Richard Cobbe and Matthias Felleisen Northeastern University

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Rigidity Rigidity Symptoms of Poor Design (revisited) 1. Rigidity 1. Rigidity The design

Filesystems CC BY-SA 2015 Nate Levesque What is a filesystem? How your operating system stores

HydroCare HC-44 HydroCare HC-44 Hard Water Problems Hard Water Problems Hard Water Costs You

6/18/2018 When Family Life Gets Hard 1 6/18/2018 When Family Life Gets Hard God

Formalising Filesystems in the ACL2 Theorem Prover An Application To FAT32 Mihir Mehta

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

Verifying filesystems in ACL2 Towards verifying file recovery tools Mihir Mehta Department of

ShieldFS: The Last Word in Ransomware Resilient Filesystems Andrea Continella , Alessandro

IOFSL Scalable HEC I/O Forwarding Layer Rob Ross, Pete Beckman, Dries Kimpe, Kamil Iskra

Fork-exec model Server Architecture Models Operating Systems Hebrew University Spring 2004

SCNP: A protocol for automatic, decentralized and scalable IP network configuration T. Delaet

stateless analysis of a cryptographic protocol emina torlak february 22, 2005 authentication

Stateless Systems, Factory Reset, Golden Master Systems and systemd LinuxCon Europe, Duesseldorf

Wednesday, March 7, 2012 Our games all look the same Flash client Backend

Internet Systems Programming NFS: Protocols, Programming, and Implementation Erez Zadok

Development of Web Applications Principles and Practice Vincent Simonet, 2013-2014 Universit

Sambuz

Useful Links

Newsletter

Mail Us