1 WAFL: A Closer Look WAFL: A Closer Look Snapshots Snapshots - PDF document

Summary of Issues for File Systems Summary of Issues for File Systems 1. Buffering disk data for access from the processor. block I/O (DMA) must use aligned, physically resident buffers block update is a read-modify-write FS Consistency, Block Allocation, and WAFL FS Consistency, Block Allocation, and WAFL 2. Creating/representing/destroying independent files. disk block allocation, file block map structures directories and symbolic naming 3. Masking the high seek/rotational latency of disk access. smart block allocation on disk block caching, read-ahead (prefetching), and write-behind 4. Reliability and the handling of updates. Rotational Media Rotational Media The Problem of Disk Layout The Problem of Disk Layout The level of indirection in the file block maps allows Track Sector flexibility in file layout. Arm “File system design is 99% block allocation.” [McVoy] Competing goals for block allocation: Cylinder Platter • allocation cost Head • bandwidth for high-volume transfers • stamina Access time = seek time + rotational delay + transfer time • efficient directory operations seek time = 5-15 milliseconds to move the disk arm and settle on a cylinder rotational delay = 8 milliseconds for full rotation at 7200 RPM: average delay = 4 ms Goal: reduce disk arm movement and seek overhead. transfer time = 1 millisecond for an 8KB block at 8 MB/s metric of merit: bandwidth utilization (or effective bandwidth ) Bandwidth utilization is less than 50% for any noncontiguous acc ess at a block grain. FFS and LFS FFS and LFS WAFL: High-Level View WAFL: High Level View The whole on-disk We will study two different approaches to block allocation: Fixed location file system layout is • Cylinder groups in the Fast File System (FFS) [McKusick81] a tree of blocks. clustering enhancements [McVoy91], and improved cluster Everything else: write anywhere . allocation [McKusick: Smith/Seltzer96] FFS can also be extended with metadata logging [e.g., Episode] User data • Log-Structured File System (LFS) proposed in [Douglis/Ousterhout90] implemented/studied in [Rosenblum91] BSD port, sort of maybe: [Seltzer93] extended with self-tuning methods [Neefe/Anderson97] • Other approach: extent-based file systems Allocation maps 1

WAFL: A Closer Look WAFL: A Closer Look Snapshots Snapshots “WAFL’s primary distinguishing characteristic is Snapshots, which are readonly copies of the entire file system.” This was really the origin of the idea of a point-in-time copy for the file server market. What is this idea good for? Snapshots Snapshots Shadowing Shadowing Shadowing is the basic technique for doing an atomic force . reminiscent of copy-on-write 1. starting point 3. overwrite block map 2. write new blocks to disk ( atomic commit ) modify purple/grey blocks prepare new block map and free old blocks The snapshot mechanism is used for user-accessible snapshots and for transient consistency points. Frequent problems: nonsequentialdisk writes, damages clustered allocation on disk. How does WAFL deal with this? How is this like a fork ? WAFL Consistency Points WAFL Consistency Points The Problem of Metadata Updates The Problem of Metadata Updates Metadata updates are a second source of FFS seek overhead. “WAFL uses Snapshots internally so that it can restart quickly even after an unclean system shutdown.” • Metadata writes are poorly localized. E.g., extending a file requires writes to the inode, direct and “A consistency point is a completely self-consistent image of the indirect blocks, cylinder group bit maps and summaries, and entire file system. When WAFL restarts, it simply reverts to the the file block itself. most recent consistency point.” Metadata writes can be delayed, but this incurs a higher risk • Buffer dirty data in memory (delayed writes) and write new of file system corruption in a crash. consistency points as an atomic batch ( force ). • If you lose your metadata, you are dead in the water. • A consistency point transitions the FS from one self- • FFS schedules metadata block writes carefully to limit the consistent state to another. kinds of inconsistencies that can occur. • Combine with NFS operation log in NVRAM Some metadata updates must be synchronous on controllers that What if NVRAM fails? don’t respect order of writes. 2

FFS Failure Recovery FFS Failure Recovery Alternative: Logging and Journaling Alternative: Logging and Journaling Logging can be used to localize synchronous metadata writes, FFS uses a two-pronged approach to handling failures: and reduce the work that must be done on recovery. 1. Carefully order metadata updates to ensure that no dangling Universally used in database systems. references can exist on disk after a failure. Used for metadata writes in journaling file systems (e.g., Episode). • Never recycle a resource (block or inode) before zeroing all Key idea: group each set of related updates into a single log pointers to it ( truncate, unlink, rmdir ). record that can be written to disk atomically (“all-or-nothing”). • Never point to a structure before it has been initialized. • Log records are written to the log file or log disk sequentially . E.g., sync inode on creat before filling directory entry, and sync a new block before writing the block map. No seeks, and preserves temporal ordering. • Each log record is trailed by a marker (e.g., checksum) that says 2. Run a file system scavenger ( fsck ) to fix other problems. “this log record is complete”. Free blocks and inodesthat are not referenced. • To recover, scan the log and reapply updates. Fsckwill never encounter a dangling reference or double allocation. Metadata Logging Metadata Logging The Nub of WAFL The Nub of WAFL Here’s one approach to building a fast filesystem: WAFL’s consistency points allow it to buffer writes and push them out in a batch. 1. Start with FFS with clustering. • Deferred, clustered allocation 2. Make all metadata writes asynchronous. But , that approach cannot survive a failure, so: • Batch writes • Localize writes 3. Add a supplementary log for modified metadata. 4. When metadata changes, write new versions immediately to Indirection through the metadata “tree” allows it to write data the log, in addition to the asynchronous writes to “home”. wherever convenient: the tree can point anywhere. 5. If the system crashes, recover by scanning the log. • Maximize the benefits from batching writes in consistency Much faster than scavenging ( fsck) for large volumes. points. • Also allow multiple copies of a given piece of metadata, for 6. If the system does not crash, then discard the log. snapshots. SnapMirror SnapMirror Mirroring Mirroring Is it research? Structural issue: build mirroring support at: • Application level What makes it interesting/elegant? • FS level What are the tech trends that motivate SnapMirror, and WAFL before it? • Block storage level (e.g., RAID unit) Why is disaster recovery so important now? Who has the information? • What has changed? How does WAFL make mirroring easier? • What has been deallocated? If a mirror fails, what is lost? Can both mirrors operate at the same time? 3

What Has Changed? What Has Changed? Details Details Given a snapshot X, WAFL can ask: is block B allocated in SnapMirror names disk blocks: why? What are the snapshot X? implications? Given a snapshot X and a later snapshot Y, WAFL can ask: What if a mirror fails? What is lost? How to keep the mirror what blocks of Y should be sent to the mirror? self-consistent? How does the no-overwrite policy of WAFL help in Y SnapMirror? 1 0 What is the strengths/weaknesses with implementing this 0 added unused functionality above or below the file system? X Does this conclusion depend on other details of WAFL? deleted unchanged What can we conclude from the experiments? 1 FFS Cylinder Groups FFS Cylinder Groups FFS Allocation Policies FFS Allocation Policies 1. Allocate file inodes close to their containing directories. FFS defines cylinder groups as the unit of disk locality, and it • For mkdir , select a cylinder group with a more-than-average factors locality into allocation choices. number of free inodes. • typical: thousands of cylinders, dozens of groups • For creat , place inode in the same group as the parent. • Strategy: place “related” data blocks in the same cylinder group 2. Concentrate related file data blocks in cylinder groups. whenever possible. Most files are read and written sequentially. seek latency is proportional to seek distance • Place initial blocks of a file in the same group as its inode. • Smear large files across groups: How should we handle directory blocks? Place a run of contiguous blocks in each group. • Place adjacent logical blocks in the same cylinder group. • Reserveinodeblocks in each cylinder group. Logical block n+1 goes in the same group as block n. This allows inodesto be allocated close to their directory entries and Switch to a different group for each indirect block. close to their data blocks (for small files). Disk Hardware (4) Disk Hardware (4) What to Know What to Know We did not cover the LFS material in class, though it was in the Tanenbaumreading. I just want you to know what LFS is and how it compares to WAFL. • FS is log-structured: all writes are to the end of the log • WAFL can write anywhere • Both use no overwrite and indirect access to metadata • LFS requires a cleaner to find log segments with few allocated blocks, and rewrite those blocks at the end of the log so it can free the segment. Raid levels 3 through 5 Backup and parity drives are shaded 4

1 WAFL: A Closer Look WAFL: A Closer Look Snapshots Snapshots - PDF document

Summary of Issues for File Systems Summary of Issues for File Systems 1. Buffering disk data for access from the processor. block I/O (DMA) must use aligned, physically resident buffers block update is a read-modify-write FS Consistency, Block

Aortoiliac Lesions Philip Green, MD Assistant Professor of Medicine Columbia University Medical

Case Study: The Number Hierarchy Goal: Develop feel for programming in the large Issues well

Improving and extending MapClass David Martinez Thursday 26 th April, 2012 David Martinez

in a Cooperative Scenario Marius Politze, Bernd Decker IT Center RWTH Aachen University Setting

Information-Centric IoT Platforms for City-Scale Deployments Jiachen Chen WINLAB, Rutgers

Type Systems Lecture 6 Nov. 24th, 2004 Sebastian Maneth

Adult Learners and the Paradox of Choice Structure of the Presentation 1. Problem 2. Key

good teachers are able to weave a complex web of connections between themselves, their

Teaching is a rewarding and exciting process. In one sense, it is very simple teach students

L ECTURE 1: I NTRODUCTION I NSTRUCTOR : G IANNI A. D I C ARO C OLLECTIVE I NTELLIGENCE ? Group of

Compression in Self-Organizing Particle Systems JOSHUA J. DAYMUDE BARRETT UNDERGRADUATE HONORS

Cloud services, which are deployed as self-contained components, are normally partial

Parallelizing the Growing Self-Organizing Maps algorithm using Software Transactional Memory

1 Self-Organization Pattern A particular, organized arrangement of objects in space or

Self organizing robot Self organizing robot gathering Seminar in Distributed Computing Christof

COMP 213 Advanced Object-oriented Programming Lecture 13 Linked Lists. Linked Lists The

P rt r

Structural Programming Course Content and Data Structures Introduction Vectors

NLG as Cogni,ve Modelling The case of Referring Expressions

Computational Semantics and Pragmatics Autumn 2014 Raquel Fernndez Institute for Logic,

Today Memory Management Virtual Memory Motivation and Requirements Project 5 Nov 28,

Pennsylvanias Solar Future Kickoff Meeting March 2, 2017 Harrisburg, PA Overview Damon

SSL/TLS OpenSSL OpenSSL is a very common SSL/TLS library Written in C Wrappers exist

Android Security CS 4720 Mobile Application Development CS 4720 Security through Obscurity