FS Consistency, Block Allocation, and WAFL FS Consistency, Block - PowerPoint PPT Presentation

FS Consistency, Block Allocation, and WAFL FS Consistency, Block Allocation, and WAFL

Summary of Issues for File Systems Summary of Issues for File Systems 1. Buffering disk data for access from the processor. block I/O (DMA) must use aligned, physically resident buffers block update is a read-modify-write 2. Creating/representing/destroying independent files. disk block allocation, file block map structures directories and symbolic naming 3. Masking the high seek/rotational latency of disk access. smart block allocation on disk block caching, read-ahead (prefetching), and write-behind 4. Reliability and the handling of updates.

Rotational Media Rotational Media Track Sector Arm Cylinder Platter Head Access time = seek time + rotational delay + transfer time seek time = 5-15 milliseconds to move the disk arm and settle on a cylinder rotational delay = 8 milliseconds for full rotation at 7200 RPM: average delay = 4 ms transfer time = 1 millisecond for an 8KB block at 8 MB/s Bandwidth utilization is less than 50% for any noncontiguous access at a block grain.

The Problem of Disk Layout The Problem of Disk Layout The level of indirection in the file block maps allows flexibility in file layout. “File system design is 99% block allocation.” [McVoy] Competing goals for block allocation: • allocation cost • bandwidth for high-volume transfers • stamina • efficient directory operations Goal: reduce disk arm movement and seek overhead. metric of merit: bandwidth utilization (or effective bandwidth )

FFS and LFS FFS and LFS We will study two different approaches to block allocation: • Cylinder groups in the Fast File System (FFS) [McKusick81] clustering enhancements [McVoy91], and improved cluster allocation [McKusick: Smith/Seltzer96] FFS can also be extended with metadata logging [e.g., Episode] • Log-Structured File System (LFS) proposed in [Douglis/Ousterhout90] implemented/studied in [Rosenblum91] BSD port, sort of maybe: [Seltzer93] extended with self-tuning methods [Neefe/Anderson97] • Other approach: extent-based file systems

WAFL: High- -Level View Level View WAFL: High The whole on-disk Fixed location file system layout is a tree of blocks. Everything else: write anywhere . User data Allocation maps

WAFL: A Closer Look WAFL: A Closer Look

Snapshots Snapshots “WAFL’s primary distinguishing characteristic is Snapshots, which are readonly copies of the entire file system.” This was really the origin of the idea of a point-in-time copy for the file server market. What is this idea good for?

Snapshots Snapshots The snapshot mechanism is used for user-accessible snapshots and for transient consistency points. How is this like a fork ?

Shadowing Shadowing Shadowing is the basic technique for doing an atomic force . reminiscent of copy-on-write 3. overwrite block map 1. starting point 2. write new blocks to disk ( atomic commit ) modify purple/grey blocks prepare new block map and free old blocks Frequent problems: nonsequential disk writes, damages clustered allocation on disk. How does WAFL deal with this?

WAFL Consistency Points WAFL Consistency Points “WAFL uses Snapshots internally so that it can restart quickly even after an unclean system shutdown.” “A consistency point is a completely self-consistent image of the entire file system. When WAFL restarts, it simply reverts to the most recent consistency point.” • Buffer dirty data in memory (delayed writes) and write new consistency points as an atomic batch ( force ). • A consistency point transitions the FS from one self- consistent state to another. • Combine with NFS operation log in NVRAM What if NVRAM fails?

The Problem of Metadata Updates The Problem of Metadata Updates Metadata updates are a second source of FFS seek overhead. • Metadata writes are poorly localized. E.g., extending a file requires writes to the inode, direct and indirect blocks, cylinder group bit maps and summaries, and the file block itself. Metadata writes can be delayed, but this incurs a higher risk of file system corruption in a crash. • If you lose your metadata, you are dead in the water. • FFS schedules metadata block writes carefully to limit the kinds of inconsistencies that can occur. Some metadata updates must be synchronous on controllers that don’t respect order of writes.

FFS Failure Recovery FFS Failure Recovery FFS uses a two-pronged approach to handling failures: 1. Carefully order metadata updates to ensure that no dangling references can exist on disk after a failure. • Never recycle a resource (block or inode) before zeroing all pointers to it ( truncate, unlink, rmdir ). • Never point to a structure before it has been initialized. E.g., sync inode on creat before filling directory entry, and sync a new block before writing the block map. 2. Run a file system scavenger ( fsck ) to fix other problems. Free blocks and inodes that are not referenced. Fsck will never encounter a dangling reference or double allocation.

Alternative: Logging and Journaling Alternative: Logging and Journaling Logging can be used to localize synchronous metadata writes, and reduce the work that must be done on recovery. Universally used in database systems. Used for metadata writes in journaling file systems (e.g., Episode). Key idea: group each set of related updates into a single log record that can be written to disk atomically (“all-or-nothing”). • Log records are written to the log file or log disk sequentially . No seeks, and preserves temporal ordering. • Each log record is trailed by a marker (e.g., checksum) that says “this log record is complete”. • To recover, scan the log and reapply updates.

Metadata Logging Metadata Logging Here’s one approach to building a fast filesystem: 1. Start with FFS with clustering. 2. Make all metadata writes asynchronous. But , that approach cannot survive a failure, so: 3. Add a supplementary log for modified metadata. 4. When metadata changes, write new versions immediately to the log, in addition to the asynchronous writes to “home”. 5. If the system crashes, recover by scanning the log. Much faster than scavenging ( fsck) for large volumes. 6. If the system does not crash, then discard the log.

The Nub of WAFL The Nub of WAFL WAFL’s consistency points allow it to buffer writes and push them out in a batch. • Deferred, clustered allocation • Batch writes • Localize writes Indirection through the metadata “tree” allows it to write data wherever convenient: the tree can point anywhere. • Maximize the benefits from batching writes in consistency points. • Also allow multiple copies of a given piece of metadata, for snapshots.

SnapMirror SnapMirror Is it research? What makes it interesting/elegant? What are the tech trends that motivate SnapMirror, and WAFL before it? Why is disaster recovery so important now? How does WAFL make mirroring easier? If a mirror fails, what is lost? Can both mirrors operate at the same time?

Mirroring Mirroring Structural issue: build mirroring support at: • Application level • FS level • Block storage level (e.g., RAID unit) Who has the information? • What has changed? • What has been deallocated?

What Has Changed? What Has Changed? Given a snapshot X, WAFL can ask: is block B allocated in snapshot X? Given a snapshot X and a later snapshot Y, WAFL can ask: what blocks of Y should be sent to the mirror? Y 1 0 0 added unused X deleted unchanged 1

Details Details SnapMirror names disk blocks: why? What are the implications? What if a mirror fails? What is lost? How to keep the mirror self-consistent? How does the no-overwrite policy of WAFL help in SnapMirror? What is the strengths/weaknesses with implementing this functionality above or below the file system? Does this conclusion depend on other details of WAFL? What can we conclude from the experiments?

FFS Cylinder Groups FFS Cylinder Groups FFS defines cylinder groups as the unit of disk locality, and it factors locality into allocation choices. • typical: thousands of cylinders, dozens of groups • Strategy: place “related” data blocks in the same cylinder group whenever possible. seek latency is proportional to seek distance • Smear large files across groups: Place a run of contiguous blocks in each group. • Reserve inode blocks in each cylinder group. This allows inodes to be allocated close to their directory entries and close to their data blocks (for small files).

FFS Allocation Policies FFS Allocation Policies 1. Allocate file inodes close to their containing directories. • For mkdir , select a cylinder group with a more-than-average number of free inodes. • For creat , place inode in the same group as the parent. 2. Concentrate related file data blocks in cylinder groups. Most files are read and written sequentially. • Place initial blocks of a file in the same group as its inode. How should we handle directory blocks? • Place adjacent logical blocks in the same cylinder group. Logical block n+1 goes in the same group as block n. Switch to a different group for each indirect block.

Disk Hardware (4) Disk Hardware (4) Raid levels 3 through 5 Backup and parity drives are shaded

FS Consistency, Block Allocation, and WAFL FS Consistency, Block - PowerPoint PPT Presentation

FS Consistency, Block Allocation, and WAFL FS Consistency, Block Allocation, and WAFL Summary of Issues for File Systems Summary of Issues for File Systems 1. Buffering disk data for access from the processor. block I/O (DMA) must use aligned,

1 WAFL: A Closer Look WAFL: A Closer Look Snapshots Snapshots WAFLs primary

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Problem 1 k zero bits n bits IV Block Block Block Block Cipher Cipher Cipher Cipher

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

More Register Allocation Last time Register allocation Global allocation via graph

Introducing EF Block TM Introduction to EF Block Building Materials Overview of EF

Block Ciphers Eli Biham - May 3, 2005 c 83 Block Ciphers (4) Block Ciphers and Stream

Chapter 16 Chapter 16 The Elements: The he Elements: The d -Block -Block The d -Block

InfiniBand Network Block Device Overview IBNBD: InfiniBand Network Block device Transfer

CRYSTAL CITY BLOCK PLAN # CCBP- J-K 2019 1 BLOCK J-K Long Range Planning Committee Block

CRYSTAL CITY BLOCK PLAN # CCBP- G 1 BLOCK G (Metro Market Square block) 1 Long Range

CRYSTAL CITY BLOCK PLAN # CCBP- G 1 BLOCK G (Metro Market Square block) 1 Long Range

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Project Nexus Principle Workshop Project Nexus Principle Workshop ALLOCATION ALLOCATION 15

Society for Nutrition and Behavior Conference Friday, July 21 9:45am 10:45am Staff Kelliann

Introduction to Seismic Essentials in Groningen Lateral Load Resisting 5.3 System Design

HACKATHON 2020 FEV. 28-29 Insert a disclaimer from the UpSlide library INTRODUCTION 2

Woodlands Health Campus YAN Yan Director, Campus Planning Woodlands Health Campus Michael LEONG

Computing the Maximum Blocking Time for Scheduling with Deferred Preemption Sebastian Altmeyer,

252-210: Compiler Design 9.2 Points and paths 9.3

Laplacian eigenvalues and optimality: IV. Further topics R. A. Bailey and Peter J. Cameron

Privacy-Preserving Ontology Publishing for EL Instance Stores Franz Baader Francesco Kriegel