cs5460 operating systems lecture 20 file system
play

CS5460: Operating Systems Lecture 20: File System Reliability CS - PowerPoint PPT Presentation

CS5460: Operating Systems Lecture 20: File System Reliability CS 5460: Operating Systems File System Optimizations Technique Effect Disk buffer cache Eliminates problem Modern Aggregated disk I/O Reduces seeks Prefetching Overlap/hide


  1. CS5460: Operating Systems Lecture 20: File System Reliability CS 5460: Operating Systems

  2. File System Optimizations Technique Effect Disk buffer cache Eliminates problem Modern Aggregated disk I/O Reduces seeks Prefetching Overlap/hide disk access Disk head scheduling Reduces seeks Historic Disk interleaving Reduces rotational latency  Goal: Reduce or hide expensive disk operations CS 5460: Operating Systems

  3. Buffer/Page Cache  Idea: Keep recently used disk blocks in kernel memory  Process reads from a file: – If blocks are not in buffer cache » Allocate space in buffer cache  Q: What do we purge and how? » Initiate a disk read » Block the process until disk operations complete – Copy data from buffer cache to process memory – Finally, system call returns  Usually, a process does not see the buffer cache directly  mmap() maps buffer cache pages into process RAM CS 5460: Operating Systems

  4. Buffer/Page Cache  Process writes to a file: – If blocks are not in the buffer cache » Allocate pages » Initiate disk read » Block process until disk operations complete – Copy written data from process RAM to buffer cache  Default: writes create dirty pages in the cache, then the system call returns – Data gets written to device in the background – What if the file is unlinked before it goes to disk?  Optional: Synchronous writes which go to disk before the system call returns – Really slow! CS 5460: Operating Systems

  5. Performing Large File I/Os  Idea: Try to allocate contiguous chunks of file in large contiguous regions of the disk – Disks have excellent bandwidth, but lousy latency! – Amortize expensive seeks over many block read/writes  Question: How? – Maintain free block bitmap (cache parts in memory) – When you allocate blocks, use a modified “ best fit ” algorithm, rather than allocating a block at a time (pre-allocate even)  Problem: Hard to do this when disk full/fragmented – Solution A: Keep a reserve (e.g., 10%) available at all times – Solution B: Run a disk “ defragger ” occasionally CS 5460: Operating Systems

  6. Prefetching  Idea: Read blocks from disk ahead of user request  Goal: Reduce number of seeks visible to user – If block read before request à à hits in file buffer cache File System User Read 0 Read 0 Read 1 Read 1 Read 2 Read 2  Problem: What blocks should we prefetch? – Easy: Detect sequential access and prefetch ahead N blocks – Harder: Detect periodic/predictable “ random ” accesses CS 5460: Operating Systems

  7. Fault Tolerance and Reliability CS 5460: Operating Systems

  8. Fault Tolerance  What kinds of failures do we need to consider? – OS crash, power failure » Data not on disk is lost; rarely, partial writes – Disk media failure » Data on disk corrupted or unavailable – Disk controller failure » Large swaths of data unavailable temporarily or permanently – Network failure » Clients and servers cannot communicate (transient failure) » Only have access to stale data (if any) – … (what else?) CS 5460: Operating Systems

  9. Techniques to Tolerate Failure  Careful disk writes and “ fsck ” – Leave disk in recoverable state even if not all writes finish – Run “ disk check ” program to identify/fix inconsistent disk state  RAID: – Redundant Array of Inexpensive Independent Disks – Write each block on more than one independent disk – If disk fails, can recover block contents from non-failed disks  Logging – Rather than overwrite-in-place, write changes to log file – Use two-phase commit to make log updates transactional  Clusters – Replicate data at the server level CS 5460: Operating Systems

  10. Careful Writes  Order writes so that disk state is recoverable – Accept that disk contents may be inconsistent or stale – Run sanity check program to detect and fix problems  Properties that should hold at all times – All blocks pointed to are not marked free – All blocks not pointed to are marked free – No block belongs to more than one file  Goal: Avoid major inconsistency  Not a goal: Never lose data CS 5460: Operating Systems

  11. Careful Writes Example  To create a file, you must: – Allocate and initialize an inode – Allocate and initialize some data blocks – Modify the directory file of the directory containing the file – Modify the directory file ’ s inode (last modified time, size)  In what order should we do these writes?  How to add transactional (all or nothing) semantics?  How do careful writes interact with optimizations? CS 5460: Operating Systems

  12. Careful Writes Exercise  To delete a file, you must: – Deallocate the file ’ s inode – Deallocate the file ’ s disk blocks – Modify the directory file of the directory containing the file – Update the directory file ’ s inode  In what order should we do these operations? – Consider what intermediate states are recoverable via fsck CS 5460: Operating Systems

  13. Soft Update Rules  Never point to a block before initializing it  Never reuse a block before nullifying pointers to it  Never reset last pointer to live block before setting a new one  Always mark free-block bitmap entries as used before making the directory entry point to it CS 5460: Operating Systems

  14. Careful Writes: More Exercises  To write a file, you must: – Modify (and perhaps allocate) the file ’ s disk blocks – Modify the file ’ s inode (size and last modified time) – Maybe, modify indirect block(s)  To move a file between directories, you must: – Modify the source directory – Modify the destination directory – Modify the inodes of both directories CS 5460: Operating Systems

  15. RAID  Goal: Organize multiple physical disks into a single high-performance, high-reliability logical disk RAID I/O bus CPU ctlr.  Issues to consider: – Multiple disks à à higher aggregate throughput (more spindles) – Multiple disks à à (hopefully) independent failure modes – Multiple disks à à vulnerable to individual disk failures (MTTF) – Writing to multiple disks for replication à à higher write overhead CS 5460: Operating Systems

  16. Possible Uses of Multiple Disks  Striping – Spread pieces of a single file across multiple disks – Advantages: » Can service multiple independent requests in parallel » Can service single “ large ” requests in parallel – Issues: » Interleave factor » How the data is striped across disks  Redundancy (replication) – Store multiple copies of blocks on independent disks – Advantages: » Can tolerate partial system failure à à How much? – Issues: » How widely do you want to spread the data? CS 5460: Operating Systems

  17. Types of RAID RAID level Description 0 Data striping w/o redundancy 1 Disk mirroring 2 Parallel array of disks w/ error correcting disk (checksum) 3 Bit-interleaved parity 4 Block-interleaved parity 5 Block-interleaved, distributed parity CS 5460: Operating Systems

  18. RAID Level 0  Striping – Spread contiguous blocks of a file across multiple spindles – Simple round-robin distribution  Non-redundant – No fault tolerance  Advantages – Higher throughput – Larger storage  Disadvantages RAID ctlr. – Lower reliability – any drive failure destroys the file system I/O bus – Added cost CPU CS 5460: Operating Systems

  19. RAID Level 1  Mirroring – Write complete copies of all blocks to multiple disks – How many copies à à how much reliability  No striping – No added write bandwidth – Potential for pipelined reads  Advantage: – Can tolerate disk failures RAID ( “ availability ” ) ctlr.  Disadvantage: – High cost (extra disks and RAID I/O bus controller) CPU  Q: How to recover from drive failure? CS 5460: Operating Systems

  20. RAID Level 5  Mirroring + striping + distributed parity – Spread contiguous blocks of a file across multiple spindles – Adds parity information » Example: XOR of other blocks  Combines features of 0 & 1  Advantages – Higher throughput – Lower cost (than level 1) RAID – Any single disk can fail ctlr.  Disadvantages I/O bus – More complexity in RAID controller CPU – Slower recovery time than RAID 1  RAID 6: 2 parity disks CS 5460: Operating Systems

  21. RAID Tradeoffs  Space efficiency  Minimum number of disks  Number of simultaneous failures tolerated  Read performance  Write performance  Time to recover from a failed disk  Complexity of controller CS 5460: Operating Systems

  22. RAID Discussion  RAID can be implemented by hardware or software – Hardware RAID implemented by RAID controller » Often supports hot swapping using hot spare disks » Not totally clear that cheap RAID HW is worth it – Software RAID implemented by OS kernel (device driver)  Multiple parity disks can handle multiple errors  Nested RAID – Can use a RAID array as a “ disk ” in a higher level RAID » RAID 1+0: RAID 0 (striping) run across RAID 1 (mirrored) arrays » RAID 0+1: RAID 1 (mirroring) run across RAID 0 (striped) arrays CS 5460: Operating Systems

  23. RAID Discussion  What are the risks due to purchasing a large number of disks at the same time for use in a RAID?  Hot spares can be useful  What does a RAID look like to the file system code?  RAID summary – Tolerates failed disks – May not deal well with correlated failure modes – Can improve sustained transfer rate – Does not improve individual seek latencies CS 5460: Operating Systems

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend