File System Reliability Main Points Problem posed by - PowerPoint PPT Presentation

File ¡System ¡Reliability ¡

Main ¡Points ¡ • Problem ¡posed ¡by ¡machine/disk ¡failures ¡ • Transac<on ¡concept ¡ • Reliability ¡ – Careful ¡sequencing ¡of ¡file ¡system ¡opera<ons ¡ – Copy-‑on-‑write ¡(WAFL, ¡ZFS) ¡ – Journalling ¡(NTFS, ¡linux ¡ext4) ¡ – Log ¡structure ¡(flash ¡storage) ¡ • Availability ¡ – RAID ¡

File ¡System ¡Reliability ¡ • What ¡can ¡happen ¡if ¡disk ¡loses ¡power ¡or ¡ machine ¡soRware ¡crashes? ¡ – Some ¡opera<ons ¡in ¡progress ¡may ¡complete ¡ – Some ¡opera<ons ¡in ¡progress ¡may ¡be ¡lost ¡ – Overwrite ¡of ¡a ¡block ¡may ¡only ¡par<ally ¡complete ¡ • File ¡system ¡wants ¡durability ¡(as ¡a ¡minimum!) ¡ – Data ¡previously ¡stored ¡can ¡be ¡retrieved ¡(maybe ¡ aRer ¡some ¡recovery ¡step), ¡regardless ¡of ¡failure ¡

Storage ¡Reliability ¡Problem ¡ • Single ¡logical ¡file ¡opera<on ¡can ¡involve ¡updates ¡to ¡ mul<ple ¡physical ¡disk ¡blocks ¡ – inode, ¡indirect ¡block, ¡data ¡block, ¡bitmap, ¡… ¡ – With ¡remapping, ¡single ¡update ¡to ¡physical ¡disk ¡block ¡ can ¡require ¡mul<ple ¡(even ¡lower ¡level) ¡updates ¡ • At ¡a ¡physical ¡level, ¡opera<ons ¡complete ¡one ¡at ¡a ¡ <me ¡ – Want ¡concurrent ¡opera<ons ¡for ¡performance ¡ • How ¡do ¡we ¡guarantee ¡consistency ¡regardless ¡of ¡ when ¡crash ¡occurs? ¡

Transac<on ¡Concept ¡ • Transac<on ¡is ¡a ¡group ¡of ¡opera<ons ¡ – Atomic: ¡opera<ons ¡appear ¡to ¡happen ¡as ¡a ¡group, ¡ or ¡not ¡at ¡all ¡(at ¡logical ¡level) ¡ • At ¡physical ¡level, ¡only ¡single ¡disk/flash ¡write ¡is ¡atomic ¡ • To ¡empty ¡disk/flash ¡block ¡with ¡consistency ¡check ¡ – Durable: ¡opera<ons ¡that ¡complete ¡stay ¡completed ¡ • Future ¡failures ¡do ¡not ¡corrupt ¡previously ¡stored ¡data ¡ – Isola<on: ¡other ¡transac<ons ¡do ¡not ¡see ¡results ¡of ¡ earlier ¡transac<ons ¡un<l ¡they ¡are ¡commiYed ¡ – Consistency: ¡sequen<al ¡memory ¡model ¡

Reliability ¡Approach ¡#1: ¡ ¡ Careful ¡Ordering ¡ • Sequence ¡opera<ons ¡in ¡a ¡specific ¡order ¡ – Careful ¡design ¡to ¡allow ¡sequence ¡to ¡be ¡interrupted ¡ safely ¡ • Post-‑crash ¡recovery ¡ – Read ¡data ¡structures ¡to ¡see ¡if ¡there ¡were ¡any ¡ opera<ons ¡in ¡progress ¡ – Clean ¡up/finish ¡as ¡needed ¡ • Approach ¡taken ¡in ¡FAT, ¡FFS ¡(fsck), ¡and ¡many ¡app-‑ level ¡recovery ¡schemes ¡(e.g., ¡Word) ¡

FAT: ¡Append ¡Data ¡to ¡File ¡ • Allocate ¡data ¡block ¡ MFT Data Blocks 0 1 • Write ¡data ¡ 2 3 fi le 9 block 3 • Write ¡new ¡MFT ¡entry ¡ 4 5 to ¡point ¡to ¡data ¡block ¡ 6 7 8 • Update ¡file ¡tail ¡to ¡ 9 fi le 9 block 0 10 fi le 9 block 1 point ¡to ¡new ¡MFT ¡ 11 fi le 9 block 2 fi le 12 block 0 12 entry ¡ 13 14 • Update ¡access ¡<me ¡ 15 16 fi le 12 block 1 at ¡head ¡of ¡file ¡ 17 18 fi le 9 block 4 19 20

FAT: ¡Append ¡Data ¡to ¡File ¡ Normal ¡opera<on: ¡ Recovery: ¡ • Allocate ¡data ¡block ¡ • Scan ¡MFT ¡ • Write ¡data ¡ • If ¡entry ¡is ¡unlinked, ¡ mark ¡as ¡unused ¡ • Write ¡new ¡MFT ¡entry ¡to ¡ point ¡to ¡data ¡block ¡ • If ¡access ¡<me ¡is ¡ incorrect, ¡update ¡ • Update ¡file ¡tail ¡to ¡point ¡ to ¡new ¡MFT ¡entry ¡ • Update ¡access ¡<me ¡at ¡ head ¡of ¡file ¡

FAT: ¡Create ¡New ¡File ¡ Normal ¡opera<on: ¡ Recovery: ¡ • Allocate ¡data ¡block ¡ • Scan ¡MFT ¡ • Write ¡MFT ¡entry ¡to ¡ • If ¡any ¡unlinked ¡files ¡(not ¡ point ¡to ¡data ¡block ¡ in ¡any ¡directory), ¡delete ¡ • Update ¡directory ¡with ¡ • Scan ¡directories ¡for ¡ file ¡name ¡-‑> ¡file ¡number ¡ missing ¡update ¡<mes ¡ – What ¡if ¡directory ¡spans ¡ mul<ple ¡disk ¡blocks? ¡ • Update ¡modify ¡<me ¡for ¡ directory ¡

FFS: ¡Create ¡a ¡File ¡ Normal ¡opera<on: ¡ Recovery: ¡ • Allocate ¡data ¡block ¡ • Scan ¡inode ¡table ¡ • Write ¡data ¡block ¡ • If ¡any ¡unlinked ¡files ¡(not ¡ in ¡any ¡directory), ¡delete ¡ • Allocate ¡inode ¡ • Compare ¡free ¡block ¡ • Write ¡inode ¡block ¡ bitmap ¡against ¡inode ¡ • Update ¡bitmap ¡of ¡free ¡ trees ¡ blocks ¡ • Scan ¡directories ¡for ¡ • Update ¡directory ¡with ¡file ¡ missing ¡update/access ¡ name ¡-‑> ¡file ¡number ¡ <mes ¡ • Update ¡modify ¡<me ¡for ¡ directory ¡ Time ¡propor<onal ¡to ¡size ¡of ¡ disk ¡

FFS: ¡Move ¡a ¡File ¡ Normal ¡opera<on: ¡ Recovery: ¡ • Remove ¡filename ¡from ¡ • Scan ¡all ¡directories ¡to ¡ old ¡directory ¡ determine ¡set ¡of ¡live ¡ files ¡ • Add ¡filename ¡to ¡new ¡ directory ¡ • Consider ¡files ¡with ¡valid ¡ inodes ¡and ¡not ¡in ¡any ¡ directory ¡ – New ¡file ¡being ¡created? ¡ – File ¡move? ¡ – File ¡dele<on? ¡

FFS: ¡Move ¡and ¡Grep ¡ Process ¡A ¡ Process ¡B ¡ move ¡file ¡from ¡x ¡to ¡y ¡ grep ¡across ¡x ¡and ¡y ¡ mv ¡x/file ¡y/ ¡ grep ¡x/* ¡y/* ¡ Will ¡grep ¡always ¡see ¡ contents ¡of ¡file? ¡

Applica<on ¡Save ¡File ¡ Normal ¡opera<on: ¡ Recovery: ¡ • Write ¡name ¡of ¡each ¡open ¡ • On ¡startup, ¡see ¡if ¡any ¡files ¡ file ¡to ¡app ¡folder ¡ were ¡leR ¡open ¡ • Write ¡changes ¡to ¡backup ¡ • If ¡so, ¡look ¡for ¡backup ¡file ¡ file ¡ • If ¡so, ¡ask ¡user ¡to ¡compare ¡ • Rename ¡backup ¡file ¡to ¡be ¡ versions ¡ file ¡(atomic ¡opera<on ¡ provided ¡by ¡file ¡system) ¡ • Delete ¡list ¡in ¡app ¡folder ¡ on ¡clean ¡shutdown ¡

Careful ¡Ordering ¡ • Pros ¡ – Works ¡with ¡minimal ¡support ¡in ¡the ¡disk ¡drive ¡ – Works ¡for ¡most ¡mul<-‑step ¡opera<ons ¡ • Cons ¡ – Can ¡require ¡<me-‑consuming ¡recovery ¡aRer ¡a ¡failure ¡ – Difficult ¡to ¡reduce ¡every ¡opera<on ¡to ¡a ¡safely ¡ interrup<ble ¡sequence ¡of ¡writes ¡ – Difficult ¡to ¡achieve ¡consistency ¡when ¡mul<ple ¡ opera<ons ¡occur ¡concurrently ¡

Reliability ¡Approach ¡#2: ¡ Copy ¡on ¡Write/Write ¡Anywhere ¡ • To ¡update ¡file ¡system, ¡write ¡a ¡new ¡version ¡of ¡ the ¡file ¡system ¡containing ¡the ¡update ¡ – Never ¡update ¡in ¡place ¡ – Reuse ¡exis<ng ¡unchanged ¡disk ¡blocks ¡ • Seems ¡expensive! ¡ ¡But ¡ – Updates ¡can ¡be ¡batched ¡ – Almost ¡all ¡disk ¡writes ¡can ¡occur ¡in ¡parallel ¡ • Approach ¡taken ¡in ¡network ¡file ¡server ¡ appliances ¡(WAFL, ¡ZFS) ¡

Copy ¡on ¡Write/Write ¡Anywhere ¡ Root Inode Inode File’s Inode Array Indirect Data Slots Indirect Blocks (in Inode File) Blocks Blocks Fixed Anywhere Location

Copy ¡on ¡Write/Write ¡Anywhere ¡ Root Inode Inode File’s Inode Array Indirect Data Slots Indirect Blocks (in Inode File) Blocks Blocks Update Last Block of File

Copy ¡on ¡Write ¡Batch ¡Update ¡ Root Root Inode File’s File’s Inode Inode’s File Indirect Data Indirect Blocks Blocks Blocks New Indirect New Nodes Root Inode New New Indirect Data Nodes of Block of Inode New Inode File Data File Blocks

FFS ¡Update ¡in ¡Place ¡ Update Inode New Data Block Update Indirect Block Update Bitmap

WAFL ¡Write ¡Loca<on ¡ Old Inode New Data Block Update Indirect Block Update Inode Update Bitmap Old Indirect Block Old Bitmap

File System Reliability Main Points Problem posed by - PowerPoint PPT Presentation

File System Reliability Main Points Problem posed by machine/disk failures Transac<on concept Reliability Careful sequencing of file system opera<ons

File Management What is a file? Elements of file management File organization

Software Reliability and System Reliability Introduction 1 Software Reliability and System

File System Reliability OSPP Chapter 14 Main Points Problem posed by machine/disk failures

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Lecture 4: File management starting from / Hands-on Unix System Administration DeCal 2012-01-30

Mass Storage and I/O - II RAID: Redundant Array of Inexpensive Disks multiple disk drives

dmraid update Linux-Kongress Dresden 2009 Heinz Mauelshagen Consulting Development Engineer Top

RAID 2009 Engin Kirda Institute Eurecom Exciting and Interesting Program 17 full papers

Linux Filesystem & Storage Tuning Christoph Hellwig LST e.V. LinuxCon North America 2011

Storage and reliability Computer Architecture J. Daniel Garca Snchez (coordinator) David

1 Hello and welcome. This is BPs first-quarter 2017 results webcast and conference call. Im

MAC Workshop: RC_2017_02: Implementation of 30-Minute Balancing Gate Closure 18 October 2019

File System Reliability Main Points Problem posed by - PowerPoint PPT Presentation

File System Reliability Main Points Problem posed by machine/disk failures Transac<on concept Reliability Careful sequencing of file system opera<ons

File Management What is a file? Elements of file management File organization

Software Reliability and System Reliability Introduction 1 Software Reliability and System

File System Reliability OSPP Chapter 14 Main Points Problem posed by machine/disk failures

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Advanced File Systems Thierry Sans Advanced File Systems How to improve the performances?

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Lecture 4: File management starting from / Hands-on Unix System Administration DeCal 2012-01-30

Mass Storage and I/O - II RAID: Redundant Array of Inexpensive Disks multiple disk drives

dmraid update Linux-Kongress Dresden 2009 Heinz Mauelshagen Consulting Development Engineer Top

RAID 2009 Engin Kirda Institute Eurecom Exciting and Interesting Program 17 full papers

Linux Filesystem &amp; Storage Tuning Christoph Hellwig LST e.V. LinuxCon North America 2011

Storage and reliability Computer Architecture J. Daniel Garca Snchez (coordinator) David

1 Hello and welcome. This is BPs first-quarter 2017 results webcast and conference call. Im

MAC Workshop: RC_2017_02: Implementation of 30-Minute Balancing Gate Closure 18 October 2019

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Linux Filesystem & Storage Tuning Christoph Hellwig LST e.V. LinuxCon North America 2011