CS 5460: Operating Systems
CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. - - PowerPoint PPT Presentation
CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. - - PowerPoint PPT Presentation
CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. 10) CS 5460: Operating Systems Important From Last Time Page replacement algorithms Optimal page replacement strategy evicts the page used farthest in the future LRU
Important From Last Time
Page replacement algorithms – Optimal page replacement strategy evicts the page used farthest in the future – LRU is a decent approximation of optimal – Clock / second chance algorithm is a single-bit approximation of LRU – works well for most workloads Thrashing happens when working sets do not fit
into RAM
– Response: Swap out entire processes – Last resort: Start killing processes Copy on write optimizations Memory-mapped file optimizations
CS 5460: Operating Systems
CS 5460: Operating Systems
Filesystem Layers
User’s viewpoint:
– Objects: Files, directories, bytes – Operations: Create, read, write, delete, rename, move, seek, set attributes
Physical viewpoint:
– Objects: Sectors, tracks, disks – Operations: Seek, read block, write block
User ßà ßà OS layer
– User library hides many details – OS can directly read/write user data
OS ßà ßà Hardware layer
– IO registers – Interrupts – DMA
Disk Hardware User Apps
User Library
Open() | Close() | Read() | Write() Seek() | ReadBlk() | WriteBlk()
Trap
I/O regs DMA Interrupts DMA
CS 5460: Operating Systems
Typical Disk Organization
Coated with magnetic material
that encodes bits
– Capacity increases come from improvements in bit density
Logically divided into:
– Spindles: individual disks – Tracks: rings on a disk – Sectors: portions of a track – Cylinders: stacks of tracks
Read/write data (overview):
– Position disk head over track – Wait for sector to rotate under head – Read/write data from/to sector
Block/sector Cylinder/Track R/W head Disk arm Rotation Spindle
CS 5460: Operating Systems
Disk Organization (cont’d)
Disk physics: – Modern disks spin at 5400, 7200, 10000, and 15000 rpm – Outside edge of 3.5” disk spins at over 150 mph – Disk head “floats” on very thin cushion of air above platter
» Bernoulli effect used to “fly” as close as possible » Head crash is exactly that à à disk head contacts the surface
Disks organized as stacks of platters: – Disk heads mounted on “combs” à à often heads on both sides – Separate disk heads moved independently Disk controller – Managing all the independent head movements – Contains RAM to cache disk contents from/to disk – Accepts commands from CPU à à responds using DMA/interrupts
CS 5460: Operating Systems
Disk Hardware Trends
Eliminating seeks is critical to performance!
Model Size Interface Seek RPM Price
ST320011A 20 GB ATA/IDE 9.0ms 7,200 $92 ST318437LC 18 GB U2-SCSI 3.6ms 15,000 $329 ST3120814A 120 GB ATA-100 8.5ms 7,200 $80 ST373453LC 73 GB U-SCSI 3.6ms 15,000 $263 ST3320820A 320 GB PATA 4ms 7,200 $99 ST3146855LW 147 GB Ultra320 2ms 15,000 $313 ST2000DM001 2000 GB SATA 4.1ms 7,200 $130 ST3600057SS 600 GB SAS 2ms 15,000 $500
- 2001
- 2005
- 2007
- Data: Seagate, NewEgg, dirtcheapdrives.com
- 2012
How to avoid seeks?
Design file system carefully Use RAM as a cache for disk – Once a block is read, cache it as long as possible – When a block is written to, delay the actual write Combine hard disk with solid state disk (SSD) Replace hard disk with SSD RAM cloud
CS 5460: Operating Systems
CS 5460: Operating Systems
What Do File System Users Need?
Persistence: Data persists beyond jobs, crash, … – Disk provides basic non-volatile storage – OS can enhance persistence via redundancy Speed: Fast access to data – Random access handled efficiently – OS can enhance performance via file caching Size: Can store lots of data Sharing/protection: – Users can control who/what has access to their data Ease of use: – Basic file abstraction (names, offsets, byte streams, …) – Directories simplify naming and lookup
CS 5460: Operating Systems
File System Abstractions
File: Basic container of persistent data – Unix: flat byte stream – IBM mainframes: series of records or objects Directory system: Hierarchical naming relationships – Directories are special “files” that index other files – OS exports operations to manage directories indirectly Common file access patterns: – Sequential: data processed in order, byte/record at a time
» Example: Compiler reading a source file
– Random access: address blocks of data based on file offset
» Example: Demand paging reads, database searches
– Keyed access: address blocks based on “key” values
» Typically implemented using key-file (hash) -- data-file pairs
CS 5460: Operating Systems
Common File System Operations
Data operations: – Create() – Delete() – Open() – Close() – Read() – Write() – Seek() Naming operations: – HardLink() – SoftLink() – Rename() Attribute operations: – SetAttribute() – GetAttribute() Attributes include owner, protection, last accessed
CS 5460: Operating Systems
File System Data Structures
Kernel (in-mem) Structures
– Global open file table – Per-process open file table – Free (disk) block list – Free inode list – File buffer cache: Cached disk blocks – Inode cache – Name cache
On-Disk Structures
– Superblock: File system format info – File: Collection of blocks/bytes – File descriptor (inode): File metadata – Directory: Special kind of file – Free block/inode maps
- File inode
- File
- contents
- Disk contents
- Key: Provide
- this mapping
- efficiently and
- safely.
CS 5460: Operating Systems
Key In-Memory Data Structures
Open file table: shared by all processes w/ open file – Open count and “deleted” flag – Copy of (or pointer to) file’s inode – Location of file blocks in file buffer cache (see below) Per-process file table: private for each process – Pointer to entry in global open file table – Current position in the file (“seek” pointer) – Access mode (read, write, read-write) File buffer cache: cache of file data blocks – Indexed by file-blocknum pairs (hash structure) – Used to reduce effective access time of disk operations – Can hold blocks from user files, directories, file system metadata
CS 5460: Operating Systems
Key In-Memory Data Structures
Name cache: cache of recent name lookup results – Indexed by full filename (hash structure) – Used to eliminate directory traversals (disk ops) for name lookups Free space “bitmap”: – Used to track which blocks on disk are available Free inode “bitmap”: – Used to track which file index nodes on disk are available Superblock: holds key metadata that describes disk – Physical characteristics: size of disk, size of blocks, … – Location of free space and free inode “bitmaps” – Location of inodes – Multiple copies stored in known location à à redundancy
CS 5460: Operating Systems
Key On-Disk Data Structures
File descriptor (aka “inode”)
– Link count – Security attributes: UID, GID, … – Size – Access/modified times – “Pointers” to blocks – …
Directory file: array of…
– File name (fixed/variable size) – Inode number – Length of directory entry
Free block bitmap Free inode bitmap Superblock
ulong links; uid_t uid; gid_t gid; ulong size; time_t access_time; time_t modified_time; addr_t blocklist…; Filename inode# Filename inode# REALLYLONGFILENAME inode# Filename inode# Short inode#
Directory file: File descriptor (inode):
CS 5460: Operating Systems
Naming and Directories
Need a method to “name” files on disk: – OS wants to use numbers or indices – Users prefer textual/visual names and hierarchical organization – Solution: Directories Naming schemes: – Simple: One name space for entire disk w/ unique names – User-based: Each user has a single separate directory (TOPS-10) – Hierarchical: Tree-structured name space (modern OSes)
» Store directories as special files flagged as “directory file” » User programs can read directory like normal files » Only special system calls can modify directory files » Directory files contain <name, filedesc> pairs » Special “root” directory
CS 5460: Operating Systems
Traversing Directories (Simplified)
How do we locate file descriptor for “/foo/bar”? – Divide file name into components (e.g., “/”, “foo”, and “bar”). – Recursively descend directory hierarchy, at each step:
» Load file descriptor of “next” directory file » Use file descriptor info to locate and load directory file contents » Scan directory file for matching filename of next component » If match found à à extract file descriptor number from (name, filedesc) » If no match à à lookup failure
How can we speed up this process? – Name cache
» Probe name cache for longest prefix contained in cache (e.g., “/foo”) » Start recursive descent using longest prefix as starting point
CS 5460: Operating Systems
Finding a File’s Inode on Disk
Locate inode for /foo/bar:
- 1. Find inode for “/”
– Always in known location
- 2. Read “/” directory into memory
- 3. Find “foo” entry
» If no match, fail lookup
- 4. Load “foo” inode from disk
- 5. Check permissions
» If no permission, fail lookup
- 6. Load “foo” directory blocks
- 7. Find “bar” entry
» If no match, fail lookup
- 8. Load “bar” inode from disk
- 9. Check permissions
» If no permission, fail lookup
foo inode# bar inode# “/” inode “foo” inode “bar” inode “foo” directory “/” directory Note: Pointers are block/inode numbers, not addresses!
CS 5460: Operating Systems
Finding a File’s Blocks on Disk
Conceptually, inode contains table: – One entry per block in file – Entry contains physical block address (e.g., platter 3, cylinder 1, sector 26) – To locate data at offset X, read block (X / block_size) Issues à
à How do we physically implement this table?
– Most files are small – Most of the disk is contained in (relatively few) large files – Need to efficiently support both sequential and random access – Want simple inode lookup and management mechanisms
Block Address 0 Block Address 1 … Block Address N
CS 5460: Operating Systems
File System Operation Details
Create(name) – Check permissions / quota – Allocate disk space – Create file descriptor w/ name, location on disk, attributes – Add index to file descriptor in directory – Optional: file type (e.g., Word doc)
» Richer interface » More complicated implementation
Delete(name) – Find directory containing file – Remove filedesc from directory – Free disk blocks used by file – Note: Wait until last user closes
CS 5460: Operating Systems
File System Operation Details
fid = Open(name, mode) – Check if file already open. If not:
» Find the file (via a name lookup) » Copy file descriptor into open file table
– Check protection à à abort operation if access not allowed – Increment open count in global open file table – Create per-process file table entry
» Add pointer to corresponding entry in system open file table » Initialize seek pointer to start of file
– Return per-process file table index Close(fid) – Remove entry in per-process file table – Decrement open count in global open file table
» If 0, remove from open file table
CS 5460: Operating Systems
File System Operation Details
Read(fid, offset, size, buffer) – Random access – Reads “size” bytes from offset “from” file into “buffer” Read(fid, size, buffer) – Sequential access – Reads “size” bytes from current seek offset into “buffer” – Increment current seek offset by number of bytes read – May read less bytes than were requested Write(…) – Analogous to Read() Seek(fid, offset) – Sets “seek offset” to specified offset
Important from Today
Key idea: Build hierarchical filesystem abstraction
- n top of a flat array of blocks
Filesystem goals – Reads, writes, file management operations must be fast – Efficient use of storage – Data is durable in face of OS crashes (and maybe disk crashes) – Implements OS’s security policy Next time: How to implement a filesystem
CS 5460: Operating Systems