CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. - - PowerPoint PPT Presentation

cs5460 operating systems lecture 17 intro to file systems
SMART_READER_LITE
LIVE PREVIEW

CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. - - PowerPoint PPT Presentation

CS5460: Operating Systems Lecture 17: Intro to File Systems (Ch. 10) CS 5460: Operating Systems Important From Last Time Page replacement algorithms Optimal page replacement strategy evicts the page used farthest in the future LRU


slide-1
SLIDE 1

CS 5460: Operating Systems

CS5460: Operating Systems

Lecture 17: Intro to File Systems (Ch. 10)

slide-2
SLIDE 2

Important From Last Time

 Page replacement algorithms – Optimal page replacement strategy evicts the page used farthest in the future – LRU is a decent approximation of optimal – Clock / second chance algorithm is a single-bit approximation of LRU – works well for most workloads  Thrashing happens when working sets do not fit

into RAM

– Response: Swap out entire processes – Last resort: Start killing processes  Copy on write optimizations  Memory-mapped file optimizations

CS 5460: Operating Systems

slide-3
SLIDE 3

CS 5460: Operating Systems

Filesystem Layers

User’s viewpoint:

– Objects: Files, directories, bytes – Operations: Create, read, write, delete, rename, move, seek, set attributes

Physical viewpoint:

– Objects: Sectors, tracks, disks – Operations: Seek, read block, write block

User ßà ßà OS layer

– User library hides many details – OS can directly read/write user data

OS ßà ßà Hardware layer

– IO registers – Interrupts – DMA

Disk Hardware User Apps

User Library

Open() | Close() | Read() | Write() Seek() | ReadBlk() | WriteBlk()

Trap

I/O regs DMA Interrupts DMA

slide-4
SLIDE 4

CS 5460: Operating Systems

Typical Disk Organization

 Coated with magnetic material

that encodes bits

– Capacity increases come from improvements in bit density

 Logically divided into:

– Spindles: individual disks – Tracks: rings on a disk – Sectors: portions of a track – Cylinders: stacks of tracks

 Read/write data (overview):

– Position disk head over track – Wait for sector to rotate under head – Read/write data from/to sector

Block/sector Cylinder/Track R/W head Disk arm Rotation Spindle

slide-5
SLIDE 5

CS 5460: Operating Systems

Disk Organization (cont’d)

 Disk physics: – Modern disks spin at 5400, 7200, 10000, and 15000 rpm – Outside edge of 3.5” disk spins at over 150 mph – Disk head “floats” on very thin cushion of air above platter

» Bernoulli effect used to “fly” as close as possible » Head crash is exactly that à à disk head contacts the surface

 Disks organized as stacks of platters: – Disk heads mounted on “combs” à à often heads on both sides – Separate disk heads moved independently  Disk controller – Managing all the independent head movements – Contains RAM to cache disk contents from/to disk – Accepts commands from CPU à à responds using DMA/interrupts

slide-6
SLIDE 6

CS 5460: Operating Systems

Disk Hardware Trends

 Eliminating seeks is critical to performance!

Model Size Interface Seek RPM Price

ST320011A 20 GB ATA/IDE 9.0ms 7,200 $92 ST318437LC 18 GB U2-SCSI 3.6ms 15,000 $329 ST3120814A 120 GB ATA-100 8.5ms 7,200 $80 ST373453LC 73 GB U-SCSI 3.6ms 15,000 $263 ST3320820A 320 GB PATA 4ms 7,200 $99 ST3146855LW 147 GB Ultra320 2ms 15,000 $313 ST2000DM001 2000 GB SATA 4.1ms 7,200 $130 ST3600057SS 600 GB SAS 2ms 15,000 $500

  • 2001
  • 2005
  • 2007
  • Data: Seagate, NewEgg, dirtcheapdrives.com
  • 2012
slide-7
SLIDE 7

How to avoid seeks?

 Design file system carefully  Use RAM as a cache for disk – Once a block is read, cache it as long as possible – When a block is written to, delay the actual write  Combine hard disk with solid state disk (SSD)  Replace hard disk with SSD  RAM cloud

CS 5460: Operating Systems

slide-8
SLIDE 8

CS 5460: Operating Systems

What Do File System Users Need?

 Persistence: Data persists beyond jobs, crash, … – Disk provides basic non-volatile storage – OS can enhance persistence via redundancy  Speed: Fast access to data – Random access handled efficiently – OS can enhance performance via file caching  Size: Can store lots of data  Sharing/protection: – Users can control who/what has access to their data  Ease of use: – Basic file abstraction (names, offsets, byte streams, …) – Directories simplify naming and lookup

slide-9
SLIDE 9

CS 5460: Operating Systems

File System Abstractions

 File: Basic container of persistent data – Unix: flat byte stream – IBM mainframes: series of records or objects  Directory system: Hierarchical naming relationships – Directories are special “files” that index other files – OS exports operations to manage directories indirectly  Common file access patterns: – Sequential: data processed in order, byte/record at a time

» Example: Compiler reading a source file

– Random access: address blocks of data based on file offset

» Example: Demand paging reads, database searches

– Keyed access: address blocks based on “key” values

» Typically implemented using key-file (hash) -- data-file pairs

slide-10
SLIDE 10

CS 5460: Operating Systems

Common File System Operations

 Data operations: – Create() – Delete() – Open() – Close() – Read() – Write() – Seek()  Naming operations: – HardLink() – SoftLink() – Rename()  Attribute operations: – SetAttribute() – GetAttribute() Attributes include owner, protection, last accessed

slide-11
SLIDE 11

CS 5460: Operating Systems

File System Data Structures

 Kernel (in-mem) Structures

– Global open file table – Per-process open file table – Free (disk) block list – Free inode list – File buffer cache: Cached disk blocks – Inode cache – Name cache

 On-Disk Structures

– Superblock: File system format info – File: Collection of blocks/bytes – File descriptor (inode): File metadata – Directory: Special kind of file – Free block/inode maps

  • File inode
  • File
  • contents
  • Disk contents
  • Key: Provide
  • this mapping
  • efficiently and
  • safely.
slide-12
SLIDE 12

CS 5460: Operating Systems

Key In-Memory Data Structures

 Open file table: shared by all processes w/ open file – Open count and “deleted” flag – Copy of (or pointer to) file’s inode – Location of file blocks in file buffer cache (see below)  Per-process file table: private for each process – Pointer to entry in global open file table – Current position in the file (“seek” pointer) – Access mode (read, write, read-write)  File buffer cache: cache of file data blocks – Indexed by file-blocknum pairs (hash structure) – Used to reduce effective access time of disk operations – Can hold blocks from user files, directories, file system metadata

slide-13
SLIDE 13

CS 5460: Operating Systems

Key In-Memory Data Structures

 Name cache: cache of recent name lookup results – Indexed by full filename (hash structure) – Used to eliminate directory traversals (disk ops) for name lookups  Free space “bitmap”: – Used to track which blocks on disk are available  Free inode “bitmap”: – Used to track which file index nodes on disk are available  Superblock: holds key metadata that describes disk – Physical characteristics: size of disk, size of blocks, … – Location of free space and free inode “bitmaps” – Location of inodes – Multiple copies stored in known location à à redundancy

slide-14
SLIDE 14

CS 5460: Operating Systems

Key On-Disk Data Structures

 File descriptor (aka “inode”)

– Link count – Security attributes: UID, GID, … – Size – Access/modified times – “Pointers” to blocks – …

 Directory file: array of…

– File name (fixed/variable size) – Inode number – Length of directory entry

 Free block bitmap  Free inode bitmap  Superblock

ulong links; uid_t uid; gid_t gid; ulong size; time_t access_time; time_t modified_time; addr_t blocklist…; Filename inode# Filename inode# REALLYLONGFILENAME inode# Filename inode# Short inode#

Directory file: File descriptor (inode):

slide-15
SLIDE 15

CS 5460: Operating Systems

Naming and Directories

 Need a method to “name” files on disk: – OS wants to use numbers or indices – Users prefer textual/visual names and hierarchical organization – Solution: Directories  Naming schemes: – Simple: One name space for entire disk w/ unique names – User-based: Each user has a single separate directory (TOPS-10) – Hierarchical: Tree-structured name space (modern OSes)

» Store directories as special files flagged as “directory file” » User programs can read directory like normal files » Only special system calls can modify directory files » Directory files contain <name, filedesc> pairs » Special “root” directory

slide-16
SLIDE 16

CS 5460: Operating Systems

Traversing Directories (Simplified)

 How do we locate file descriptor for “/foo/bar”? – Divide file name into components (e.g., “/”, “foo”, and “bar”). – Recursively descend directory hierarchy, at each step:

» Load file descriptor of “next” directory file » Use file descriptor info to locate and load directory file contents » Scan directory file for matching filename of next component » If match found à à extract file descriptor number from (name, filedesc) » If no match à à lookup failure

 How can we speed up this process? – Name cache

» Probe name cache for longest prefix contained in cache (e.g., “/foo”) » Start recursive descent using longest prefix as starting point

slide-17
SLIDE 17

CS 5460: Operating Systems

Finding a File’s Inode on Disk

Locate inode for /foo/bar:

  • 1. Find inode for “/”

– Always in known location

  • 2. Read “/” directory into memory
  • 3. Find “foo” entry

» If no match, fail lookup

  • 4. Load “foo” inode from disk
  • 5. Check permissions

» If no permission, fail lookup

  • 6. Load “foo” directory blocks
  • 7. Find “bar” entry

» If no match, fail lookup

  • 8. Load “bar” inode from disk
  • 9. Check permissions

» If no permission, fail lookup

foo inode# bar inode# “/” inode “foo” inode “bar” inode “foo” directory “/” directory Note: Pointers are block/inode numbers, not addresses!

slide-18
SLIDE 18

CS 5460: Operating Systems

Finding a File’s Blocks on Disk

 Conceptually, inode contains table: – One entry per block in file – Entry contains physical block address (e.g., platter 3, cylinder 1, sector 26) – To locate data at offset X, read block (X / block_size)  Issues à

à How do we physically implement this table?

– Most files are small – Most of the disk is contained in (relatively few) large files – Need to efficiently support both sequential and random access – Want simple inode lookup and management mechanisms

Block Address 0 Block Address 1 … Block Address N

slide-19
SLIDE 19

CS 5460: Operating Systems

File System Operation Details

 Create(name) – Check permissions / quota – Allocate disk space – Create file descriptor w/ name, location on disk, attributes – Add index to file descriptor in directory – Optional: file type (e.g., Word doc)

» Richer interface » More complicated implementation

 Delete(name) – Find directory containing file – Remove filedesc from directory – Free disk blocks used by file – Note: Wait until last user closes

slide-20
SLIDE 20

CS 5460: Operating Systems

File System Operation Details

 fid = Open(name, mode) – Check if file already open. If not:

» Find the file (via a name lookup) » Copy file descriptor into open file table

– Check protection à à abort operation if access not allowed – Increment open count in global open file table – Create per-process file table entry

» Add pointer to corresponding entry in system open file table » Initialize seek pointer to start of file

– Return per-process file table index  Close(fid) – Remove entry in per-process file table – Decrement open count in global open file table

» If 0, remove from open file table

slide-21
SLIDE 21

CS 5460: Operating Systems

File System Operation Details

 Read(fid, offset, size, buffer) – Random access – Reads “size” bytes from offset “from” file into “buffer”  Read(fid, size, buffer) – Sequential access – Reads “size” bytes from current seek offset into “buffer” – Increment current seek offset by number of bytes read – May read less bytes than were requested  Write(…) – Analogous to Read()  Seek(fid, offset) – Sets “seek offset” to specified offset

slide-22
SLIDE 22

Important from Today

 Key idea: Build hierarchical filesystem abstraction

  • n top of a flat array of blocks

 Filesystem goals – Reads, writes, file management operations must be fast – Efficient use of storage – Data is durable in face of OS crashes (and maybe disk crashes) – Implements OS’s security policy  Next time: How to implement a filesystem

CS 5460: Operating Systems