Fall 2014:: CSE 506:: Section 2 (PhD)
Virtual File System (VFS)
Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)
(VFS) Nima Honarmand (Based on slides by Don Porter and Mike - - PowerPoint PPT Presentation
Fall 2014:: CSE 506:: Section 2 (PhD) Virtual File System (VFS) Nima Honarmand (Based on slides by Don Porter and Mike Ferdman) Fall 2014:: CSE 506:: Section 2 (PhD) History Early OSes provided a single file system In general, system
Fall 2014:: CSE 506:: Section 2 (PhD)
Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)
Fall 2014:: CSE 506:: Section 2 (PhD)
– In general, system was tailored to target hardware
– Any guesses why? – Networked file systems
Fall 2014:: CSE 506:: Section 2 (PhD)
– Allows new features and designs transparent to apps – Interoperability with removable media and other OSes
– In-memory file systems (ramdisks) – Pseudo file systems used for configuration
Fall 2014:: CSE 506:: Section 2 (PhD)
VFS ext4 Page Cache Block Device IO Scheduler Driver Disk
Kernel User
btrfs fat32 nfs Network
Fall 2014:: CSE 506:: Section 2 (PhD)
– (POSIX file system calls – open, read, write, etc.)
– Remote FS can be transparently mounted (e.g., at /home)
– Much more trouble for the programmer
Fall 2014:: CSE 506:: Section 2 (PhD)
– not just an API wrapper
attributes)
– Coordinates data caching with the page cache
– path lookup – opening files – file handle management
Fall 2014:: CSE 506:: Section 2 (PhD)
– Implementing standard objects/functions called by the VFS
– And page cache functions – And some VFS helpers
Fall 2014:: CSE 506:: Section 2 (PhD)
(whether device, remote system, or other/none)
– Potentially includes requesting I/O
– More of a lowest common denominator
– More optimal media usage/scheduling – Varying on-disk consistency guarantees – Features (e.g., encryption, virus scanning, snapshotting)
Fall 2014:: CSE 506:: Section 2 (PhD)
– Early/many file systems put this as first block of partition
Fall 2014:: CSE 506:: Section 2 (PhD)
From Understanding Linux kernel, 3rd Ed
Fall 2014:: CSE 506:: Section 2 (PhD)
– Opaque pointer (s_fs_info) for FS-specific data
– Tasks such as creating or destroying inodes
– When there are multiple FSes (in today’s systems: always)
Fall 2014:: CSE 506:: Section 2 (PhD)
– Huge – more fields than we can talk about
– File attributes: permissions, size, modification time, etc. – File contents:
– Flags, including dirty inode and dirty data
Fall 2014:: CSE 506:: Section 2 (PhD)
– If you knew the file’s index number
– Thing of a portion of the disk as a big array of metadata
– virtual node (perhaps more appropriately) – Linux uses the name inode
Fall 2014:: CSE 506:: Section 2 (PhD)
struct myfs_inode { int ondisk_blocks[]; /* other stuff*/ struct inode vfs_inode; }
– Finding the low-level from inode is simple
Fall 2014:: CSE 506:: Section 2 (PhD)
– Does not change when renamed
– Count “1” for every reference on disk – Created by file names in a directory that point to the inode
– There is no ‘delete’ system call, only ‘unlink’
Fall 2014:: CSE 506:: Section 2 (PhD)
– Creates a new name for the same inode
– This is not a copy
– If an open file is unlinked, the directory entry is deleted
– Famous feature: rm on large open file when out of quota
Fall 2014:: CSE 506:: Section 2 (PhD)
– create (1 link) – open (1 link, 1 ref) – unlink (0 link) – File gets cleaned up when program dies
Fall 2014:: CSE 506:: Section 2 (PhD)
– String usually assumed to be a filename – Created with symlink() system call
– Completely – Doesn’t raise the link count of the file – Can be “broken,” or point to a missing file (just a string)
[myself@newcastle ~/tmp]% ln -s "silly example" mydata [myself@newcastle ~/tmp]% ls -l lrwxrwxrwx 1 myself mygroup 23 Oct 24 02:42 mydata -> silly example
Fall 2014:: CSE 506:: Section 2 (PhD)
– regular file, directory, pipe, device, socket, etc… – Unix: Everything’s a file! VFS involved even with sockets!
– 3 bits for each of User, Group, Other + 3 special bits – Bits: 2 = read, 1 = write, 0 = execute – Ex: 750 – User RWX, Group RX, Other nothing
– chmod has more pleasant syntax [ugs][+-][rwx]
Fall 2014:: CSE 506:: Section 2 (PhD)
– X-only allows to find readable subdirectories or files
– Program executes with owner’s UID – Crude form of permission delegation – Any examples?
Fall 2014:: CSE 506:: Section 2 (PhD)
– When I create a file, it is owned by my default group – When I create in a ‘g+s’ directory, directory group owns file
– Prevents non-owners from deleting or renaming files
Fall 2014:: CSE 506:: Section 2 (PhD)
– Each process has a table of pointers to them – The int fd returned by open is an offset into this table
– FS doesn’t track which process has a reference to a file
– Fork also copies the file handles
– If child reads from the handle, it advances (shared) cursor
Fall 2014:: CSE 506:: Section 2 (PhD)
– Creates 2 table entries for same file struct
– Back when files were on tape...
– E.g., CLOSE_ON_EXEC flag prevents inheritance on exec()
Fall 2014:: CSE 506:: Section 2 (PhD)
– These store:
– /, home, myuser, and vfs.pptx
– Although inode hooks on directories can populate them
– FS directory tree traversal very common
Fall 2014:: CSE 506:: Section 2 (PhD)
– Only “recently” accessed parts of dir are in memory
– dentries can be freed to reclaim memory (like pages)
– A hash table (for quick lookup) – A LRU list (for freeing cache space wisely) – A child list of subdirectories (mainly for freeing) – An alias list (to do reverse mapping of inode -> dentries)
Fall 2014:: CSE 506:: Section 2 (PhD)
– Map a human-readable path name to an inode
– Possibly create or truncate the file (O_CREAT, O_TRUNC) – Create a file struct
– Return descriptor
Fall 2014:: CSE 506:: Section 2 (PhD)
int open(char *path, int flags, int mode);
– Or (0 –errno) on failure
Fall 2014:: CSE 506:: Section 2 (PhD)
– Stored in current->fs->fs and current->fs>pwd – Specifically, these are dentry pointers (not strings)
– Some programs are ‘chroot jailed’ and should not be able to access anything outside of the directory
use to start searching (fs or pwd)
– An absolute path starts with the ‘/’ character (e.g., /lib/libc.so) – A relative path starts with anything else (e.g., ../vfs.pptx)
Fall 2014:: CSE 506:: Section 2 (PhD)
– Treat ‘/’ character as component delimiter – Each iteration looks up part of the path
– ‘home’, ‘myself’, then ‘foo’, starting at ‘/’
Fall 2014:: CSE 506:: Section 2 (PhD)
– Remember: dentry for / is stored in current->fs->fs
– Use permission() function pointer on inode
– Compute a hash value to find bucket in denry hash table
– Search the hash bucket to find entry for /home
Fall 2014:: CSE 506:: Section 2 (PhD)
– Call lookup() method on parent inode (provided by FS)
– If so, call inode->readlink() (also provided by FS)
– Then continue next iteration
– If not a directory and not last element, we have a bad path
Fall 2014:: CSE 506:: Section 2 (PhD)
– Search for foo
Fall 2014:: CSE 506:: Section 2 (PhD)
– Kernel gets in an infinite loop
– foo -> bar – bar -> baz – baz -> foo
– more than 40 symlinks resolved, or – more than 6 symlinks in a row without non-symlink
– Better than an infinite loop
Fall 2014:: CSE 506:: Section 2 (PhD)
– Map a human-readable path name to an inode
– Possibly create or truncate the file (O_CREAT, O_TRUNC) – Create a file descriptor
Fall 2014:: CSE 506:: Section 2 (PhD)
– Usually, if an item isn’t found, search returns an error
– If O_EXCL is not set, return existing dentry
– Make a new inode and dentry
– Avoid races in “if (!exist()) create(); open();”
Fall 2014:: CSE 506:: Section 2 (PhD)
– dentry pointer – cursor into the file – permissions (cache of inode’s value) – reference count (of the struct file object)
– If full, create a new table 2x the size and copies old one – Allocate a new file struct and put a pointer in table
Fall 2014:: CSE 506:: Section 2 (PhD)
int read(int fd, void *buf, size_t bytes);
Fall 2014:: CSE 506:: Section 2 (PhD)
How to…
– create() system call – More commonly, open() with the O_CREAT flag – What does O_EXCL do?
– mkdir()
Fall 2014:: CSE 506:: Section 2 (PhD)
How to…
– rmdir()
– unlink()
– read() – Use lseek() or pread() to change the cursor position
– readdir() or getdents()
Fall 2014:: CSE 506:: Section 2 (PhD)
– Create a temp file (using open) – Copy old to temp (using read old / write temp) – Apply writes to temp – Close both old and temp – Do a rename(temp, old) to atomically replace
Fall 2014:: CSE 506:: Section 2 (PhD)
– stat(file, &stat_buf) – if (stat & execute bit) color == green – else if … – Print file name – Reset color