File System Project Seminar On-Disk Layout Prof. Andreas Polze - - PowerPoint PPT Presentation

file system project seminar on disk layout
SMART_READER_LITE
LIVE PREVIEW

File System Project Seminar On-Disk Layout Prof. Andreas Polze - - PowerPoint PPT Presentation

File System Project Seminar On-Disk Layout Prof. Andreas Polze Andreas Grapentin, Sven Khler Max Plauth, Jossekin Beilharz, Felix Eberhardt Hasso Plattner Institute File System Seminar Overview program open readdir today Virtual File


slide-1
SLIDE 1

File System Project Seminar On-Disk Layout

  • Prof. Andreas Polze

Andreas Grapentin, Sven Köhler Max Plauth, Jossekin Beilharz, Felix Eberhardt Hasso Plattner Institute

slide-2
SLIDE 2

Block Buffer

File System Seminar Overview

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 2

  • pen

program

readdir

Virtual File System ext4 proc fs btrfs disk

today

slide-3
SLIDE 3

File System Seminar Tasks of A File System (Simplified)

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 3

A file system needs to be … Searchable □ resolve filename to metadata □ resolve filename to data (streams, forks) □ find the corresponding block to a given file position Modifiable □ find space to add new data □ find space to add new metadata □ mark bad blocks □ query existing free space

slide-4
SLIDE 4

1

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 4

Block Devices And Physics

disk

slide-5
SLIDE 5
  • 1. Block Devices

Cylinder-Head-Sector

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 5

For several decades continously spinning magnetic disks were the gold standard for secondary storage. Data is originally addressable by block-wise Cylinder-Head-Sector (CHS) tuples. To reduce movements of the head (arm), data is kept along cylinders first. Modern busses allow Logical Block Addressing (LBA) by linear numbers. block

slide-6
SLIDE 6
  • 1. Block Devices

Sector Vs. Block Vs. Cluster

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 6

Main overhead factors when accessing data: ■ latency (seek+rotational): How long to wait for the first byte? ■ throughput (transfer rate): How many bytes per second once started? In the latency time for one byte, several others can be transferred. Insight: Group bytes into blocks and even bigger ones on FS level. Multiple names: sector cluster physical block logical block device block file system block (within a track)

slide-7
SLIDE 7
  • 1. Block Devices

Cylinder Groups

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 7

Many file systems (UFS, ext, NTFS) use block groups to keep semantically connected data within one cylinder. Reduces head seeks and limits fragmentation within partition.

SB Meta Files SB Meta Files SB Meta Files

redundant superblock backups

slide-8
SLIDE 8
  • 1. Block Devices

Tracking Available (Free) Space

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 8

■ Blocks can be occupied. ■ Blocks can become bad (bit rot). ■ Likewise need to track available free inodes ■ Use a linked list: ■ Use a bitmap:

0x001 0x003 0x040 0xa00 0xab0 0xab1 next next 1110001010001010 0100010111010011 0101111011101001 0011100101110010

block #0 is occupied block #54 is free

slide-9
SLIDE 9

2

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 9

Files And Directories

slide-10
SLIDE 10
  • 2. Files And Directories

Storing Files – Overview

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 10

How to find files, if the inode/file descriptor is already at hand. Different methods to store files with different use cases exist: □ continuous allocation □ linked list – separated linked List (FAT) □ indexed block references – direct data □ extents

slide-11
SLIDE 11
  • 2. Files And Directories

Storing Files – Continuous Allocation

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 11

File system keeps track of free blocks. For a new file the necessary number of blocks is reserved and only a (start_block, size) tuple stored. Advantages Simple implementation No file fragmentation Very few seek times Disadvantages Growing files need expensive move High external fragmentation

File 1 File 3 File 2

free

slide-12
SLIDE 12
  • 2. Files And Directories

Storing Files –Linked List

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 12

The inode/file descriptor/metadata points to the first block. Each block contains data and a pointer to the next block in this file. Advantages Data can be distributed across device Files can be resized No external fragmentation Disadvantages “Odd” wasted space per data block High file fragmentation risk No random access High seek times

Inode data next data next data end

slide-13
SLIDE 13
  • 2. Files And Directories

Storing Files – Separated Linked List

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 13

The inode/file descriptor/metadata points to the first block. Data fills entire blocks. A separate table tracks for each block either its successor, or if it’s the last block. Also free and bad blocks can be tracked. Example: FAT

slide-14
SLIDE 14

e" ," "all" "

  • 2. Files And Directories

Storing Files – Indexed Block References

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 14

The inode contains a fixed number of block references. For files exceeding that number an additional block may be pointed at, containing an additional block references and so forth. Advantages Small files require small overhead Random access efficient for small files Sparse files possible Disadvantages Limits file size High file fragmentation risk Many seeks for random access on large files

slide-15
SLIDE 15

e" ," "all" "

  • 2. Files And Directories

Storing Files – Indexed Block References (Direct Data)

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 15

The inode repurposes the block references to store actual data. Commonly used for lock files, pid files and symbolic links. No more seeks required. e.g. for 32-bit block references: (11 + 3) * 32 bit = 14 * 4 byte = 56 bytes available

slide-16
SLIDE 16
  • 2. Files And Directories

Storing Files –Extents

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 16

Multiple large, contiguous area reserved for file ranges. Each extent is addressed by only a (first block, length) tuple. Each files extents can be stored as (linked) lists or trees. Advantages Very little overhead data required Limits file fragmentation Disadvantages Difficult to add extents on fragmented systems Copy-on-write for small changes is very expensive Require block buffer to allocate large areas on flush.

Inode Extent 1 Extent 2 Extent 3

slide-17
SLIDE 17
  • 2. Files And Directories

Directory Entry Structure

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 17

bin home usr var attributes attributes attributes attributes bin home usr var inode inode inode inode

attributes in directory entry (FAT) separated attributes (Unix)

slide-18
SLIDE 18
  • 2. Files And Directories

Directory Structure (Canonical)

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 18

ino ino ino ino name[MAX] name[MAX] name[MAX] name[MAX]

table of fixed length entries (Minix 1, FAT16)

ino

Linked list of variable sized entries (ext2, VFAT) Can still be contiguous on disk Fast unlink (ext2) marks ino as 0

rec_len name ino rec_len name ino rec_len name rec_len unused

<complexity find()?> O(n)

slide-19
SLIDE 19

3

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 19

B-Trees

14 21 7 10 19 23 25

slide-20
SLIDE 20
  • 3. B-Trees

Recap: Glossary Search Trees

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 20

19 10 7 14 23 21

■ node, root, leaf ■ parent, child, sibling ■ degree, depth, height ■ pre-order, in-order, post-order, level-order ■ full, complete, balanced ■ value vs. key, value-tuple ■ AVL-tree, Red-Black-Trees

<any term unfamiliar?>

slide-21
SLIDE 21
  • 3. B-Trees

Definition

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 21

14 21 7 10 19 23 25

B = 2 B = 3 ■ A B-Tree is a search tree for keys ■ All leaves have the same depth ■ Classified by a parameter B: □ B ≤ #children < 2 · B □ B – 1 ≤ #keys < 2 · B – 1 ■ (Keys within a node are sorted)

<valid #keys for B=4?>

slide-22
SLIDE 22
  • 3. B-Trees

Theory Vs. Reality: Complexity

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 22

19 10 7 14 23 21

O(log3n)

? ⊆ = ⊇

O(log2n)

O(f) = {g : N → R| ∃c > 0 ∃n0 ∈ N ∀n >= n0 : g(n) <= c · f(n)}

log3n = = c · log2n log23 log2n

14 21 7 10 19 23 25

slide-23
SLIDE 23
  • 3. B-Trees

Theory Vs. Reality: Number of Block Seek Operations

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 23

19 10 7 14 23 21

O(log3n) O(log2n) k = 1: log2(230) k = 2: log3(230) k = 1024: log1024(230) = 30 seeks = 18 seeks = 3 seeks

14 21 7 10 19 23 25

stored in level-order

slide-24
SLIDE 24
  • 3. B-Trees

Operations: Search

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 24

10 17 3 7 14 20 24 38 32 42 48 30

B = 2

<search 20>

slide-25
SLIDE 25
  • 3. B-Trees

Operations: Insert

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 25

B = 4

<insert 49>

9 56 10 17 38 49 21 30

<insert 42>

… …

42

(recursive up to and beyond root) Split:

9 56 10 17 38 42 21 30

… …

49

slide-26
SLIDE 26
  • 3. B-Trees

Operations: Search

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 26

10 17 3 7 14 20 24 38 32 42 48 30

B = 2

slide-27
SLIDE 27
  • 3. B-Trees

Operations: Deletion I

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 27

B = 4

<delete 42>

10 23 27 51 62 42 77 83 19 5

slide-28
SLIDE 28
  • 3. B-Trees

Operations: Deletion II

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 28

B = 4

10 99 23 27 51 62 42

… …

77 83

<delete 19>

19

Rotation (if filled siblings)

10 99 23 27 62 51

… …

77 83 42

slide-29
SLIDE 29
  • 3. B-Trees

Operations: Deletion III

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 29

B = 4

<delete 42>

10 99 23 27 62 51

… …

77 83 42

Merge (underflowing siblings)

10 23 27

51 99 62

77 83

<why still valid?>

slide-30
SLIDE 30

^D

Polze, Grapentin, Köhler Plauth, Beilharz, Eberhardt 14.11.2017 File System Seminar On-Disk Layout Chart 30

end