Goals for Today Learning Objective: Inspect Linux Disk Scheduling - - PowerPoint PPT Presentation

goals for today
SMART_READER_LITE
LIVE PREVIEW

Goals for Today Learning Objective: Inspect Linux Disk Scheduling - - PowerPoint PPT Presentation

Goals for Today Learning Objective: Inspect Linux Disk Scheduling Algorithms Survey concepts in File Systems Design Announcements, etc: MP3 is now available for download on Compass! DUE APRIL 15th (15 days from now)


slide-1
SLIDE 1

CS 423: Operating Systems Design 1

Goals for Today

Reminder: Please put away devices at the start of class

  • Learning Objective:
  • Inspect Linux Disk Scheduling Algorithms
  • Survey concepts in File Systems Design
  • Announcements, etc:
  • MP3 is now available for download on Compass!
  • DUE APRIL 15th (15 days from now)
  • MP2.5 (optional) available! Details on Piazza. Contact me by April 8th if

you would like to participate.

slide-2
SLIDE 2

CS 423: Operating Systems Design

Professor Adam Bates

CS 423
 Operating System Design: File Systems in Practice

slide-3
SLIDE 3

CS 423: Operating Systems Design

Symbolic Links

3

■ Symbolic links are different than regular links (often

called hard links). Created with ln -s

■ Can be thought of as a directory entry that points to

the name of another file.

■ Does not change link count for file

When original deleted, symbolic link remains

■ They exist because:

Hard links don’t work across file systems

Hard links only work for regular files, not directories

Hard link(s) Symbolic Link Contents of file Contents of file direct direct direct symlink

slide-4
SLIDE 4

CS 423: Operating Systems Design

Linked Files

4

File header points to 1st block on disk

Each block points to next

Pros

Can grow files dynamically

Free list is similar to a file

Cons

random access: horrible

unreliable: losing a block means losing the rest File header null

. . .

slide-5
SLIDE 5

CS 423: Operating Systems Design

Linked Allocation

5

slide-6
SLIDE 6

CS 423: Operating Systems Design

Indexed File Allocation

6

Link full index blocks together using last entry.

slide-7
SLIDE 7

CS 423: Operating Systems Design

Multilevel Indexed Files

7

Multiple levels of index blocks

slide-8
SLIDE 8

CS 423: Operating Systems Design

File Systems In Practice

8

FAT Berkeley FFS (Unix FS) NTFS Index structure Linked list Tree (fixed, assym) Tree (dynamic) granularity block block extent free space allocaCon FAT array Bitmap (fixed locaCon) Bitmap (file) Locality defragmentaCon Block groups + reserve space Extents Best fit defrag

slide-9
SLIDE 9

CS 423: Operating Systems Design

MS File Allocation Table (FAT)

9

■ Linked list index structure ■ Simple, easy to implement ■ Still widely used (e.g., thumb drives) ■ File table: ■ Linear map of all blocks on disk ■ Each file a linked list of blocks

slide-10
SLIDE 10

CS 423: Operating Systems Design

MS File Allocation Table (FAT)

10

fjle 9 block 3 fjle 9 block 0 fjle 9 block 1 fjle 9 block 2 fjle 12 block 0 fjle 12 block 1 fjle 9 block 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

MFT Data Blocks

slide-11
SLIDE 11

CS 423: Operating Systems Design

MS File Allocation Table (FAT)

11

■ Pros: ■ Easy to find free block ■ Easy to append to a file ■ Easy to delete a file ■ Cons: ■ Small file access is slow ■ Random access is very slow ■ Fragmentation ■ File blocks for a given file may be scattered ■ Files in the same directory may be scattered ■ Problem becomes worse as disk fills

slide-12
SLIDE 12

CS 423: Operating Systems Design

Berkeley FFS / UNIX FS

12

■ “Fast File System” ■ inode table ■ Analogous to FAT table ■ inode ■ Metadata ■ File owner, access permissions, access times, … ■ Set of 12 data pointers ■ With 4KB blocks => max size of 48KB files ■ Indirect block pointers ■ pointer to disk block of data pointers ■ w/ indirect blocks, we can point to 1K data blocks => 4MB (+48KB) ■ … but why stop there??

slide-13
SLIDE 13

CS 423: Operating Systems Design

Berkeley FFS / UNIX FS

13

■ Doubly indirect block pointer ■ w/ doubly indirect blocks, we can point to

1K indirect blocks

■ => 4GB (+ 4MB + 48KB) ■ Triply indirect block pointer ■ w/ triply indirect blocks, we can point to 1K

doubly indirect blocks

■ 4TB (+ 4GB + 4MB + 48KB)

slide-14
SLIDE 14

CS 423: Operating Systems Design 14 File position R/W Pointer to inode File position R/W Pointer to inode

Mode Link Count UID GID File size Times Address of first 10 disk blocks

Single Indirect Double Indirect

Triple Indirect inode Open file description Parent File descriptor table Child File descri ptor table Unrelated process File descriptor table

14

Berkeley FFS / UNIX FS

slide-15
SLIDE 15

CS 423: Operating Systems Design 15 Inode Array

File Metadata Indirect Pointer

  • Dbl. Indirect Ptr.
  • Tripl. Indirect Ptr.

Inode Data Blocks Indirect Blocks Double Indirect Blocks Triple Indirect Blocks

DP Direct Pointer DP DP DP DP DP DP DP DP DP Direct Pointer

Alternate figure, same basic idea

Berkeley FFS / UNIX FS

slide-16
SLIDE 16

CS 423: Operating Systems Design

Berkeley FFS Asym. Trees

16

■ Indirection has a cost. Only use if needed! ■ Small files: shallow tree ■ Efficient storage for small files ■ Large files: deep tree ■ Efficient lookup for random access in

large files

■ Sparse files: only fill pointers if needed

slide-17
SLIDE 17

CS 423: Operating Systems Design

Berkeley FFS Locality

17

■ How does FFS provide locality? ■ Block group allocation ■ Block group is a set of nearby cylinders ■ Files in same directory located in same group ■ Subdirectories located in different block groups ■ inode table spread throughout disk ■ inodes, bitmap near file blocks ■ First fit allocation

■ Property: Small files may be a little fragmented, but large

files will be contiguous

slide-18
SLIDE 18

CS 423: Operating Systems Design

Berkeley FFS Locality

18

F r e e S p a c e B i t m a p F r e e S p a c e B i t m a p I n

  • d

e s D a t a B l

  • c

k s f

  • r

fj l e s i n d i r e c t

  • r

i e s / b , / a / g , / z F r e e S p a c e B i t m a p I n

  • d

e s D a t a B l

  • c

k s f

  • r

fj l e s i n d i r e c t

  • r

i e s / d / q , / c , a n d / a / p

Block Group 0 Block Group 1 Block Group 2

I n

  • d

e s D a t a B l

  • c

k s f

  • r

fj l e s i n d i r e c t

  • r

i e s / a , / d , a n d / b / c

slide-19
SLIDE 19

CS 423: Operating Systems Design

Berkeley FFS Locality

19

■ How does FFS provide locality? ■ Block group allocation ■ Block group is a set of nearby cylinders ■ Files in same directory located in same group ■ Subdirectories located in different block groups ■ inode table spread throughout disk ■ inodes, bitmap near file blocks ■ First fit allocation ■ Property: Small files may be a little fragmented, but large

files will be contiguous

slide-20
SLIDE 20

CS 423: Operating Systems Design

Berkeley FFS Locality

20

“First Fit” Block Allocation:

...

In-Use Block Start of Block Group Free Block

slide-21
SLIDE 21

CS 423: Operating Systems Design

Berkeley FFS Locality

21

“First Fit” Block Allocation:

...

Start of Block Group Write Two Block File

slide-22
SLIDE 22

CS 423: Operating Systems Design

Berkeley FFS Locality

22

“First Fit” Block Allocation:

...

Start of Block Group Write Large File

slide-23
SLIDE 23

CS 423: Operating Systems Design

Berkeley FFS / UNIX FS

23 ■ Pros ■ Efficient storage for both small and large files ■ Locality for both small and large files ■ Locality for metadata and data ■ Cons ■ Inefficient for tiny files (a 1 byte file requires both an inode

and a data block)

■ Inefficient encoding when file is mostly contiguous on disk

(no equivalent to superpages)

■ Need to reserve 10-20% of free space to prevent

fragmentation

slide-24
SLIDE 24

CS 423: Operating Systems Design

NTFS

24

■ “New Technology File System” for Windows NT

Released in ’93

Incidentally, a big step forward for security in commodity OS’

■ Master File Table ■ Flexible 1KB storage for metadata and data ■ Extents ■ Block pointers cover runs of blocks ■ Similar approach in linux (ext4) ■ File create can provide hint as to size of file ■ Journalling for reliability

slide-25
SLIDE 25

CS 423: Operating Systems Design 25

NTFS: Small File

  • Std. Info.

File Name Data (resident) (free)

MFT Record (small fjle) Master File Table

Tiny file? Store in MFT record!

slide-26
SLIDE 26

CS 423: Operating Systems Design 26

NTFS: ‘Normal’ File

MFT MFT Record

Start Length Start Length

  • Std. Info.

File Name (free) Data (nonresident) Data Extent Data Extent

Normal-sized file? store pointers to data extents!

slide-27
SLIDE 27

CS 423: Operating Systems Design 27

NTFS: Indirect Blocks

MFT MFT Record (part 2)

  • Std. Info.

(free) Data (nonresident) MFT Record (part 1)

  • Std. Info.

Attr.list Data (nonresident) File Name

Data Extent Data Extent Data Extent Data Extent Data Extent

Bigger file? Store pointer to additional MFT records!

slide-28
SLIDE 28

CS 423: Operating Systems Design

NTFS

28

■ Problems? ■ This looks like indexed file allocation (plus extents) ■ Linked MTR records may lack locality on disk!

slide-29
SLIDE 29

CS 423: Operating Systems Design 29

NTFS

MFT Record (normal file) MFT

  • Std. Info.

Data (nonresident)

MFT Record (small file)

  • Std. Info.

Data (resident)

MFT Record (big/fragmented file)

  • Std. Info.

Attr.list Data (nonresident) Data (nonresident) Data (nonresident) Data (nonresident)

MFT MFT Record (huge/badly-fragmented file)

  • Std. Info.

Attr.list (nonresident) Data (nonresident) Data (nonresident) Data (nonresident) Data (nonresident) Extent with part of attribute list Extent with part of attribute list Data (nonresident)

Defragmentation still required…

slide-30
SLIDE 30

CS 423: Operating Systems Design

Meanwhile, in Linux land…

30

■ The ext family of filesystems leverage many of the

same concepts.

■ ext (’92): introduces VFS support, 2GB max FS size ■ ext2 (’93): introduces attributes and symbolic

links, max file size is 2 GB and 2 TB FS, reserved disk space for root

■ ext3 (’01): introduces journaling, supports 2^32

blocks (up to max file of 2 TB, FS of 32 TB)

■ ext4 (’08): 2^48 block addressing, extent support