FAT cont / HDDs/ SSDs / inodes 1 Changelog Changes made in this - - PowerPoint PPT Presentation

fat con t hdds ssds inodes
SMART_READER_LITE
LIVE PREVIEW

FAT cont / HDDs/ SSDs / inodes 1 Changelog Changes made in this - - PowerPoint PPT Presentation

FAT cont / HDDs/ SSDs / inodes 1 Changelog Changes made in this version not seen in fjrst lecture: 28 March 2019: SSD block remapping: fjx some animation issues 28 March 2019: xv6 disk layout: add note re: specialness of some block numbers


slide-1
SLIDE 1

FAT con’t / HDDs/ SSDs / inodes

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

28 March 2019: SSD block remapping: fjx some animation issues 28 March 2019: xv6 disk layout: add note re: specialness of some block numbers earlier 28 March 2019: xv6 inode: direct and indirect blocks: fjx label on indirect block 8 May 2019: xv6 fjle sizes: correct calculation

1

slide-3
SLIDE 3

last time

kernel level device driver interface devices as magic memory top and bottom half of device drivers

part from syscall/etc. (‘top’) part form interrupt handler (‘bottom’)

programmed I/O versus direct memory access (DMA)

DMA = device talks to main memory directly programmed I/O = OS read/write bufger on controller

FAT fjlesystem

disk as series of clusters (1+ sectors) fjles: linked list of clusters fjle allocation table: next pointers for list directories = fjle w/ list of name + start cluster number

2

slide-4
SLIDE 4

paging/protection checkpoint grading

initially grade didn’t detect some guard page issues/not handling fork now corrected 1 point adjustments (downward, sorry) from fjrst posting

3

slide-5
SLIDE 5

start locations?

really want fjlenames stored in directories! in FAT: directory is a fjle, but its data is list of: (name, starting location, other data about fjle)

5

slide-6
SLIDE 6

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

6

slide-7
SLIDE 7

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

6

slide-8
SLIDE 8

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

6

slide-9
SLIDE 9

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

6

slide-10
SLIDE 10

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-11
SLIDE 11

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-12
SLIDE 12

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-13
SLIDE 13

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-14
SLIDE 14

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-15
SLIDE 15

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

7

slide-16
SLIDE 16

aside: FAT date encoding

seperate date and time fjelds (16 bits, little-endian integers) bits 0-4: seconds (divided by 2), 5-10: minute, 11-15: hour bits 0-4: day, 5-8: month, 9-15: year (minus 1980) sometimes extra fjeld for 100s(?) of a second

8

slide-17
SLIDE 17

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File attribute uint8_t DIR_NTRes; // set value to 0, never change this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word of this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // file size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

9

slide-18
SLIDE 18

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File attribute uint8_t DIR_NTRes; // set value to 0, never change this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word of this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // file size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

9

slide-19
SLIDE 19

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File attribute uint8_t DIR_NTRes; // set value to 0, never change this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word of this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // file size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

9

slide-20
SLIDE 20

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File attribute uint8_t DIR_NTRes; // set value to 0, never change this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word of this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // file size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

9

slide-21
SLIDE 21

nested directories

foo/bar/baz/fjle.txt read root directory entries to fjnd foo read foo’s directory entries to fjnd bar read bar’s directory entries to fjnd baz read baz’s directory entries to fjnd fjle.txt

10

slide-22
SLIDE 22

the root directory?

but where is the fjrst directory?

11

slide-23
SLIDE 23

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-24
SLIDE 24

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-25
SLIDE 25

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-26
SLIDE 26

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-27
SLIDE 27

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-28
SLIDE 28

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data) … bytes per sector 512 reserved sectors 5 sectors per cluster 4 … … total sectors 4096 FAT size 11 Number of FATs 2 root directory cluster 10 … …

fjlesystem header

FAT backup FAT root directory starts here reserved sectors

12

slide-29
SLIDE 29

fjlesystem header

fjxed location near beginning of disk determines size of clusters, etc. tells where to fjnd FAT, root directory, etc.

13

slide-30
SLIDE 30

FAT header (C)

struct __attribute__((packed)) Fat32BPB { uint8_t BS_jmpBoot[3]; // jmp instr to boot code uint8_t BS_oemName[8]; // indicates what system formatted this field, default=MSWIN4.1 uint16_t BPB_BytsPerSec; // count of bytes per sector uint8_t BPB_SecPerClus; // no.of sectors per allocation unit uint16_t BPB_RsvdSecCnt; // no.of reserved sectors in the reserved region of the volume starting at 1st sector uint8_t BPB_NumFATs; // count of FAT datastructures on the volume uint16_t BPB_rootEntCnt; // count of 32-byte entries in root dir, for FAT32 set to 0 uint16_t BPB_totSec16; // total sectors on the volume uint8_t BPB_media; // value of fixed media .... uint16_t BPB_ExtFlags; // flags indicating which FATs are active

size of sector (in bytes) and size of cluster (in sectors) space before fjle allocation table number of copies of fjle allocation table extra copies in case disk is damaged typically two with writes made to both

14

slide-31
SLIDE 31

FAT header (C)

struct __attribute__((packed)) Fat32BPB { uint8_t BS_jmpBoot[3]; // jmp instr to boot code uint8_t BS_oemName[8]; // indicates what system formatted this field, default=MSWIN4.1 uint16_t BPB_BytsPerSec; // count of bytes per sector uint8_t BPB_SecPerClus; // no.of sectors per allocation unit uint16_t BPB_RsvdSecCnt; // no.of reserved sectors in the reserved region of the volume starting at 1st sector uint8_t BPB_NumFATs; // count of FAT datastructures on the volume uint16_t BPB_rootEntCnt; // count of 32-byte entries in root dir, for FAT32 set to 0 uint16_t BPB_totSec16; // total sectors on the volume uint8_t BPB_media; // value of fixed media .... uint16_t BPB_ExtFlags; // flags indicating which FATs are active

size of sector (in bytes) and size of cluster (in sectors) space before fjle allocation table number of copies of fjle allocation table extra copies in case disk is damaged typically two with writes made to both

14

slide-32
SLIDE 32

FAT header (C)

struct __attribute__((packed)) Fat32BPB { uint8_t BS_jmpBoot[3]; // jmp instr to boot code uint8_t BS_oemName[8]; // indicates what system formatted this field, default=MSWIN4.1 uint16_t BPB_BytsPerSec; // count of bytes per sector uint8_t BPB_SecPerClus; // no.of sectors per allocation unit uint16_t BPB_RsvdSecCnt; // no.of reserved sectors in the reserved region of the volume starting at 1st sector uint8_t BPB_NumFATs; // count of FAT datastructures on the volume uint16_t BPB_rootEntCnt; // count of 32-byte entries in root dir, for FAT32 set to 0 uint16_t BPB_totSec16; // total sectors on the volume uint8_t BPB_media; // value of fixed media .... uint16_t BPB_ExtFlags; // flags indicating which FATs are active

size of sector (in bytes) and size of cluster (in sectors) space before fjle allocation table number of copies of fjle allocation table extra copies in case disk is damaged typically two with writes made to both

14

slide-33
SLIDE 33

FAT header (C)

struct __attribute__((packed)) Fat32BPB { uint8_t BS_jmpBoot[3]; // jmp instr to boot code uint8_t BS_oemName[8]; // indicates what system formatted this field, default=MSWIN4.1 uint16_t BPB_BytsPerSec; // count of bytes per sector uint8_t BPB_SecPerClus; // no.of sectors per allocation unit uint16_t BPB_RsvdSecCnt; // no.of reserved sectors in the reserved region of the volume starting at 1st sector uint8_t BPB_NumFATs; // count of FAT datastructures on the volume uint16_t BPB_rootEntCnt; // count of 32-byte entries in root dir, for FAT32 set to 0 uint16_t BPB_totSec16; // total sectors on the volume uint8_t BPB_media; // value of fixed media .... uint16_t BPB_ExtFlags; // flags indicating which FATs are active

size of sector (in bytes) and size of cluster (in sectors) space before fjle allocation table number of copies of fjle allocation table extra copies in case disk is damaged typically two with writes made to both

14

slide-34
SLIDE 34

FAT: creating a fjle

add a directory entry choose clusters to store fjle data (how???) update FAT to link clusters together

15

slide-35
SLIDE 35

FAT: creating a fjle

add a directory entry choose clusters to store fjle data (how???) update FAT to link clusters together

15

slide-36
SLIDE 36

FAT: free clusters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 21 0 (free) 22

  • 1 (end)

23 0 (free) 24 35 25 48 26 0 (free) 27 … … fjle allocation table

16

slide-37
SLIDE 37

FAT: writing fjle data

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table

17

slide-38
SLIDE 38

FAT: replacing unused directory entry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table directory of new fjle “foo.txt”, cluster 11, size …, created … … unused entry“new.txt”, cluster 21, size … … directory’s data

18

slide-39
SLIDE 39

FAT: extending directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table directory of new fjle “foo.txt”, cluster 11, size …, created … … “quux.txt”, cluster 104, size …, created … directory’s data (fjrst cluster) “new.txt”, cluster 21, size …, created … unused entry unused entry unused entry … directory’s data (new second cluster)

19

slide-40
SLIDE 40

FAT: deleting fjles

reset FAT entries for fjle clusters to free (0) write “unused” character in fjlename for directory entry

maybe rewrite directory if that’ll save space?

20

slide-41
SLIDE 41

FAT pros and cons?

21

slide-42
SLIDE 42

hard drive operation/performance

22

slide-43
SLIDE 43

why hard drives?

what fjlesystems were designed for currently most cost-efgective way to have a lot of online storage solid state drives (SSDs) imitate hard drive interfaces

23

slide-44
SLIDE 44

hard drives

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

platters

stack of fmat discs (only top visible) spins when operating

heads

read/write magnetic signals

  • n platter surfaces

arm

rotates to position heads

  • ver spinning platters

hard drive image: Wikimedia Commons / Evan-Amos

24

slide-45
SLIDE 45

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

25

slide-46
SLIDE 46

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

25

slide-47
SLIDE 47

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

25

slide-48
SLIDE 48

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

25

slide-49
SLIDE 49

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

25

slide-50
SLIDE 50

disk latency components

queue time — how long read waits in line?

depends on number of reads at a time, scheduling strategy

disk controller/etc. processing time seek time — head to cylinder rotational latency — platter rotate to sector transfer time

26

slide-51
SLIDE 51

cylinders and latency

cylinders closer to edge of disk are faster (maybe) less rotational latency

27

slide-52
SLIDE 52

sector numbers

historically: OS knew cylinder/head/track location now: opaque sector numbers

more fmexible for hard drive makers same interface for SSDs, etc.

typical pattern: low sector numbers = closer to center typical pattern: adjacent sector numbers = adjacent on disk actual mapping: decided by disk controller

28

slide-53
SLIDE 53

OS to disk interface

disk takes read/write requests

sector number(s) location of data for sector modern disk controllers: typically direct memory access

can have queue of pending requests disk processes them in some order

OS can say “write X before Y”

29

slide-54
SLIDE 54

hard disks are unreliable

Google study (2007), heavily utilized cheap disks 1.7% to 8.6% annualized failure rate

varies with age ≈ chance a disk fails each year disk fails = needs to be replaced

9% of working disks had reallocated sectors

30

slide-55
SLIDE 55

bad sectors

modern disk controllers do sector remapping part of physical disk becomes bad — use a difgerent one this is expected behavior maintain mapping (special part of disk, probably)

31

slide-56
SLIDE 56

error correcting codes

disk store 0s/1s magnetically

very, very, very small and fragile

magnetic signals can fade over time/be damaged/interfere/etc. but use error detecting+correcting codes

details? CS/ECE 4434 covers this

error detecting — can tell OS “don’t have data”

result: data corruption is very rare data loss much more common

error correcting codes — extra copies to fjx problems

  • nly works if not too many bits damaged

32

slide-57
SLIDE 57

queuing requests

recall: multiple active requests queue of reads/writes

in disk controller and/or OS

disk is faster for adjacent/close-by reads/writes

less seek time/rotational latency

33

slide-58
SLIDE 58

disk scheduling

schedule I/O to the disk schedule = decide what read/write to do next

by OS: what to request from disk next? by controller: which OS request to do next?

typical goals: minimize seek time don’t starve requiests

34

slide-59
SLIDE 59

disk scheduling

schedule I/O to the disk schedule = decide what read/write to do next

by OS: what to request from disk next? by controller: which OS request to do next?

typical goals: minimize seek time don’t starve requiests

34

slide-60
SLIDE 60

shortest seek time fjrst

time = disk I/O request disk head inside of disk

  • utside of disk

some requests starved potentially forever if enough other reads missing consideration: rotational latency modifjcation called shortest positioning time fjrst

35

slide-61
SLIDE 61

shortest seek time fjrst

time = disk I/O request disk head inside of disk

  • utside of disk

some requests starved potentially forever if enough other reads missing consideration: rotational latency modifjcation called shortest positioning time fjrst

35

slide-62
SLIDE 62

shortest seek time fjrst

time = disk I/O request disk head inside of disk

  • utside of disk

some requests starved potentially forever if enough other reads missing consideration: rotational latency modifjcation called shortest positioning time fjrst

35

slide-63
SLIDE 63

shortest seek time fjrst

time = disk I/O request disk head inside of disk

  • utside of disk

some requests starved potentially forever if enough other reads missing consideration: rotational latency modifjcation called shortest positioning time fjrst

35

slide-64
SLIDE 64

disk scheduling

schedule I/O to the disk schedule = decide what read/write to do next

by OS: what to request from disk next? by controller: which OS request to do next?

typical goals: minimize seek time don’t starve requiests

36

slide-65
SLIDE 65
  • ne idea: SCAN

time = disk I/O request disk head inside of disk

  • utside of disk

37

slide-66
SLIDE 66

another idea: C-SCAN (C=circular)

time = disk I/O request disk head inside of disk

  • utside of disk

scan in single direction maybe more fair than SCAN (doesn’t favor middle of disk) maybe disk has fast way of ‘resetting’ head to outside?

38

slide-67
SLIDE 67

another idea: C-SCAN (C=circular)

time = disk I/O request disk head inside of disk

  • utside of disk

scan in single direction maybe more fair than SCAN (doesn’t favor middle of disk) maybe disk has fast way of ‘resetting’ head to outside?

38

slide-68
SLIDE 68

another idea: C-SCAN (C=circular)

time = disk I/O request disk head inside of disk

  • utside of disk

scan in single direction maybe more fair than SCAN (doesn’t favor middle of disk) maybe disk has fast way of ‘resetting’ head to outside?

38

slide-69
SLIDE 69

some disk scheduling algorithms (text)

SSTF: take request with shortest seek time next

subject to starvation — stuck on one side of disk could also take into account rotational latency — yields SPTF

shortest positioning time fjrst

SCAN/elevator: move disk head towards center, then away

let requests pile up between passes limits starvation; good overall throughput

C-SCAN: take next request closer to center of disk (if any)

take requests when moving from outside of disk to inside let requests pile up between passes limits starvation; good overall throughput

39

slide-70
SLIDE 70

caching in the controller

controller often has a DRAM cache can hold things controller thinks OS might read

e.g. sectors ‘near’ recently read sectors helps hide sector remapping costs?

can hold data waiting to be written

makes writes a lot faster problem for reliability

40

slide-71
SLIDE 71

disk performance and fjlesystems

fjlesystem can… do contiguous or nearby reads/writes

bunch of consecutive sectors much faster to read nearby sectors have lower seek/rotational delay

start a lot of reads/writes at once

avoid reading something to fjnd out what to read next array of sectors better than linked list

41

slide-72
SLIDE 72

solid state disk architecture

controller

(includes CPU)

RAM

NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip NAND fmash chip

42

slide-73
SLIDE 73

fmash

no moving parts

no seek time, rotational latency

can read in sector-like sizes (“pages”) (e.g. 4KB or 16KB) write once between erasures erasure only in large erasure blocks (often 256KB to megabytes!) can only rewrite blocks order tens of thousands of times

after that, fmash starts failing

43

slide-74
SLIDE 74

SSDs: fmash as disk

SSDs: implement hard disk interface for NAND fmash

read/write sectors at a time sectors much smaller than erasure blocks sectors sometimes smaller than fmash ‘pages’ read/write with use sector numbers, not addresses queue of read/writes

need to hide erasure blocks

trick: block remapping — move where sectors are in fmash

need to hide limit on number of erases

trick: wear levening — spread writes out

44

slide-75
SLIDE 75

block remapping

being written

Flash Translation Layer

logical physical 93 1 260 … … 31 74 32 75 … …

remapping table

OS sector numbers fmash locations

pages 0–63 pages 64–127 pages 128–191 pages 192-255 pages 256-319 pages 320-383

pages 128–191 pages 192–255 pages 256–319

erased block can only erase whole “erasure block”

“garbage collection” (free up new space)

copied from erased

active data erased + ready-to-write unused (rewritten elsewhere)

read sector write sector

45

slide-76
SLIDE 76

block remapping

being written

Flash Translation Layer

logical physical 93 1 260 … … 31 74 32 75 … …

remapping table

OS sector numbers fmash locations

pages 0–63 pages 64–127 pages 128–191 pages 192-255 pages 256-319 pages 320-383

pages 128–191 pages 192–255 pages 256–319

erased block can only erase whole “erasure block”

“garbage collection” (free up new space)

copied from erased

active data erased + ready-to-write unused (rewritten elsewhere)

read sector write sector

45

slide-77
SLIDE 77

block remapping

being written

Flash Translation Layer

logical physical 93 1 260 … … 31 74 32 75 … …

remapping table

OS sector numbers fmash locations

pages 0–63 pages 64–127 pages 128–191 pages 192-255 pages 256-319 pages 320-383

pages 128–191 pages 192–255 pages 256–319

erased block can only erase whole “erasure block”

“garbage collection” (free up new space)

copied from erased

active data erased + ready-to-write unused (rewritten elsewhere)

read sector 31 write sector

45

slide-78
SLIDE 78

block remapping

being written

Flash Translation Layer

logical physical 93 1 260 … … 31 74 32 75 163 … …

remapping table

OS sector numbers fmash locations

pages 0–63 pages 64–127 pages 128–191 pages 192-255 pages 256-319 pages 320-383

pages 128–191 pages 192–255 pages 256–319

erased block can only erase whole “erasure block”

“garbage collection” (free up new space)

copied from erased

active data erased + ready-to-write unused (rewritten elsewhere)

read sector write sector 32

45

slide-79
SLIDE 79

block remapping

being written

Flash Translation Layer

logical physical 93 1 260 187 … … 31 74 32 75 163 … …

remapping table

OS sector numbers fmash locations

pages 0–63 pages 64–127 pages 128–191 pages 192-255 pages 256-319 pages 320-383

pages 128–191 pages 192–255 pages 256–319

erased block can only erase whole “erasure block”

“garbage collection” (free up new space)

copied from erased

active data erased + ready-to-write unused (rewritten elsewhere)

read sector write sector

45

slide-80
SLIDE 80

block remapping

controller contains mapping: sector → location in fmash

  • n write: write sector to new location

eventually do garbage collection of sectors

if erasure block contains some replaced sectors and some current sectors… copy current blocks to new locationt to reclaim space from replaced sectors

doing this effjciently is very complicated SSDs sometimes have a ‘real’ processor for this purpose

46

slide-81
SLIDE 81

SSD performance

reads/writes: sub-millisecond contiguous blocks don’t really matter can depend a lot on the controller

faster/slower ways to handle block remapping

writing can be slower, especially when almost full

controller may need to move data around to free up erasure blocks erasing an erasure block is pretty slow (milliseconds?)

47

slide-82
SLIDE 82

extra SSD operations

SSDs sometimes implement non-HDD operations

  • n operation: TRIM

way for OS to mark sectors as unused/erase them SSD can remove sectors from block map

more effjcient than zeroing blocks frees up more space for writing new blocks

48

slide-83
SLIDE 83

aside: future storage

emerging non-volatile memories… slower than DRAM (“normal memory”) faster than SSDs read/write interface like DRAM but persistent capacities similar to/larger than DRAM

49

slide-84
SLIDE 84

xv6 fjlesystem

xv6’s fjlesystem similar to modern Unix fjlesytems better at doing contiguous reads than FAT better at handling crashes supports hard links (more on these later) divides disk into blocks instead of clusters fjle block numbers, free blocks, etc. in difgerent tables

50

slide-85
SLIDE 85

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks inode size logstart inodestart bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-86
SLIDE 86

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks ninodes inode size ←logstart ←inodestart ←bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-87
SLIDE 87

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks inode size logstart inodestart bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-88
SLIDE 88

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks inode size logstart inodestart bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-89
SLIDE 89

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks inode size logstart inodestart bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-90
SLIDE 90

xv6 disk layout

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

block number the disk

(boot block) super block log inode array free block map data blocks

superblock — “header”

struct superblock { uint size; // Size of file system image (blocks) uint nblocks; // # of data blocks uint ninodes; // # of inodes uint nlog; // # of log blocks uint logstart; // block # of first log block uint inodestart; // block # of first inode block uint bmapstart; // block # of first free map block };

nblocks inode size logstart inodestart bmapstart

inode — fjle information

struct dinode { short type; // File type // T_DIR, T_FILE, T_DEV short major; short minor; // T_DEV only short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses };

location of data as block numbers: e.g. addrs[0] = 11; addrs[1] = 14; special case for larger fjles free block map — 1 bit per data block 1 if available, 0 if used allocating blocks: scan for 1 bits contiguous 1s — contigous blocks what about fjnding free inodes xv6 solution: scan for type = 0 typical Unix solution: separate free inode map

51

slide-91
SLIDE 91

xv6 directory entries

struct dirent { ushort inum; char name[DIRSIZ]; };

inum — index into inode array on disk name — name of fjle or directory each directory reference to inode called a hard link

multiple hard links to fjle allowed!

52

slide-92
SLIDE 92

xv6 allocating inodes/blocks

need new inode or data block: linear search simplest solution: xv6 always takes the fjrst one that’s free

53

slide-93
SLIDE 93

xv6 inode: direct and indirect blocks

addrs[0] addrs[1] … addrs[11] addrs[12]

addrs

data blocks

indirect block of direct blocks

54

slide-94
SLIDE 94

xv6 fjle sizes

512 byte blocks 2-byte block pointers: 256 block pointers in the indirect block 256 blocks = 131072 bytes of data referenced 12 direct blocks @ 512 bytes each = 6144 bytes 1 indirect block @ 131072 bytes each = 131072 bytes maximum fjle size

55

slide-95
SLIDE 95

Linux ext2 inode

struct ext2_inode { __le16 i_mode; /* File mode */ __le16 i_uid; /* Low 16 bits of Owner Uid */ __le32 i_size; /* Size in bytes */ __le32 i_atime; /* Access time */ __le32 i_ctime; /* Creation time */ __le32 i_mtime; /* Modification time */ __le32 i_dtime; /* Deletion Time */ __le16 i_gid; /* Low 16 bits of Group Id */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_flags; /* File flags */ ... __le32 i_block[EXT2_N_BLOCKS]; /* Pointers to blocks */ ... };

type (regular, directory, device) and permissions (read/write/execute for owner/group/others)

  • wner and group

whole bunch of times similar pointers like xv6 FS — but more indirection

56

slide-96
SLIDE 96

Linux ext2 inode

struct ext2_inode { __le16 i_mode; /* File mode */ __le16 i_uid; /* Low 16 bits of Owner Uid */ __le32 i_size; /* Size in bytes */ __le32 i_atime; /* Access time */ __le32 i_ctime; /* Creation time */ __le32 i_mtime; /* Modification time */ __le32 i_dtime; /* Deletion Time */ __le16 i_gid; /* Low 16 bits of Group Id */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_flags; /* File flags */ ... __le32 i_block[EXT2_N_BLOCKS]; /* Pointers to blocks */ ... };

type (regular, directory, device) and permissions (read/write/execute for owner/group/others)

  • wner and group

whole bunch of times similar pointers like xv6 FS — but more indirection

56

slide-97
SLIDE 97

Linux ext2 inode

struct ext2_inode { __le16 i_mode; /* File mode */ __le16 i_uid; /* Low 16 bits of Owner Uid */ __le32 i_size; /* Size in bytes */ __le32 i_atime; /* Access time */ __le32 i_ctime; /* Creation time */ __le32 i_mtime; /* Modification time */ __le32 i_dtime; /* Deletion Time */ __le16 i_gid; /* Low 16 bits of Group Id */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_flags; /* File flags */ ... __le32 i_block[EXT2_N_BLOCKS]; /* Pointers to blocks */ ... };

type (regular, directory, device) and permissions (read/write/execute for owner/group/others)

  • wner and group

whole bunch of times similar pointers like xv6 FS — but more indirection

56

slide-98
SLIDE 98

Linux ext2 inode

struct ext2_inode { __le16 i_mode; /* File mode */ __le16 i_uid; /* Low 16 bits of Owner Uid */ __le32 i_size; /* Size in bytes */ __le32 i_atime; /* Access time */ __le32 i_ctime; /* Creation time */ __le32 i_mtime; /* Modification time */ __le32 i_dtime; /* Deletion Time */ __le16 i_gid; /* Low 16 bits of Group Id */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_flags; /* File flags */ ... __le32 i_block[EXT2_N_BLOCKS]; /* Pointers to blocks */ ... };

type (regular, directory, device) and permissions (read/write/execute for owner/group/others)

  • wner and group

whole bunch of times similar pointers like xv6 FS — but more indirection

56

slide-99
SLIDE 99

Linux ext2 inode

struct ext2_inode { __le16 i_mode; /* File mode */ __le16 i_uid; /* Low 16 bits of Owner Uid */ __le32 i_size; /* Size in bytes */ __le32 i_atime; /* Access time */ __le32 i_ctime; /* Creation time */ __le32 i_mtime; /* Modification time */ __le32 i_dtime; /* Deletion Time */ __le16 i_gid; /* Low 16 bits of Group Id */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_flags; /* File flags */ ... __le32 i_block[EXT2_N_BLOCKS]; /* Pointers to blocks */ ... };

type (regular, directory, device) and permissions (read/write/execute for owner/group/others)

  • wner and group

whole bunch of times similar pointers like xv6 FS — but more indirection

56

slide-100
SLIDE 100

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-101
SLIDE 101

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-102
SLIDE 102

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-103
SLIDE 103

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-104
SLIDE 104

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-105
SLIDE 105

double/triple indirect

i_block[0] i_block[1] i_block[2] i_block[3] i_block[4] i_block[5] i_block[6] i_block[7] i_block[8] i_block[9] i_block[10] i_block[11] i_block[12] i_block[13] i_block[14]

… … … … … …

block pointers blocks of block pointers data blocks 12 direct pointers indirect pointer double-indirect pointer triple-indirect pointer

57

slide-106
SLIDE 106

ext2 indirect blocks

12 direct block pointers 1 indirect block pointer

pointer to block containing more direct block pointers

1 double indirect block pointer

pointer to block containing more indirect block pointers

1 triple indirect block pointer

pointer to block containing more double indirect block pointers

exercise: if 1K blocks, 4 byte block pointers, how big can a fjle be?

58

slide-107
SLIDE 107

ext2 indirect blocks

12 direct block pointers 1 indirect block pointer

pointer to block containing more direct block pointers

1 double indirect block pointer

pointer to block containing more indirect block pointers

1 triple indirect block pointer

pointer to block containing more double indirect block pointers

exercise: if 1K blocks, 4 byte block pointers, how big can a fjle be?

58

slide-108
SLIDE 108

indirect block advantages

small fjles: all direct blocks + no extra space beyond inode larger fjles — more indirection

fjle should be large enough to hide extra indirection cost

(log N)-like time to fjnd block for particular ofgset

no linear search like FAT

59

slide-109
SLIDE 109

backup slides

60

slide-110
SLIDE 110

ways to talk to I/O devices

user program read/write/mmap/etc. fjle interface

regular fjles fjlesystems device fjles device drivers

61

slide-111
SLIDE 111

devices as fjles

talking to device? open/read/write/close typically similar interface within the kernel device driver implements the fjle interface

62

slide-112
SLIDE 112

example device fjles from a Linux desktop

/dev/snd/pcmC0D0p — audio playback

confjgure, then write audio data

/dev/sda, /dev/sdb — SATA-based SSD and hard drive

usually access via fjlesystem, but can mmap/read/write directly

/dev/input/event3, /dev/input/event10 — mouse and keyboard

can read list of keypress/mouse movement/etc. events

/dev/dri/renderD128 — builtin graphics

DRI = direct rendering infrastructure

63

slide-113
SLIDE 113

devices: extra operations?

read/write/mmap not enough?

audio output device — set format of audio? terminal — whether to echo back what user types? CD/DVD — open the disk tray? is a disk present? …

extra POSIX fjle descriptor operations:

ioctl (general I/O control) tcget/setaddr (for terminal settings) fcntl …

64

slide-114
SLIDE 114

FAT scattered data

fjle data and metadata scattered throughout disk

directory entry many places in fjle allocation table

slow to fjnd location of kth cluster of fjle

fjrst read FAT entries for clusters 0 to k − 1

need to scan FAT to allocate new blocks all not good for contiguous reads/writes

65

slide-115
SLIDE 115

FAT in practice

typically keep entire fjle alocation table in memory still pretty slow to fjnd kth cluster of fjle

66