I/O 2 / Filesystems 1 1 Changelog Changes made in this version not - - PowerPoint PPT Presentation

i o 2 filesystems 1
SMART_READER_LITE
LIVE PREVIEW

I/O 2 / Filesystems 1 1 Changelog Changes made in this version not - - PowerPoint PPT Presentation

I/O 2 / Filesystems 1 1 Changelog Changes made in this version not seen in fjrst lecture: 13 November: Correct cluster number on FAT directory entry slide. 1 last time page replacement modifjcations for scanning Linux: guess fjle pages used


slide-1
SLIDE 1

I/O 2 / Filesystems 1

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

13 November: Correct cluster number on FAT directory entry slide.

1

slide-3
SLIDE 3

last time

page replacement modifjcations for scanning

Linux: guess fjle pages used once until multiple references (but non-fjle pages do actual LRU approximation)

readahead: proactive replacement

detect sequential access patterns try to keep slightly ahead of program scanning a fjle

device drivers

fjle (or block) interface — top half interrupt handling — bottom half

devices as magic memory

connected to same bus as memory

  • ften via bus adaptors — or chains of them

2

slide-4
SLIDE 4
  • n the homework (1)

yes, debugging more challenging than I expected what I did? — lots of cprintfs …including (virtual and physical) addresses and process IDs involved should be able to track intended state of page tables/physical pages try to make really simple test cases

minimize number of pages active

could also potentially use GDB

3

slide-5
SLIDE 5
  • n the homework (2)

anonymous feedback:

“This homework is the most frustrating homework we’ve had this semester. It’s like everything can go wrong and once it goes wrong it’s almost impossible to fjgure out what went wrong without knowing every possible interaction. This along with the number of things that can go wrong just makes it very

  • frustrating. At least with previous homeworks, we were able to learn from most
  • f our mistakes whereas here it’s like everything is guess and check. I think if

we had a checkpoint where we implemented only allocate on demand and another checkpoint where we implemented copy on write this process could have been better.”

next time — will split into checkpoint (but too late now)

4

slide-6
SLIDE 6
  • n the homework (3)

5

slide-7
SLIDE 7

connecting devices

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware?

0x80004800: 0x80004808: 0x80004810: …:

control registers have memory addresses looks like write to memory actually changes value in device controller control registers might not really be registers e.g. maybe writing to write? “control register” actually just sends the value the external hardware bufgers/queues will also have memory addresses way to send “please interrupt” signal component of processor decides when to handle (deals with ordering, interrupt disabling, which of several processors handles it, …, etc.)

7

slide-8
SLIDE 8

connecting devices

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware?

0x80004800: 0x80004808: 0x80004810: …:

control registers have memory addresses looks like write to memory actually changes value in device controller control registers might not really be registers e.g. maybe writing to write? “control register” actually just sends the value the external hardware bufgers/queues will also have memory addresses way to send “please interrupt” signal component of processor decides when to handle (deals with ordering, interrupt disabling, which of several processors handles it, …, etc.)

7

slide-9
SLIDE 9

connecting devices

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware?

0x80004800: 0x80004808: 0x80004810: …:

control registers have memory addresses looks like write to memory actually changes value in device controller control registers might not really be registers e.g. maybe writing to write? “control register” actually just sends the value the external hardware bufgers/queues will also have memory addresses way to send “please interrupt” signal component of processor decides when to handle (deals with ordering, interrupt disabling, which of several processors handles it, …, etc.)

7

slide-10
SLIDE 10

connecting devices

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware?

0x80004800: 0x80004808: 0x80004810: …:

control registers have memory addresses looks like write to memory actually changes value in device controller control registers might not really be registers e.g. maybe writing to write? “control register” actually just sends the value the external hardware bufgers/queues will also have memory addresses way to send “please interrupt” signal component of processor decides when to handle (deals with ordering, interrupt disabling, which of several processors handles it, …, etc.)

7

slide-11
SLIDE 11

connecting devices

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware?

0x80004800: 0x80004808: 0x80004810: …:

control registers have memory addresses looks like write to memory actually changes value in device controller control registers might not really be registers e.g. maybe writing to write? “control register” actually just sends the value the external hardware bufgers/queues will also have memory addresses way to send “please interrupt” signal component of processor decides when to handle (deals with ordering, interrupt disabling, which of several processors handles it, …, etc.)

7

slide-12
SLIDE 12

bus adaptors

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices
  • r
  • ther bus adaptors

bus adaptor

  • ther devices

device controller

status read? write? …

control registers

bufgers/queues

external hardware? difgerent bus

8

slide-13
SLIDE 13

devices as magic memory (1)

devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read

9

slide-14
SLIDE 14

devices as magic memory (1)

devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read

9

slide-15
SLIDE 15

devices as magic memory (1)

devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read

9

slide-16
SLIDE 16

device as magic memory (2)

example: display controller write to pixels to magic memory location — displayed on screen

  • ther memory locations control format/screen size

example: network interface write to bufgers write “send now” signal to magic memory location — send data read from “status” location, bufgers to receive

10

slide-17
SLIDE 17

what about caching?

caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? solution: OS can mark memory uncachable x86: bit in page table entry can say “no caching”

11

slide-18
SLIDE 18

what about caching?

caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? solution: OS can mark memory uncachable x86: bit in page table entry can say “no caching”

11

slide-19
SLIDE 19

what about caching?

caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? solution: OS can mark memory uncachable x86: bit in page table entry can say “no caching”

11

slide-20
SLIDE 20

aside: I/O space

x86 has a “I/O addresses” like memory addresses, but accessed with difgerent instruction

in and out instructions

historically — and sometimes still: separate I/O bus more recent processors/devices usually use memory addresses

no need for more instructions, buses always have layers of bus adaptors to handle compatibility issues

  • ther reasons to have devices and memory close (later)

12

slide-21
SLIDE 21

xv6 keyboard access

two control registers:

KBSTATP: status register (I/O address 0x64) KBDATAP: data bufger (I/O address 0x60)

st = inb(KBSTATP); // in instruction: read from I/O address if ((st & KBS_DIB) == 0) // bit KBS_DIB indicates data in buffer? return −1; data = inb(KBDATAP); // read from data --- *clears* buffer /* interpret data to learn what kind of keypress/release */

13

slide-22
SLIDE 22

programmed I/O

“programmed I/O”: write to or read from device controller bufgers directly OS runs loop to transfer data to or from device controller might still be triggered by interrupt

new data in bufger to read? device processed data previously written to bufger?

14

slide-23
SLIDE 23

direct memory access (DMA)

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

external hardware?

  • bservation: devices can read/write memory

can have device copy data to/from memory

15

slide-24
SLIDE 24

direct memory access (DMA)

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

external hardware?

  • bservation: devices can read/write memory

can have device copy data to/from memory

15

slide-25
SLIDE 25

direct memory access (DMA)

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

external hardware?

  • bservation: devices can read/write memory

can have device copy data to/from memory

15

slide-26
SLIDE 26

direct memory access (DMA)

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

external hardware?

  • bservation: devices can read/write memory

can have device copy data to/from memory

15

slide-27
SLIDE 27

direct memory access (DMA)

processor

interrupt controller memory bus

  • ther processors…

actual memory

  • ther devices

device controller

external hardware?

  • bservation: devices can read/write memory

can have device copy data to/from memory

15

slide-28
SLIDE 28

direct memoyr access (DMA)

  • bservation: devices can read/write memory

can have device copy data to/from memory much faster, e.g., for disk or network I/O avoids having processor run a loop allows device to use memory as very large bufger space allows device to read/write data as it needs/gets it

16

slide-29
SLIDE 29

direct memory access protocol

store address of bufger in memory OS needs to keep bufger around until device indicates it’s done end of transfer indicated via interrupt + control registers

17

slide-30
SLIDE 30

IOMMUs

typically, direct memory access requires using physical addresses

devices don’t have page tables need contiguous physical addresses (multiple pages if bufger >page size) devices that messes up can overwrite arbitrary memory

recent systems have an IO Memory Management Unit

pagetables for devices allows non-contiguous bufgers enforces protection — broken device can’t write wrong memory location helpful for virtual machines

18

slide-31
SLIDE 31

hard drive interfaces

hard drives and solid state disks are divided into sectors historically 512 bytes (larger on recent disks) disk commands:

read from sector i to sector j write from sector i to sector j this data

typically want to read/write more than sector— 4K+ at a time

19

slide-32
SLIDE 32

fjlesystems

fjlesystems: store hierarchy of directories on disk disk is a fmat list of blocks of data given a fjle (identifjed how?), where is its data?

which sectors? parts of sectors?

given a directory (identifjed how?), what fjles are in it? metadata: names, owner, permissions, size, …of fjle making a new fjle: where to put it? making a fjle/directory bigger: where does new data go?

20

slide-33
SLIDE 33

the FAT fjlesystem

FAT: File Allocation Table probably simplest widely used fjlesystem (family) named for important data structure: fjle allocation table

21

slide-34
SLIDE 34

FAT and sectors

FAT divides disk into clusters

composed of one or more sectors sector = minimum amount hardware can read

cluster: typically 512 to 4096 bytes a fjle’s data is stored in clusters reading a fjle: determine the list of clusters

22

slide-35
SLIDE 35

FAT: the fjle allocation table

big array on disk, one entry per cluster each entry contains a number — usually “next cluster”

cluster num. entry value 4 1 7 2 5 3 1434 … … 1000 4503 1001 1523 … …

23

slide-36
SLIDE 36

FAT: reading a fjle (1)

get (from elsewhere) fjrst cluster of data linked list of cluster numbers next pointers? fjle allocation table entry for cluster

special value for NULL

cluster num. entry value … … 10 14 11 23 12 54 13

  • 1 (end mark)

14 15 15 13 … … fjle starting at cluster 10 contains data in: cluster 10, then 14, then 15, then 13

24

slide-37
SLIDE 37

FAT: reading a fjle (2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 21 6 8 7 9 8

  • 1 (end mark) 9

14 10 23 11 54 12

  • 1 (end mark) 15

15 14 13 15 20 16 … … fjle allocation table

block 0 block 1 block 2 block 3 block 0 block 1 block 2

25

slide-38
SLIDE 38

FAT: reading a fjle (2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 21 6 8 7 9 8

  • 1 (end mark) 9

14 10 23 11 54 12

  • 1 (end mark) 15

15 14 13 15 20 16 … … fjle allocation table

block 0 block 1 block 2 block 3 block 0 block 1 block 2

25

slide-39
SLIDE 39

FAT: reading a fjle (2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 21 6 8 7 9 8

  • 1 (end mark) 9

14 10 23 11 54 12

  • 1 (end mark) 15

15 14 13 15 20 16 … … fjle allocation table

block 0 block 1 block 2 block 3 block 0 block 1 block 2

25

slide-40
SLIDE 40

FAT: reading fjles

to read a fjle given it’s start location read the starting cluster X get the next cluster Y from FAT entry X read the next cluster get the next cluster from FAT entry Y … until you see an end marker

26

slide-41
SLIDE 41

start locations?

really want fjlenames stored in directories! in FAT: directory is a list of: (name, starting location, other data about fjle)

27

slide-42
SLIDE 42

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

28

slide-43
SLIDE 43

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

28

slide-44
SLIDE 44

fjnding fjles with directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 10

cluster number the disk

dir pt 0 dir pt 1

fjle “index.html” starting at cluster 10, 12792 bytes fjle “assignments.html” starting at cluster 17, 4312 bytes … directory “examples” starting at cluster 20 unused entry … fjle “info.html” starting at cluster 50, 23789 bytes

index.html pt 0 index.html pt 1 index.html pt 2 index.html pt 3

(bytes 0-4095 of index.html) (bytes 4096-8191 of index.html) (bytes 8192-12287 of index.html) (bytes 12278-12792 of index.html) (unused bytes 12792-16384)

28

slide-45
SLIDE 45

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-46
SLIDE 46

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-47
SLIDE 47

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-48
SLIDE 48

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-49
SLIDE 49

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-50
SLIDE 50

FAT directory entry

box = 1 byte entry for README.TXT, 342 byte fjle, starting at cluster 0x104F4 'R' 'E' 'A' 'D' 'M' 'E' ' ␣' ' ␣' 'T' 'X' 'T' 0x00

fjlename + extension (README.TXT) attrs

directory? read-only? hidden? … 0x9C0xA10x200x7D0x3C0x7D0x3C0x010x000xEC0x620x76 creation date + time

(2010-03-29 04:05:03.56)

last access

(2010-03-29)

cluster # (high bits) last write

(2010-03-22 12:23:12)

0x3C0xF40x040x560x010x000x00 'F' 'O' 'O' …

last write con’t

cluster # (low bits) fjle size

(0x156 bytes)

next directory entry…

32-bit fjrst cluster number split into two parts (history: used to only be 16-bits) 8 character fjlename + 3 character extension longer fjlenames? encoded using extra directory entries (special attrs values to distinguish from normal entries) 8 character fjlename + 3 character extension history: used to be all that was supported attributes: is a subdirectory, read-only, … also marks directory entries used to hold extra fjlename data convention: if fjrst character is 0x0 or 0xE5 — unused 0x00: for fjlling empty space at end of directory 0xE5: ‘hole’ — e.g. from fjle deletion

29

slide-51
SLIDE 51

aside: FAT date encoding

seperate date and time fjelds (16 bits, little-endian integers) bits 0-4: seconds (divided by 2), 5-10: minute, 11-15: hour bits 0-4: day, 5-8: month, 9-15: year (minus 1980) sometimes extra fjeld for 100s(?) of a second

30

slide-52
SLIDE 52

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File sttribute uint8_t DIR_NTRes; // Set value to 0, never chnage this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word fo this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // 32-bit DWORD hoding this file's size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

31

slide-53
SLIDE 53

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File sttribute uint8_t DIR_NTRes; // Set value to 0, never chnage this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word fo this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // 32-bit DWORD hoding this file's size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

31

slide-54
SLIDE 54

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File sttribute uint8_t DIR_NTRes; // Set value to 0, never chnage this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word fo this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // 32-bit DWORD hoding this file's size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

31

slide-55
SLIDE 55

FAT directory entries (from C)

struct __attribute__((packed)) DirEntry { uint8_t DIR_Name[11]; // short name uint8_t DIR_Attr; // File sttribute uint8_t DIR_NTRes; // Set value to 0, never chnage this uint8_t DIR_CrtTimeTenth; // millisecond timestamp for file creation time uint16_t DIR_CrtTime; // time file was created uint16_t DIR_CrtDate; // date file was created uint16_t DIR_LstAccDate; // last access date uint16_t DIR_FstClusHI; // high word fo this entry's first cluster number uint16_t DIR_WrtTime; // time of last write uint16_t DIR_WrtDate; // dat eof last write uint16_t DIR_FstClusLO; // low word of this entry's first cluster number uint32_t DIR_FileSize; // 32-bit DWORD hoding this file's size in bytes };

GCC/Clang extension to disable padding normally compilers add padding to structs (to avoid splitting values across cache blocks or pages) 8/16/32-bit unsigned integer use exact size that’s on disk just copy byte-by-byte from disk to memory (and everything happens to be little-endian) why are the names so bad (“FstClusHI”, etc.)? comes from Microsoft’s documentation this way

31

slide-56
SLIDE 56

trees of directories

roothomeag8t cr4bd mst3k

32

slide-57
SLIDE 57

nested directories

foo/bar/baz/fjle.txt read root directory entries to fjnd foo read foo’s directory entries to fjnd bar read bar’s directory entries to fjnd baz read baz’s directory entries to fjnd fjle.txt

33

slide-58
SLIDE 58

the root directory?

but where is the fjrst directory?

34

slide-59
SLIDE 59

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data)… bytes per sector512 reserved sectors5 sectors per cluster4 …… total sectors4096 FAT size11 root directory cluster10 ……

fjlesystem header

FAT root directory starts here reserved sectors

35

slide-60
SLIDE 60

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data)… bytes per sector512 reserved sectors5 sectors per cluster4 …… total sectors4096 FAT size11 root directory cluster10 ……

fjlesystem header

FAT root directory starts here reserved sectors

35

slide-61
SLIDE 61

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data)… bytes per sector512 reserved sectors5 sectors per cluster4 …… total sectors4096 FAT size11 root directory cluster10 ……

fjlesystem header

FAT root directory starts here reserved sectors

35

slide-62
SLIDE 62

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data)… bytes per sector512 reserved sectors5 sectors per cluster4 …… total sectors4096 FAT size11 root directory cluster10 ……

fjlesystem header

FAT root directory starts here reserved sectors

35

slide-63
SLIDE 63

FAT disk header

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk

(OS startup data)… bytes per sector512 reserved sectors5 sectors per cluster4 …… total sectors4096 FAT size11 root directory cluster10 ……

fjlesystem header

FAT root directory starts here reserved sectors

35

slide-64
SLIDE 64

fjlesystem header

fjxed location near beginning of disk determines size of clusters, etc. tells where to fjnd FAT, root directory, etc.

36

slide-65
SLIDE 65

FAT header (C)

struct __attribute__((packed)) Fat32BPB { uint8_t BS_jmpBoot[3]; // jmp instr to boot code uint8_t BS_oemName[8]; // indicates what system formatted this field, default=MSWIN4.1 uint16_t BPB_BytsPerSec; // Count of bytes per sector uint8_t BPB_SecPerClus; // no.of sectors per allocation unit uint16_t BPB_RsvdSecCnt; // no.of reserved sectors in the resercved region of the volume starting at 1st sector uint8_t BPB_NumFATs; // The count of FAT datastructures on the volume uint16_t BPB_rootEntCnt; // Count of 32-byte entries in root dir, for FAT32 set to 0 uint16_t BPB_totSec16; // total sectors on the volume uint8_t BPB_media; // value of fixed media ....

37

slide-66
SLIDE 66

FAT: creating a fjle

add a directory entry choose clusters to store fjle data (how???) update FAT to link clusters together

38

slide-67
SLIDE 67

FAT: creating a fjle

add a directory entry choose clusters to store fjle data (how???) update FAT to link clusters together

38

slide-68
SLIDE 68

FAT: free clusters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 21 0 (free) 22

  • 1 (end)

23 0 (free) 24 35 25 48 26 0 (free) 27 … … fjle allocation table

39

slide-69
SLIDE 69

FAT: writing fjle data

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table

40

slide-70
SLIDE 70

FAT: replacing unused directory entry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table directory of new fjle “foo.txt”, cluster 11, size …, created … … unused entry“new.txt”, cluster 21, size … …

41

slide-71
SLIDE 71

FAT: extending directory

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

cluster number the disk entry value index … … 20 18 0 (free) 19

  • 1 (end mark)

20 0 (free) 22 21 0 (free) 24 22

  • 1 (end)

23 0 (free) -1 (end) 24 35 25 48 26 0 (free) 27 … … fjle allocation table directory of new fjle “foo.txt”, cluster 11, size …, created … … “quux.txt”, cluster 104, size …, created … “new.txt”, cluster 21, size …, created … unused entry unused entry unused entry …

42

slide-72
SLIDE 72

FAT: deleting fjles

reset FAT entries for fjle clusters to free (0) write “unused” character in fjlename for directory entry

maybe rewrite directory if that’ll save space?

43

slide-73
SLIDE 73

FAT pros and cons?

44

slide-74
SLIDE 74

why hard drives?

what fjlesystems were designed for currently most cost-efgective way to have a lot of online storage solid state drives (SSDs) imitate hard drive interfaces

45

slide-75
SLIDE 75

hard drives

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

platters

stack of fmat discs (only top visible) spins when operating

heads

read/write magnetic signals

  • n platter surfaces

arm

rotates to position heads

  • ver spinning platters

hard drive image: Wikimedia Commons / Evan-Amos

46

slide-76
SLIDE 76

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

47

slide-77
SLIDE 77

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

47

slide-78
SLIDE 78

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

47

slide-79
SLIDE 79

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

47

slide-80
SLIDE 80

sectors/cylinders/etc.

cylinder track sector? seek time — 5–10ms move heads to cylinder

faster for adjacent accesses

rotational latency — 2–8ms rotate platter to sector

depends on rotation speed faster for adjacent reads

transfer time — 50–100+MB/s actually read/write data

47

slide-81
SLIDE 81

POSIX: everything is a fjle

the fjle: one interface for

devices (terminals, printers, …) regular fjles on disk networking (sockets) local interprocess communication (pipes, sockets)

basic operations: open(), read(), write(), close()

48

slide-82
SLIDE 82

the fjle interface

  • pen before use

setup, access control happens here

byte-oriented

real device isn’t? operating system needs to hide that

explicit close

49

slide-83
SLIDE 83

the fjle interface

  • pen before use

setup, access control happens here

byte-oriented

real device isn’t? operating system needs to hide that

explicit close

49

slide-84
SLIDE 84

kernel bufgering (reads)

program

  • perating system

keyboard disk

keypress happens, read bufger: keyboard input waiting for program read char from terminal …via bufger read char from fjle read block of data from disk bufger: recently read data from disk …via bufger

50

slide-85
SLIDE 85

kernel bufgering (reads)

program

  • perating system

keyboard disk

keypress happens, read bufger: keyboard input waiting for program read char from terminal …via bufger read char from fjle read block of data from disk bufger: recently read data from disk …via bufger

50

slide-86
SLIDE 86

kernel bufgering (reads)

program

  • perating system

keyboard disk

keypress happens, read bufger: keyboard input waiting for program read char from terminal …via bufger read char from fjle read block of data from disk bufger: recently read data from disk …via bufger

50

slide-87
SLIDE 87

kernel bufgering (reads)

program

  • perating system

keyboard disk

keypress happens, read bufger: keyboard input waiting for program read char from terminal …via bufger read char from fjle read block of data from disk bufger: recently read data from disk …via bufger

50

slide-88
SLIDE 88

kernel bufgering (reads)

program

  • perating system

keyboard disk

keypress happens, read bufger: keyboard input waiting for program read char from terminal …via bufger read char from fjle read block of data from disk bufger: recently read data from disk …via bufger

50

slide-89
SLIDE 89

kernel bufgering (writes)

program

  • perating system

network disk

(when ready) send data bufger: output waiting for network print char to remote machine write char to fjle (when ready) write block of data from disk bufger: data waiting to be written on disk

51

slide-90
SLIDE 90

kernel bufgering (writes)

program

  • perating system

network disk

(when ready) send data bufger: output waiting for network print char to remote machine write char to fjle (when ready) write block of data from disk bufger: data waiting to be written on disk

51

slide-91
SLIDE 91

kernel bufgering (writes)

program

  • perating system

network disk

(when ready) send data bufger: output waiting for network print char to remote machine write char to fjle (when ready) write block of data from disk bufger: data waiting to be written on disk

51

slide-92
SLIDE 92

kernel bufgering (writes)

program

  • perating system

network disk

(when ready) send data bufger: output waiting for network print char to remote machine write char to fjle (when ready) write block of data from disk bufger: data waiting to be written on disk

51

slide-93
SLIDE 93

kernel bufgering (writes)

program

  • perating system

network disk

(when ready) send data bufger: output waiting for network print char to remote machine write char to fjle (when ready) write block of data from disk bufger: data waiting to be written on disk

51

slide-94
SLIDE 94

read/write operations

read/write: move data into/out of bufger block (make process wait) if bufger is empty (read)/full (write)

(default behavior, possibly changeable)

actual I/O operations — wait for device to be ready

trigger process to stop waiting if needed

52

slide-95
SLIDE 95

layering

application standard library system calls kernel’s fjle interface device drivers hardware interfaces

kernel’s bufgers read/write cout/printf — and their own bufgers

53