1 last time page table review virtual to physical translation - - PowerPoint PPT Presentation

1 last time
SMART_READER_LITE
LIVE PREVIEW

1 last time page table review virtual to physical translation - - PowerPoint PPT Presentation

1 last time page table review virtual to physical translation two-level page tables how xv6 manages page tables walkpgdir , mappages , etc. x86-32 page table format xv6 memory layout high memory for the kernel, mapping everything


slide-1
SLIDE 1

1

slide-2
SLIDE 2

last time

page table review

virtual to physical translation two-level page tables

how xv6 manages page tables

walkpgdir, mappages, etc. x86-32 page table format

xv6 memory layout

high memory for the kernel, mapping everything virtual-to-physical/phyiscal-to-virtual utility functions sbrk to determine end of user memory

page fault handling

2

slide-3
SLIDE 3

xv6 page faults (now)

fault from accessing page table entry marked ‘not-present’ xv6: prints an error and kills process:

*((int*) 0x800444) = 1; ... /* in trap.c: */ cprintf("pid ␣ %d ␣ %s: ␣ trap ␣ %d ␣ err ␣ %d ␣

  • n

␣ cpu ␣ %d ␣ " "eip ␣ 0x%x ␣ addr ␣ 0x%x--kill ␣ proc\n", myproc()−>pid, myproc()−>name, tf−>trapno, tf−>err, cpuid(), tf−>eip, rcr2()); myproc()−>killed = 1; pid 4 processname: trap 14 err 6 on cpu 0 eip 0x1a addr 0x800444−−kill proc

14 = T_PGFLT special register CR2 contains faulting address

3

slide-4
SLIDE 4

xv6 page faults (now)

fault from accessing page table entry marked ‘not-present’ xv6: prints an error and kills process:

*((int*) 0x800444) = 1; ... /* in trap.c: */ cprintf("pid ␣ %d ␣ %s: ␣ trap ␣ %d ␣ err ␣ %d ␣

  • n

␣ cpu ␣ %d ␣ " "eip ␣ 0x%x ␣ addr ␣ 0x%x--kill ␣ proc\n", myproc()−>pid, myproc()−>name, tf−>trapno, tf−>err, cpuid(), tf−>eip, rcr2()); myproc()−>killed = 1; pid 4 processname: trap 14 err 6 on cpu 0 eip 0x1a addr 0x800444−−kill proc

14 = T_PGFLT special register CR2 contains faulting address

3

slide-5
SLIDE 5

xv6 page faults (now)

fault from accessing page table entry marked ‘not-present’ xv6: prints an error and kills process:

*((int*) 0x800444) = 1; ... /* in trap.c: */ cprintf("pid ␣ %d ␣ %s: ␣ trap ␣ %d ␣ err ␣ %d ␣

  • n

␣ cpu ␣ %d ␣ " "eip ␣ 0x%x ␣ addr ␣ 0x%x--kill ␣ proc\n", myproc()−>pid, myproc()−>name, tf−>trapno, tf−>err, cpuid(), tf−>eip, rcr2()); myproc()−>killed = 1; pid 4 processname: trap 14 err 6 on cpu 0 eip 0x1a addr 0x800444−−kill proc

14 = T_PGFLT special register CR2 contains faulting address

3

slide-6
SLIDE 6

xv6: if one handled page faults

returning from page fault handler without killing process …retries the failing instruction can use to update the page table — “just in time”

if (tf−>trapno == T_PGFLT) { void *address = (void *) rcr2(); if (is_address_okay(myproc(), address)) { setup_page_table_entry_for(myproc(), address); // return from fault, retry access } else { // actual segfault, kill process cprintf("..."); myproc()−>killed = 1; } }

check process control block to see if access okay if so, setup the page table so it works next time i.e. immediately after returning from fault

4

slide-7
SLIDE 7

xv6: if one handled page faults

returning from page fault handler without killing process …retries the failing instruction can use to update the page table — “just in time”

if (tf−>trapno == T_PGFLT) { void *address = (void *) rcr2(); if (is_address_okay(myproc(), address)) { setup_page_table_entry_for(myproc(), address); // return from fault, retry access } else { // actual segfault, kill process cprintf("..."); myproc()−>killed = 1; } }

check process control block to see if access okay if so, setup the page table so it works next time i.e. immediately after returning from fault

4

slide-8
SLIDE 8

xv6: if one handled page faults

returning from page fault handler without killing process …retries the failing instruction can use to update the page table — “just in time”

if (tf−>trapno == T_PGFLT) { void *address = (void *) rcr2(); if (is_address_okay(myproc(), address)) { setup_page_table_entry_for(myproc(), address); // return from fault, retry access } else { // actual segfault, kill process cprintf("..."); myproc()−>killed = 1; } }

check process control block to see if access okay if so, setup the page table so it works next time i.e. immediately after returning from fault

4

slide-9
SLIDE 9

extra data structures needed

OSs can do all sorts of tricks with page tables …but more bookkeeping is required tracking what processes think they have in memory

since page table won’t tell the whole story OS will change page table

tracking how physical pages are used in page tables

multiple processes might want same data = same page

5

slide-10
SLIDE 10

space on demand

Used by OS Program Memory Stack Heap / other dynamic Writable data Code + Constants used stack space (12 KB) wasted space? (huge??) OS would like to allocate space only if needed

6

slide-11
SLIDE 11

space on demand

Used by OS Program Memory Stack Heap / other dynamic Writable data Code + Constants used stack space (12 KB) wasted space? (huge??) OS would like to allocate space only if needed

6

slide-12
SLIDE 12

space on demand

Used by OS Program Memory Stack Heap / other dynamic Writable data Code + Constants used stack space (12 KB) wasted space? (huge??) OS would like to allocate space only if needed

6

slide-13
SLIDE 13

allocating space on demand

... // requires more stack space A: pushq %rbx B: movq 8(%rcx), %rbx C: addq %rbx, %rax ...

%rsp = 0x7FFFC000 VPN valid? physical page … … … 0x7FFFB

  • 0x7FFFC

1 0x200DF 0x7FFFD 1 0x12340 0x7FFFE 1 0x12347 0x7FFFF 1 0x12345 … … … pushq triggers exception hardware says “accessing address 0x7FFFBFF8” OS looks up what’s should be there — “stack” page fault! in exception handler, OS allocates more stack space OS updates the page table then returns to retry the instruction restarted

7

slide-14
SLIDE 14

allocating space on demand

... // requires more stack space A: pushq %rbx B: movq 8(%rcx), %rbx C: addq %rbx, %rax ...

%rsp = 0x7FFFC000 VPN valid? physical page … … … 0x7FFFB

  • 0x7FFFC

1 0x200DF 0x7FFFD 1 0x12340 0x7FFFE 1 0x12347 0x7FFFF 1 0x12345 … … … pushq triggers exception hardware says “accessing address 0x7FFFBFF8” OS looks up what’s should be there — “stack” page fault! in exception handler, OS allocates more stack space OS updates the page table then returns to retry the instruction restarted

7

slide-15
SLIDE 15

allocating space on demand

... // requires more stack space A: pushq %rbx B: movq 8(%rcx), %rbx C: addq %rbx, %rax ...

%rsp = 0x7FFFC000 VPN valid? physical page … … … 0x7FFFB 1 0x200D8 0x7FFFC 1 0x200DF 0x7FFFD 1 0x12340 0x7FFFE 1 0x12347 0x7FFFF 1 0x12345 … … … pushq triggers exception hardware says “accessing address 0x7FFFBFF8” OS looks up what’s should be there — “stack” page fault! in exception handler, OS allocates more stack space OS updates the page table then returns to retry the instruction restarted

7

slide-16
SLIDE 16

xv6: adding space on demand

struct proc { uint sz; // Size of process memory (bytes) ... };

adding allocate on demand logic:

  • n page fault: if address ≥ sz

kill process — out of bounds

  • n page fault: if address < sz

fjnd virtual page number of address allocate page of memory, add to page table return from interrupt

8

slide-17
SLIDE 17

versus more complicated OSes

range of valid addresses is not just 0 to maximum need some more complicated data structure to represent will get to that later

9

slide-18
SLIDE 18

fast copies

recall : fork() creates a copy of an entire program! (usually, the copy then calls execve — replaces itself with another program) how isn’t this really slow?

10

slide-19
SLIDE 19

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

11

slide-20
SLIDE 20

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

11

slide-21
SLIDE 21

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

11

slide-22
SLIDE 22

trick for extra sharing

sharing writeable data is fjne — until either process modifjes the copy can we detect modifjcations? trick: tell CPU (via page table) shared part is read-only processor will trigger a fault when it’s written

12

slide-23
SLIDE 23

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 1 0x12345 0x00602 1 1 0x12347 0x00603 1 1 0x12340 0x00604 1 1 0x200DF 0x00605 1 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

13

slide-24
SLIDE 24

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

13

slide-25
SLIDE 25

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

13

slide-26
SLIDE 26

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 1 0x300FD … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

13

slide-27
SLIDE 27

copy-on write cases

trying to write forbidden page (e.g. kernel memory)

kill program instead of making it writable

trying to write read-only page and…

  • nly one page table entry refers to it

make it writeable return from fault

multiple process’s page table entries refer to it

copy the page replace read-only page table entry to point to copy return from fault

14

slide-28
SLIDE 28

mmap

Linux/Unix has a function to “map” a fjle to memory

int file = open("somefile.dat", O_RDWR); // data is region of memory that represents file char *data = mmap(..., file, 0); // read byte 6 from somefile.dat char seventh_char = data[6]; // modifies byte 100 of somefile.dat data[100] = 'x'; // can continue to use 'data' like an array

15

slide-29
SLIDE 29

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

16

slide-30
SLIDE 30

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

16

slide-31
SLIDE 31

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

16

slide-32
SLIDE 32

mmap options (2)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

flags, choose at least

MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write)

…along with additional fmags:

MAP_ANONYMOUS (not POSIX) — ignore fd, just allocate space … (and more not shown)

addr, suggestion about where to put mapping (may be ignored)

can pass NULL — “choose for me” address chosen will be returned

17

slide-33
SLIDE 33

mmap options (2)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

flags, choose at least

MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write)

…along with additional fmags:

MAP_ANONYMOUS (not POSIX) — ignore fd, just allocate space … (and more not shown)

addr, suggestion about where to put mapping (may be ignored)

can pass NULL — “choose for me” address chosen will be returned

17

slide-34
SLIDE 34

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

19

slide-35
SLIDE 35

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

19

slide-36
SLIDE 36

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

19

slide-37
SLIDE 37

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

19

slide-38
SLIDE 38

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

19

slide-39
SLIDE 39

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

20

slide-40
SLIDE 40

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-41
SLIDE 41

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-42
SLIDE 42

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-43
SLIDE 43

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-44
SLIDE 44

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-45
SLIDE 45

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-46
SLIDE 46

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

21

slide-47
SLIDE 47

shared mmap

int fd = open("/tmp/somefile.dat", O_RDWR); mmap(0, 64 * 1024, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

from /proc/PID/maps for this program:

7f93ad877000-7f93ad887000 rw-s 00000000 08:01 1839758 /tmp/somefile.dat 22

slide-48
SLIDE 48

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

23

slide-49
SLIDE 49

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

23

slide-50
SLIDE 50

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

23

slide-51
SLIDE 51

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

23

slide-52
SLIDE 52

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

24

slide-53
SLIDE 53

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

25

slide-54
SLIDE 54

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

25

slide-55
SLIDE 55

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

25

slide-56
SLIDE 56

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

25

slide-57
SLIDE 57

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

26

slide-58
SLIDE 58

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-59
SLIDE 59

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-60
SLIDE 60

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-61
SLIDE 61

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-62
SLIDE 62

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-63
SLIDE 63

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

27

slide-64
SLIDE 64

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

28

slide-65
SLIDE 65

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

29

slide-66
SLIDE 66

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

29

slide-67
SLIDE 67

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

29

slide-68
SLIDE 68

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

29

slide-69
SLIDE 69

swapping

historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD

  • nly need keep ‘currently active’ pages in physical memory

swapping mmap with “default” fjles to use

30

slide-70
SLIDE 70

swapping

historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD

  • nly need keep ‘currently active’ pages in physical memory

swapping ≈ mmap with “default” fjles to use

30

slide-71
SLIDE 71

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

31

slide-72
SLIDE 72

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

31

slide-73
SLIDE 73

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

31

slide-74
SLIDE 74

the page cache

memory is a cache for disk fjles, program memory has a place on disk

running low on memory? always have room on disk assumption: disk space approximately infjnite

physical memory pages: disk ‘temporarily’ kept in faster storage

possibly being used by one or more processes? possibly part of a fjle on disk? possibly both

goal: manage this cache intelligently

32

slide-75
SLIDE 75

the page cache

memory is a cache for disk fjles, program memory has a place on disk

running low on memory? always have room on disk assumption: disk space approximately infjnite

physical memory pages: disk ‘temporarily’ kept in faster storage

possibly being used by one or more processes? possibly part of a fjle on disk? possibly both

goal: manage this cache intelligently

32

slide-76
SLIDE 76

memory as a cache for disk

“cache block” ≈ physical page fully associative

any virtual address/fjle part can be stored in any physical page

replacement is managed by the OS normal cache hits happen without OS

common case that needs to be fast

33

slide-77
SLIDE 77

page cache components [text]

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

34

slide-78
SLIDE 78

page cache components

virtual address

(used by program)

fjle + ofgset

(for read()/write())

physical page

(if cached)

disk location

OS datastructure page table OS datastructure OS datastructure? OS datastructure

page usage

(recently used? etc.)

cache hit

OS lookup for read()/write() CPU lookup in page table

cache miss: OS looks up location on disk allocating a physical page choose page that’s not being used much might need to evict used page requires removing pointers to it need reverse mappings to fjnd pointers to remove

36