Virtual Memory 2 1 Changelog Changes made in this version not seen - - PowerPoint PPT Presentation

virtual memory 2
SMART_READER_LITE
LIVE PREVIEW

Virtual Memory 2 1 Changelog Changes made in this version not seen - - PowerPoint PPT Presentation

Virtual Memory 2 1 Changelog Changes made in this version not seen in fjrst lecture: 23 October: mapped pages (no backing fjle): fjx end of animation to have page on disk 23 October: separate out discussion of readahead from other reasons why


slide-1
SLIDE 1

Virtual Memory 2

1

slide-2
SLIDE 2

Changelog

Changes made in this version not seen in fjrst lecture:

23 October: mapped pages (no backing fjle): fjx end of animation to have page on disk 23 October: separate out discussion of readahead from other reasons why hit rate not performance

1

slide-3
SLIDE 3

exam notes

exam graded some things I regret on semaphore, pipe question

semaphore queue — difgerent shared bufger than text specifjed piping — should have used other than 1 to init, no newline in printf

regrades requests available…

2

slide-4
SLIDE 4

last time (1)

page tables

mapping from program (‘visible’) to physical (‘real’) addresses memory divided into fjxed-sized chunks called pages table: for each program page, what’s its physical page?

page table entries: physical page and permission bits

accessible in user-mode or not? writeable or not?

two-level page tables

too many virtual pages to store entire list table of tables fjrst bits indicate entry in fjrst table (= loc of second) second bits in second table; last bits in actual physical page

3

slide-5
SLIDE 5

last time (2)

xv6 kernel memory layout

kernel-only space (PTE’s marked ‘fault if in user mode’) at top mapping for kernel space: physical page 0, 1, 2, …, same in every process

xv6 page manipulation utility functions

fjnding location of last-level PTE — array lookups for each level of page table

allocating new pages

actually allocate physical page if needed, create second-level page table set page table entry to point to physical page

exec’ing processes: allocate pages, copy from executable

4

slide-6
SLIDE 6

last time (3)

page table says “not valid”? page fault

OS get to run return from fault handler? reruns instruction

allocate on demand

  • n page fault, allocate new page

set page table entry, return program runs as if it was allocated

copy-on-write

  • n fault for read-only page, make a copy

set page table entry, return program runs as if it had copy all the time

5

slide-7
SLIDE 7

toy virtual and physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

6

slide-8
SLIDE 8

toy virtual and physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

6

slide-9
SLIDE 9

toy virtual and physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

6

slide-10
SLIDE 10

toy virtual and physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

6

slide-11
SLIDE 11

toy virtual and physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

6

slide-12
SLIDE 12

xv6 memory layout

virtual memory 0xFFFF FFFF 0x8000 0000 0x0000 0000 physical memory

  • nly accessible in kernel

same in every process

7

slide-13
SLIDE 13

fast copies

recall : fork() creates a copy of an entire program! (usually, the copy then calls execve — replaces itself with another program) how isn’t this really slow?

8

slide-14
SLIDE 14

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

9

slide-15
SLIDE 15

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

9

slide-16
SLIDE 16

do we really need a complete copy?

Used by OS bash Stack Heap / other dynamic Writable data Code + Constants Used by OS new copy of bash Stack Heap / other dynamic Writable data Code + Constants shared as read-only can’t be shared?

9

slide-17
SLIDE 17

trick for extra sharing

sharing writeable data is fjne — until either process modifjes the copy can we detect modifjcations? trick: tell CPU (via page table) shared part is read-only processor will trigger a fault when it’s written

10

slide-18
SLIDE 18

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 1 0x12345 0x00602 1 1 0x12347 0x00603 1 1 0x12340 0x00604 1 1 0x200DF 0x00605 1 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

11

slide-19
SLIDE 19

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

11

slide-20
SLIDE 20

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

11

slide-21
SLIDE 21

copy-on-write and page tables

VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 0x200AF … … … … VPN valid? write?physical page … … … … 0x00601 1 0x12345 0x00602 1 0x12347 0x00603 1 0x12340 0x00604 1 0x200DF 0x00605 1 1 0x300FD … … … …

copy operation actually duplicates page table both processes share all physical pages but marks pages in both copies as read-only when either process tries to write read-only page triggers a fault — OS actually copies the page after allocating a copy, OS reruns the write instruction

11

slide-22
SLIDE 22

VM assignment

allocate heap on demand

  • n page fault — check address

need to write page fault handler if okay, allocate page

copy-on-write

change fork to keep same page, make read-only

  • n page fault for write — copy + allocate new page

track number of references to each page

12

slide-23
SLIDE 23

VM assignment restrictions

don’t handle page faults triggered by kernel

means system calls will break if passed uninit’d/copy-on-write memory

copy-on-write only reason for read-only pages

detect copy-on-write via write-only

13

slide-24
SLIDE 24

homework operations

add page fault handler change growuvm (heap allocation function) to only change sz edit page fault handler to allocate on demand add reference counts for each physical copy-on-write page do marking read-only on fork (in copyuvm) handle deallocating with reference counts (several places)

14

slide-25
SLIDE 25

xv6: adding space on demand

struct proc { uint sz; // Size of process memory (bytes) ... };

adding allocate on demand logic:

  • n page fault: if address > sz

kill process — out of bounds

  • n page fault: if address ≤ sz

fjnd virtual page number of address allocate page of memory, add to page table return from interrupt

15

slide-26
SLIDE 26

versus more complicated OSes

range of valid addresses is not just 0 to maximum need some more complicated data structure to represent will get to that later

16

slide-27
SLIDE 27

copy-on write cases

trying to write forbidden page (e.g. kernel memory)

kill program instead of making it writable

trying to write read-only page and…

  • nly one page table entry refers to it

make it writeable return from fault

multiple process’s page table entries refer to it

copy the page replace read-only page table entry to point to copy return from fault

17

slide-28
SLIDE 28

mmap

Linux/Unix has a function to “map” a fjle to memory

int file = open("somefile.dat", O_RDWR); // data is region of memory that represents file char *data = mmap(..., file, 0); // read byte 6 from somefile.dat char seventh_char = data[6]; // modifies byte 100 of somefile.dat data[100] = 'x'; // can continue to use 'data' like an array

18

slide-29
SLIDE 29

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

19

slide-30
SLIDE 30

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

19

slide-31
SLIDE 31

mmap options (1)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

length bytes from open fjle fd starting at byte offset protection fmags prot, bitwise or together 1 or more of:

PROT_READ PROT_WRITE PROT_EXEC PROT_NONE (for forcing segfaults)

19

slide-32
SLIDE 32

mmap options (2)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

flags, choose at least

MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write)

…along with additional fmags:

MAP_ANONYMOUS (not POSIX) — ignore fd, just allocate space … (and more not shown)

addr, suggestion about where to put mapping (may be ignored)

can pass NULL — “choose for me” address chosen will be returned

20

slide-33
SLIDE 33

mmap options (2)

#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

flags, choose at least

MAP_SHARED — changing memory changes fjle and vice-versa MAP_PRIVATE — make a copy of data in fjle (using copy-on-write)

…along with additional fmags:

MAP_ANONYMOUS (not POSIX) — ignore fd, just allocate space … (and more not shown)

addr, suggestion about where to put mapping (may be ignored)

can pass NULL — “choose for me” address chosen will be returned

20

slide-34
SLIDE 34

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

22

slide-35
SLIDE 35

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

22

slide-36
SLIDE 36

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

22

slide-37
SLIDE 37

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

22

slide-38
SLIDE 38

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

22

slide-39
SLIDE 39

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

23

slide-40
SLIDE 40

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-41
SLIDE 41

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-42
SLIDE 42

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-43
SLIDE 43

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-44
SLIDE 44

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-45
SLIDE 45

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-46
SLIDE 46

mapped pages (read-only)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD initially — all invalid? (could also prefjll entries…) read from second page? page fault PF handler: fjnd cached page update page table, retry read from fjrst page? page fault PF handler: no cached page fjrst read in page PF handler: read in page now point to page

24

slide-47
SLIDE 47

shared mmap

int fd = open("/tmp/somefile.dat", O_RDWR); mmap(0, 64 * 1024, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

from /proc/PID/maps for this program:

7f93ad877000-7f93ad887000 rw-s 00000000 08:01 1839758 /tmp/somefile.dat 25

slide-48
SLIDE 48

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

26

slide-49
SLIDE 49

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

26

slide-50
SLIDE 50

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

26

slide-51
SLIDE 51

mapped pages (read/write, shared)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD write to page? update cached fjle data data on disk out of date eventually free memory… write update to disk

26

slide-52
SLIDE 52

knowing when to write to disk?

need a dirty bit per page (“was page modifjed”)

D bit on PTEs we’ve seen

x86: kept in the page table!

  • ption 1 (most common): hardware sets dirty bit in page table

entry (on write)

bit means “physical page was modifjed using this PTE”

  • ption 2: OS sets page read-only, fmips read-only+dirty bit on fault

27

slide-53
SLIDE 53

multiple dirty bits?

what if a page is in multiple page tables? each page table has a dirty bit… check all of them to decide if it was modifjed

28

slide-54
SLIDE 54

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

29

slide-55
SLIDE 55

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

30

slide-56
SLIDE 56

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

30

slide-57
SLIDE 57

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

30

slide-58
SLIDE 58

mapped pages (copy-on-write)

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data on disk/SSD reads like before write to second page? protection fault page table entry says read-only fault handler: make copy, update page table copies of fjle data, modifjed

30

slide-59
SLIDE 59

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

31

slide-60
SLIDE 60

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-61
SLIDE 61

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-62
SLIDE 62

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-63
SLIDE 63

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-64
SLIDE 64

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-65
SLIDE 65

mapped pages (no backing fjle)

virtual pages w/o backing fjle page table (part) data in memory swapped out data (if any) access new page page fault handler allocates on demand need more memory save page to disk temporarily data in memory

32

slide-66
SLIDE 66

Linux maps

$ cat /proc/self/maps 00400000−0040b000 r−xp 00000000 08:01 48328831 / bin / cat 0060a000−0060b000 r− −p 0000a000 08:01 48328831 /bin/cat 0060b000−0060c000 rw−p 0000b000 08:01 48328831 / bin / cat 01974000−01995000 rw−p 00000000 00:00 0 [ heap ] 7f60c718b000−7f60c7490000 r− −p 00000000 08:01 77483660 /usr/lib/locale/locale−archive 7f60c7490000−7f60c764e000 r−xp 00000000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c764e000−7f60c784e000 − − −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c784e000−7f60c7852000 r− −p 001be000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7852000−7f60c7854000 rw−p 001c2000 08:01 96659129 /lib/x86_64−linux−gnu/libc−2.19.so 7f60c7854000−7f60c7859000 rw−p 00000000 00:00 0 7f60c7859000−7f60c787c000 r−xp 00000000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a39000−7f60c7a3b000 rw−p 00000000 00:00 0 7f60c7a7a000−7f60c7a7b000 rw−p 00000000 00:00 0 7f60c7a7b000−7f60c7a7c000 r− −p 00022000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7c000−7f60c7a7d000 rw−p 00023000 08:01 96659109 /lib/x86_64−linux−gnu/ld−2.19.so 7f60c7a7d000−7f60c7a7e000 rw−p 00000000 00:00 0 7ffc5d2b2000−7ffc5d2d3000 rw−p 00000000 00:00 0 [ stack ] 7ffc5d3b0000−7ffc5d3b3000 r− −p 00000000 00:00 0 [ vvar ] 7ffc5d3b3000−7ffc5d3b5000 r−xp 00000000 00:00 0 [ vdso ] ffffffffff600000−ffffffffff601000 r−xp 00000000 00:00 0 [ vsyscall ]

at virtual addresses 0x400000–0x40b000 read, not write, execute, private private = copy-on-write (if writeable) starting at ofgset 0 of the fjle /bin/cat device major number 8 device minor number 1 inode 48328831 more on what this means when we talk about fjlesystems heap — no corresponding fjle just read/write memory read/write, copy-on-write (private) mapping

int fd = open("/bin/cat", O_RDONLY); mmap(0x60b000, 0x1000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0xb000);

as if:

int fd = open("/bin/cat", O_RDONLY); mmap(0x400000, 0x1000, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0xb000);

33

slide-67
SLIDE 67

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

34

slide-68
SLIDE 68

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

34

slide-69
SLIDE 69

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

34

slide-70
SLIDE 70

swapping with copy-on-write

virtual pages mapped to fjle page table (part) fjle data, cached in memory fjle data

  • n disk/SSD

copies of fjle data, modifjed free up space by removing cached copies of fjle need to free up more space? can move copied data to disk “swapped out” modifjed data ‘swapped out’ modifjed data

34

slide-71
SLIDE 71

swapping

historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD

  • nly need keep ‘currently active’ pages in physical memory

swapping mmap with “default” fjles to use

35

slide-72
SLIDE 72

swapping

historical major use of virtual memory is supporting “swapping” using disk (or SSD, …) as the next level of the memory hierarchy process is allocated space on disk/SSD memory is a cache for disk/SSD

  • nly need keep ‘currently active’ pages in physical memory

swapping ≈ mmap with “default” fjles to use

35

slide-73
SLIDE 73

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

36

slide-74
SLIDE 74

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

36

slide-75
SLIDE 75

HDD/SDDs are slow

HDD reads and writes: milliseconds to tens of milliseconds

minimum size: 512 bytes writing tens of kilobytes basically as fast as writing 512 bytes

SSD writes and writes: hundreds of microseconds

designed for writes/reads of kilobytes (not much smaller)

page fault handler is going switch to another program

36

slide-76
SLIDE 76

the page cache

memory is a cache for disk fjles, program memory has a place on disk

running low on memory? always have room on disk assumption: disk space approximately infjnite

physical memory pages: disk data ‘temporarily’ kept in faster storage

possibly being used by one or more processes? possibly part of a fjle on disk? possibly both

goal: manage this cache intelligently

37

slide-77
SLIDE 77

memory as a cache for disk

“cache block” ≈ physical page fully associative

any virtual address/fjle part can be stored in any physical page

replacement is managed by the OS normal cache hits happen without OS

common case that needs to be fast

38

slide-78
SLIDE 78

page cache components

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

39

slide-79
SLIDE 79

page cache components

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

40

slide-80
SLIDE 80

virtual address/fjle ofgset → physical page

“cache hits” page table: virtual address → physical page (if any)

mapping found? cache hit on program memory access structure determined by hardware — involved in every memory access

kernel data structures: fjle ofgset → physical page (if any)

mapping found? cache hit on read/write system call (or cache hit on page fault for mmap’d memory) multiple possible designs (software data structure)

  • ne idea: balanced tree: ofgset → physical page

41

slide-81
SLIDE 81

Linux: tracking fjles in memory

struct file { ... struct inode *f_inode; ... }; ... struct inode { ... struct address_space i_data; ... }; ... struct address_space { ... struct radix_tree_root i_pages; /* cached pages */ atomic_t i_mmap_writable;/* count VM_SHARED mappings */ struct rb_root_cached i_mmap; /* tree of private and shared mappings */ ...

struct inode represents fjle on disk (versus a fjle that’s a pipe, terminal, etc.) way to fjnd cached copies of parts of fjle copies can be shared between processes Linux’s choice: tree of cached pages list of every place this fjle is mmap’d needed when freeing up memory from cahced pages

42

slide-82
SLIDE 82

Linux: tracking fjles in memory

struct file { ... struct inode *f_inode; ... }; ... struct inode { ... struct address_space i_data; ... }; ... struct address_space { ... struct radix_tree_root i_pages; /* cached pages */ atomic_t i_mmap_writable;/* count VM_SHARED mappings */ struct rb_root_cached i_mmap; /* tree of private and shared mappings */ ...

struct inode represents fjle on disk (versus a fjle that’s a pipe, terminal, etc.) way to fjnd cached copies of parts of fjle copies can be shared between processes Linux’s choice: tree of cached pages list of every place this fjle is mmap’d needed when freeing up memory from cahced pages

42

slide-83
SLIDE 83

Linux: tracking fjles in memory

struct file { ... struct inode *f_inode; ... }; ... struct inode { ... struct address_space i_data; ... }; ... struct address_space { ... struct radix_tree_root i_pages; /* cached pages */ atomic_t i_mmap_writable;/* count VM_SHARED mappings */ struct rb_root_cached i_mmap; /* tree of private and shared mappings */ ...

struct inode represents fjle on disk (versus a fjle that’s a pipe, terminal, etc.) way to fjnd cached copies of parts of fjle copies can be shared between processes Linux’s choice: tree of cached pages list of every place this fjle is mmap’d needed when freeing up memory from cahced pages

42

slide-84
SLIDE 84

mapped pages (read/write, shared)

fjle data, cached in memory fjle data on disk/SSD

43

slide-85
SLIDE 85

page cache components

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

44

slide-86
SLIDE 86

virtual address/fjle ofgset → location on disk

“cache miss” for memory mapped to fjles:

need data structure saying where fjles are mapped (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps

for “swapped out” data outside of fjles:

(heap memory, modifjed copy-on-write copies of fjles, etc.) need some way to track swapped out, modifjed pages hopefully not too big…

for data in fjles: depends on fjlesystem (topic for later)

45

slide-87
SLIDE 87

virtual address/fjle ofgset → location on disk

“cache miss” for memory mapped to fjles:

need data structure saying where fjles are mapped (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps

for “swapped out” data outside of fjles:

(heap memory, modifjed copy-on-write copies of fjles, etc.) need some way to track swapped out, modifjed pages hopefully not too big…

for data in fjles: depends on fjlesystem (topic for later)

46

slide-88
SLIDE 88

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

47

slide-89
SLIDE 89

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

47

slide-90
SLIDE 90

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

47

slide-91
SLIDE 91

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

47

slide-92
SLIDE 92

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

47

slide-93
SLIDE 93

virtual address/fjle ofgset → location on disk

“cache miss” for memory mapped to fjles:

need data structure saying where fjles are mapped (then rely on fjlesystem) seen dump of this data structure in Linux: /proc/PID/maps

for “swapped out” data outside of fjles:

(heap memory, modifjed copy-on-write copies of fjles, etc.) need some way to track swapped out, modifjed pages hopefully not too big…

for data in fjles: depends on fjlesystem (topic for later)

48

slide-94
SLIDE 94

Linux: tracking swapped out pages

need to lookup location on disk potentially one location for every virtual page trick: store location in page table entry

instead of physical page #, permission bits, etc., store ofgset on disk

  • n page fault: examine page table entry to read from disk

49

slide-95
SLIDE 95

page cache components

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

50

slide-96
SLIDE 96

tracking physical pages: fjnding free pages

Linux has list of “least recently used” pages:

struct page { ... struct list_head lru; /* list_head ~ next/prev pointer */ ... };

how we’re going to fjnd a page to allocate

(and evict from something else)

later — what this list actually looks like (how many lists, …)

51

slide-97
SLIDE 97

page cache components

mapping: virtual address or fjle+ofgset → physical page

handle cache hits

fjnd backing location based on virtual address/fjle+ofgset

handle cache misses

track information about each physical page

handle page allocation handle cache eviction

52

slide-98
SLIDE 98

tracking physical pages: fjnding mappings

want to evict a page? remove from page tables, etc. need to track where every page is used!

53

slide-99
SLIDE 99

Linux: physical page → fjle → PTE

Linux tracking where fjle pages are in page tables:

struct page { ... struct address_space *mapping; pgoff_t index; /* Our offset within mapping. */ ... }; struct address_space { ... struct rb_root_cached i_mmap; /* tree of private and shared mappings */ ... };

tree of mappings lets us fjnd vm_area_structs and PTEs rather complicated look up (but writing ot disk is already slow)

54

slide-100
SLIDE 100

Linux: physical page → PTE w/o fjle

Linux also tracks location of “anonymous” (non-fjle) pages mapping from page to list of vm_area_structs that contain page

recall: vm_area_struct: one memory allocation in one process

exercise: why a list?

what’s one case when non-fjle memory is shared between processes?

55

slide-101
SLIDE 101

list of allocations per page

naive solution: seperate list for each page?

a lot of overhead (many tens of bytes per 4K page?)

but, trick: many pages ‘copied’ at the same time (e.g. fork) idea: share list between all pages

Linux represents each list as struct anon_vma (mmap region) initially: list one of mmap region

  • n fork: add to existing list; create a new one

56

slide-102
SLIDE 102

Linux: tracking memory regions

struct vm_area_struct { ... unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address within vm_mm. */ ... pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ ... struct anon_vma *anon_vma; /* Serialized by page_table_lock */ ... unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE units */ struct file * vm_file; /* File we map to (can be NULL). */ ... } __randomize_layout;

virtual addresses of mapping mapping are part of sorted list/tree to allow fjnding by start/end address fjle to get data from and location within fjle permissions (read/write/execute) fmags: private or shared? … private = copy-on-write shared = make changes to underlying fjle for tracking pages that aren’t part of fjle (from copy-on-write, or for non-fjle-backed memory)

57

slide-103
SLIDE 103

page replacement

step 1: evict a page to free a physical page step 2: load new, more important in its place

58

slide-104
SLIDE 104

evicting a page

fjnd a ‘victim’ page to evict remove victim page from page table, etc.

every page table it is referenced by every list of fjle pages …

if needed, save victim page to disk

59

slide-105
SLIDE 105

page replacement goals

hit rate: minimize number of misses throughput: minimize overhead/maximize performance fairness: every process/user gets its ‘share’ of memory will start with optimizing hit rate

60

slide-106
SLIDE 106

max hit rate ≈ max throughput

  • ptimizing hit rate almost optimizes throughput, but…

cache miss costs are variable

creating zero page versus reading data from slow disk? write back dirty page before reading a new one or not? reading multiple pages at a time from disk (faster per page read)? …

61

slide-107
SLIDE 107

max hit rate ≈ max throughput

  • ptimizing hit rate almost optimizes throughput, but…

cache miss costs are variable

creating zero page versus reading data from slow disk? write back dirty page before reading a new one or not? reading multiple pages at a time from disk (faster per page read)? …

61

slide-108
SLIDE 108

being proactive?

can avoid misses by “reading ahead”

guess what’s needed — read in ahead of time wrong guesses can have costs besides more cache misses

we will get back to this later for now — only access/evict on demand

62

slide-109
SLIDE 109

backup slides

63

slide-110
SLIDE 110

swapping timeline

… program A pages … program B pages program A page fault OS start read evicted loaded interrupt OS needs to choose page to replace hopefully copy on disk is already up-to-date? fjrst step of replacement: mark evicted page invalid in each page table this example: only process B real case: possibly many page tables

  • ther processes can run while reading page

OS will get interrupt when disk is done process A’s page table updated and restarted from point of fault

64

slide-111
SLIDE 111

swapping timeline

… program A pages … program B pages program A page fault OS start read evicted loaded interrupt OS needs to choose page to replace hopefully copy on disk is already up-to-date? fjrst step of replacement: mark evicted page invalid in each page table this example: only process B real case: possibly many page tables

  • ther processes can run while reading page

OS will get interrupt when disk is done process A’s page table updated and restarted from point of fault

64

slide-112
SLIDE 112

swapping timeline

… program A pages … program B pages program A page fault OS start read evicted loaded interrupt OS needs to choose page to replace hopefully copy on disk is already up-to-date? fjrst step of replacement: mark evicted page invalid in each page table this example: only process B real case: possibly many page tables

  • ther processes can run while reading page

OS will get interrupt when disk is done process A’s page table updated and restarted from point of fault

64

slide-113
SLIDE 113

swapping timeline

… program A pages … program B pages program A page fault OS start read evicted loaded interrupt OS needs to choose page to replace hopefully copy on disk is already up-to-date? fjrst step of replacement: mark evicted page invalid in each page table this example: only process B real case: possibly many page tables

  • ther processes can run while reading page

OS will get interrupt when disk is done process A’s page table updated and restarted from point of fault

64

slide-114
SLIDE 114

swapping timeline

… program A pages … program B pages program A page fault OS start read evicted loaded interrupt OS needs to choose page to replace hopefully copy on disk is already up-to-date? fjrst step of replacement: mark evicted page invalid in each page table this example: only process B real case: possibly many page tables

  • ther processes can run while reading page

OS will get interrupt when disk is done process A’s page table updated and restarted from point of fault

64

slide-115
SLIDE 115

swapping decisions

write policy replacement policy

65

slide-116
SLIDE 116

swapping decisions

write policy replacement policy

66

slide-117
SLIDE 117

swapping is writeback

implementing write-through is hard

when fault happens — physical page not written when OS resumes process — no chance to forward write HW itself doesn’t know how to write to disk

write-through would also be really slow

HDD/SSD perform best if one writes at least a whole page at a time

67

slide-118
SLIDE 118

implementing writeback

need a dirty bit per page (“was page modifjed”) x86: kept in the page table!

  • ption 1 (most common): hardware sets dirty bit in page table

entry (on write)

bit means “physical page was modifjed using this PTE”

  • ption 2: OS sets page read-only, fmips read-only+dirty bit on fault

68

slide-119
SLIDE 119

swapping decisions

write policy replacement policy

69

slide-120
SLIDE 120

replacement policies really matter

huge cost for “miss” on swapping (milliseconds!) replacement policy implemented in software

a lot more room for fancy policies

usualy goal: least-recently-used approximation

70

slide-121
SLIDE 121

LRU replacement?

problem: need to identify when pages are used

ideally every single time

not practical to do this exactly

HW would need to keep a list of when each page was accessed, or SW would need to force every access to trigger a fault

71

slide-122
SLIDE 122

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-123
SLIDE 123

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-124
SLIDE 124

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-125
SLIDE 125

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-126
SLIDE 126

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-127
SLIDE 127

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-128
SLIDE 128

second chance example

A B C D — — — B A — C — 1 A D 2 B C 3 C C A page list

last added *1R *2R *3R 1NR 2NR 3NR *1R 1R 2NR *3R 1NR *2R — 3NR 1R 2R 3R 1NR 2NR 3NR 3NR 1R 2NR 3R 1NR end of list 2NR 3NR 1R 2R 3R 1NR 2NR *2R 3NR 1R 2NR 3R

page 2 was at bottom of list is not referenced

  • kay to use

page 1 was at bottom of list reference — give second chance moves to top of list clear referenced bit eventually page 1 gets to bottom of list again but now not referenced — use B referenced — fmips referenced bit

72

slide-129
SLIDE 129

toy program memory

code data/heap empty/more heap? stack

00 0000 0000 = 0x000 01 0000 0000 = 0x100 10 0000 0000 = 0x200 11 0000 0000 = 0x300 11 1111 1111 = 0x3FF

virtual page# 0 virtual page# 1 virtual page# 2 virtual page# 3 divide memory into pages ( bytes in this case) “virtual” = addresses the program sees page number is upper bits of address (because page size is power of two) rest of address is called page ofgset

73

slide-130
SLIDE 130

toy program memory

code data/heap empty/more heap? stack

00 0000 0000 = 0x000 01 0000 0000 = 0x100 10 0000 0000 = 0x200 11 0000 0000 = 0x300 11 1111 1111 = 0x3FF

virtual page# 0 virtual page# 1 virtual page# 2 virtual page# 3 divide memory into pages ( bytes in this case) “virtual” = addresses the program sees page number is upper bits of address (because page size is power of two) rest of address is called page ofgset

73

slide-131
SLIDE 131

toy program memory

code data/heap empty/more heap? stack

00 0000 0000 = 0x000 01 0000 0000 = 0x100 10 0000 0000 = 0x200 11 0000 0000 = 0x300 11 1111 1111 = 0x3FF

virtual page# 0 virtual page# 1 virtual page# 2 virtual page# 3 divide memory into pages (28 bytes in this case) “virtual” = addresses the program sees page number is upper bits of address (because page size is power of two) rest of address is called page ofgset

73

slide-132
SLIDE 132

toy program memory

code data/heap empty/more heap? stack

00 0000 0000 = 0x000 01 0000 0000 = 0x100 10 0000 0000 = 0x200 11 0000 0000 = 0x300 11 1111 1111 = 0x3FF

virtual page# 0 virtual page# 1 virtual page# 2 virtual page# 3 divide memory into pages ( bytes in this case) “virtual” = addresses the program sees page number is upper bits of address (because page size is power of two) rest of address is called page ofgset

73

slide-133
SLIDE 133

toy program memory

code data/heap empty/more heap? stack

00 0000 0000 = 0x000 01 0000 0000 = 0x100 10 0000 0000 = 0x200 11 0000 0000 = 0x300 11 1111 1111 = 0x3FF

virtual page# 0 virtual page# 1 virtual page# 2 virtual page# 3 divide memory into pages ( bytes in this case) “virtual” = addresses the program sees page number is upper bits of address (because page size is power of two) rest of address is called page ofgset

73

slide-134
SLIDE 134

toy physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

74

slide-135
SLIDE 135

toy physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

74

slide-136
SLIDE 136

toy physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

74

slide-137
SLIDE 137

toy physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

74

slide-138
SLIDE 138

toy physical memory

program memory virtual addresses

00 0000 0000 to 00 1111 1111 01 0000 0000 to 01 1111 1111 10 0000 0000 to 10 1111 1111 11 0000 0000 to 11 1111 1111

real memory physical addresses

000 0000 0000 to 000 1111 1111 001 0000 0000 to 001 1111 1111 111 0000 0000 to 111 1111 1111

physical page 0 physical page 1 physical page 7 virtual page # physical page # 00 010 (2) 01 111 (7) 10 none 11 000 (0) page table!

74

slide-139
SLIDE 139

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-140
SLIDE 140

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-141
SLIDE 141

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-142
SLIDE 142

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-143
SLIDE 143

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-144
SLIDE 144

toy page table lookup

virtual page # valid? physical page # read OK? write OK? 00 1 010 (2, code) 1 01 1 111 (7, data) 1 1 10 ??? (ignored) 11 1 000 (0, stack) 1 1 01 1101 0010 — address from CPU trigger exception if 0? 111 1101 0010 to cache (data or instruction) “page table entry” “virtual page number” “physical page number” “page ofgset” “page ofgset”

75

slide-145
SLIDE 145

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-146
SLIDE 146

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-147
SLIDE 147

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-148
SLIDE 148

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-149
SLIDE 149

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-150
SLIDE 150

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-151
SLIDE 151

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76

slide-152
SLIDE 152

two-level page tables

for VPN 0x0-0x3FF for VPN 0x400-0x7FF for VPN 0x800-0xBFF for VPN 0xC00-0xFFF … for VPN 0xFF800-0xFFBFF for VPN 0xFFC00-0xFFFFF

fjrst-level page table two-level page table; 220 pages total; 210 entries per table

PTE for VPN 0x000 PTE for VPN 0x001 PTE for VPN 0x002 PTE for VPN 0x003 … PTE for VPN 0x3FF

second-level page tables actual data (if PTE valid)

PTE for VPN 0xC00 PTE for VPN 0xC01 PTE for VPN 0xC02 PTE for VPN 0xC03 … PTE for VPN 0xFFF

invalid entries represent big holes VPN range valid user? write? physical page #

(of next page table) 0x0-0x3FF

1 1 1 0x22343

0x400-0x7FF

1 0x00000

0x800-0xBFF

0x00000

0xC00-0xFFF

1 1 0x33454

0x1000-0x13FF

1 1 0xFF043

… … … …

0xFFC00-0xFFFFF

1 1 0xFF045 fjrst-level page table VPN valid user? write? physical page #

(of data)

0xC00 1 1 0x42443 0xC01 1 1 0x4A9DE 0xC02 1 1 0x5C001 0xC03 0x00000 0xC04 1 1 0x6C223 … … … … … 0xFFF 0x00000 a second-level page table

76