virtual memory 3 i o
play

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models - PowerPoint PPT Presentation

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models LRU page replacement approximating LRU by sampling accessed bits or mark invalid nit: said Linux marked invalid to test probably not on x86 instead periodic scanning of


  1. accounting pages shared pages make it diffjcult to count memory usage count shared fjle pages for the process that last ‘used’ them …as detected by page fault for page 20 Linux cgroups accounting: last touch

  2. Linux cgroup limits Linux “control groups” of processes can set memory limits for group of proceses: low limit: don’t ‘steal’ pages when group uses less than this always take pages someone is using (unless no choice) high limit: never let group use more than this replace pages from this group before anything else … 21

  3. Linux cgroups Linux mechanism: seperate processes into groups: cgroup website cgroup login can set memory and CPU and …shares for each group 22 webserver webapp … bash (shell) ls …

  4. Linux cgroup memory limits memory usage low limit high limit max 0 GB memory capacity actively deallocate pages cgroup is using if other processes need memory, take from this group do not take from this group for other groups (even if pages not recently used) 23

  5. recall: kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 24

  6. recall: kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 24

  7. recall: kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 24

  8. recall: kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 24

  9. recall: kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 24

  10. recall: kernel bufgering (writes) to remote machine to be written on disk bufger: data waiting write block of data from disk (when ready) to fjle write char print char program waiting for network bufger: output send data (when ready) disk network operating system 25

  11. recall: kernel bufgering (writes) to remote machine to be written on disk bufger: data waiting write block of data from disk (when ready) to fjle write char print char program waiting for network bufger: output send data (when ready) disk network operating system 25

  12. recall: kernel bufgering (writes) to remote machine to be written on disk bufger: data waiting write block of data from disk (when ready) to fjle write char print char program waiting for network bufger: output send data (when ready) disk network operating system 25

  13. recall: kernel bufgering (writes) to remote machine to be written on disk bufger: data waiting write block of data from disk (when ready) to fjle write char print char program waiting for network bufger: output send data (when ready) disk network operating system 25

  14. recall: kernel bufgering (writes) to remote machine to be written on disk bufger: data waiting write block of data from disk (when ready) to fjle write char print char program waiting for network bufger: output send data (when ready) disk network operating system 25

  15. recall: layering application standard library system calls kernel’s fjle interface device drivers hardware interfaces kernel’s bufgers read/write cout/printf — and their own bufgers 26

  16. ways to talk to I/O devices user program read/write/mmap/etc. fjle interface regular fjles fjlesystems device fjles device drivers 27

  17. devices as fjles talking to device? open/read/write/close typically similar interface within the kernel device driver implements the fjle interface 28

  18. example device fjles from a Linux desktop /dev/snd/pcmC0D0p — audio playback confjgure, then write audio data /dev/sda , /dev/sdb — SATA-based SSD and hard drive usually access via fjlesystem, but can mmap/read/write directly /dev/input/event3 , /dev/input/event10 — mouse and keyboard can read list of keypress/mouse movement/etc. events /dev/dri/renderD128 — builtin graphics DRI = direct rendering infrastructure 29

  19. devices: extra operations? read/write/mmap not enough audio output device — set format of audio? terminal — whether to echo back what user types? CD/DVD — open the disk tray? is a disk present? … POSIX: ioctl (general I/O control), tcget/setaddr (for terminal settings), … 30

  20. Linux example: fjle operations (selected subset — table of pointers to functions) }; ... ... unsigned long mmap_supported_flags; ... ... ... struct file_operations { 31 ssize_t (*read) ( struct file *, char __user *, size_t, loff_t *); ssize_t (*write) ( struct file *, const char __user *,x size_t, loff_t *); long (*unlocked_ioctl) ( struct file *, unsigned int , unsigned long ); int (*mmap) ( struct file *, struct vm_area_struct *); int (*open) ( struct inode *, struct file *); int (*release) ( struct inode *, struct file *);

  21. special case: block devices devices like disks often have a difgerent interface instead of bytes used by fjlesystems — store directories on devices fjlesystems are specialized to know disks aren’t byte-based want to work with page cache — bytes not convenient read/write page at a time implement read/write to use page cache, not direct common code to translate from working with bytes to blocks 32 unlike normal fjle interface, works in terms of ‘blocks’

  22. Linux example: block device operations struct block_device_operations { ... }; read/write a page for a sector number (= block number) 33 int (*open) ( struct block_device *, fmode_t); void (*release) ( struct gendisk *, fmode_t); int (*rw_page)( struct block_device *, sector_t, struct page *, bool ); int (*ioctl) ( struct block_device *, fmode_t, unsigned , unsigned long );

  23. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 34

  24. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 34

  25. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 34

  26. xv6: device fjles struct devsw { }; extern struct devsw devsw[]; table of devices device fjle uses entry in devsw array fjlesystem stores name to index lookup similar scheme used on ‘real’ Unix/Linux fjles referencing major/minor device number table of device numbers in kernel 35 int (*read)( struct inode*, char *, int ); int (*write)( struct inode*, char *, int );

  27. xv6: console devsw code run at boot: devsw[CONSOLE].write = consolewrite; devsw[CONSOLE].read = consoleread; CONSOLE is a constant consoleread/consolewrite: run when you read/write console 36

  28. xv6: console devsw code run at boot: devsw[CONSOLE].write = consolewrite; devsw[CONSOLE].read = consoleread; CONSOLE is a constant consoleread/consolewrite: run when you read/write console 36

  29. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 37

  30. xv6: console top half (read) } } ... release(&cons.lock) } ... } sleep(&input.r, &cons.lock); ... int while (input.r == input.w){ while (n > 0){ acquire(&cons.lock); target = n; ... { 38 consoleread( struct inode *ip, char *dst, int n) if (myproc() − >killed){ return − 1;

  31. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 39

  32. xv6: console top half (read) int } ... release(&cons.lock) } break ; if (c == '\n') *dst++ = c; ... c = input.buf[input.r++ % INPUT_BUF]; ... while (n > 0){ acquire(&cons.lock); target = n; ... { 40 consoleread( struct inode *ip, char *dst, int n) −− n; return target − n;

  33. xv6: console top half (read) int } ... release(&cons.lock) } break ; if (c == '\n') *dst++ = c; ... c = input.buf[input.r++ % INPUT_BUF]; ... while (n > 0){ acquire(&cons.lock); target = n; ... { 40 consoleread( struct inode *ip, char *dst, int n) −− n; return target − n;

  34. xv6: console top half wait for bufger to fjll no special work to request data — keyboard input always sent copy from bufger check if done (newline or enough chars), if not repeat 41

  35. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 42

  36. xv6: console interrupt (one case) break ; lapcieoi: tell CPU “I’m done with this interrupt” kbdintr: atually read from keyboard device } ... } ... lapcieoi(); void kbdintr(); case T_IRQ0 + IRQ_KBD: ... ... 43 trap( struct trapframe *tf) { switch (tf − >trapno) {

  37. xv6: console interrupt (one case) break ; lapcieoi: tell CPU “I’m done with this interrupt” kbdintr: atually read from keyboard device } ... } ... lapcieoi(); void kbdintr(); case T_IRQ0 + IRQ_KBD: ... ... 43 trap( struct trapframe *tf) { switch (tf − >trapno) {

  38. device driver fmow get interrupt from device trap handler “bottom half” device hardware store and return request result send more to device (if needed) wake up thread (if needed) update bufgers put thread to sleep (if needed) thread making read/write/etc. “top half” send or queue I/O operation (e.g. previous keypresses to keyboard) check if satisfjed from bufgers page cache miss/eviction… read/write/… system call or get I/O request 44

  39. xv6: console interrupt reading kbdintr fuction actually reads from device adds data to bufger (if room) wakes up sleeping thread (if any) 45

  40. connecting devices 0x80004808 : which of several processors handles it, …, etc.) (deals with ordering, interrupt disabling, component of processor decides when to handle way to send “please interrupt” signal bufgers/queues will also have memory addresses actually just sends the value the external hardware e.g. maybe writing to write? “control register” control registers might not really be registers actually changes value in device controller looks like write to memory control registers have memory addresses …: 0x80004810 : 0x80004800 : processor external hardware? bufgers/queues control registers … write? read? status device controller other devices actual memory other processors… memory bus controller interrupt 46

  41. connecting devices 0x80004808 : which of several processors handles it, …, etc.) (deals with ordering, interrupt disabling, component of processor decides when to handle way to send “please interrupt” signal bufgers/queues will also have memory addresses actually just sends the value the external hardware e.g. maybe writing to write? “control register” control registers might not really be registers actually changes value in device controller looks like write to memory control registers have memory addresses …: 0x80004810 : 0x80004800 : processor external hardware? bufgers/queues control registers … write? read? status device controller other devices actual memory other processors… memory bus controller interrupt 46

  42. connecting devices 0x80004808 : which of several processors handles it, …, etc.) (deals with ordering, interrupt disabling, component of processor decides when to handle way to send “please interrupt” signal bufgers/queues will also have memory addresses actually just sends the value the external hardware e.g. maybe writing to write? “control register” control registers might not really be registers actually changes value in device controller looks like write to memory control registers have memory addresses …: 0x80004810 : 0x80004800 : processor external hardware? bufgers/queues control registers … write? read? status device controller other devices actual memory other processors… memory bus controller interrupt 46

  43. connecting devices 0x80004808 : which of several processors handles it, …, etc.) (deals with ordering, interrupt disabling, component of processor decides when to handle way to send “please interrupt” signal bufgers/queues will also have memory addresses actually just sends the value the external hardware e.g. maybe writing to write? “control register” control registers might not really be registers actually changes value in device controller looks like write to memory control registers have memory addresses …: 0x80004810 : 0x80004800 : processor external hardware? bufgers/queues control registers … write? read? status device controller other devices actual memory other processors… memory bus controller interrupt 46

  44. connecting devices 0x80004808 : which of several processors handles it, …, etc.) (deals with ordering, interrupt disabling, component of processor decides when to handle way to send “please interrupt” signal bufgers/queues will also have memory addresses actually just sends the value the external hardware e.g. maybe writing to write? “control register” control registers might not really be registers actually changes value in device controller looks like write to memory control registers have memory addresses …: 0x80004810 : 0x80004800 : processor external hardware? bufgers/queues control registers … write? read? status device controller other devices actual memory other processors… memory bus controller interrupt 46

  45. bus adaptors device controller difgerent bus external hardware? bufgers/queues control registers … write? read? status other devices processor bus adaptor other bus adaptors or other devices actual memory other processors… memory bus controller interrupt 47

  46. devices as magic memory (1) devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read 48

  47. devices as magic memory (1) devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read 48

  48. devices as magic memory (1) devices expose memory locations to read/write use read/write instructions to manipulate device example: keyboard controller read from magic memory location — get last keypress/release reading location clears bufger for next keypress/release get interrupt whenever new keypress/release you haven’t read 48

  49. device as magic memory (2) example: display controller write to pixels to magic memory location — displayed on screen other memory locations control format/screen size example: network interface write to bufgers write “send now” signal to magic memory location — send data read from “status” location, bufgers to receive 49

  50. solution: OS can mark memory uncachable what about caching? caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? x86: bit in page table entry can say “no caching” 50

  51. what about caching? caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? solution: OS can mark memory uncachable x86: bit in page table entry can say “no caching” 50

  52. what about caching? caching “last keypress/release”? I press ‘h’, OS reads ‘h’, does that get cached? …I press ‘e’, OS reads what? solution: OS can mark memory uncachable x86: bit in page table entry can say “no caching” 50

  53. aside: I/O space x86 has a “I/O addresses” like memory addresses, but accessed with difgerent instruction in and out instructions historically: separate I/O bus more recent processors/devices would just use memory addresses no need for more instructions, buses other reasons to have devices and memory close (later) 51

  54. xv6 keyboard access two control registers: KBSTATP: status register (I/O address 0x64 ) KBDATAP: data bufger (I/O address 0x60 ) st = inb(KBSTATP); // in instruction: read from I/O address if ((st & KBS_DIB) == 0) // bit KBS_DIB indicates data in buffer? data = inb(KBDATAP); 52 return − 1; // read from data --- *clears* buffer /* interpret data to learn what kind of keypress/release */

  55. programmed I/O “programmed I/O”: write to or read from device bufgers directly OS runs loop to transfer data to or from device might still be triggered by interrupt know/what for “is device ready” 53

  56. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  57. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  58. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  59. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  60. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  61. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  62. approximating LRU: SEQ know: not referenced ‘recently’ extra details needed: how big is the inactive list? this is current Linux algorithm for non-fjle pages or mark invalid + get fault scan reference bits detecting references? “new” pages start in active list evict page at bottom of inactive list active list move to active list not really inactive inactive page referenced? is really inactive page guess: oldest active page inactive list 54

  63. loaded evicted swapping timeline hopefully copy on disk is already up-to-date? and restarted from point of fault process A’s page table updated OS will get interrupt when disk is done other processes can run while reading page real case: possibly many page tables this example: only process B mark evicted page invalid in each page table fjrst step of replacement: interrupt OS needs to choose page to replace … start read OS page fault program A program B pages … program A pages 55

  64. swapping timeline OS needs to choose page to replace and restarted from point of fault process A’s page table updated OS will get interrupt when disk is done other processes can run while reading page real case: possibly many page tables this example: only process B mark evicted page invalid in each page table fjrst step of replacement: hopefully copy on disk is already up-to-date? interrupt … start read OS page fault program A program B pages … program A pages 55 loaded evicted

  65. swapping timeline OS needs to choose page to replace and restarted from point of fault process A’s page table updated OS will get interrupt when disk is done other processes can run while reading page real case: possibly many page tables this example: only process B mark evicted page invalid in each page table fjrst step of replacement: hopefully copy on disk is already up-to-date? interrupt … start read OS page fault program A program B pages … program A pages 55 loaded evicted

  66. swapping timeline OS needs to choose page to replace and restarted from point of fault process A’s page table updated OS will get interrupt when disk is done other processes can run while reading page real case: possibly many page tables this example: only process B mark evicted page invalid in each page table fjrst step of replacement: hopefully copy on disk is already up-to-date? interrupt … start read OS page fault program A program B pages … program A pages 55 loaded evicted

  67. swapping timeline OS needs to choose page to replace and restarted from point of fault process A’s page table updated OS will get interrupt when disk is done other processes can run while reading page real case: possibly many page tables this example: only process B mark evicted page invalid in each page table fjrst step of replacement: hopefully copy on disk is already up-to-date? interrupt … start read OS page fault program A program B pages … program A pages 55 loaded evicted

  68. POSIX: everything is a fjle the fjle: one interface for devices (terminals, printers, …) regular fjles on disk networking (sockets) local interprocess communication (pipes, sockets) basic operations: open(), read(), write(), close() 56

  69. the fjle interface open before use setup, access control happens here byte-oriented real device isn’t? operating system needs to hide that explicit close 57

  70. the fjle interface open before use setup, access control happens here byte-oriented real device isn’t? operating system needs to hide that explicit close 57

  71. kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 58

  72. kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 58

  73. kernel bufgering (reads) …via bufger …via bufger data from disk bufger: recently read read block of data from disk from fjle read char from terminal program read char waiting for program bufger: keyboard input keypress happens, read disk keyboard operating system 58

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend