CS 5460: Operating Systems
CS5460: Operating Systems Lecture 16: Page Replacement (Ch. 9) CS - - PowerPoint PPT Presentation
CS5460: Operating Systems Lecture 16: Page Replacement (Ch. 9) CS - - PowerPoint PPT Presentation
CS5460: Operating Systems Lecture 16: Page Replacement (Ch. 9) CS 5460: Operating Systems Last Time: Demand Paging Key idea: RAM is used as a cache for disk Dont give a process a page of RAM until it is needed When running short
Last Time: Demand Paging
Key idea: RAM is used as a cache for disk – Don’t give a process a page of RAM until it is needed – When running short on RAM, take pages away from processes – This only works if accesses to memory pages have high temporal locality
» Why don’t we care about spatial locality?
Three basic kinds of page table entries – Valid mapping – the OS is not involved – translation performed entirely by the CPU – Invalid mapping – trap, then kernel does something special, such as kill the process – Valid but not present – trap and do demand paging Demand paging makes the exec() system call fast
CS 5460: Operating Systems
CS 5460: Operating Systems
Timeline of a Page Fault
- 1. Trap to operating system
- 2. Save state in PCB
- 3. Vector to page fault handler
- 4. If invalid, send SIGSEGV
- 5. If valid, find or create a free
page
- a. Possibly involves disk write
- 6. Issue disk read for page
- a. Wait until request queued at
disk controller
- b. Wait for seek/latency
- c. Wait for data transfer (DMA)
- d. Wait for completion interrupt
- 7. (Optional) Schedule another
process while waiting
- 8. Take disk interrupt
- 9. Update page table
- 10. Add process to run queue
- 11. Wait for process to be
scheduled next
- 12. Restore state from PCB
- 13. Return from OS
- 14. Re-execute faulting
instruction
CS 5460: Operating Systems
Effective Access Times
What is average access latency? – L1 cache: 2 cycles – L2 cache: 10 cycles – Main memory: 150 cycles – Disk: 10 ms à à 30,000,000 cycles on 3.0 GHz processor – Assume access have following characteristics:
» 98% handled by L1 cache » 1% handled by L2 cache » 0.99% handled by DRAM » 0.01% cause page fault
– Average access latency:
» (0.98 * 2) + (0.01 * 10) + (0.0099 * 150) + (0.0001 * 30,000,000) = 1.96 + 0.1 + 1.485 + 3000 = about 3000 cycles / access
Moral: Need LOW fault rates to sustain performance!
CS 5460: Operating Systems
Issues in Demand Paging
Page selection policy – When do we load a page? Page replacement policy – What page(s) do we swap to disk to make room for new pages? – When do we swap pages out to disk? How do we handle thrashing?
CS 5460: Operating Systems
Page Selection Policy
Demand paging: – Load page in response to access (page fault) – Predominant selection policy Pre-paging (prefetching) – Predict what pages will be accessed in near future – Prefetch pages in advance of access – Problems:
» Hard to predict accurately (trace cache) » Mispredictions can cause useful pages to be replaced
Overlays – Application controls when pages loaded/replaced – Only really relevant now for embedded/real-time systems
CS 5460: Operating Systems
Page Replacement Policies
Optimal – Throw out page used farthest in the future Random – Works surprisingly well FIFO (first in, first out) – Throw out oldest pages LRU (least recently used) – Throw out page not used in longest time NRU (not recently used) – Approximation to LRU à à do not throw out recently used pages How should we evaluate page replacement policies?
CS 5460: Operating Systems
FIFO Page Replacement
A B C A B D A D B C B
Frame1 Frame2 Frame3 FIFO: replace oldest page (first loaded) Example: – Memory system with three pages à à all initially free – Reference string: A B C A B D A D B C B
Result: 7 page faults
- A
- B
- C
- √
- √
- D
- A
- √
- B
- C
- √
CS 5460: Operating Systems
Optimal Page Replacement
A B C A B D A D B C B
Frame1 Frame2 Frame3 Optimal: replace page used farthest in the future Example: – Memory system with three pages à à all initially free – Reference string: A B C A B D A D B C B
Result: 5 page faults
- A
- B
- C
- √
- √
- D
- √
- C
- √
- √
- C
- √
CS 5460: Operating Systems
LRU Page Replacement
A B C A B D A D B C B
Frame1 Frame2 Frame3 LRU: replace least recently used page Example: – Memory system with three pages à à all initially free – Reference string: A B C A B D A D B C B
Result: 5 page faults
- A
- B
- C
- √
- √
- D
- √
- C
- √
- √
- √
How would you implement… – Random – FIFO – Optimal – LRU – NRU Which ones are efficient?
CS 5460: Operating Systems
CS 5460: Operating Systems
NRU Page Replacement
Observations – LRU is pretty good approximation of OPT
» Past performance is often reasonable predictor of future performance » Captures “phase” behavior in many (but not all) applications
– Implementing true LRU requires far too much overhead
» Logically, need to update “sort order” on every memory access
How can we approximate LRU efficiently? – Exploit “referenced” bit in modern page tables – Only replace pages that have not been recently referenced (NRU) – Periodically clear referenced bits à à enforces “recently”
» Optionally: Maintain recent history of referenced bits per-page » Example: 10010101 à à denotes times page referenced last 8 sweeps
CS 5460: Operating Systems
NRU Page Replacement
This is a modified version of FIFO Checks if the page at the head of the FIFO queue
has its referenced bit set
– Yes? Then clear the bit and put it at the back of the queue and look at the next page – No? Then select this page Is this fast? What is the worst case? This is called the “second chance” algorithm
CS 5460: Operating Systems 1 1 1
Clock Algorithm
This is basically an
- ptimized version of second
chance
Maintains “next” pointer
– Starts sweep there until done – Persists across invocations
While (need more pages)
– Check referenced bit – If 0 à à add to free pool – If 1 à à reset bit
Between sweeps
– If a process accesses page, referenced bit gets set – TLB helps here!
1
Next
Referenced
Next Next Next Next
Free! Free!
CS 5460: Operating Systems
BSD Page Replacement (NRU)
Goal: maintain pool of free pages at all times – Avoid waiting for replacement algorithm/write during page fault – Typical goal: ~5% of main memory in free page pool Sweeper process – Privileged (kernel) process – Scheduled whenever free page pool drops below threshold
» Low watermark (sweep) –vs- high watermark (goal)
– Sweeps through list of allocated pages doing 2nd chance
Nth Chance
Like second chance but… – If page is referenced, clear its counter and move on – If page is not referenced, increment its counter
» If new counter == N, select this page » Otherwise move on
– If N is big, we have a really good LRU approximation
» But we spend a lot of time looking for pages
– If N == 1 we have second chance – If N == 0 we have FIFO Lots more work exists on page replacement…
CS 5460: Operating Systems
CS 5460: Operating Systems
Belady’s Anomaly
For some replacement algorithms – MORE pages in main memory can lead to… – MORE page faults! This phenomenon is known as “Belady’s Anomaly” Example: – FIFO replacement policy – Reference string: A B C D A B E A B C D E – Three pages à à 9 faults – Four pages à à 10 faults! Interesting since we would expect that adding more
memory always helps
CS 5460: Operating Systems
Thrashing
Working set: collection of memory currently being
used by a process
If all working sets do not fit in memory à
à thrashing
– One “hot” page replaces another – Percentage of accesses that generate page faults skyrockets Typical solution: “swap out” entire processes – Scheduler needs to get involved – Two-level scheduling policy à à runnable vs memory-available – Need to be fair – Invoked when page fault rate exceeds some bound When swap devices are full, Linux invokes the
“OOM killer”
CS 5460: Operating Systems
Who should we compete against for memory? Global replacement: – All pages for all processes come from single shared pool – Advantage: very flexible à à can globally “optimize” memory usage – Disadvantages: Thrashing more likely, can often do just the wrong thing (e.g., replace the pages of a process about to be scheduled) – Many OSes, including Linux, do this Per-process replacement: – Each process has private pool of pages à à competes with itself – Alleviates inter-process problems, but not every process equal – Need to know working set size for each process – Windows kernel does this
» There are Win32 API calls to set a process’s minimum and maximum working set sizes
CS 5460: Operating Systems
Important From Today
Demand paging – What is it? What is the “effective access time”? Page replacement policies – Random, FIFO, Optimal, LRU, NRU, … – Belady’s anomaly Thrashing Global vs local allocation – Concept of a process’s “working set”