CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) - - PowerPoint PPT Presentation

cs5460 operating systems lecture 14 memory management
SMART_READER_LITE
LIVE PREVIEW

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) - - PowerPoint PPT Presentation

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) CS 5460: Operating Systems Important from last time Were trying to build efficient virtual address spaces Why?? Virtual / physical translation is done by HW and


slide-1
SLIDE 1

CS 5460: Operating Systems

CS5460: Operating Systems

Lecture 14: Memory Management (Chapter 8)

slide-2
SLIDE 2

Important from last time

 We’re trying to build efficient virtual address spaces – Why??  Virtual / physical translation is done by HW and SW

cooperating

– We want to make the common case fast in HW  Key insight: Treat a virtual address as a collection

  • f separate elements

– High bits identify a segment or page – Low bits identify offset into the segment or page

CS 5460: Operating Systems

slide-3
SLIDE 3

CS 5460: Operating Systems

Quick Review

 Base and bounds – Maps a single contiguous chunk of virtual address space to a single contiguous chunk of physical memory – Fast, small HW, and safe, but inflexible and lots of external fragmentation  Segmentation – Small number of segment registers (base-bounds per segment) – Each segment maps contiguous VM to contiguous physical mem – More flexible and less fragmentation than B+B, but same issues

slide-4
SLIDE 4

 Modern hardware and OSes use paging  Pages are like segments, but fixed size – So the bounds register goes away – External fragmentation goes away  Since pages are small (4 or 8 KB, often) there are a

lot of them

– So page table has to go into RAM – Page table might be huge – Accessing the page table requires a memory reference by the hardware to perform the translation

» First, load the page table entry » Second, access the data the user asked for

 Today we look at how these problems are fixed

CS 5460: Operating Systems

slide-5
SLIDE 5

CS 5460: Operating Systems

 Problems – Page table too large to store on processor die – Two memory references per load or store issued by the user program is unacceptable  Solution: Cache recently-used page table entries – TLB == “translation lookaside buffer” – TLB is a fixed-size cache of recently used page table entries – If TLB hit rate sufficiently large à à translation as fast as segments – If TLB hit rate poor à à load/store performance suffers badly – Key issue: Access locality – What is effective access time with/without TLB?

slide-6
SLIDE 6

CS 5460: Operating Systems

Paging Unit

Physical Address

CPU Core

Valid? No

CPU issues load/store

Memory management unit

  • 1. Compare VPN to all TLB tags
  • 2. If no match, need TLB refill

» SW à à trap to OS » HW à à HW table walker

  • 3. Checks if VA is valid

» If not, generate trap

  • 4. Concatenate PPN to Offset to

form physical address

This all needs to be very fast

– Why?

Tag PPN State 0A1FE 20104 … 104A3 4010D … … … … 3D11C 0401B …

Virtual Pg# Offset

Trap

1 Match? 2

Trap or HW refill

No 3

  • Phys. Pg#

Offset

4

  • TLB
slide-7
SLIDE 7

TLB Issues

 How large should the TLB be?  What limits TLB size?  How does a TLB interact with context switches?

CS 5460: Operating Systems

slide-8
SLIDE 8

CS 5460: Operating Systems

TLB Discussion

 What are typical TLB sizes and organization? – 16-1024 entries, fully associative, sub-cycle access latency – TLB misses take 10-100 cycles – Modern chips have multiple levels of TLB – Perform TLB lookup in parallel to L1 cache tag load

» Requires virtually tagged L1 cache

– Why is TLB often fully associative?  TLB “reach” – Amount of address space that can be mapped by TLB entries – What are architectural trends in terms of:

» TLB size and associativity » Main memory size

– Example: 64-entry TLB w/ 4KB pages à à 256 KB reach – Good news: 90-10 rule (90% of accesses to 10% of address space)

slide-9
SLIDE 9

CS 5460: Operating Systems

More Paging Discussion

 What are typical state bits? – Valid bit – Protection bits (read-only, execute-only, read-write) – Referenced bit – Modified bit – … (real PTEs tend to have more)  What happens during a context switch? – PCB needs to be extended to contain (or point to) page table – Save/restore page table base register (for hardware-filled TLB) – Flush TLB (if PID not in tag field)

Used to support “demand paging”

slide-10
SLIDE 10

CS 5460: Operating Systems

Page Table Organization

 Simple flat full table

– Simple, fast lookups – Huge!!!

 Hierarchical

– Two or more levels of tables

» Leaf tables have PTEs

– Fairly simple, fairly fast – Size roughly proportional to allocated memory size

 Inverted

– One entry per physical page – Entry contains VPN and ASID – Small table size – Lookups more expensive

» Hashing helps » Poor memory locality

  • Valid, R/W, …
  • Root Page Table
  • Main Memory

Hierarchical

Page# Offset PID

  • Search
  • n

Offset PPN (n)

  • Virtual address
  • Physical address
  • Inverted Page Table

hash(PID | Page#)

slide-11
SLIDE 11

CS 5460: Operating Systems

  • page number
  • p1
  • page offset
  • d
  • p2
  • outer page
  • table
  • inner page
  • table
  • desired
  • page
  • d
  • p2
  • p1
slide-12
SLIDE 12

Multi-level page tables

 Why does this save space?  Early virtual memory systems used a single-level

page table

 VAX, 386 used 2-level page table  Sparc uses 3-level  Alpha, IA-64, x86-64 have 4 levels – In many OSes, page tables other than top-level can themselves be paged

CS 5460: Operating Systems

slide-13
SLIDE 13

CS 5460: Operating Systems

Sharing

 How do you share memory?

– Entries in different process page tables map to same PPN

 Questions:

– What does this imply about granularity of sharing? – Can we share only a 4-byte int? – Can we set it up so that only one of the sharers can write the region? – How can we use this idea to implement copy on write?

VP31 VP0 VP21 VP11 Physical Address VP41 VP0 VP11 VP21 VP31 VP41 VP0 VP22 VP42 VP12 P2 VA P1 VA VP5 VP12 VP22 VP5 VP42 VP5

slide-14
SLIDE 14

CS 5460: Operating Systems

Initializing an Address Space

 Determine number of pages needed – Examine header information in executable file  Determine virtual address layout – Header file determines size of various segments – OS specifies positioning (e.g., base of code, location of stack)  Allocate necessary pages and initialize contents – Initialize page table entry to contain VPNà àPPN, set valid, …  Mark current TLB entries as invalid (i.e., TLB flush)  Start process – PC should point at valid address in code segment – Allocated stack space as process touches new pages

slide-15
SLIDE 15

CS 5460: Operating Systems

Superpages

 Problem: TLB reach shrinking as % of memory size  Solution: Superpages – Permit pages to have variable size (back to segments?) – For simplicity, restrict generality:

» Power of two region sizes » Aligned to superpage size (e.g., 1MB superpage aligned on 1MB bdy) » Contiguous

 Problem: Restrictions limit applicability. How?

Tag PPN Size State … 014BEC02 VPN Offset

slide-16
SLIDE 16

CS 5460: Operating Systems

Example: Superpage Usage

0x00004000 0x00005000 0x00007000 0x00006000 0x80240000 0x80243000 0x80242000 0x80241000

  • Virtual Addresses
  • Physical Addresses

valid dirty access size physical virtual

80240 00004 Y Y 002 Y

  • Page table

Y N

read-only supervisor

Size: Denotes number of base pages in superpage (2size) Virtual and physical ranges aligned on superpage boundary Both virtual and physical ranges are contiguous

slide-17
SLIDE 17

CS 5460: Operating Systems

Superpage Discussion

 What are good candidates for superpages? – Kernel – or at least the portions of kernel that are not “paged” – Frame buffer – Large “wired” data structures

» Scientific applications being run in “batch” mode » In-core databases

 How might OS exploit superpages? – Simple: Few hardwired regions (e.g., kernel and frame buffer) – Improved: Provide system calls so applications can request it – Holy grail: OS watches page access behavior and determines which pages are “hot” enough to warrant superpaging  Why might you not want to use superpages?

slide-18
SLIDE 18

CS 5460: Operating Systems

Paged Segmentation

 Problem with paging:

– Large VA space à à large page table

 Idea: Combine segmentation

and paging

– Divide address space into large contiguous segments – Divide segments into smaller fixed-size pages – Divide main memory into pages – Two-stage address translation

 Benefits:

– Reduced page table size – Can share segments easily

 Problems:

– Extra complexity

Seg# Page# Offset base0 limit0 base1 limit1 … basen limitn frame0 prot0 frame1 prot1 … framen protn frame# Offset

>?

Trap

Y

Physical Address Virtual Address

slide-19
SLIDE 19

CS 5460: Operating Systems

Paged Segmentation

 Sharing:

– Share individual pages à à copy page table entries – Share entire segments à à copy segment table entries

 Protection:

– Can be associated with either segment or page tables

 Implementation

– Segment tables in MMU – Page tables in main memory

 Practice

– x86 supports pages & segments – RISCs support only paging

Seg3 Seg0 Seg2 Seg1

Virtual Memory

Seg3 Seg0 Seg2 Seg1

Virtual Memory Physical Memory

Sharing example: Two processes Two common segments (0&3)

slide-20
SLIDE 20

CS 5460: Operating Systems

Bringing It All Together

 Paging is major improvement over segmentation – Eliminates external fragmentation problem (no compaction) – Permits flexible sharing between processes – Enables processes to run when only partially loaded (more soon!)  Significant issues remain – Page tables are much larger than segment tables – TLB required à à effectiveness seriously impacts performance – More complex OS routines for managing page table

» But management of physical address space much easier

– Translation overheads tend to be higher  Question: How can we let processes run when only

some of their pages are present in main memory?