1
CSCI 350
- Ch. 8 – Address Translation
Mark Redekopp Michael Shindler & Ramesh Govindan
CSCI 350 Ch. 8 Address Translation Mark Redekopp Michael Shindler - - PowerPoint PPT Presentation
1 CSCI 350 Ch. 8 Address Translation Mark Redekopp Michael Shindler & Ramesh Govindan 2 Abstracting Memory Thread = Abstraction of the processor What about abstracting memory? "All problems in computer science can be
1
Mark Redekopp Michael Shindler & Ramesh Govindan
2
– "All problems in computer science can be solved by another level of indirection"
Processor Memory Output Devices Input Devices
Software Program Data
3
4
Translation Unit / MMU (Mem. Mgmt. Unit)
Proc. Core
Virtual Addr
Memory
Physical Addr Data 0x00000000 0x3fffffff
frame
Physical Memory and Address Space
frame I/O and un- used area
0xffffffff
unused
Secondary Storage
… unused … unused … unused
Fictitious Virtual Address Spaces
5
6
7
8
– Virtual address space from VA: 0 to N-1 – Physical address space from PA: BASE to BASE+N-1
CPU
esp
VA: 0x02000
ebx eip eax Translation Unit / MMU
0x14000
base
0x05000
bound
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused
P1 Phys. Addr. Space Base: 0x14000 Base + Bound: 0x19000 0x16000 P2 Phys Addr. Space
and checking hardware are termed the MMU (Mem. Mgmt. Unit) or Translation Unit
9
– Simple – Provides isolation amongst processes
– No easy way to share data – Can't enforce access rights within the process (e.g. code = read only)
CPU
esp
VA: 0x02000
ebx eip eax Translation Unit / MMU
0x14000
base
0x05000
bound
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused
P1 Phys. Addr. Space Base: 0x14000 Base + Bound: 0x19000 0x16000 P2 Phys Addr. Space
and checking hardware are termed the MMU (Mem. Mgmt. Unit) or Translation Unit
10
– Code, data, stack, heap
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused Stack Seg.
Base: 0x14000 Base + Bound: 0x19000 0x16000 0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 02000 seg.
1 2 Descriptor Table 1:1:3
ss
13 bits=Seg. 1=LDT/0=GDT 0-3=RPL
Format:
Data Seg.
Base: 0x2a000
Code Seg.
Base + Bound: 0x2d200 Base: 0x08000 Base + Bound: 0x80400
http://ece-research.unm.edu/jimp/310/slides/micro_arch2.html
11
entries in the segment descriptor
– Shared code / libraries and/or data – One code segment for multiple instances of a running program
– Translation gives us a chance to intervene
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused Stack Seg.
Base: 0x14000 Base + Bound: 0x19000 0x16000 0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 02000 seg.
1 2 Descriptor Table 1:1:3
ss Data Seg.
Base: 0x2a000
Code Seg.
Base + Bound: 0x2d200 Base: 0x08000 Base + Bound: 0x80400
12
– As a system runs and segments are created and deleted the physical address may have many small, unusable gaps and we effectively lose that memory for use
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused Stack Seg.
Base: 0x14000 Base + Bound: 0x19000 0x16000 0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 02000 seg.
1 2 Descriptor Table 1:1:3
ss
13 bits=Seg. 1=LDT/0=GDT 0-3=RPL
Format:
Data Seg.
Base: 0x2a000
Code Seg.
Base + Bound: 0x2d200 Base: 0x08000 Base + Bound: 0x80400
http://ece-research.unm.edu/jimp/310/slides/micro_arch2.html
13
region
descriptors for each process
– Pointed to by GDTR / Interrupt descriptor table pointed to by IDTR
LDT for each segment register (CS, DS, SS, ES, FS, GS)
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused Stack Seg.
Base: 0x14000 Base + Bound: 0x19000 0x16000 0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 02000 seg.
DS SS CS Descriptor Cache 1:1:3
ss Data Seg.
Base: 0x2a000
Code Seg.
Base + Bound: 0x2d200 Base: 0x08000 Base + Bound: 0x80400
http://ece-research.unm.edu/jimp/310/slides/micro_arch2.html LDT1 GDT
0xc4000 GDTR 0xc0000 0xc4000 IDTR LDT/ TR 0xc0000 8191 0:1:3
cs
2:1:3
ds
14
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
> + PA: 0x16000
Exception
Physical Addr Virtual Addr
unused P1 Stack
Base: 0x14000 0x16000 0x7a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 02000 seg.
DS SS CS Descriptor Cache 1:1:3
ss P2 Data
Base: 0x7a000
P1/P2 Code
Base: 0x08000
http://ece-research.unm.edu/jimp/310/slides/micro_arch2.html LDT1 GDT
0xc4000 GDTR 0xc0000 0xc4000 IDTR LDT/ TR 0xc0000
8191
0:1:3
cs
2:1:3
ds LDT2
8191
P1 Data P2 Stack
0x7e000 0x01000 R/W Base Bound R/W 0x6e000 0x05000 R/W 0x08000 0x0400 R DS SS CS Base: 0x6e000 Base: 0x7e000
Process 2 Process 1
15
– Generally going to "exec" soon afterward
mark all entries read/only
copy of the segment for the child (aka Copy-On-Write)
avoiding the copy
– Start by doing minimal required work; do more work when required CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU unused Stack Seg.
Base: 0x14000 Base + Bound: 0x19000 0x16000 0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 2 1:1:3
ss Data Seg.
Base: 0x2a000
Code Seg.
Base + Bound: 0x2d200 Base: 0x08000 Base + Bound: 0x80400 0x2a000 0x03200 R 0x14000 0x05000 R 0x08000 0x0400 R 1 2 Forked Proc Seg. Table (Read only) Parent Proc Seg. Table (Read only)
16
library functions (think printf, strcpy, memset, etc.)
– Problem: Compiled progs. won't know where the library code will be when compiled – Solution: Compiler generates code to lookup any call to a shared function's in a table that it generates but loader fills in with actual library – Map a segment to describe that shared code area
unused P1 Stack
Base: 0x14000 0x16000
P2 Data
Base: 0x7a000
Shared Lib.
Base: 0x08000
LDT1 GDT
0xc0000 0xc4000 8191
LDT2
8191
P1 Code P2 Code
Base: 0x6e000 Base: 0x7e000
CPU
esp
VA: 0x102000
ebx eip eax Translation Unit / MMU
0x2a000 0x03200 R/W Base Bound R/W 0x14000 0x05000 R/W 0x08000 0x0400 R 1 2 1:1:3
ss
0x7e000 0x01000 R/W Base Bound R/W 0x6e000 0x05000 R/W 0x08000 0x0400 R 2 3 4 printf 0x400 strcpy 0x640 … … printf 0x400 strcpy 0x640 … …
17
18
– No unused "gaps" in memory
Physical Frame of memory can hold data from any virtual page. Since all pages are the same size any page can go in any frame (and be swapped at our desire).
0x00000000 0x3fffffff
frame
frame I/O and un- used area
0xffffffff
unused …
unused unused …
Space
19
(several KB) to amortize the large access time
unit = MMU (Mem. Management Unit)
up translations from VPN to PPFN
Offset within page
Virtual Address
Virtual Page Number
31 12 11
Offset within page
Physical Address
Number
31 30 12 11
00
Copied
12
Translation Process (MMU + Page Table)
29 20 18
20
– Let us assume 1000 contacts represented by a 3-digit integer (0-999) by the cell phone (this ID can be used to look up their names) – We want to use a simple Look-Up Table (LUT) to translate phone numbers to contact ID’s, how shall we organize/index our LUT
213-745-9823
LUT indexed w/ contact ID
000
LUT indexed w/ all possible phone #’s
626-454-9985 … 323-823-7104 818-329-1980 001 002 999 null 000-000-0000 .. … null 000 213-745-9823 999-999-9999
Sorted LUT indexed w/ used phone #’s
436 213-745-9823 000 … 002 999 323-823-7104 213-730-2198 818-329-1980
O(n) - Doesn’t Work We are given phone # and need to translate to ID (1000 accesses) O(log n) - Could Work Since its in sorted order we could use a binary search (log21000 accesses) O(1) - Could Work Easy to index & find but LARGE (1 access) 1 2 3
21
– VPN (upper bits) – Page offset: Based on page size (i.e. 0 to 4K-1 for 4KB page)
frame
VA
Offset w/in page Virtual Page Number
31 12 11
Page Table Size = 220 entries * 19 bits = approx. 220*4bytes = 4MB
PTBR = Page Table Base Reg. Offset w/in page
PA
31 12 11
00
Page Frame Number … Valid / Present
20
CPU Memory
18
22
address space and thus needs its own page table
reload the PTBR
CPU
esp
VA: 0x001040
ebx
VA: 0x002eac
eip eax Translation Unit / MMU
+ PA: 0x6e040 Physical Addr Virtual Addr
unused
VPN
ss
Code 2.1
PT1 GDT
0xc4000 PTBR/CR3 0xc4000 0xd0000
cs ds PT2
0x7e000 R/W
R/W 0x6e000 R/W 0x08000 R 1 2 0x6e000
Process 2 Page Table Process 1
VPN 0x7e000 R/W
R/W 0x6e000 R/W 0x08000 R Stack 2.1 Data 1.1 Stack 1.1 Code 1.2 Data 2.1 Code1.1 1 2 VPN
PPFN: 0x6e000 0x08000 0x002 0xeac 0x001 0x040
PA: 0x08eac PPFN: 0x08000 0xc4000 Physical Addr
Process 1 Page Table
23
requires 1 access) but somehow reduce the size especially since much of it is unused?
area codes.
– 1st level LUT is indexed on area code and contains pointers to 2nd level tables – 2nd level LUT’s indexed on local phone numbers and contains contact ID entries LUT indexed w/ all possible phone #’s
null … … 000 213 323
1st Level Index = Area Code
Area Code null 000-000-0000 .. … null 000 213-745-9823 999-999-9999 … 213 Table
2nd Level Index = Local Phone #
000-0000 999-9999 323 Table 000-0000 999-9999
If only 2 used area codes then only 1000 + 2(107) entries rather than 1010 entries
24
– 1st Level = Area code and pointers to 2nd level tables – 2nd Level = First 3-digits of local phone and pointers to 3rd level tables – 3rd Level = Contact ID’s
null … … 000 213 323
1st Level Index = Area Code
Area Code …
2nd Level Index = Local Phone #
000 999 000 999 323 Table 213 Table null null 745 823 null null
3rd Level Index = Local Phone #
000 999 213-745 Table null 000 null 9823 000 999 323-823 Table null 999 null 7104
25
third level table for just this entry
null … … 000 213 323
1st Level Index = Area Code
Area Code …
2nd Level Index = Local Phone #
000 999 000 999 323 Table 213 Table null null 745 823 null null
3rd Level Index = Local Phone #
000 999 213-745 Table null 000 null 9823 000 999 323-823 Table null 999 null 7104
26
CPU
esp
VA: 0x001040
ebx
VA: 0x002eac
eip eax Translation Unit / MMU
+ PA: 0x6e040 Physical Addr Virtual Addr
unused
VPN
ss
Code 2.1
PTy PD
0xd0000 PDBR/CR3 0xc4000 0xd0000
cs ds PTx
0x0007e R/W
Acess 0x0006e R/W 0x00008 R 1023 0x6e000 PTIdx Phys PT Pointer Access Stack 2.1 Data 1.1 Stack 1.1 Code 1.2 Data 2.1 Code1.1 1023 PDIdx
PPFN: 0x6e000 0x08000 0x000 0x040 0xd0000 Physical Addr
Proc 1 Page Table y
0x001 PDIdx PTIdx
Page Directory
0x0007a R
R/W … … 0x00041 R/W 1023 PTIdx 0xe8000 0xe8000 1 0xc4000 0x4a000 0x7e000 0x7a000 0x000e8
… 0x000c4
27
1 2 1023 1 2 1023 1 2 1023 1 2 1023
Offset w/in page Level Index 1
31 12 11 22 21
Level Index 2
10 10
Pointer to start of 2nd Level Table PPFN’s
frame I/O and un- used area frame
0x0
Unused entries can store a NULL pointer (dark shaded entries) "Swapping" can be performed if no physical frames available when a new page need be allocated. We simply swap the page of data to disk to free up the physical frame. We can retrieve it when necessary. A secondary data structure (not shown) can be maintained for pages and store where that page’s data resides on disk
28
Page Frame Number Valid / Present Modified / Dirty Referenced Protection Cacheable
29
– Record the offending address and generate a page fault exception
– Pick an empty frame or select a page to evict – Writeback the evicted page if it has been modified
– Bring in the desired page and update the page table
– Restart the offending instruction
– Allocate a new page, zero it out, perform copy-on-write, etc.
30
– Internal nodes contain pointers to other page tables – Leaves hold actual translations
0x40 0x040 0x35 Virtual Addr VPN
Idx1 Idx2 7 bits 7 bits 12 bits 0xd0000 PDBR/CR3 … … 0x3f 6 bits Idx3 [0x45] PT2[] = start addr PD start addr [0x3f] [0x35] PT3[] = start addr Level 1 Level 2 Level 3 CPU
…
Translations live in this level
31
0x40 0x040 0x35 Virtual Addr VPN
Idx1 Idx2 7 bits 7 bits 12 bits 0xd0000 PDBR/CR3 … … 0x3f 6 bits Idx3 [0x45] PT2[] = start addr PD start addr [0x3f] [0x35] PT3[] = start addr CPU
…
unused unused used unused used unused used used unused used used
Physical Address Space Pages can be anywhere.
used used used unused unused used unused used unused unused used
Virtual Address Space may be "sparse". In that case any PT entry can be null indicating no translations (and page tables are needed for those address ranges)
1 2 3 3 Notes:
entries
32
Segment Translation (Bounds checks)
Proc. Core
Virtual Addr
Memory
Physical Addr Data
Paging
33
0x040 Virtual Addr VPN
12 bits 0xd0000 PDBR/CR3 0x001 0x001 VPN PDIdx PTIdx 10 bits10 bits [0x3f] CPU … Level 1 (Page Dir) Level 2 (Page Table)
unused
Code 2.1
PTy
0xc4000 0xd0000
PTx
0x6e000 Stack 2.1 Data 1.1 Stack 1.1 Code 1.2 Data 2.1 Code1.1 0x08000 0xe8000 0x4a000 0x7e000 0x7a000
PD
0x040 + PA: 0x6e040 Physical Addr PPFN: 0x6e000
34
Offset w/in page
Index 1
8 11 6
Index 2 Process ID Index 3
6
4095 MMU hold 4096 entry table (one entry per context/process) [Essentially, PTBR for each process] Context Table First Level Second Level Third Level 4K Page Desired word PPFN 28 * 4 bytes 26 * 4 bytes 26 * 4 bytes
Virtual Address:
35
1 2 3 1 2
… …
1 2
…
Physical Memory Page Tables P1 P2
36
be combined
page table
– PT for Code, Data, Stack, etc.
VPN, and offset
CPU
esp
VA: 0x102040
ebx
VA: 0x202eac
eip eax Translation Unit / MMU
+ PA: 0x6e040 Physical Addr Virtual Addr
unused ss
Code 2.1
PT1.1
0xc4000 0xcc000
cs ds PT1.0
0x7e000 R
R/W 0x6e000 R 0x08000 R 1 2 0x6e000
VPN 0x75000 R/W
R/W 0x48000 R/W 0x6e000 R/W Stack 2.1 Stack 1.0 Stack 1.2 Code 1.0 Stack 1.1 Code1.1 1 2 VPN PPFN: 0x6e000 0x08000 0x02 0x040 0xc4000 Physical Addr
0x1 0xcc000 0x18 R/W PTBase Bound (# pages) R/W 0xc4000 0x03 R/W 0xc8000 0x08 R/W 1 2 > 0x75000 0x48000 VPN offset seg. 0xcc000 Exception
Descriptor Table
37
– Though they really aren't that big
– One entry per physical frame – Hash the virtual address and whatever results is where that page must reside
– Becomes hard to maintain in hardware, but can be used by secondary software structures
213-745-9823
LUT indexed w/ contact ID
000 626-454-9985 … 323-823-7104 818-329-1980 001 002 999 626-454-9985
Hash func.
38
39
40
– Implies page size should be fairly large (i.e. once we’ve taken the time to find data on disk, make it worthwhile by accessing a reasonably large amount of data) – Implies a “page fault” is going to take so much time to even access the data that we can handle them in software (via an exception) rather than using HW like typical cache misses [Other implications you might not yet understand yet w/o caching knowledge] – Implies eviction algorithms like LRU can be used since reducing page miss rates will pay off greatly – Implies the placement of pages in main memory should be fully associative to reduce conflicts and maximize page hit rates – Implies write-back (write-through would be too expensive)
41
corresponds to the given virtual address?
– That sounds BAD!
Offset w/in page
Index 1
8 11 6
Index 2 Process ID Index 3
6
4095 MMU hold 4096 entry table (one entry per context/process) [Essentially, PTBR for each process] Context Table First Level Second Level Third Level 4K Page Desired word PPFN 28 * 4 bytes 26 * 4 bytes 26 * 4 bytes
Virtual Address
42
Processor Chip Translation Unit / MMU
associativity (at least 4-way and many times fully associative) to avoid conflicts – On hit, the PPFN is produced and concatenated with the offset – On miss, a page table walk is needed
TLB Cache
CPU
VA VPN Page Offset PPFN PA data 10 ns 10 ns
Memory Memory (Page Table)
Hit Miss Miss Hit
Cost of Translation: Cost of TLB lookup + (1-P(TLB-hit) * Cost(Page Table Walk))
43
Offset w/in page
Virtual Address
Virtual Page Number
31 12 11
Page Frame # 0x308ac
Offset w/in page
Physical Address
31 12 11
V D 0x7ffe4 Tag = VPN
= = = =
Fully Associative TLB
(Entry can be anywhere and thus we must check all locations in TLB for a hit)
20 12
TLB
0x7ffe4
44
– On hit, page frame # supplied quickly w/o page table access
Offset w/in page
Virtual Address
Virtual Page Number
31 12 11
Offset w/in page
Physical Address
31 12 11
Set Tag
Tag PF# Tag PF# Tag PF# Tag PF#
= = = =
Way 1 Way 0 Way 2 Way 3
4 16
45
– MMU can perform page table walk if needed – If page fault occurs, OS takes over to bring in the page
– If TLB miss, OS can perform both the page table walk and bring in page if necessary
– First flush out blocks belong to that page from data cache (writing back if necessary) – Invalidate tags of those blocks – Invalidate TLB entry (if any) corresponding to that page
– If page is dirty, copy page back to the disk – Simple way to remember this…
46
– Can TLB share mappings from multiple processes?
translations
– Virtual addresses are not unique between processes
– Can choose to only hold translations for current process and thus invalidate all entries on context switch – Can hold translations for multiple processes concurrently by concatenating a process or address space ID (PID or ASID) to the VPN tag
Offset
VA
VPN
31 12 11
ASID
Unique ID for each process
Page Frame # V M Tag
= = = =
ASID
47
large contiguous physical chunk
– All the lower (2nd) level entries of the page table would point at a contiguous range – Instead of the level 1 entry pointing at a level 2 page table, it has the physical address translation – A bit in the level 1 entry indicates if it contains this phys. translation or a pointer to the 2nd level table
0x00000000 0x3a000000
Super Page
frame I/O and un- used area
0xffffffff
unused unused
Super Page
Space
0x040 Virtual Addr VPN
12 bits 0xd0000 PDBR/CR3 0x010 0x001 VPN PDIdx PTIdx 10 bits10 bits CPU … Level 1 (Page Dir) Level 2 (Page Table)
x86 allows for 2MB or 1GB super pages by skipping 1 or 2 last levels of its 4-level page table scheme.
0x001040
0x3a001040 0x01001040
48
Processor Chip Translation Unit / MMU
TLB IL1 $ / DL1 $
CPU
VA VPN Page Offset PPFN PA data 10 ns 10 ns
Memory Memory (Page Table)
Hit Miss Miss Hit
ITLB / DTLB IL1 $ / DL1 $
49
– Conservatively need to invalidate TLB entry in the other processor
in the TLBs from P1?
– SHOOTDOWN: 1 proc. interrupts another and indicates a TLB entry should be invalidated
Processor Chip Translation Unit / MMU Core1 (P1)
TLB Cache
Translation Unit / MMU Core3 (P2)
TLB Cache
Translation Unit / MMU Core2 (P1)
TLB Cache
Translation Unit / MMU Core4 (P2)
TLB Cache
… Level 1 (Page Dir) Level 2 (Page Table) … Level 1 (Page Dir) Level 2 (Page Table)
50
51
– Store copies of data indexed based on the address they came from in MM – Simplified view: 2 steps to determine hit
entries in set to determine hit
– Sequential connection between indexing these two steps (index + tag match)
then performing this two step hit process, can we
sequence?
– Yes if we choose page size, block size, and set/direct mapping carefully
1 2 3 4 …
addr, data addr, data
Index/Hash Tag Offset
Address
52
– Wait for full address translation – Then use physical address for both indexing and tag comparison
– Use portion of the virtual address for indexing then wait for address translation and use physical address for tag comparisons – Easiest when index portion of virtual address w/in offset (page size) address bits, otherwise aliasing may occur
– Use virtual address for both indexing and tagging…No TLB access unless cache miss – Requires invalidation of cache lines on context switch or use of process ID as part of tags
Offset
VA
VPN
31 12 11
Offset
PA
PFN
31 12 11
Set/Blk Tag
PIPT
Offset
VA
VPN
31 12 11
Offset
PA
PFN
31 12 11
Tag Set/Blk Offset
VA
VPN
31 12 11
Offset
PA
PFN
31 12 11
Set/Blk Tag
VIPT VIVT
53
Virtually addressed Cache Physically addressed Cache In a modern system the L1 caches may be virtually addressed while L2 may be physically addressed.
54
55
– Most languages don't provide raw pointer features
– MS.NET - Many languages (C#, VB, etc.) compiled to intermediate byte-code and then run by an interpreter – Java
– Insert code to perform checks that prove the program cannot violate certain properties – If proven we can remove checks