SHARED ADDRESS TRANSLATION REVISITED
Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University
TRANSLATION REVISITED Xiaowan Dong - - PowerPoint PPT Presentation
SHARED ADDRESS TRANSLATION REVISITED Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University Limitations
Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University
process
(TLB) entries
information
2
(as much as 58% on Android)
physical memory
Page Table entry Page Table entry TLB entry … TLB entry
Process 1 Process 2
Page Table entry Page Table entry
amounts of contiguous data
intensively
3
All applications share the same physical and virtual addresses for the preloaded libraries
4
5
zygote-preloaded shared libraries
existing hardware support
physical page
Page Table entry TLB entry
Process 1 & Process 2
Impact of Shared Libraries on Instruction Footprint
6
0% 20% 40% 60% 80% 100%
% of inst pages accessed
zygote-preloaded shared lib
0% 20% 40% 60% 80% 100%
% of inst fetched
zygote-preloaded shared lib
93% 98%
68% 72%
library code accessed across different applications
common for each pair of applications
7
Laya Music Player Adobe Reader MX Player 91% 72% 85%
The % of inst footprint overlapped
8
level hierarchical page table
time between the zygote and its child processes
pages should both be managed in a copy-on-write (COW) manner
9
L1 PTE L1 PTE L2 PTE L2 PTE L2 PTE L2 PTE L1 PTE L1 PTE L2 PTE L2 PTE L2 PTE L2 PTE Zygote Android application
cases:
shared page table page
library into different page table pages
10
libraries’ code segments
TLB entries
11
12
Zygote- preloaded shared libraries User Space Kernel Space Domain 2 Domain 1 Domain 3 … 00 … Non-zygote processes … 01 … Zygote-like processes Domain 3 DACR VPN ASID 1 0011 Permission bits Global bit Domain field TLB Memory Abort Handler Trap into kernel Domain fault ? Check fault status register Flush all TLB entries with the faulting address
Leveraging the domain protection model
00: No access permission 01: Based on permission bits listed in the TLB entry
13
14
processes
15
Kernel Execution Cycles (x 106) # of PTPs allocated # of PTEs copied Stock Android 2.9 38 3,900 Copied PTEs 4.6 51 9,800 Shared PTPs 1.4 1 7
application-specific Java classes
16
17
38% fewer Page faults for creating PTEs that map shared code and data on average (maximum 78%) 35% fewer page table pages allocated (maximum 58%)
0% 20% 40% 60% 80% 100%
PTP allocation normalized to stock Android
common on Android
Android IPC binder mechanism
18
improved
platforms
19
example) will waste physical memory compared to 4KB base pages:
union set
pages for code
page table as 4KB page does
20
CDF of # of 4KB pages untouched within a 64KB large page of zygote-preloaded shared libraries
21
Task_struct .zygote = 1 Vma.global = 1 mmap the code segment of a shared library fork Task_struct. zygote_like =1 inherit Vma.global = 1 zygote exec Task_struct. zygote =1 or zygote_like = 1? Page fault on a zygote- preloaded shared library Vma. global = 1 ? Set global bit in PTE yes yes Global bit is used for kernel pages in stock Linux
Parent’s addr space
vma1 vma2 vma3
L1 PTP
L1 PTE1 L1 PTE2 L1 PTE3
L2 PTP
L2 PTE1 L2 PTE2 L2 PTE3
Child’s addr space
vma1 vma2 vma3
L1 PTP
L1 PTE1 L1 PTE2 L1 PTE3 L2 PTP is shared? No Write-protect every writable L2 PTE Shared PTP Virtual memory area (VMA): a memory region If ARM supports write protection in L1 PTE as x86, we can avoid write- protecting every L2 PTE