TRANSLATION REVISITED Xiaowan Dong - PowerPoint PPT Presentation

SHARED ADDRESS TRANSLATION REVISITED Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University

Limitations of Current Shared Memory Management physical • Physical memory sharing is common memory • However, address translation is private per Process 2 Process 1 process Page Table Page Table Page Table Page Table • page tables and Translation Lookaside Buffer entry entry entry entry (TLB) entries • Potential for duplicate translation information (as much as 58% on Android) • Scalability problem: O(# of processes) TLB entry • Inefficient utilization of shared caches … TLB entry 2

Previous Work • Previous work shares page tables for applications handling large amounts of contiguous data • E.g., PostgreSQL database systems • Limitations: • Overlook code at smaller granularity (such as shared libraries) • Ignore duplication in the TLB • New opportunities on Android, where shared libraries are used intensively 3

Android Process Creation Model All applications share the same physical and virtual addresses for the preloaded libraries 4

Goal: Shared Address Translation: Page Tables and TLB Entries • Sharing address translation for the physical page zygote-preloaded shared libraries Page Table Process 1 • Implemented at the OS level with & entry existing hardware support Process 2 • Mostly machine-independent • Benefits TLB entry • Reduce soft page faults • Improve cache and TLB performance 5

Impact of Shared Libraries on Instruction Footprint • Number of shared libraries per application: • Loaded: 88 to 107 (zygote-preloaded: 88) • Invoked: 24 to 68 (zygote-preloaded: 21 to 46) % of inst pages accessed % of inst fetched 93% 98% 100% 100% 80% 80% 72% 60% 60% 68% 40% 40% 20% 20% 0% 0% zygote-preloaded shared lib other shared lib zygote-preloaded shared lib other shared lib 6

Shared Library Instruction Footprint Intersection • Considerable overlap in the shared Laya Music library code accessed across different Player applications 91% 85% • 46% of total inst pages accessed are in common for each pair of applications MX Adobe 72% • Zygote-preloaded: 38% Player Reader The % of inst footprint overlapped 7

SHARING ADDRESS TRANSLATION 8

L2 PTE Sharing Page Tables L2 PTE L1 PTE L1 PTE • The ARM architecture defines a two- L2 PTE level hierarchical page table Zygote L2 PTE • L2 page table pages are shared at fork time between the zygote and its child processes L2 PTE • Supports private writable memory regions L2 PTE L1 PTE • Shared page table pages and physical L1 PTE pages should both be managed in a L2 PTE copy-on-write (COW) manner Android L2 PTE application 9

Maintaining Shared Page Tables • A shared page table page needs to be unshared (COWed) in the following cases: • Page fault with write access • A process creates, destroys, or modifies a memory region within the range of a shared page table page • A process tries to free a shared page table page • Modification to any memory region will lose the entire shared page table page • Mapping the page table entries of the code segment and data segment of a shared library into different page table pages 10

Sharing TLB Entries • Global bit • We set the global bit in the page table entries of the zygote-preloaded shared libraries’ code segments • Overrides Address Space Identifier (ASID) in TLB • Domain protection model of 32-bit ARM • Prevents processes not forked from the zygote from accessing the shared global TLB entries • E.g., system services and daemons 11

Leveraging the domain protection model Domain field Global bit Domain 1 Domain 2 Domain 3 Zygote- VPN ASID 1 0011 Permission TLB User Kernel preloaded bits Space Space shared libraries Memory Abort Handler Trap into kernel Non-zygote … 00 … processes DACR Domain Check fault Zygote-like … 01 … fault ? status register processes Domain 3 00: No access permission Flush all TLB 01: Based on permission bits listed in the TLB entry entries with the faulting address 12

EVALUATION 13

Evaluation Platforms • Nexus 7 (2012) • 1.2GHz NvidiaTegra 3 processor with four ARM Cortex-A9 cores • A private 2-level TLB • I/D micro TLB ( flushed over context switch ) • 128-entry main TLB • 32KB/32KB L1 cache (I/D) • 1MB shared L2 cache • Android KitKat 4.4.4 OS • New android runtime (ART) • Benchmarks: • Most popular application in each category on Google Play Store 14

Zygote Fork • Sharing page table improves execution time of a zygote fork by 2.1x • Trade-off between cost of fork and # of page faults experienced by child processes • Sharing page table is the best of both worlds Execution Cycles (x 10 6 ) Kernel # of PTPs allocated # of PTEs copied Stock Android 2.9 38 3,900 Copied PTEs 4.6 51 9,800 Shared PTPs 1.4 1 7 15

Application Launch Performance • Every application follows the same launch procedure before it loads its application-specific Java classes • Launch time improved by 7% (10% with 2MB alignment) • 94% fewer page faults for creating PTEs that map shared code and data • 15% reduction in L1 Icache stall cycles • 68 % less page table page allocation 16

Over The Course of Execution PTP allocation normalized to stock Android 100% 80% 60% 40% 20% 0% 38% fewer Page faults for creating PTEs 35% fewer page table pages allocated that map shared code and data on average (maximum 58% ) (maximum 78% ) 17

Android IPC Performance • Inter-process communication (IPC) is common on Android • Developed microbenchmark using Android IPC binder mechanism • Inst main TLB stall cycles are reduced by: • Client: 36% • Server: 19% 18

Conclusion • Android presents opportunities for shared library address translation sharing • We eliminated the duplication of address translation on Android • Android’s application launch, steady -state, and context switch efficiency are improved • Speed up a zygote fork by 2.1x • Improve application launch by 10% • Our shared address translation infrastructure should be portable to other platforms 19

Large Pages Are Inefficient for Zygote- preloaded Shared Libraries • Using large pages (64KB page for example) will waste physical memory compared to 4KB base pages: • 2.6x memory consumption on average • 94% more memory consumption for the union set • Linux does not support the use of large pages for code • Our design can complement large pages • 64KB page on ARM also requires 2-level CDF of # of 4KB pages untouched within a 64KB page table as 4KB page does large page of zygote-preloaded shared libraries 20

Page fault on a zygote- preloaded shared library Sharing TLB Task_struct. zygote zygote =1 or zygote_like = mmap the code exec 1? segment of a shared library Task_struct Vma.global yes Global bit is used = 1 .zygote = 1 for kernel pages Vma. in stock Linux global = 1 ? fork inherit yes Task_struct. Vma.global zygote_like =1 = 1 Set global bit in PTE 21

Sharing Page Table at Fork L2 PTP is Virtual memory area (VMA): a memory region shared? Parent’s L1 PTP addr space No L1 PTE1 vma1 L2 PTP L1 PTE2 vma2 L2 PTE1 L1 PTE3 vma3 Write-protect every writable L2 PTE L2 PTE2 Child’s addr L2 PTE3 L1 PTP space Shared PTP L1 PTE1 vma1 L1 PTE2 If ARM supports write vma2 protection in L1 PTE as L1 PTE3 x86, we can avoid write- vma3 protecting every L2 PTE

TRANSLATION REVISITED Xiaowan Dong - PowerPoint PPT Presentation

SHARED ADDRESS TRANSLATION REVISITED Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University Limitations

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Translation Memory & Machine Translation Dj Vu combines both smartly! Content

Translation Services: Innovation in Translation Workflow, Tools and Translation Workflow, Tools

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

Second Quarter 2013 Financial Results Conference Call August 7, 2013 Forward-looking

Global Poultry Sector Trends and External Drivers for Structural Change What is the future of

INSIDE ENGIE BRASIL ENERGIA 2017 May 31, 2017 DISCLAIMER This publication may include

Half year results 2014 Amsterdam, 24 July 2014 Disclaimer The information contained herein

Security Enhanced (SE) Android: Bringing Flexible MAC to Android Stephen Smalley and Robert Craig

Generating Precise Dependencies for Large Software Pei Wang, Jinqiu Yang, Lin Tan University of

What s in a Word? Academic Vocabulary Development for ELLs CCRC 2014 1 Essential

24.01.2012 Is Google Maps a place for Advertising? Presented by : Sriharsha Kanduri Presented to

TRANSLATION REVISITED Xiaowan Dong - PowerPoint PPT Presentation

SHARED ADDRESS TRANSLATION REVISITED Xiaowan Dong University of Rochester Sandhya Dwarkadas University of Rochester Alan L. Cox Rice University Limitations

Hom and Ext, Revisited Justin Lyle Lawrence, KS justin.lyle@ku.edu April 28, 2018 JL Hom and

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Translation Memory &amp; Machine Translation Dj Vu combines both smartly! Content

Translation Services: Innovation in Translation Workflow, Tools and Translation Workflow, Tools

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

CRF Word Alignment &amp; Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

Second Quarter 2013 Financial Results Conference Call August 7, 2013 Forward-looking

Global Poultry Sector Trends and External Drivers for Structural Change What is the future of

INSIDE ENGIE BRASIL ENERGIA 2017 May 31, 2017 DISCLAIMER This publication may include

Half year results 2014 Amsterdam, 24 July 2014 Disclaimer The information contained herein

Security Enhanced (SE) Android: Bringing Flexible MAC to Android Stephen Smalley and Robert Craig

Generating Precise Dependencies for Large Software Pei Wang, Jinqiu Yang, Lin Tan University of

What s in a Word? Academic Vocabulary Development for ELLs CCRC 2014 1 Essential

24.01.2012 Is Google Maps a place for Advertising? Presented by : Sriharsha Kanduri Presented to

Translation Memory & Machine Translation Dj Vu combines both smartly! Content

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13