in Virtual Memory Systems Presented by Michael Jantz Contributions - PowerPoint PPT Presentation

A Framework for Application Guidance in Virtual Memory Systems Presented by Michael Jantz Contributions from Carl Strickland, Kshitij Doshi, Martin Dimitrov, and Karthik Kumar 1

Executive Summary • Memory has become a significant player in power and performance of server systems • Memory power management is challenging • Propose a collaborative approach between applications, operating system, and hardware: – Applications inform OS about memory usage – Expose hardware power-manageable domains to the OS • Implement our framework by re-architecting a recent Linux kernel • Evaluate by classifying memory usage in an industrial-grade Java virtual machine (Oracle/Sun’s HotSpot) 2

Why • CPU and Memory are most significant players for power and performance – In servers, memory power == 40% of total power • Applications can direct CPU usage – threads may be affinitized to individual cores or migrated b/w cores – prioritize threads for task deadlines (with nice) – individual cores may be turned off when unused • Surprisingly, no such controls exist for memory 3

Example Scenario • System with database workload with 256GB DRAM – All memory in use, but only 2% of pages are accessed frequently – CPU utilization is low • How to reduce power consumption? 4

Challenges in Managing Memory Power • At least two levels of virtualization: – Virtual memory abstracts away application-level info – Physical memory viewed as single, contiguous array of storage • No way for agents to cooperate with the OS and with each other • Lack of a tuning methodology 5

A Collaborative Approach • Our approach: enable applications to guide mem. mgmt. • Requires collaboration between the application, OS, and hardware: – Interface for communicating application intent to OS – Ability to keep track of which memory hardware units host which physical pages during memory mgmt. • To achieve this, we propose the following abstractions: – Colors – Trays 6

Communicating Application Intent with Colors Software • Color = a hint for how pages will be used Intent – Colors applied to sets of virtual pages that are alike – Attributes associated with each color Color • Attributes express different types of distinctions: – Hot and cold pages (frequency of access) – Pages belonging to data structures with different Tray usage patterns • Allow applications to remain agnostic to lower level Memory details of mem. mgmt. Allocation and Freeing 7

Power-Manageable Units Represented as Trays Software • Tray = software structure containing sets of pages Intent that constitute a power-manageable unit • Requires mapping from physical addresses to Color power-manageable units – ACPI 5.0 defines memory power state table (MPST) Tray • Re-architect a recent Linux Kernel to perform memory management over trays Memory Allocation and Freeing 8

Application Table of selectable mem. Hot pages mgmt. policies Warm pages … V1 V2 … VN … SOCKET_AFFINITY Cold pages EXCLUSIVE_MEM_UNIT Application uses color to indicate MEM_PRIORITY that this set of pages will be hot … P1 P2 … PN Operating Lookup mem. mgmt. policy for pages with a particular color System Physical memory allocation and recyclying Trays: T0 T1 T2 T3 T4 T5 T6 T7 Pages: Hardware Memory topology Node 0 Node 1 represented in the OS using trays R0 – R3 R4 – R7 9

Experimental Evaluation • Emulating NUMA API’s • Memory prioritization • Enabling power consumption proportional to the active footprint • Exploiting generational garbage collection to reduce DRAM power 10

Emulating NUMA API’s • Modern server systems enable memory mgmt. at level of NUMA node • API’s control memory placement on NUMA nodes • Our framework is flexible enough to emulate NUMA API functionality • Oracle/Sun’s HotSpot JVM uses NUMA API to improve DRAM access locality for several workloads • Modified HotSpot to control memory placement using memory coloring framework 11

Memory Coloring Emulates the NUMA API NUMA API mem. color API Performance of NUMA optim. 1.2 1 relative to default 0.8 0.6 0.4 0.2 0 Benchmarks • Performance of SciMark 2.0 benchmarks with “NUMA - optimized” HotSpot implemented with (1) NUMA API’s and (2) memory coloring framework • Performance is similar for both implementations 12

Memory Coloring Emulates the NUMA API default NUMA API mem. color API % memory reads satisfied 100 90 by local DRAM 80 70 60 50 40 30 20 10 0 Benchmarks • % of memory reads satisfied by NUMA-local DRAM for SciMark 2.0 benchmarks with each HotSpot configuration. • Performance with each implementation is (again) roughly the same 13

Memory Prioritization • Example scenario: – Several applications compete over same memory pool – Allocations for low-priority apps. contend with high priority app. • No way for systems to directly prioritize memory • memnice : restrict low-priority apps. from using the entire pool of memory – Use colors to set priorities for virtual pages – Low-priority allocations restricted to a subset trays 14

Memory Prioritization 16 Free mem. on node (in GB) 14 12 10 2GB 4GB 8 8GB 16GB 6 0 200 400 600 800 1000 1200 1400 1600 Time (s) • Run kernel compilation with different memory priority configurations. • Compute free memory on node during each compilation using /proc • Restricted compilations stop expanding their memory footprints – but take longer to complete 15

Enabling Power Consumption Proportional to the Active Footprint 16 14 consumption (in W) Avg. DRAM power 12 10 8 6 Default kernel (interleaving enabled) 4 Default kernel (interleaving disabled) 2 Power efficient custom kernel 0 0 2 4 6 8 10 12 Memory activated by scale_mem (in GB) • Even a small memory footprint can prevent memory hardware units from transitioning to low-power states • Custom workload incrementally increases memory usage in 2GB steps 16

Enabling Power Consumption Proportional to the Active Footprint 16 14 consumption (in W) Avg. DRAM power 12 10 8 6 Default kernel (interleaving enabled) 4 Default kernel (interleaving disabled) 2 Power efficient custom kernel 0 0 2 4 6 8 10 12 Memory activated by scale_mem (in GB) • Default kernel yields high power consumption even with small footprint • Custom kernel – tray-based allocation enables power consumption proportional to the active footprint 17

Coloring Generational Garbage Collection to Reduce DRAM Power default color-aware 8 DRAM power (W) 7 6 5 4 0 2 4 6 8 10 Time (s) • Memory power optimization with generational garbage collection – Isolate older generation on its own power-manageable unit – Older generation powers down during young generation GC 18

Coloring Generational Garbage Collection to Reduce DRAM Power default color-aware 8 DRAM power (W) 7 6 5 4 0 2 4 6 8 10 Time (s) • DRAM power consumption for derby benchmark • Dips correspond to garbage collection • Isolating the older generation saves about 9% power over the entire run. 19

Conclusion • A critical first step in meeting the need for a fine-grained, power-aware flexible provisioning of memory. • Initial implementation demonstrates value – But there is much more to be done • Questions? 20

Backup 21

Future Work • MPST • Select server-scale software package to implement color awareness • Other optimizations – Maximize performance – Application-guided read-ahead and/or fault-ahead • Page recycling policies – Minimum residency time, capacity allocation, etc. • Development of tools for instrumentation, analysis, and control 22

Default Linux Kernel Infrequently Frequently Pages of different types referenced referenced Application Node’s Memory Problem ranks Operating system does not see a distinction between: • different types of pages from the application • different units of memory that can be independently power managed 23

Custom Kernel with Memory Containerization Infrequently Frequently Pages of different types referenced referenced Application Node’s Memory Less power More power Self refresh (idle) management management state Note: not drawn to scale- 10 6 4kB pages can be contained in a 4GB DIMM 24

Controlling Memory • Difficult to control distribution of memory bandwidth, capacity, or power – Temporal and spatial variations in application memory usage – Depend on how virt. mem. binds to phys. mem. • Layout of each application’s hot pages affects DRAM power and performance: – Low activity: condense hot pages onto a small set of ranks (reduce power) – High activity: spread pages across as many ranks as possible (maximize perf.) 25

Our Approach • Our approach: Enable apps. to guide physical memory mgmt. 26

in Virtual Memory Systems Presented by Michael Jantz Contributions - PowerPoint PPT Presentation

A Framework for Application Guidance in Virtual Memory Systems Presented by Michael Jantz Contributions from Carl Strickland, Kshitij Doshi, Martin Dimitrov, and Karthik Kumar 1 Executive Summary Memory has become a significant player in

Virtual Memory and Virtual Memory and Demand Paging Demand Paging Virtual Memory Illustrated

Lecture 19: Virtual Memory Virtual Memory concept, Virtual- physical translation, page table,

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 21: Virtual Memory, I/O Basics Todays topics: Virtual memory I/O overview

Virtual Memory 1 Virtual Memory Main memory is cache for secondary storage

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Virtual Memory: Demand Paging and Replacment Virtual Memory Illustrated virtual physical

Lecture 23: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Last class: Paging Today: Virtual Memory Virtual Memory What if programs

Last class: Virtual Memory Today: Virtual Memory Uses Efficient Use of Physical

Operating Systems WT 2019/20 Memory Management Shared Memory Process 1 virtual memory most

Virtual Memory Questions? ! What is virtual memory and when is it useful? CSCI [4|6] 730 ! What is

Shared Memory OS Lecture 9 UdS/TUKL WS 2015 MPI-SWS 1 Review: Virtual Memory How is virtual

LECTURE 12 Virtual Memory VIRTUAL MEMORY Just as a cache can provide fast, easy access to

Virtual Memory Virtual Memory - The games we play with addresses and the memory behind them

0DAF F9H:8AIF

Automatic Techniques to Systematically Discover New Heap Exploitation Primitives Ins Insu Yu

TREATMENT FOR PTSD/SUD The Fear Structure A fear structure is a program for escaping danger

The Institute for Advanced Architectures and Algorithms (IAA) David H. Rogers Sudip Dosanjh

st st r tr

Main Memory Prof. Bracy and Van Renesse CS 4410 Cornell University based on slides designed by

1 Clock algorithm Least Recently Used (LRU) Same functionality as Assume pages used

Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014