A Framework for Application Guidance in Virtual Memory Systems
Presented by Michael Jantz
Contributions from Carl Strickland, Kshitij Doshi, Martin Dimitrov, and Karthik Kumar
1
in Virtual Memory Systems Presented by Michael Jantz Contributions - - PowerPoint PPT Presentation
A Framework for Application Guidance in Virtual Memory Systems Presented by Michael Jantz Contributions from Carl Strickland, Kshitij Doshi, Martin Dimitrov, and Karthik Kumar 1 Executive Summary Memory has become a significant player in
1
system, and hardware:
– Applications inform OS about memory usage – Expose hardware power-manageable domains to the OS
virtual machine (Oracle/Sun’s HotSpot)
2
– In servers, memory power == 40% of total power
– threads may be affinitized to individual cores or migrated b/w cores – prioritize threads for task deadlines (with nice) – individual cores may be turned off when unused
3
4
5
6
– Colors applied to sets of virtual pages that are alike – Attributes associated with each color
– Hot and cold pages (frequency of access) – Pages belonging to data structures with different usage patterns
7
Software Intent Color Tray Memory Allocation and Freeing
– ACPI 5.0 defines memory power state table (MPST)
Software Intent Color Tray Memory Allocation and Freeing
8
… R0 – R3 R4 – R7
Node 0 Node 1 T0 T1 T2 T3 T7 T6 T5 T4 Table of selectable mem.
SOCKET_AFFINITY EXCLUSIVE_MEM_UNIT MEM_PRIORITY … Trays: Pages: …
Physical memory allocation and recyclying
Application uses color to indicate that this set of pages will be hot
Memory topology represented in the OS using trays
Lookup mem. mgmt. policy for pages with a particular color
VN … V2 V1 PN … P2 P1 Hot pages Warm pages Cold pages
9
10
11
0.2 0.4 0.6 0.8 1 1.2
Performance of NUMA optim. relative to default Benchmarks NUMA API
12
implemented with (1) NUMA API’s and (2) memory coloring framework
10 20 30 40 50 60 70 80 90 100
% memory reads satisfied by local DRAM Benchmarks default NUMA API
13
benchmarks with each HotSpot configuration.
14
6 8 10 12 14 16 200 400 600 800 1000 1200 1400 1600 Free mem. on node (in GB) Time (s) 2GB 4GB 8GB 16GB
15
longer to complete
2 4 6 8 10 12 14 16 2 4 6 8 10 12
consumption (in W) Memory activated by scale_mem (in GB)
Default kernel (interleaving enabled) Default kernel (interleaving disabled) Power efficient custom kernel
16
transitioning to low-power states
2 4 6 8 10 12 14 16 2 4 6 8 10 12
consumption (in W) Memory activated by scale_mem (in GB)
Default kernel (interleaving enabled) Default kernel (interleaving disabled) Power efficient custom kernel
17
proportional to the active footprint
18
4 5 6 7 8 2 4 6 8 10 DRAM power (W) Time (s)
default color-aware
– Isolate older generation on its own power-manageable unit – Older generation powers down during young generation GC
19
4 5 6 7 8 2 4 6 8 10 DRAM power (W) Time (s)
default color-aware
– But there is much more to be done
20
21
– Maximize performance – Application-guided read-ahead and/or fault-ahead
– Minimum residency time, capacity allocation, etc.
22
Pages of different types Frequently referenced Infrequently referenced Application Problem Operating system does not see a distinction between:
ranks Node’s Memory
23
Pages of different types Node’s Memory Frequently referenced Infrequently referenced Application Note: not drawn to scale- 106 4kB pages can be contained in a 4GB DIMM Self refresh (idle) state More power management Less power management
24
– Temporal and spatial variations in application memory usage – Depend on how virt. mem. binds to phys. mem.
– Low activity: condense hot pages onto a small set of ranks (reduce power) – High activity: spread pages across as many ranks as possible (maximize perf.)
25
26
– An application wants to manage different sets of pages differently – An application wants to reduce the size of its memory footprint – An application wants to control its page eviction
(equivalent thread level scenarios are currently possible for CPUs)
– no mechanism for the application to pass information about its memory references to the OS and hardware – no mechanism for the OS to use this information to confine pages to different subsets of the total spatial capacity of memory
27
accessed (hot) and pages with relatively infrequent references (cold)
units than cold pages
INTENT MEM-INTENSITY
MEM-INTENSITY RED // hot pages MEM-INTENSITY BLUE 1 // cold pages
BLUE with the mcolor system call: addr = malloc (hot_object_size); mcolor(addr, hot_object_size, RED);
Software Intent Color Tray Memory Allocation and Freeing
28
29
– Systems include an API and toolkit for controlling memory placement on NUMA nodes
manageable units, but is flexible enough to emulate the functionality of the NUMA.
– Intent: Restrict some virtual range to physical allocations from node 1, some other virtual range to nodes 2 and 3 – Example mapping:
SOCKET_AFFINITY_ABSOLUTE RED 1 /* allocate only from node 1 */ SOCKET_AFFINITY_ABSOLUTE BLUE 2,3 /* allocate only from nodes 2&3 */ SOCKET_AFFINITY_RELATIVE WHITE 1 /* allocate node local */ SOCKET_AFFINITY_RELATIVE YELLOW /* allocate anywhere */
30
access locality.
use the object.
– “Eden” space is divided into different regions per NUMA node and the physical memory corresponding to each eden region is bound to a particular NUMA node (via the NUMA API) – Application’s newly allocated objects are placed into the eden space local to the allocating thread.
region with the appropriate SOCKET_AFFINITIZATION color
31
32
designed a “power-efficient” memory management configuration
furnished a page for similar use. – In this way, total number of additional ranks that need to stay powered up is reduced
– Allocate increasing amounts of memory in stages – Each stage allocates enough additional memory to fit in exactly one rank (2GB, in our case) – Each stage continuously reads and writes the allocated memory and lasts for 100 seconds.
33