Managed Language Applications Forrest J. Robinson Michael R. Jantz - PowerPoint PPT Presentation

Cross-Layer Memory Management for Managed Language Applications Forrest J. Robinson Michael R. Jantz Kshitij A. Doshi Prasad A. Kulkarni University of Tennessee Intel Corporation mrjantz@utk.edu kshtiji.a.doshi@intel.com University of Kansas {fjrobinson,kulkarni}@ku.edu 1

Memory Power Management • Memory has become a significant player in power and performance – Memory power is a dominant factor in servers [1,2,3,4] • Hardware can automatically power down individual memory modules • Memory power management is challenging – Small footprint can reside in multiple devices – Different memory regions can have different requirements 2

Example Scenario • Server system with database workload with 1TB DRAM – All memory in use, but only 2% of pages are accessed frequently – CPU utilization is low • How to reduce power consumption? 3

A Collaborative Approach to Memory Management • Effective memory management is difficult due to virtualization of memory • We propose a collaborative approach: – Applications – communicate memory usage intent to OS – OS – interprets application intent and manages physical memory over hardware units – Hardware – communicate hardware layout to the OS to guide memory management decisions 4

Application Guidance in the Linux Kernel • Implemented by re-architecting a recent Linux kernel – Applications pass guidance to the OS by coloring virtual address ranges with a system call interface – OS organizes physical memory into software structures that correspond to hardware memory devices ( trays ) • Limitations of our Linux kernel-based framework: – Little understanding of what kind of guidance will be most useful for existing workloads – All hints must be manually inserted into source code 5

Automatic Guidance in the Application Layer • Our approach: integrate with automated mechanism to generate guidance for the OS – No source code modifications or recompilations • Implemented in the HotSpot JVM – Create separate heap regions for different usage patterns – Instrumentation and analysis to build memory profile – Partition/allocate live objects into separate regions according to partitioning strategy – Communicates heap region information to the OS 6

Application Heap Young generation Execution Engine Hot eden Cold eden Object profiling and analysis Hot survivors Cold survivors JIT Compiler Tenured generation Garbage Hot tenured Cold tenured Collection • Employ the default HotSpot config. for server-class applications • Divide survivor / tenured spaces into spaces for hot / cold objects 7

Application Heap Young generation Execution Engine Hot eden Cold eden Object profiling and analysis Hot survivors Cold survivors JIT Compiler Tenured generation Garbage Hot tenured Cold tenured Collection • Partition allocation sites and objects into hot / cold sets • Color spaces on creation or resize 8

Potential of JVM Framework • Our goal: evaluate power-saving potential when hot / cold objects are known statically • MemBench: Java benchmark that uses different object types for hot / cold memory • “ HotObject ” and “ ColdObject ” – Contain memory resources (array of integers) – Implement different functions for accessing mem. 9

Experimental Platform • Hardware – Single node of 2-socket server machine – Processor: Intel Xeon E5-2620 (12 threads @ 2.1GHz) – Memory: 32GB DDR3 memory (four DIMM’s, each connected to its own channel) • Operating System – CentOS 6.5 with Linux 2.6.32 • HotSpot JVM – v. 1.6.0_24, 64-bit – Default configuration for server-class applications 10

The MemBench Benchmark • Object allocation – Creates “ HotObject ” and “ ColdObject ” objects in a large in-memory array – # of hots < # of colds ( ~ 15% of all objects) – Object array occupies most ( ~ 90%) system mem. • Multi-threaded object access – Object array divided into 12 separate parts, each passed to its own thread – Iterate over object array, only accessing hot objects • Optional delay parameter 11

MemBench Configurations • Three configurations – Default – Tray-based kernel (custom kernel, default HotSpot) – Hot/cold organize (custom kernel, custom HotSpot) • Delay varied from "no delay" to 1000ns – With no delay, 85ns between memory accesses 12

MemBench Performance 3.5 25 Perf. (runtime) (P(X) / P(DEF)) default 3 20 Bandwidth (GB /s) tray-based kernel 2.5 hot/cold organize 15 2 1.5 10 1 5 0.5 0 0 85 100 150 200 300 500 750 1000 Time (ns) between memory accesses • Tray-based kernel has about same performance as default • Hot/cold organize exhibits poor performance with low delay 13

MemBench Bandwidth 3.5 25 Perf. (runtime) (P(X) / P(DEF)) default 3 20 Bandwidth (GB /s) tray-based kernel 2.5 hot/cold organize 15 2 1.5 10 1 5 0.5 0 0 85 100 150 200 300 500 750 1000 Time (ns) between memory accesses • Default and tray-based kernel produce high memory bandwidth when delay is low • Placement of hot objects across multiple channels enables higher bandwidth 14

MemBench Bandwidth 3.5 25 Perf. (runtime) (P(X) / P(DEF)) default 3 20 Bandwidth (GB /s) tray-based kernel 2.5 hot/cold organize 15 2 1.5 10 1 5 0.5 0 0 85 100 150 200 300 500 750 1000 Time (ns) between memory accesses • Hot/cold organize - hot objects co-located on single channel • Increased delays reduces bandwidth reqs. of the workload 15

MemBench Energy 2 Energy consumed relative to tray-based kernel (DRAM only) default (J) (J(X) / J(DEF)) 1.8 tray-based kernel (CPU+DRAM) 1.6 hot/cold organize (DRAM only) 1.4 hot/cold organize (CPU+DRAM) 1.2 1 0.8 0.6 0.4 85 100 150 200 300 500 750 1000 Time (ns) between memory accesses • Significant energy savings potential with custom JVM • Max. DRAM energy savings of ~ 39%, max. CPU+DRAM energy savings of ~ 15% 16

Results Summary • Object partitioning strategies – Offline approach partitions allocation points – Online approach uses sampling to predict object access patterns • Evaluate with standard sets of benchmarks – DaCapo, SciMark • Achieve 10% average DRAM energy savings, 2.8% CPU+DRAM reduction • Performance overhead – 2.2% for offline, 5% for online 17

Current and Future Projects in Cross-Layer Memory Management • Improve performance and efficiency – Reduce overhead of online sampling – Automatic bandwidth management • Applications for heterogeneous memory architectures • Exploit data object placement within each page to improve efficiency 18

Conclusions • Achieving power/performance efficiency in memory requires a cross-layer approach • First framework to utilize usage patterns of application objects to steer low-level memory management • Approach shows promise for reducing DRAM energy • Opens several avenues for future research in collaborative memory management 19

Questions? 20

References 1. C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T. W. Keller. Energy management for commercial servers. Computer ,36 (12):39 – 48, Dec. 2003 2. Urs Hoelzle and Luiz Andre Barroso. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009. 3. Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 267--278, New York, NY, USA, 2009. ACM. 4. Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi, and Mark Horowitz. Towards energy-proportional datacenter memory with mobile dram. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 37--48, Washington, DC, USA, 2012. IEEE Computer Society. 21

Managed Language Applications Forrest J. Robinson Michael R. Jantz - PowerPoint PPT Presentation

Cross-Layer Memory Management for Managed Language Applications Forrest J. Robinson Michael R. Jantz Kshitij A. Doshi Prasad A. Kulkarni University of Tennessee Intel Corporation mrjantz@utk.edu kshtiji.a.doshi@intel.com University of

Introducing Sterling Managed Accounts Managed Accounts Like a managed fund (and fund of funds)

Managed Lanes in California: Where Weve Been Where We ve Been Where Were Going Joe Rouse

Managed Services Managed Services Managed Services Welcome to Kaseya.edu www.kaseya.com

Medicaid Managed Care Initiatives Information Session April 2014 Topics What is managed

Managed Long Term Care Managed Long Term Care Informational Materials f B Brenda Rivera d Ri

Corporate presentation October 2017 Managed by: Managed by: Disclaimer This presentation has

Legal framework & regulation of managed accounts Rainmaker managed accounts conference 4

Integration and Care/Service Coordination September 2012 What Is Managed Care? Managed care

Taking Control of Your Managed Care Destiny AJAS 2017 April 3, 2017 All Roads Lead to Managed

Analyzing the Scalability of Managed Language Applications with Speedup Stacks Jennifer B.

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

virtual memory 2 1 xv6 memory layout 0x80000000 (KERNBASE) page tables store this mapping (one

Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai , Yee Ling Gan, and Daniel Sanchez

Wormhole: A Fast Ordered Index for In-memory Data Management(I) Main Paper : Wormhole: A Fast

Dynamic Memory Allocation CS 351: Systems Programming Michael Saelee <lee@iit.edu>

Marr's Theory of the Hippocampus: Part I Computational Models of Neural Systems Lecture 3.3

Real Processing in Memory using Memristors Nishil Talati, Rotem Ben Hur, Nimrod Wald, Ameer Haj

Language Definition vs. Implementation Most of 251 so far Now a

Main Memory & DRAM Nima Honarmand Spring 2018 :: CSE 502 Main Memory Big Picture 1)

Managed Language Applications Forrest J. Robinson Michael R. Jantz - PowerPoint PPT Presentation

Cross-Layer Memory Management for Managed Language Applications Forrest J. Robinson Michael R. Jantz Kshitij A. Doshi Prasad A. Kulkarni University of Tennessee Intel Corporation mrjantz@utk.edu kshtiji.a.doshi@intel.com University of

Introducing Sterling Managed Accounts Managed Accounts Like a managed fund (and fund of funds)

Managed Lanes in California: Where Weve Been Where We ve Been Where Were Going Joe Rouse

Managed Services Managed Services Managed Services Welcome to Kaseya.edu www.kaseya.com

Medicaid Managed Care Initiatives Information Session April 2014 Topics What is managed

Managed Long Term Care Managed Long Term Care Informational Materials f B Brenda Rivera d Ri

Corporate presentation October 2017 Managed by: Managed by: Disclaimer This presentation has

Legal framework &amp; regulation of managed accounts Rainmaker managed accounts conference 4

Integration and Care/Service Coordination September 2012 What Is Managed Care? Managed care

Taking Control of Your Managed Care Destiny AJAS 2017 April 3, 2017 All Roads Lead to Managed

Analyzing the Scalability of Managed Language Applications with Speedup Stacks Jennifer B.

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

virtual memory 2 1 xv6 memory layout 0x80000000 (KERNBASE) page tables store this mapping (one

Rethinking the Memory Hierarchy for Modern Languages Po-An Tsai , Yee Ling Gan, and Daniel Sanchez

Wormhole: A Fast Ordered Index for In-memory Data Management(I) Main Paper : Wormhole: A Fast

Dynamic Memory Allocation CS 351: Systems Programming Michael Saelee &lt;lee@iit.edu&gt;

Marr's Theory of the Hippocampus: Part I Computational Models of Neural Systems Lecture 3.3

Real Processing in Memory using Memristors Nishil Talati, Rotem Ben Hur, Nimrod Wald, Ameer Haj

Language Definition vs. Implementation Most of 251 so far Now a

Main Memory &amp; DRAM Nima Honarmand Spring 2018 :: CSE 502 Main Memory Big Picture 1)

Legal framework & regulation of managed accounts Rainmaker managed accounts conference 4

Dynamic Memory Allocation CS 351: Systems Programming Michael Saelee <lee@iit.edu>

Main Memory & DRAM Nima Honarmand Spring 2018 :: CSE 502 Main Memory Big Picture 1)