data placement in multi tiered
play

Data Placement in Multi-tiered Memory T. Chad Effler 1 , Adam P. - PowerPoint PPT Presentation

On Automated Feedback-Driven Data Placement in Multi-tiered Memory T. Chad Effler 1 , Adam P. Howard 1 , Tong Zhou 1 , Michael R. Jantz 1 , Kshitij A. Doshi 2 , and Prasad A. Kulkarni 3 1 University of Tennessee,


  1. On Automated Feedback-Driven Data Placement in Multi-tiered Memory T. Chad Effler 1 , Adam P. Howard 1 , Tong Zhou 1 , Michael R. Jantz 1 , Kshitij A. Doshi 2 , and Prasad A. Kulkarni 3 1 University of Tennessee, {teffler,ahoward,tzhou9,mrjantz}@utk.edu 2 Intel Corporation, kshitij.a.doshi@intel.com 3 University of Kansas, kulkarni@ittc.ku.edu

  2. The Problem • Multi-Tiered Memory Hierarchies • Different Capacities • Different Performance • Cross Layer Data Management for Heterogeneous Tiers • Match Memory Needs Efficiently • Optimality • Transparency • Simplicity

  3. Current Solutions • Hardware Managed Caching • Non-Flexible • Large Architectural Costs • OS Managed Data Placement • Reactive • Relies on Non-Standard Hardware • Developer Managed Data Placement

  4. Feedback-Driven Data Placement • Allocation Site Partitioning • Knapsack • Hotset • Profile-Guided Management • Static Arena Allocation • Phase-based Arena Allocation

  5. Collecting Application Guidance • Track by allocation sites • Site == path upto malloc, calloc, etc. • Intuition 1: pedigree is a good predictor • Intuition 2: profile transferability 5

  6. Collecting Application Guidance • Track by allocation sites • Site == path upto malloc, calloc, etc. • Intuition 1: pedigree is a good predictor • Intuition 2: profile transferability • Profile = { ∀ S : < peak RSS , # post-cache-accesses > } 6

  7. Collecting Application Guidance • Track by allocation sites • Site == path upto malloc, calloc, etc. • Intuition 1: pedigree is a good predictor • Intuition 2: profile transferability • Profile = { ∀ S : < peak RSS , # post-cache-accesses > } • Partition sites into hot, cold sets. • Knapsack, Hotset 7

  8. Applying Application Guidance • Profile-guided allocation into arenas • Static: one arena for each tier Allocation Guidance Application A1: contains sites A0: contains sites S1: Cold A1 Address Space 1, 3, 4, and 5 2 and 6 S2: Hot A0 S3: Cold A1 S4: Cold A1 S5: Cold A1 S6: Hot A0 … DDR Hardware MCDRAM CPU 0 8

  9. Applying Application Guidance • Profile-guided allocation into arenas • Per-phase: one arena for each phase signature Allocation A1: 01111 A0: 10001 Guidance Application S1: 10001 A0 Address Space S2: 01111 A1 A3: 10100 A2: 11010 (phase 0) S3: 10001 A0 S4: 11010 A2 S5: 10100 A3 S6: 11010 A2 … DDR Hardware MCDRAM CPU 0 9

  10. Applying Application Guidance • Profile-guided allocation into arenas • Per-phase: one arena for each phase signature Allocation A1: 01111 A2: 11010 Guidance Application S1: 10001 A0 Address Space S2: 01111 A1 A3: 10100 A0: 10001 ( phase 1 ) S3: 10001 A0 S4: 11010 A2 S5: 10100 A3 S6: 11010 A2 … DDR Hardware MCDRAM CPU 0 10

  11. Simulation Framework • Marena – arena allocation library • Memtracer – Pin based instrumentation tool • Ramulator – Cycle accurate DRAM simulator

  12. Marena • Arena based Allocator • Built on Jemalloc • Allocation site Guidance

  13. Memtracer • Pin based Instrumentation Tool • Profiling • Generates Trace Files

  14. Ramulator • Memory Simulator • Trace based and cycle accurate modeling • Modified to support Tiered Memory Simulation

  15. Framework malloc ( ) Memory Arena Memory usage Executable Allocation usage Allocator guidance free ( ) guidance Guidance Pin Instruction Trace CPU Memory Controller HBM DDR Ramulator

  16. Evaluation • Benchmarks: CPU 2006 • Multi-tier configuration: • HBM-DDR • HBM capacity = 12.5% of upper-level memory 16

  17. Evaluation • Baseline Modes • Cache mode • Static First Touch (FT) • HBM/DDR only • Feed-back Guidance Directed Modes • Static Train • Static Ref • Adaptive Ref • Reactive Profiling • First Touch Hot Page (FTHP)

  18. IPC relative to DDR3-only 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 bzip2 baseline performance gcc mcf HBM-DDR3: milc cactusADM leslie3d 512 KB cache gobmk soplex hmmer GemsFDTD libquantum Benchmarks h264ref lbm sphinx3 average mcf milc cactusADM 8 MB cache leslie3d HBM-only static-FT cache-mode soplex GemsFDTD libquantum lbm average

  19. IPC relative to DDR3-only 0.0 0.5 1.0 1.5 2.0 2.5 3.0 bzip2 gcc mcf milc cactusADM leslie3d 512 KB cache gobmk soplex hmmer GemsFDTD libquantum Benchmarks Performance of static guidance strategies h264ref lbm sphinx3 average mcf milc cactusADM 8 MB cache leslie3d soplex static-ref static-train static-FT GemsFDTD libquantum lbm average

  20. IPC relative to DDR3-only 0.0 0.5 1.0 1.5 2.0 2.5 3.0 bzip2 gcc mcf milc cactusADM leslie3d 512 KB cache gobmk soplex hmmer GemsFDTD Performance of static and adaptive policies libquantum Benchmarks h264ref lbm sphinx3 average mcf milc cactusADM 8 MB cache leslie3d FTHP adaptive-ref static-ref soplex GemsFDTD libquantum lbm average

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend