SLIDE 1
Managing Hybrid Memories by Predicting Object Write Intensity Shoaib - - PowerPoint PPT Presentation
Managing Hybrid Memories by Predicting Object Write Intensity Shoaib - - PowerPoint PPT Presentation
Managing Hybrid Memories by Predicting Object Write Intensity Shoaib Akram , Kathryn S. Mckinley, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be DRAM as main memory is facing multiple challenges Cost high
SLIDE 2
SLIDE 3
Opportunity for new memory technologies to replace DRAM
Source: https://www.nextplatform.com/2015/07/29/scaling-the-growing-system-memory-hierarchy/
SLIDE 4
PCM cells have limited write endurance, shortening its lifetime
Current (Temperature) Time Read Reset to amorphous Set to crystalline 610°C 350°C
SLIDE 5
Speed ✔ Endurance ✔ Energy Density
DRAM
Speed Endurance Energy ✔ Density ✔
Hybrid memory is the best of DRAM and PCM
PCM
SLIDE 6
DRAM PCM
Future of main memory: limited DRAM, lots of PCM
This work uses DRAM for frequently written data
SLIDE 7
Memory automatically reclaimed for reuse More than just reclaim, stuff better organized
Garbage collection: key advantage of using a managed language
SLIDE 8
Use GC to keep frequently written
- bjects in DRAM
Reactive approach
- Monitors writes to objects
- More fine-grained compared to hardware
and OS approaches
- No page migrations
Write-rationing garbage collection for hybrid memories, PLDI 2018
SLIDE 9
Proactive approach Use a profile-guided predictor (this work)
Use GC to keep frequently written
- bjects in DRAM
SLIDE 10
Three offline steps in building a write intensity predictor
<Size, Type, Site, #writes> Profiling <Site, #writes> Application Feature Selection Classification <Site, advice>
SLIDE 11
11
Profiling methodology
- Java
Virtual Machine
- Jikes RVM (version 3.1.2)
- 4 MB nursery
- 2 GB Mark Sweep mature
- Java applications
- 9 from DaCapo
- PsuedoJBB 2005
- Default inputs
SLIDE 12
12
The outcome of profiling is a write intensity trace
For each unique object X
- 1. Size
- 2. Type
- 3. Allocation site <method-name, bytecode index>
- 4. # Writes
SLIDE 13
13
Measuring entropy of different features
Object Size # Writes O1 12 B 1000 O2 12 B 1000 O3 64 KB 1000 O4 32 O5 32
Each size has an entropy of 0
SLIDE 14
14
Object Size # Writes O1 12 B 1000 O2 12 B 1000 O3 64 KB 1000 O4 32 1000 O5 32
Measuring entropy of different features
Size 32 has an entropy of 1
SLIDE 15
15
Homogeneity curves compare size vs. type vs. allocation site
0% 20% 40% 60% 80% 100%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 % of Heap Volume Entropy size type site
Write intensity threshold = 1 K
Homogeneity
SLIDE 16
16
Heuristics to classify allocation sites as write-intensive or not
- Goals
- 1. Minimize DRAM utilization
- 2. Minimize PCM writes
- Parameters
- 1. Criteria to determine write intensive objects
- 2. Homogeneity threshold
SLIDE 17
17
Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32
Criteria # 1: write frequency
✔ ✔ ✔ ✗
Write frequency threshold = 1 K
✗
SLIDE 18
18
Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32
Criteria # 2: write density
✔ ✔ ✗ ✗
Write density threshold = 1
✗
SLIDE 19
19
Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32
Criteria # 1: write frequency
✔ ✔ ✔ ✗ ✗
Write frequency threshold = 1 K Homogeneity threshold = 50%
Site A is write-intensive
SLIDE 20
20
Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32
Criteria # 2: write density
✔ ✔ ✗ ✗ ✗
Write density threshold = 1 Homogeneity threshold = 50%
Site A is NOT write-intensive
SLIDE 21
Baseline generational heap
- rganization
nursery mature large mutator mutator GC
DRAM
SLIDE 22
Distribution of writes to objects
Empirical observations
- 1. Nursery is highly mutated
- 2. 2% of mature objects get 80% of writes
SLIDE 23
Generational heap organization in hybrid memory
nursery mature large mutator mutator GC
DRAM PCM
mature large mutator
SLIDE 24
24
PCM Writes vs. DRAM Utilization
10 20 30 40 50 5 10 15 % Heap in DRAM % Writes to PCM Write-Frequency Write-Density
wf = 1 wf = 50K dcut = 1E-3 dcut = 50 wf = 30K dcut = 0.2
Homogeneity threshold = 1%
SLIDE 25
25
Allocation site predictor yields better tradeoffs than size and type
10 20 30 40 50 60
Size Type Site % of mature
PCM Writes DRAM Utilization
Homogeneity threshold = 1% , Write-Density (50)
SLIDE 26
26
Profile-guided predictor is more effective compared to existing work
0.1 0.2 0.3 0.4 0.5 0.6
Lusearch Pjbb Lu.Fix Avrora Luindex Hsqldb Xalan Sunflow Pmd Jython Pmd.S Fop Antlr Bloat Normalized writes to PCM Kingsguard-Writers Write-Density
SLIDE 27
27
What is missing in the workshop paper?
- Implementation details
- Compiler sets a bit in the object header
- GC chooses the correct allocator
- Big data benchmarks
- Emulation on a real NUMA machine
- Performance results
SLIDE 28
Conclusions
- Exploit GC for improving the lifetime of emerging
memories
- Allocation sites correctly predict write intensity
- Use an allocation site predictor to eliminate a
large number of writes to PCM
SLIDE 29
Challenge: limit # writes to PCM
Solution: Use DRAM for frequently written data
SLIDE 30
Online monitoring introduces mutator and GC overheads
nursery mature large mutator mutator
DRAM PCM
mature large mutator
- bserver
SLIDE 31
Online monitoring introduces mutator and GC overheads
nursery mature large mutator mutator
DRAM PCM
mature large mutator
- bserver