Managing Hybrid Memories by Predicting Object Write Intensity Shoaib - - PowerPoint PPT Presentation

managing hybrid memories by predicting object write
SMART_READER_LITE
LIVE PREVIEW

Managing Hybrid Memories by Predicting Object Write Intensity Shoaib - - PowerPoint PPT Presentation

Managing Hybrid Memories by Predicting Object Write Intensity Shoaib Akram , Kathryn S. Mckinley, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be DRAM as main memory is facing multiple challenges Cost high


slide-1
SLIDE 1

Managing Hybrid Memories by Predicting Object Write Intensity

Shoaib Akram, Kathryn S. Mckinley, Jennifer B. Sartor, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be

slide-2
SLIDE 2

DRAM as main memory is facing multiple challenges

Cost high when scaling to 100s of GB Reliability a concern as stored charge very small

slide-3
SLIDE 3

Opportunity for new memory technologies to replace DRAM

Source: https://www.nextplatform.com/2015/07/29/scaling-the-growing-system-memory-hierarchy/

slide-4
SLIDE 4

PCM cells have limited write endurance, shortening its lifetime

Current (Temperature) Time Read Reset to amorphous Set to crystalline 610°C 350°C

slide-5
SLIDE 5

Speed ✔ Endurance ✔ Energy Density

DRAM

Speed Endurance Energy ✔ Density ✔

Hybrid memory is the best of DRAM and PCM

PCM

slide-6
SLIDE 6

DRAM PCM

Future of main memory: limited DRAM, lots of PCM

This work uses DRAM for frequently written data

slide-7
SLIDE 7

Memory automatically reclaimed for reuse More than just reclaim, stuff better organized

Garbage collection: key advantage of using a managed language

slide-8
SLIDE 8

Use GC to keep frequently written

  • bjects in DRAM

Reactive approach

  • Monitors writes to objects
  • More fine-grained compared to hardware

and OS approaches

  • No page migrations

Write-rationing garbage collection for hybrid memories, PLDI 2018

slide-9
SLIDE 9

Proactive approach Use a profile-guided predictor (this work)

Use GC to keep frequently written

  • bjects in DRAM
slide-10
SLIDE 10

Three offline steps in building a write intensity predictor

<Size, Type, Site, #writes> Profiling <Site, #writes> Application Feature Selection Classification <Site, advice>

slide-11
SLIDE 11

11

Profiling methodology

  • Java

Virtual Machine

  • Jikes RVM (version 3.1.2)
  • 4 MB nursery
  • 2 GB Mark Sweep mature
  • Java applications
  • 9 from DaCapo
  • PsuedoJBB 2005
  • Default inputs
slide-12
SLIDE 12

12

The outcome of profiling is a write intensity trace

For each unique object X

  • 1. Size
  • 2. Type
  • 3. Allocation site <method-name, bytecode index>
  • 4. # Writes
slide-13
SLIDE 13

13

Measuring entropy of different features

Object Size # Writes O1 12 B 1000 O2 12 B 1000 O3 64 KB 1000 O4 32 O5 32

Each size has an entropy of 0

slide-14
SLIDE 14

14

Object Size # Writes O1 12 B 1000 O2 12 B 1000 O3 64 KB 1000 O4 32 1000 O5 32

Measuring entropy of different features

Size 32 has an entropy of 1

slide-15
SLIDE 15

15

Homogeneity curves compare size vs. type vs. allocation site

0% 20% 40% 60% 80% 100%

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 % of Heap Volume Entropy size type site

Write intensity threshold = 1 K

Homogeneity

slide-16
SLIDE 16

16

Heuristics to classify allocation sites as write-intensive or not

  • Goals
  • 1. Minimize DRAM utilization
  • 2. Minimize PCM writes
  • Parameters
  • 1. Criteria to determine write intensive objects
  • 2. Homogeneity threshold
slide-17
SLIDE 17

17

Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32

Criteria # 1: write frequency

✔ ✔ ✔ ✗

Write frequency threshold = 1 K

slide-18
SLIDE 18

18

Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32

Criteria # 2: write density

✔ ✔ ✗ ✗

Write density threshold = 1

slide-19
SLIDE 19

19

Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32

Criteria # 1: write frequency

✔ ✔ ✔ ✗ ✗

Write frequency threshold = 1 K Homogeneity threshold = 50%

Site A is write-intensive

slide-20
SLIDE 20

20

Object Site Size # Writes O1 A 12 1000 O2 A 12 1000 O3 A 65536 1000 O4 A 32 O5 A 32

Criteria # 2: write density

✔ ✔ ✗ ✗ ✗

Write density threshold = 1 Homogeneity threshold = 50%

Site A is NOT write-intensive

slide-21
SLIDE 21

Baseline generational heap

  • rganization

nursery mature large mutator mutator GC

DRAM

slide-22
SLIDE 22

Distribution of writes to objects

Empirical observations

  • 1. Nursery is highly mutated
  • 2. 2% of mature objects get 80% of writes
slide-23
SLIDE 23

Generational heap organization in hybrid memory

nursery mature large mutator mutator GC

DRAM PCM

mature large mutator

slide-24
SLIDE 24

24

PCM Writes vs. DRAM Utilization

10 20 30 40 50 5 10 15 % Heap in DRAM % Writes to PCM Write-Frequency Write-Density

wf = 1 wf = 50K dcut = 1E-3 dcut = 50 wf = 30K dcut = 0.2

Homogeneity threshold = 1%

slide-25
SLIDE 25

25

Allocation site predictor yields better tradeoffs than size and type

10 20 30 40 50 60

Size Type Site % of mature

PCM Writes DRAM Utilization

Homogeneity threshold = 1% , Write-Density (50)

slide-26
SLIDE 26

26

Profile-guided predictor is more effective compared to existing work

0.1 0.2 0.3 0.4 0.5 0.6

Lusearch Pjbb Lu.Fix Avrora Luindex Hsqldb Xalan Sunflow Pmd Jython Pmd.S Fop Antlr Bloat Normalized writes to PCM Kingsguard-Writers Write-Density

slide-27
SLIDE 27

27

What is missing in the workshop paper?

  • Implementation details
  • Compiler sets a bit in the object header
  • GC chooses the correct allocator
  • Big data benchmarks
  • Emulation on a real NUMA machine
  • Performance results
slide-28
SLIDE 28

Conclusions

  • Exploit GC for improving the lifetime of emerging

memories

  • Allocation sites correctly predict write intensity
  • Use an allocation site predictor to eliminate a

large number of writes to PCM

slide-29
SLIDE 29

Challenge: limit # writes to PCM

Solution: Use DRAM for frequently written data

slide-30
SLIDE 30

Online monitoring introduces mutator and GC overheads

nursery mature large mutator mutator

DRAM PCM

mature large mutator

  • bserver
slide-31
SLIDE 31

Online monitoring introduces mutator and GC overheads

nursery mature large mutator mutator

DRAM PCM

mature large mutator

  • bserver