Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories - - PowerPoint PPT Presentation

kingsguard write rationing garbage collection for hybrid
SMART_READER_LITE
LIVE PREVIEW

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories - - PowerPoint PPT Presentation

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram (Ghent) , Jennifer B. Sartor (Ghent), Kathryn S. Mckinley (Google), and Lieven Eeckhout (Ghent) Shoaib.Akram@UGent.be DRAM is facing challenges Scalability


slide-1
SLIDE 1

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories

Shoaib Akram (Ghent), Jennifer B. Sartor (Ghent), Kathryn S. Mckinley (Google), and Lieven Eeckhout (Ghent) Shoaib.Akram@UGent.be

slide-2
SLIDE 2

2

DRAM is facing challenges

Scalability Reliability Energy

slide-3
SLIDE 3

Phase change memory is promising

3

GB/$ J Latency L Endurance L

3

temperature set to crystalline read reset to amorphous time

But …

slide-4
SLIDE 4

4

Hybrid DRAM-PCM memory

Challenge Mitigate PCM wear-out and extend its lifetime Speed Endurance Energy Capacity

DRAM PCM

slide-5
SLIDE 5

5

Wear Level Wear Level Wear Level Operating System

Language runtime

How to mitigate PCM wear-out?

Phase change memory as …

slide-6
SLIDE 6

6

1 2 Lifetime in years

32 GB PCM memory, 32 cores

PCM only with leveling is not practical

slide-7
SLIDE 7

7

OS to limit PCM writes

Drawbacks Coarse grained Page migrations can be costly

DRAM PCM

slide-8
SLIDE 8

8

Managed runtime to limit PCM writes

Our work uses garbage collection to keep highly written objects in DRAM

nursery

mature

  • bserver

PCM DRAM

mature

slide-9
SLIDE 9

nursery mature

GC

70%

  • f writes

9

Distribution of writes in GC runtime

slide-10
SLIDE 10

nursery mature

GC

22%

to 2% of objects

70%

  • f writes

10

Distribution of writes in GC runtime

slide-11
SLIDE 11

mature

GC

Write-Rationing Garbage Collectors Contribution DRAM PCM

11

slide-12
SLIDE 12

Kingsguard- Nursery Kingsguard- Writers

12

Two write-rationing garbage collectors

slide-13
SLIDE 13

13

Heap organization in DRAM

nursery

mature large GC DRAM

slide-14
SLIDE 14

14

mature large GC DRAM PCM

KG-N Kingsguard-Nursery

nursery

slide-15
SLIDE 15

15

KG-W Kingsguard-Writers

mature large

  • bserver

PCM

mature large

DRAM

nursery

slide-16
SLIDE 16

16

Observing writes

Write barrier configurations Observe references Observe references and primitives Write barrier sets a header bit on object writes references primitives header Object format

slide-17
SLIDE 17

17

Additional optimizations in KG-W

Large object optimization Allocate selected large objects in DRAM Metadata optimization Allocate PCM metadata in DRAM

slide-18
SLIDE 18

nursery

½ of remaining nursery

large

Monitor PCM write rate to turn opt on/off

18

Large object optimization

slide-19
SLIDE 19

19

Metadata optimization

Mature Meta Full-heap GC: Mark live PCM objects KG-W: Keep mark bytes of PCM objects in DRAM

slide-20
SLIDE 20

20

Metadata optimization

Mature Meta Full-heap GC: Mark live PCM objects KG-W: Keep mark bytes of PCM objects in DRAM address_mark_bit = start_meta + idx_pcm_obj

slide-21
SLIDE 21

21

Evaluation Methodology

(1) Simulator (2) Real Java applications Jikes research virtual machine

Hardware Software

slide-22
SLIDE 22

22

Simulation with Sniper

7 DaCapo applications 4 cores, 1 MB per core LLC Scale simulated rates to a 32 core machine using trends from real hw

slide-23
SLIDE 23

Memory systems

Homogeneous 32 GB DRAM 32 GB PCM PCM parameters 4X read latency 4X write energy 10 M writes/cell Hybrid 1 GB DRAM 32 GB PCM

23

slide-24
SLIDE 24

24

PCM lifetimes

10 20 30 40 Lifetime in years PCM-Only KG-N KG-W

9 17 PCM alone is not practical PCM lasts more than 10 years with KG-W

slide-25
SLIDE 25

25

EDP reduction compared to DRAM

  • 80
  • 40

40 80 % reduction in EDP PCM-Only KG-N KG-W

EDP : Energy Delay Product KG-W has 35% better EDP than DRAM-Only Higher is better 4 cores

slide-26
SLIDE 26

26

Emulation on NUMA hardware

D R A M D R A M

DRAM: Socket 0 CPU CPU PCM: Socket 1

Modify JVM to divide heap in DRAM and PCM Use Intel perf monitor to measure writes

D R A M D R A M

slide-27
SLIDE 27

27

PCM write rates on NUMA hardware

KG-N reduces write rate by 3.8X over PCM-Only KG-W reduces write rate by 1.9X over KG-N

0.0 0.5 1.0 1.5 DaCapo Pjbb GraphChi Avg Write rate in GB/s PCM-Only KG-N KG-W

130 MB/s

slide-28
SLIDE 28

28

Crystal Gazer: Profile-Driven Write-Rationing Garbage Collection for Hybrid Memories

slide-29
SLIDE 29

Promising to monitor heaps at a fine granularity

29

Takeaways

Write-rationing GC makes PCM practical as main memory Similar conclusion with different evaluation methods