HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory - - PowerPoint PPT Presentation

hmeh write optimal extendible hashing for hybrid dram nvm
SMART_READER_LITE
LIVE PREVIEW

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory - - PowerPoint PPT Presentation

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory Xiaomin Zou 1 , Fang Wang 1 *, Dan Feng 1 , Janxi Chen 1 , Chaojie Liu 1 , Fan Li 1 , Nan Su 2 Huazhong University of Science and Technology 1 , China Shandong Massive Information


slide-1
SLIDE 1

HMEH: write-optimal extendible hashing for hybrid DRAM-NVM memory

Xiaomin Zou 1, Fang Wang1*, Dan Feng1, Janxi Chen1, Chaojie Liu1, Fan Li1, Nan Su2

Huazhong University of Science and Technology1, China Shandong Massive Information Technology Research Institute2, China

slide-2
SLIDE 2
  • Background and motivation
  • Our Work: HMEH
  • Performance Evaluation
  • Conclusion

Outline

slide-3
SLIDE 3
  • NVM is expected to complement or replace DRAM as

main memory

Cache hierarchy

Background : Non-Volatile Memory (NVM)

CPU Intel Optane DC Persistent Memory

 non-volatile

 large capacity  high performance  low standby power

 limited write endurance

 asymmetric properties

3

slide-4
SLIDE 4
  • Hashing structures are widely used in storage systems

 main memory database  in-cache index  in-memory key-value store

Background : NVM-based hash structures

  • Previous work is insufficient for real NVM device

 PFHT [INFLOW 2015]  Path hashing [MSST 2017]  Level hashing [OSDI 2018]  CCEH [FAST 2019]

4

slide-5
SLIDE 5
  • Static hashing structure vs Dynamic hashing structure

 Static hashing: Cost inefficiency for resizing hash table  Dynamic hashing: need extra directory access and the read latency

  • f optane DCPMM is higher

Motivation : The design of hashing structure

rehash all items Directory

002 012 102 112

Buckets hash(key) &val3 000 Static hashing structure dynamic hashing structure

5

slide-6
SLIDE 6
  • Data consistency guarantee

 The volatile/non-volatile boundary is between CPU cache and NVM  Arbitrarily-evicted cache lines → memory writes reordering

Motivation : High overhead for data consistency

CPU CPU cache

value key ① ②

Program reordering

volatile

Non-volatile St value; St key;

6

slide-7
SLIDE 7
  • Data consistency guarantee

 The volatile/non-volatile boundary is between CPU cache and NVM  Arbitrarily-evicted cache lines → memory writes reordering

Motivation : High overhead for data consistency

CPU CPU cache

11 key

Program reordering

volatile

Non-volatile St value; St key; reordered

value

7

slide-8
SLIDE 8
  • Data consistency guarantee

 The volatile/non-volatile boundary is between CPU cache and NVM  Arbitrarily-evicted cache lines → memory writes reordering

Motivation : High overhead for data consistency

CPU CPU cache

11 key

Program reordering

volatile

Non-volatile St value; St key; reordered

value

Crash

Inconsistency

8

slide-9
SLIDE 9
  • Data consistency guarantee

 The volatile/non-volatile boundary is between CPU cache and NVM  Arbitrarily-evicted cache lines → memory writes reordering

Motivation : High overhead for data consistency

CPU CPU cache

11 key

Program reordering

volatile

Non-volatile St value; Fence(); St key; Flush()

value

 Flush: flush cache lines  Fence: order CPU cache line flush

9

slide-10
SLIDE 10
  • Data consistency guarantee

 The volatile/non-volatile boundary is between CPU cache and NVM  Arbitrarily-evicted cache lines → memory writes reordering

Motivation : High overhead for data consistency

CPU CPU cache

11 key

Program reordering

volatile

Non-volatile St value; Fence(); St key; Flush()

value

 Flush: flush cache lines  Fence: order CPU cache line flush

Expensive !

1

slide-11
SLIDE 11
  • Data consistency guarantee

 the evaluation with/without Fence and Flush in optane DCPMM

 CCEH[FAST 2019], LEVL[OSDI 2018], linear hashing, and cuckoo hashing

Motivation : High overhead for data consistency

without Fence and Flush instructions, the throughputs

  • f these hashing schemes are

improved by 20.3% to 29.1%

  • Our goals

 high-performance dynamic hashing with low data consistency overhead and fast recovery

1 1

slide-12
SLIDE 12

Our Scheme: HMEH

0000 &val4 0001 &val2 0010 &val6 Segment

radix-tree Directory

NVM

Flat-structured Directory Hash key Bucket index Segment index

DRAM

1100 &val0 1101 &val8 1110 &val9 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11 Bucket 00 Bucket 01 Bucket 10 Bucket 11

00 10

  • HMEH: Extendible Hashing for Hybrid DRAM-NVM Memory

 Flat-structured Directory for fast access and radix-tree Directory for recovery  Directory → segment → cacheline-sized bucket

12

slide-13
SLIDE 13
  • Flat-structured Directory VS Radix-tree Directory

 Radix tree is friendly to NVM  exploit RT-directory to rebuild FS-directory upon recovery  every segment is pointed by 2G-L directory entries

HMEH : Two directories

1 1 1 1 1 1 Local depth:1 2 3 000 001 010 011 100 101 110 111 Global depth: 3

13

slide-14
SLIDE 14
  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

HMEH : Low data consistency overhead

CPU CPU cache

value key

volatile

Non-volatile Program reordering St value; Fence(); St key; Flush()

14

slide-15
SLIDE 15
  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

HMEH : Low data consistency overhead

CPU CPU cache

value key

volatile

Non-volatile Program reordering St value; Fence(); St key; Flush()

15

slide-16
SLIDE 16
  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

HMEH : Low data consistency overhead

CPU CPU cache

volatile

Non-volatile Program reordering St value; Fence(); St key; Flush()

K1 K2 V1 V2

16

slide-17
SLIDE 17
  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

HMEH : Low data consistency overhead

CPU CPU cache

volatile

Non-volatile Program reordering St value; Fence(); St key; Flush()

K1 K2 V1 V2

17

slide-18
SLIDE 18
  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

HMEH : Low data consistency overhead

CPU CPU cache

volatile

Non-volatile Program reordering St value; Fence(); St key; Flush()

K1 K2 V1 V2

18

slide-19
SLIDE 19

HMEH : Low data consistency overhead

CPU CPU cache

volatile

Non-volatile

K1 K2 V1 V2

Crash √

Program reordering St value; Fence(); St key; Flush()

  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

19

slide-20
SLIDE 20

HMEH : Low data consistency overhead

CPU CPU cache

volatile

Non-volatile

K1 K2 V1 V2

Crash √

Program reordering St value; Fence(); St key; Flush()

  • Cross-KV mechanism

 Split kv item into several pieces and alternately store key and value as several 8-byte atomic blocks  Avoid lots of Flush and Fence instructions

St cross-KVs

20

slide-21
SLIDE 21

HMEH : Improve load factor

  • Resolve hash collisions

 linear probing:allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM)  stash: non-addressable and used to store colliding items

0000 &val4 0101 &val2 0010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

002 012 102 112

1000 &val4 1001 &val2 1010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

1101 &val2 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

Hash key

11 01

stash stash stash

21

slide-22
SLIDE 22

HMEH : Improve load factor

  • Resolve hash collisions

 linear probing:allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM)  stash: non-addressable and used to store colliding items

0000 &val4 0101 &val2 0010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

002 012 102 112

1000 &val4 1001 &val2 1010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

1101 &val2 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

Hash key

11 01

stash stash stash

22

slide-23
SLIDE 23

HMEH : Improve load factor

  • Resolve hash collisions

 linear probing:allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM)  stash: non-addressable and used to store colliding items

0000 &val4 0101 &val2 0010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

002 012 102 112

1000 &val4 1001 &val2 1010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

1101 &val2 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

Hash key

11 01

stash stash stash

23

slide-24
SLIDE 24

HMEH : Improve load factor

  • Resolve hash collisions

 linear probing:allow probe 4 buckets (256bytes, the access granularity of intel optane DCPMM)  stash: non-addressable and used to store colliding items

0000 &val4 0101 &val2 0010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

002 012 102 112

1000 &val4 1001 &val2 1010 &val6 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

1101 &val2 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11

Hash key

11 01

stash stash stash

24

slide-25
SLIDE 25

HMEH : Optimistic Concurrency

Compare-and-swap Instructions for Slots Fine-grained lock for segment split lock-free read Mutex and version number for directories

0000 &val4 0001 &val2 0010 &val6 Segment

Directories

1100 &val0 1101 &val8 1110 &val9 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11 Bucket 00 Bucket 01 Bucket 10 Bucket 11

25

slide-26
SLIDE 26

Compare-and-swap Instructions for Slots Fine-grained lock for segment split lock-free read Mutex and version number for directories

0000 &val4 0001 &val2 0010 &val6 Segment

Directories

1100 &val0 1101 &val8 1110 &val9 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11 Bucket 00 Bucket 01 Bucket 10 Bucket 11

HMEH : Optimistic Concurrency

26

slide-27
SLIDE 27

Compare-and-swap Instructions for Slots Fine-grained lock for segment split lock-free read Mutex and version number for directories

0000 &val4 0001 &val2 0010 &val6 Segment

Directories

1100 &val0 1101 &val8 1110 &val9 Segment

Bucket 00 Bucket 01 Bucket 10 Bucket 11 Bucket 00 Bucket 01 Bucket 10 Bucket 11

HMEH : Optimistic Concurrency

27

slide-28
SLIDE 28

Performance Evaluation

CPU

2-socket 36-core machine with 32MB LLC

Memory

1.5 TB DCPMM, 192GB DRAM

workload

160 Million random number dataset YCSB

Comparisons

CCEH [FAST 2019] LEVL [OSDI 2018] P-CUCK: persistent cuckoo hashing P-LINP: persistent linear probing

  • Experimental setup

28

slide-29
SLIDE 29

Experiment - Sensitivity Analysis

  • Segment size
  • Stash size

 The reasonable segment size is in the range of 4KB to 16KB.  The optimal stash size is between 1 bucket and 8 buckets  we set the segment size as 16KB with a stash whose size is 4 buckets for the rest of the experiments

29

slide-30
SLIDE 30

Experiment - Comparative Performance

  • Design gain
  • Insertion latency of different researches

 Baseline: EH with persist barriers  D1: the changes of structure  D2: Cross-KV  All: entire HMEH  Compared with CCEH, P-CUCK, LEVL, and P-LINP, HMEH speeds up the insertions by over 1.49×, 2.37×, 2.47×, and 1.91×

30

slide-31
SLIDE 31

Experiment - Concurrent performance

  • Three YCSB workloads test

 Concurrent HMEH also delivers superior performance and high scalability under YCSB workloads with different search/insertion ratios

31

slide-32
SLIDE 32

Experiment – Other evaluations

  • Maximum Load Factor

 As linear probing distance and stash size grow, the max load factors of HMEH increase stably and all exceed 74% Number of Indexed Records 1.6 million 16 million 160 million RT-directory Recovery Time(ms) 0.47 6.3 50.1 FS-directory Rebuild Time(ms) 2.5 21.8 172.2

  • Recovery Time of directories

 directories of HMEH can achieve an instantaneous recovery

32

slide-33
SLIDE 33
  • Problem

 the structures of previous work have shortcomings  Existing data consistency mechanisms incur high overhead

Conclusion

  • Results

 Outperforms the state-of-the-art work by up to 2.47×  High scalability and fast recovery

  • A write-optimal extendible hashing for hybrid memory

 Flat-structured Directory in DRAM for fast access  Radix-tree-structured Directory in NVM for recovery  Cross-KV mechanism  linear probing+stash  Optimistic Concurrency

33

slide-34
SLIDE 34

Thanks! Q&A

34