Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu - - PowerPoint PPT Presentation

cuckoo hashing for storage systems
SMART_READER_LITE
LIVE PREVIEW

Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu - - PowerPoint PPT Presentation

Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology USENIX ATC 2019 Query Services in Cloud Storage Systems Large


slide-1
SLIDE 1

Mitigating Asymmetric Read and Write Costs in Cuckoo Hashing for Storage Systems

Yuanyuan Sun, Yu Hua, Zhangyu Chen, Yuncheng Guo Huazhong University of Science and Technology

USENIX ATC 2019

slide-2
SLIDE 2
  • Large amounts of data
  • 300 new profiles and more than 208 thousand photos per minute [September

2018@Facebook]

Query Services in Cloud Storage Systems

2

slide-3
SLIDE 3
  • Large amounts of data
  • 300 new profiles and more than 208 thousand photos per minute [September

2018@Facebook]

Query Services in Cloud Storage Systems

Demanding the support of low-latency and high-throughput queries …

3

slide-4
SLIDE 4

 Constant-scale read performance

  • Widely used in key-value stores and relational databases

Hash structures

4

slide-5
SLIDE 5

 Constant-scale read performance

  • Widely used in key-value stores and relational databases

ꭗ High latency for handling hash collisions

Hash structures

5

slide-6
SLIDE 6
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations

Cuckoo Hashing

a n k m b T1 T2

6

slide-7
SLIDE 7
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations

Cuckoo Hashing

Insert(x)

a n k m b T1 T2

7

slide-8
SLIDE 8
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations

Cuckoo Hashing

a n k m b T1 T2 x

8

slide-9
SLIDE 9
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity

Cuckoo Hashing

a n k b T1 T2 f m c

9

slide-10
SLIDE 10
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity

Cuckoo Hashing

a n k b T1 T2 f m c

Find(c)

10

slide-11
SLIDE 11
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity
  • For writes, endless loops may occur! => slow-write performance

Cuckoo Hashing

a n k b T1 T2 f m c a n f m c

11

slide-12
SLIDE 12
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity
  • For writes, endless loops may occur! => slow-write performance

Cuckoo Hashing

a n k b T1 T2 f m c

Insert(x)

a n f m c

12

slide-13
SLIDE 13
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity
  • For writes, endless loops may occur! => slow-write performance

Cuckoo Hashing

a n k b T1 T2 f m c

Insert(x)

a n f m c

13

slide-14
SLIDE 14
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity
  • For writes, endless loops may occur! => slow-write performance

Cuckoo Hashing

a n k b T1 T2 f m c

Insert(x)

a n f m c

x

An endless loop occurs! 14

slide-15
SLIDE 15
  • Multi-choice hashing
  • Handling hash collisions: kick-out operations
  • For reads, only limited positions are probed => O(1) time complexity
  • For writes, endless loops may occur! => slow-write performance

Cuckoo Hashing

a n k b T1 T2 f m c

Insert(x)

a n f m c

x

An endless loop occurs!

Bottleneck: Asymmetric reads and writes!

15

slide-16
SLIDE 16
  • Existing concurrency strategy for cuckoo hashing
  • Lock two buckets before each kick-out operation (libcuckoo@EuroSys’14)

Concurrency in Multi-core Systems

16

slide-17
SLIDE 17
  • Existing concurrency strategy for cuckoo hashing
  • Lock two buckets before each kick-out operation (libcuckoo@EuroSys’14)
  • Challenges:
  • Inefficient insertion performance
  • Limited scalability

Concurrency in Multi-core Systems

17

slide-18
SLIDE 18
  • Existing concurrency strategy for cuckoo hashing
  • Lock two buckets before each kick-out operation (libcuckoo@EuroSys’14)
  • Challenges:
  • Inefficient insertion performance
  • Limited scalability
  • Design goal:
  • A high-throughput and concurrency-friendly cuckoo hash table

Concurrency in Multi-core Systems

18

slide-19
SLIDE 19
  • Pseudoforests to predetermine endless loops
  • Efficient concurrency strategy
  • A graph-grained locking mechanism
  • Concurrency optimization to reduce the length of critical path
  • Higher throughput than state-of-the-art scheme, i.e., libcuckoo

Our Approach: CoCuckoo

19

slide-20
SLIDE 20
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c

Insert(y)

a n k b f m c

20

slide-21
SLIDE 21
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c

Insert(y)

a n k b f m c

21

slide-22
SLIDE 22
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c

Insert(y)

a n k b f m c

Maximal

22

slide-23
SLIDE 23
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c

Insert(y)

a n k b f m c

Vacancy Maximal Non-maximal

23

slide-24
SLIDE 24
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c

Insert(y)

a n k b f m c

Maximal Non-maximal Vacancy

y y

24

slide-25
SLIDE 25
  • Vertex: a bucket
  • Edge: an inserted item from the storage vertex to its backup vertex
  • Identify endless loops: #Vertices = #Edges (called maximal)

Pseudoforest

a n k b T1 T2 f m c a n k b f m c y y

Maximal

25

slide-26
SLIDE 26
  • EMPTY subgraph: buckets not represented in pseudoforest

Graph-grained Locking

a n k b T1 T2 f m c a n k b f m c a n k b f m c

26

slide-27
SLIDE 27
  • EMPTY subgraph: buckets not represented in pseudoforest

Graph-grained Locking

a n k b T1 T2 f m c a n k b f m c a n k b f m c

27

slide-28
SLIDE 28
  • EMPTY subgraph: buckets not represented in pseudoforest
  • Classify insertions into 3 cases, which include 6 subcases

Graph-grained Locking

EMPTY Non-maximal Maximal

28

slide-29
SLIDE 29
  • EMPTY subgraph: buckets not represented in pseudoforest
  • Classify insertions into 3 cases, which include 6 subcases

Graph-grained Locking

TwoEmpty OneEmpty ZeroEmpty

According to the number of corresponding EMPTY subgraphs

/

EMPTY Non-maximal Maximal

29

slide-30
SLIDE 30
  • EMPTY subgraph: buckets not represented in pseudoforest
  • Classify insertions into 3 cases, which include 6 subcases

Graph-grained Locking

TwoEmpty OneEmpty ZeroEmpty Diff_non_non Same_non Diff_non_max Max

According to the number of corresponding EMPTY subgraphs According to the states and the number of subgraphs

/ /

EMPTY Non-maximal Maximal

30

slide-31
SLIDE 31
  • Two EMPTY subgraphs

TwoEmpty

T1 T2

Before insertion

31

slide-32
SLIDE 32
  • Two EMPTY subgraphs
  • Insertion algorithm:

Atomically assign allocated subgraph number to two buckets Insert item Mark the subgraph as non-maximal

TwoEmpty

T1 T2

Before insertion

critical path With graph-grained lock(s) Out of the critical path

32

slide-33
SLIDE 33
  • Two EMPTY subgraphs
  • Insertion algorithm:

Atomically assign allocated subgraph number to two buckets Insert item Mark the subgraph as non-maximal

TwoEmpty

a k T1 T2 f

Before insertion After insertion

critical path With graph-grained lock(s) Out of the critical path

33

slide-34
SLIDE 34
  • One EMPTY subgraph (the other is non-maximal/maximal)

OneEmpty /

a k T1 T2 f

/

Before insertion

34

slide-35
SLIDE 35
  • One EMPTY subgraph (the other is non-maximal/maximal)
  • Insertion algorithm:

 Two atomic operations without locks

  • Assign the existing subgraph number to the new vertex
  • Insert the item into the new vertex

OneEmpty /

a n k b T1 T2 f

/ /

Before insertion After insertion

35

slide-36
SLIDE 36
  • Two different non-maximal subgraphs
  • Insertion algorithm:

Kick-out (with item insertion) Merge two subgraphs

ZeroEmpty (Diff_non_non)

a n k b T1 T2 f

Insert(c)

a n f

Before insertion

36

slide-37
SLIDE 37
  • Two different non-maximal subgraphs
  • Insertion algorithm:

Kick-out (with item insertion) Merge two subgraphs

ZeroEmpty (Diff_non_non)

a n k b T1 T2 f

Insert(c)

a n f

Before insertion

37

slide-38
SLIDE 38
  • Two different non-maximal subgraphs
  • Insertion algorithm:

Kick-out (with item insertion) Merge two subgraphs

ZeroEmpty (Diff_non_non)

a n k b T1 T2 f

Insert(c)

a n f

Non-maximal Non-maximal Before insertion

38

slide-39
SLIDE 39
  • Two different non-maximal subgraphs
  • Insertion algorithm:

Kick-out (with item insertion) Merge two subgraphs

ZeroEmpty (Diff_non_non)

a n k b T1 T2 f c c a n f

Non-maximal Before insertion

39

slide-40
SLIDE 40
  • Two different non-maximal subgraphs
  • Insertion algorithm:

Kick-out (with item insertion) Merge two subgraphs

ZeroEmpty (Diff_non_non)

a n k b T1 T2 f c c a n f

Non-maximal Before insertion After insertion

40

slide-41
SLIDE 41
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c

Insert(m)

a n f c

Before insertion

41

slide-42
SLIDE 42
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c

Insert(m)

a n f c

Before insertion

42

slide-43
SLIDE 43
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c

Insert(m) Non-maximal

a n f c

Before insertion

43

slide-44
SLIDE 44
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c

Insert(m) Maximal

a n f c

Before insertion

44

slide-45
SLIDE 45
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c

Insert(m) Maximal

a n f c

Before insertion

45

slide-46
SLIDE 46
  • The same non-maximal subgraph
  • Insertion algorithm:

Mark as maximal

 Kick-out (with item insertion)

ZeroEmpty (Same_non)

a n k b T1 T2 f c m m

Maximal

a n f c

Before insertion After insertion

46

slide-47
SLIDE 47
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c a n k b f m c

Insert(y)

47

slide-48
SLIDE 48
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c a n k b f m c

Insert(y)

48

slide-49
SLIDE 49
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c

Maximal Non-maximal

a n k b f m c

Insert(y)

49

slide-50
SLIDE 50
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c a n k b f m c

Maximal Insert(y)

50

slide-51
SLIDE 51
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c a n k b f m c

Maximal Insert(y)

51

slide-52
SLIDE 52
  • One non-maximal subgraph and one maximal subgraph
  • Insertion algorithm (similar to same_non):

Mark as maximal

 Kick-out (with item insertion)  Merge two subgraphs

ZeroEmpty (Diff_non_max)

a n k b T1 T2 f m c a n f m c

Maximal

y y b k

52

slide-53
SLIDE 53
  • Two maximal subgraphs or the same maximal subgraph
  • Always walking into a loop and predetermined to be a failure
  • Insertion algorithm:

 Do nothing

ZeroEmpty (Max) /

a n k b T1 T2 f m c

Insert(x)

a n f m c

53

slide-54
SLIDE 54
  • Two maximal subgraphs or the same maximal subgraph
  • Always walking into a loop and predetermined to be a failure
  • Insertion algorithm:

 Do nothing

ZeroEmpty (Max) /

a n k b T1 T2 f m c

Insert(x)

a n f m c

x Maximal

54

slide-55
SLIDE 55
  • Two maximal subgraphs or the same maximal subgraph
  • Always walking into a loop and predetermined to be a failure
  • Insertion algorithm:

 Do nothing

ZeroEmpty (Max) /

a n k b T1 T2 f m c

Insert(x)

a n f m c

x Maximal

55

slide-56
SLIDE 56
  • Most subgraphs are small

the granularity of graph-grained locks is acceptable

  • Only constraining a very small number of buckets
  • 3 vertices (44.25% subgraphs)
  • No more than 10 vertices (99% subgraphs)

Lock Granularity

56

slide-57
SLIDE 57
  • Subgraph number allocation
  • Subgraph number: identifying a unique subgraph
  • Unique without the need of continuity
  • Subgraph number generator: a simple modular function
  • Modulus: the total number of threads p
  • Remainder: the number of each thread r
  • n = kp+r , e.g., 8-thread CoCuckoo, Thread 2, n=2,10,18,…

Subgraph Management

57

slide-58
SLIDE 58
  • Comparison:
  • libcuckoo@EuroSys’14
  • Slot numbers: 1, 2, 4, 8, 16
  • Workloads:
  • YCSB: https://github.com/brianfrankcooper/YCSB @SOCC’11
  • 2 million key-value pairs per workload
  • Threads: 1, 4, 8, 12, 16
  • Metrics:
  • Throughput
  • Predetermination for insertion
  • Extra space overhead

Performance Evaluation

58

slide-59
SLIDE 59
  • CoCuckoo significantly increases average throughputs.
  • 75%-150% improvements compared to 2-way libcuckoo.

Average Insertion Throughput

59

slide-60
SLIDE 60

Predetermination for Insertion

60

slide-61
SLIDE 61
  • TwoEmpty and OneEmpty account for a large proportion
  • Short-term or no locks for the shared buckets

Predetermination for Insertion

61

slide-62
SLIDE 62
  • TwoEmpty and OneEmpty account for a large proportion
  • Short-term or no locks for the shared buckets
  • Max:
  • Predetermine insertion failures and release locks without any kick-out operations

Predetermination for Insertion

62

slide-63
SLIDE 63

Extra Space Overhead

63

slide-64
SLIDE 64
  • The same space available for both libcuckoo and CoCuckoo
  • CoCuckoo increases the throughput over 2-way libcuckoo by 73% - 159%.
  • CoCuckoo significantly decreases the average execution time per request.

Extra Space Overhead

64

slide-65
SLIDE 65
  • The same space available for both libcuckoo and CoCuckoo
  • CoCuckoo increases the throughput over 2-way libcuckoo by 73% - 159%.
  • CoCuckoo significantly decreases the average execution time per request.
  • The extra space overhead is small

Extra Space Overhead

65

slide-66
SLIDE 66
  • CoCuckoo mitigates the asymmetric read and write costs in cuckoo

hashing via

  • A pseudoforest to predetermine and avoid occurrence of endless

loops

  • Graph-grained locking mechanism and concurrency optimization
  • CoCuckoo achieves 75%-150% write throughput improvements

compared with 2-way libcuckoo.

Conclusion

66

slide-67
SLIDE 67

Q&A

https://csunyy.github.io/ Homepage::

67