An Efficient Wait-free Resizable Hash Table Panagiota Fatourou 1,2 , - - PowerPoint PPT Presentation

an efficient wait free resizable hash table
SMART_READER_LITE
LIVE PREVIEW

An Efficient Wait-free Resizable Hash Table Panagiota Fatourou 1,2 , - - PowerPoint PPT Presentation

An Efficient Wait-free Resizable Hash Table Panagiota Fatourou 1,2 , Nikolaos Kallimanis 1 , Thomas Ropars 3 1 FORTH ICS 2 University of Crete 3 Univ. Grenoble Alpes 1 2018 A new combination of properties for a hash table Context Dictionary of


slide-1
SLIDE 1

An Efficient Wait-free Resizable Hash Table

Panagiota Fatourou1,2, Nikolaos Kallimanis1, Thomas Ropars3

1 FORTH ICS 2 University of Crete 3 Univ. Grenoble Alpes

2018

1

slide-2
SLIDE 2

A new combination of properties for a hash table

Context

Dictionary of Key-Value pairs Important data structure in several domains (OS, etc.)

A resizable hash table

Provides the strongest progress guarantee (wait-freedom) Targets the most common load for a hash table

◮ Large majority of Lookup operations

Outperforms existing non-blocking algorithms for such workloads

◮ By enforcing 2 important design rules

2018

2

slide-3
SLIDE 3

Hash tables

A hash function associates items to buckets

◮ Fixed-size buckets

3 operations:

◮ Insert(K, V) (If K already exists, V is updated) ◮ Delete(K) ◮ Lookup(K)

00 01 10 11 000010 010000 011110 00 01 10 11

2018

3

slide-4
SLIDE 4

Dynamic hashing

Adapts the number of buckets to the number of items Ensures constant average time for operations

00 01 01 1 010 011 000 001 010 011 100 101 110 111 000010 010000 011110 010000 010011 011110

Insert(010011)

2018

4

slide-5
SLIDE 5

Dynamic hashing

Adapts the number of buckets to the number of items Ensures constant average time for operations

00 01 01 1 010 011 000 001 010 011 100 101 110 111 000010 010000 011110 010000 010011 011110

Insert(010011)

2018

4

slide-6
SLIDE 6

Dynamic hashing

Adapts the number of buckets to the number of items Ensures constant average time for operations

00 01 01 1 010 011 000 001 010 011 100 101 110 111 000010 010000 011110 010000 010011 011110

Insert(010011)

2018

4

slide-7
SLIDE 7

Extendible hashing

Hash keys manipulated as bit strings

◮ A prefix of the key is used to find the appropriate bucket

Resizing actions are local

◮ Splitting and merging buckets

00 1 010 011 000 001 010 011 100 101 110 111 000010 010000 010011 011110

Insert(010011)

000010 00 000 001 010 011 100 101 110 111

2018

5

slide-8
SLIDE 8

Extendible hashing

Hash keys manipulated as bit strings

◮ A prefix of the key is used to find the appropriate bucket

Resizing actions are local

◮ Splitting and merging buckets

00 1 010 011 000 001 010 011 100 101 110 111 000010 010000 010011 011110

Insert(010011)

000010 00 000 001 010 011 100 101 110 111

2018

5

slide-9
SLIDE 9

A wait-free concurrent hash table

Natural parallelism

Operations applying to different parts of the hash table can run in parallel More complex with dynamic hashing

00 011 010 1 000 001 010 011 100 101 110 111 000010 010000 010011 011110

TA: Insert(100000) TB: Insert(010011)

2018

6

slide-10
SLIDE 10

A wait-free concurrent hash table

Natural parallelism

Operations applying to different parts of the hash table can run in parallel More complex with dynamic hashing

Non-blocking algorithm

Lock freedom: At least one thread makes progress Wait freedom: Every

  • peration completes in a finite

number of steps

00 011 010 1 000 001 010 011 100 101 110 111 000010 010000 010011 011110

TA: Insert(100000) TB: Insert(010011)

2018

6

slide-11
SLIDE 11

Towards an efficient resizable hash table: Insights

Most common load for a hash table

Large majority of Lookup() operations Resizing actions are rare

Design rules to achieve best performance

Lookup() operations should always be allowed to proceed without any synchronization When no resizing actions are executed, update operations applying to different buckets should be allowed to progress fully in parallel

2018

7

slide-12
SLIDE 12

Related work

The split-ordered list (LF-Split)

Shalev and Shavit [PODC’03] LF-Split does not comply with our design rules

◮ During Lookup() operations, threads have to help removing items marked for deletion. ◮ A global counter is modified after every insertion/deletion.

LF/WF-Freeze

Liu, Zhang, and Spear [PODC’14] WF-Freeze does not comply with our design rules

◮ A global sequence number is required to tag update operations

2018

8

slide-13
SLIDE 13

Contributions

The design of a wait-free extendible hash table

Follows our two design rules First algorithm to use several instances of the PSim universal construction [SPAA’11].

◮ Appropriatly synchronized to ensure wait-freedom

Experiments demonstrate the new performance trade-off

Outperforms all existing non-blocking resizable hash tables when resizing actions are rare Slower resizing

2018

9

slide-14
SLIDE 14

Our Wait-Free Algorithm

slide-15
SLIDE 15

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-16
SLIDE 16

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-17
SLIDE 17

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed Make a local copy of the object to update

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-18
SLIDE 18

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed Make a local copy of the object to update Apply all pending operations on the local object

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-19
SLIDE 19

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed Make a local copy of the object to update Apply all pending operations on the local object

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-20
SLIDE 20

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed Make a local copy of the object to update Apply all pending operations on the local object Try making the object globally visible using CAS

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-21
SLIDE 21

The PSim algorithm

Fatourou and Kallimanis [SPAA’11]

Announce the operation to be executed for k in 1..2: Make a local copy of the object to update Apply all pending operations on the local object Try making the object globally visible using CAS

00 000010 00 a: res: BState b

  • p0

help: 1 t:

T2: Insert(001110)

  • p0
  • p2

1 1 000010 00 a: res: BState 001110 1 1

2018

11

slide-22
SLIDE 22

The hash table structure

00 01 10 11

DState

prefix=0 prefix=1

Bucket

000010 100000

BState ht

help:

Two levels of indirection One instance of PSim for the DState and for each BState

2018

12

slide-23
SLIDE 23

Insert (no resizing) and Lookup operations

00 01 10 11 prefix=0 prefix=1 000010 100000

ht

Ta: Insert(111100) Tb: Lookup(100010)

100000 111100

Lookup operations are executed without any synchronization (BState objects are immutable) Insert operations on different buckets do not synchronize

2018

13

slide-24
SLIDE 24

Insert (no resizing) and Lookup operations

00 01 10 11 prefix=0 prefix=1 000010 100000

ht

Ta: Insert(111100) Tb: Lookup(100010)

100000 111100

Lookup operations are executed without any synchronization (BState objects are immutable) Insert operations on different buckets do not synchronize

2018

13

slide-25
SLIDE 25

Insert (no resizing) and Lookup operations

00 01 10 11 prefix=0 prefix=1 000010 100000

ht

Ta: Insert(111100) Tb: Lookup(100010)

100000 111100

Lookup operations are executed without any synchronization (BState objects are immutable) Insert operations on different buckets do not synchronize

2018

13

slide-26
SLIDE 26

Insert (no resizing) and Lookup operations

00 01 10 11 prefix=0 prefix=1 000010 100000

ht

Ta: Insert(111100) Tb: Lookup(100010)

100000 111100

Lookup operations are executed without any synchronization (BState objects are immutable) Insert operations on different buckets do not synchronize

2018

13

slide-27
SLIDE 27

Splitting a bucket

00 01 10 11 prefix=0 prefix=1 000010 100000 111100

ht

Ta: Insert(110011)

00 01 10 11 prefix=10 prefix=11 100000 111100 110011 00 01 10 11

2018

14

slide-28
SLIDE 28

Splitting a bucket

00 01 10 11 prefix=0 prefix=1 000010 100000 111100

ht

Ta: Insert(110011)

00 01 10 11 prefix=10 prefix=11 100000 111100 110011 00 01 10 11

2018

14

slide-29
SLIDE 29

Splitting a bucket

00 01 10 11 prefix=0 prefix=1 000010 100000 111100

ht

Ta: Insert(110011)

00 01 10 11 prefix=10 prefix=11 100000 111100 110011 00 01 10 11

2018

14

slide-30
SLIDE 30

Splitting a bucket

00 01 10 11 prefix=0 prefix=1 000010 100000 111100

ht

Ta: Insert(110011)

00 01 10 11 prefix=10 prefix=11 100000 111100 110011 00 01 10 11

2018

14

slide-31
SLIDE 31

Splitting a bucket

00 01 10 11 prefix=0 prefix=1 000010 100000 111100

ht

Ta: Insert(110011)

00 01 10 11 prefix=10 prefix=11 100000 111100 110011 00 01 10 11

To avoid losing updates:

Only full buckets can be replaced during resizing No update operation can be run on a full bucket

2018

14

slide-32
SLIDE 32

Increasing the directory size

00 01 10 11 prefix=0 prefix=10 prefix=11 000010 100000 111100 110011

ht

Ta: Insert(110110)

000 001 010 011 100 101 110 111 prefix=110 prefix=111 110011 110110 111100 00 01 10 11

2018

15

slide-33
SLIDE 33

Increasing the directory size

00 01 10 11 prefix=0 prefix=10 prefix=11 000010 100000 111100 110011

ht

Ta: Insert(110110)

000 001 010 011 100 101 110 111 prefix=110 prefix=111 110011 110110 111100 00 01 10 11

2018

15

slide-34
SLIDE 34

Increasing the directory size

00 01 10 11 prefix=0 prefix=10 prefix=11 000010 100000 111100 110011

ht

Ta: Insert(110110)

000 001 010 011 100 101 110 111 prefix=110 prefix=111 110011 110110 111100 00 01 10 11

2018

15

slide-35
SLIDE 35

Increasing the directory size

00 01 10 11 prefix=0 prefix=10 prefix=11 000010 100000 111100 110011

ht

Ta: Insert(110110)

000 001 010 011 100 101 110 111 prefix=110 prefix=111 110011 110110 111100 00 01 10 11

2018

15

slide-36
SLIDE 36

Ensuring wait-freedom

The problem

Ensuring that updates on DState and BState objects are wait-free is not enough to ensure that the operations on the hash table are wait-free

Example

  • 1. Thread Ta tries to insert in bucket B → full
  • 2. Ta tries to update the directory → already done
  • 3. Ta tries to insert in bucket B′ → full
  • 4. Ta tries to update the directory . . .

2018

16

slide-37
SLIDE 37

Ensuring wait-freedom

The problem

Ensuring that updates on DState and BState objects are wait-free is not enough to ensure that the operations on the hash table are wait-free

Solution

When resizing the directory, all pending updates applying to full buckets should be run

2018

16

slide-38
SLIDE 38

Executing each operation exactly once

The problem

An Insert operation can be applied directly on a BState or through a resizing action

◮ How to ensure that an operation is never executed twice?

Example

  • 1. Thread Ta wants to run an Insert operation

◮ It registers its operation in the help array

  • 2. Thread Tb executes the operation of Ta during a resizing action
  • 3. Ta access the bucket B where it should execute its operation

◮ Has its operation already been executed?

2018

17

slide-39
SLIDE 39

Executing each operation exactly once

The problem

An Insert operation can be applied directly on a BState or through a resizing action

◮ How to ensure that an operation is never executed twice?

Solution

Per-thread sequence numbers are used to tag operations The sequence number of the last applied operation is stored in each BState Sequence numbers are evaluated before executing an update

  • peration

2018

17

slide-40
SLIDE 40

Experimental Evaluation

slide-41
SLIDE 41

Implementation

Our proposed algorithm (WF-Ext)

Implementation in C Epoch-based memory reclamation Efficient memory allocation of BState objects

State-of-the-art algorithms

Reference C implementations and modified versions: LF-Split-M: Modified version to avoid the global counter LF-Freeze-M:

◮ Implementation of our semantic for Insert operations ◮ Integration of our efficient memory allocator ◮ Recall: WF-Freeze is much slower than LF-Freeze

2018

19

slide-42
SLIDE 42

Evaluation setup

Hardware

64-core machine with 4 NUMA nodes (Intel Broadwell)

Software

System memory allocator: tests with the glibc allocator and TCMalloc NUMA policy: tests with Local and Interleave policies

Methodology

Average over 10 runs All combinations of parameters are tested for each algorithm

2018

20

slide-43
SLIDE 43

Throughput with 90% Lookups (directory stable)

Description of the experiment:

Initial state: half-full hash table 5% Insert ops; 5% Delete ops

4 16 32 48 64 Threads 20 40 60 80 100 120 140 160 Mops/s LF-Split LF-Split-M LF-Freeze LF-Freeze-M WF-Ext

1K items

4 16 32 48 64 Threads 50 100 150 200 250 Mops/s

256K items

2018

21

slide-44
SLIDE 44

Conclusion

A wait-free extendible hash table

Follows two design rules to preserve the natural parallelism of such data structures Synchronizes several instances of the PSim algorithm to acheive wait-freedom

A new performance trade-off

Outperforms existing lock-free algorithms when resizing actions are rare Slower resizing actions

◮ Amortized over long runs

2018

22

slide-45
SLIDE 45

References

[1] Yujie Liu, Kunlong Zhang, and Michael Spear. “Dynamic-sized Nonblocking Hash Tables”. Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing. PODC ’14. Paris, France, 2014. [2] Panagiota Fatourou and Nikolaos D. Kallimanis. “Highly-Efficient Wait-Free Synchronization”. Theory of Computing Systems (2013),

  • pp. 1–46.

[3] Ori Shalev and Nir Shavit. “Split-ordered lists: Lock-free extensible hash tables”. Journal of the ACM 53.3 (2006), pp. 379–405.

Thanks!

2018

23

slide-46
SLIDE 46

Throughput with 50% Lookups (directory stable)

4 16 32 48 64 Threads 20 40 60 80 100 120 140 160 Mops/s LF-Split LF-Split-M LF-Freeze LF-Freeze-M WF-Ext

1K items

4 16 32 48 64 Threads 50 100 150 200 250 Mops/s

256K items

2018

24

slide-47
SLIDE 47

Performance with resizing

Description of the experiment

Initial state: Empty hash table with 2 buckets 90% Lookup ops; 10% Insert ops Throughput with 1K items over a 5 second run

4 16 32 48 64 Threads 25 50 75 100 125 150 175 200 Mops/s LF-Split LF-Split-M LF-Freeze LF-Freeze-M WF-Ext 2018

25

slide-48
SLIDE 48

Resizing efficiency

Description of the experiment

Inital state: empty hash table with 2 buckets 50% Lookup ops; 50% Insert ops Measurement: Time to reach final size

1 K 4 K 16 K 64 K 256 K Size 1 10 100 1000 10000 Time (ms) (Lower is better) LF-Split LF-Split-M LF-Freeze WF-Ext 2018

26

slide-49
SLIDE 49

Additional information

Merging buckets

Buckets to be merged have to be frozen A merging action may fail

Compliance with our design rules

Lookup operations are executed without any synchronization When no resizing is needed, an update operation is executed by the PSim instance of the bucket

2018

27

slide-50
SLIDE 50

Avoiding losing updates

The problem

Since an update operation on a bucket might be run in parallel with resizing the directory, how to avoid loosing updates?

Example

  • 1. Thread Ta updates bucket B during an update operation
  • 2. Thread Tb changes the directory during a resizing action
  • 3. Is the update made by Ta visible in the new directory published by

Tb?

2018

28

slide-51
SLIDE 51

Avoiding losing updates

The problem

Since an update operation on a bucket might be run in parallel with resizing the directory, how to avoid loosing updates?

Solution

For non-full buckets:

◮ The two levels of indirection ensure that the update of Ta remains accessible

For full buckets:

◮ Updates on full buckets are not allowed

2018

28