AniFilter: Parallel and Failure-Atomic Cuckoo Filter for - - PowerPoint PPT Presentation

anifilter parallel and failure atomic cuckoo filter for
SMART_READER_LITE
LIVE PREVIEW

AniFilter: Parallel and Failure-Atomic Cuckoo Filter for - - PowerPoint PPT Presentation

AniFilter: Parallel and Failure-Atomic Cuckoo Filter for Non-Volatile Memories Hyungjun Oh 1 , Bongki Cho 1 , Changdae Kim 2 , Heejin Park 1 , Jiwon Seo 1 1 2 1 Ou Outline NVM and AMQs Cuckoo Filter Optimizations in AniFilter


slide-1
SLIDE 1

AniFilter: Parallel and Failure-Atomic Cuckoo Filter for Non-Volatile Memories

1

Hyungjun Oh1, Bongki Cho1, Changdae Kim2, Heejin Park1, Jiwon Seo1

1 2

slide-2
SLIDE 2

Ou Outline

  • NVM and AMQs
  • Cuckoo Filter
  • Optimizations in AniFilter
  • Spillable Buckets
  • Lookahead Eviction
  • Bucket Primacy
  • Logging and Recovery
  • Evaluation

2

slide-3
SLIDE 3

No Non-Vo Volatile Memories

  • NVM Characteristics
  • High performance
  • Persistency
  • Byte-addressability

è Best of both DRAM and SSD (almost)

3

slide-4
SLIDE 4

Appr Approxi ximate Membe bershi ship p Que ueries s (AM (AMQs) s)

  • Approximate set data structures
  • APIs
  • Insert(x) – inserts key x into the set
  • Lookup(x) – lookup key x and returns true or false
  • Delete(x) – removes key x from the set (optional)
  • Small false-positives
  • lookup(x) true when x not in the set

4

slide-5
SLIDE 5

NVM VM and AMQs

  • NVMs are fast, but not as fast as DRAM
  • Read latency is 2~3x slower than DRAM

è DRAM versions of AMQs run slow on NVM

  • AMQs’ operations are cheap
  • Insert() and Lookup() need only handful of computation

è Cannot use complicated optimization techniques

5

slide-6
SLIDE 6

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

6

05A 0F2 0E9 000 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

slide-7
SLIDE 7

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

7

05A 0F2 0E9 000 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000

Bucket

1 2 3 4 5

Slots

slide-8
SLIDE 8

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Insertion (without eviction)

8

H1(x)

05A 0F2 0E9 000 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

Key x

slide-9
SLIDE 9

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Insertion (without eviction)

9

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

Key x

slide-10
SLIDE 10

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Insertion (with eviction)

10

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

H2(x) Key x

slide-11
SLIDE 11

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Insertion (with eviction)

11

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

H2(x) Key x

AFE

evicted

slide-12
SLIDE 12

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Insertion (with eviction)

12

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF FPX 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

H2(x) Key x

AFE

evicted

slide-13
SLIDE 13

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Lookup operation

13

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

Key x H2(x)

slide-14
SLIDE 14

Cu Cuckoo Filter

  • Fingerprint-based AMQ
  • Bucketized Cuckoo Filter

è Each bucket has four slots è Two hashes (H1, H2) for a bucket index

  • Lookup operation

14

H1(x)

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 14A FA0 5B0 000 1AA 5B0 000 000 1 2 3 4 5

Key x H2(x)

slide-15
SLIDE 15

Cu Cuckoo Filter r Issu ssue

1) Eviction overhead in high load factors (>75%)

  • Worse in NVM with higher latency

2) Failure-atomicity issue

  • Typical setting: 4 slots, 12 bit fingerprints
  • A bucket is 6 bytes
  • NVM’s atomic write unit: 8 byte

15

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 1 2 3

6 bytes

slide-16
SLIDE 16

Cu Cuckoo Filter r Issu ssue

1) Eviction overhead in high load factors (>75%)

  • Worse in NVM with higher latency

2) Failure-atomicity issue

  • Typical setting: 4 slots, 12 bit fingerprints
  • A bucket is 6 bytes
  • NVM’s atomic write unit: 8 byte

16

05A 0F2 0E9 AC0 600 27D 000 000 2F0 000 000 000 12D A15 1EF AFE 1 2 3

8 byte boundaries

slide-17
SLIDE 17

Ani AniFilter*

  • Cuckoo Filter optimized for NVM
  • Optimization techniques
  • Spillable Buckets
  • Lookahead Evictions
  • Bucket Primacy
  • Failure-atomic with minimal logging

17

* Anis are in the cuckoo family and have communal nests.

slide-18
SLIDE 18

Sp Spillable Bu Buckets

  • Spill a fingerprint in next 2 buckets
  • Only spill in the first slot

18

A0F D1F E49 F8A 000 27D 2A0 F09 000 000 000 000 000 A15 1EF AFE A4A DA1 EC0 F02 000 5B0 B8A CAC

slide-19
SLIDE 19

Sp Spillable Bu Buckets

  • Spill a fingerprint in next 2 buckets
  • Only spill in the first slot

19

H1 (x)

A0F D1F E49 F8A 000 27D 2A0 F09 000 000 000 000 000 A15 1EF AFE A4A DA1 EC0 F02 000 5B0 B8A CAC

Key x

slide-20
SLIDE 20

Sp Spillable Bu Buckets

  • Spill a fingerprint in next 2 buckets
  • Only spill in the first slot

20

Swapped to encode spill

A0F D1F E49 F8A 000 27D 2A0 F09 000 000 000 000 000 A15 1EF AFE A4A DA1 EC0 F02 080 B8A 5B0 CAC

Spill FPx Key x

>

slide-21
SLIDE 21

Sp Spillable Bu Buckets s – Th Theoretic Analysis

  • Eviction probability with and without Spillable Buckets
  • Probabilistic model to compute Prob(X=k)

21

slide-22
SLIDE 22

Sp Spillable Bu Buckets s – Th Theoretic Analysis

  • Eviction probability with and without Spillable Buckets
  • Probabilistic model to compute Prob(X=k)

22

slide-23
SLIDE 23

Sp Spillable Bu Buckets s – Th Theoretic Analysis

  • Eviction probability with and without Spillable Buckets
  • Probabilistic model to compute Prob(X=k)

23

slide-24
SLIDE 24

Sp Spillable Bu Buckets s – Th Theoretic Analysis

  • Eviction probability with and without Spillable Buckets
  • Probabilistic model to compute Prob(X=k)

24

slide-25
SLIDE 25

Sp Spillable Bu Buckets s – Th Theoretic Analysis

  • Eviction probability with and without Spillable Buckets
  • Probabilistic model to compute Prob(X=k)

25

Cuckoo Filter AniFilter

slide-26
SLIDE 26

Lo Lookahead Ev Eviction

  • Evict a fingerprint that does not incur further eviction

26

A0F D1F E49 F8A 000 27D 2A0 F09 000 000 000 000 000 A15 1EF AFE A4A DA1 EC0 F02 080 B8A 5B0 CAC 1 1 1

Occupancy flags

slide-27
SLIDE 27

Bu Bucket Pri rima macy

  • Primary bucket (H1) and secondary bucket (H2)

27

A0F D1F F8A E49 1A0 27D 2A0 F09 000 000 000 000 000 A15 1EF AFE A4A DA1 F02 EC0 080 B8A 5B0 CAC

> > < <

Swapped to encode overflow

Buckets previously not overflown

slide-28
SLIDE 28

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

28

A0F D1F F8A E49 2A0 27D 000 000 AC7 000 000 000 009 A15 1EF AFE

Type A Type B Type C Type A

slide-29
SLIDE 29

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets
  • 8-byte logging record

29

slide-30
SLIDE 30

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

30

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert)

slide-31
SLIDE 31

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

31

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert) 17

slide-32
SLIDE 32

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

32

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert) 17 F A 1

slide-33
SLIDE 33

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

33

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert) 17 F A 1 E A 1

slide-34
SLIDE 34

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

34

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert) 17 F A 1 E A 1 E

slide-35
SLIDE 35

Lo Logging for r Failure Atomi micity

  • Type-A, -B, -C buckets
  • Requires different # of loggings
  • Logging example for Type-B buckets

35

0 0 8 E A 1 F 0 1 0 F 0

17 F A 1

FP(insert) 17 F A 1 E A 1 E 00

slide-36
SLIDE 36

Re Recovery

  • 1. Check log record and read FPi, FPe, and bucket B
  • 2. Test if FPi and FPe are in B

a) FPe ∈ B and FPi ∈ B b) FPe ∉ B and FPi ∉ B c) FPe ∈ B and FPi ∉ B d) FPe ∉ B and FPi ∉ B

  • 3. For (a), (b), (c) insertion is incomplete

For (d) examine meta-data in log record è Example recovery process for (d)

36

slide-37
SLIDE 37

Re Recovery – Ex Exampl ple

37

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0

Write a log record

1

slide-38
SLIDE 38

Re Recovery

38

Update high 16 bit

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

2

Write a log record

1

slide-39
SLIDE 39

Re Recovery

39

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

slide-40
SLIDE 40

Re Recovery

40

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

0 0 8 F A 1 F 0 1 0 F 0

Recovery

4

FPi FPe

slide-41
SLIDE 41

Re Recovery

41

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

0 0 8 F A 1 F 0 1 0 F 0

Recovery

4

FPi FPe

slide-42
SLIDE 42

Re Recovery

42

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

0 0 8 F A 1 F 0 1 0 F 0

Recovery

4

slide-43
SLIDE 43

Re Recovery

43

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

0 0 8 F A 1 F 0 1 0 F 0

FP1 > FP2 ≠ spill status bit è Incomplete update detected

>

Recovery

4

slide-44
SLIDE 44

Re Recovery

44

Crash

17 FA1 EA1 E 00 0 0 8 E A 1 F 0 1 0 F 0 0 0 8 F A 1 F 0 1 0 F 0

3 2

Write a log record

1

Update high 16 bit

0 0 8 F A 1 F 0 1 0 F 0

Recovery

4

FP1 > FP2 ≠ spill status bit è Incomplete update detected

>

0 0 8 E A 1 F 0 1 0 F 0

>

slide-45
SLIDE 45

Pa Para rallel Implementation

  • Synchronization
  • Logging

45

slide-46
SLIDE 46

Pa Para rallel Implementation – Sy Sync

  • Shared lock for buckets
  • Holding locks for both buckets
  • Lock ordering – for spill, try lock

46

lock0 lock1 lock2 8 byte boundaries

slide-47
SLIDE 47

Pa Para rallel Implementation – Sy Sync

  • Shared lock for buckets
  • Holding locks for both buckets
  • Lock ordering – for spill, try lock

47

H1(x) Key x H2(x) lock0 lock1 lock2

slide-48
SLIDE 48

Pa Para rallel Implementation – Sy Sync

  • Shared lock for buckets
  • Holding locks for both buckets
  • Lock ordering – for spill, try lock

è No deadlock, yet may have livelock 1) Single insertion and multiple lookups – Bounded wait time for lookups 2) Multiple insertions – Extremely unlikely ( < 10-28)

48

slide-49
SLIDE 49

Pa Para rallel Implementation – Lo Logging

  • Thread-local log entries per thread
  • Additional log write for commit mark

– To determine write order between 2 threads accessing a same bucket

49

slide-50
SLIDE 50

Ev Evaluation

  • System setting

50

Intel Optane DC Persistent Memory* Quartz Emulation CPU Xeon Gold 5215M 2.5GHz (10 cores) Xeon E5-2620 2.4GHz Memory 384 GB DRAM + 1512 GB NVM 96 GB DRAM OS Linux Kernel 4.18.0 Linux Kernel 4.8.12

*Used AppDirect mode

slide-51
SLIDE 51

Ev Evaluation

  • Evaluated Filters

51

Filter Notation Description Cuckoo Filter CF Bucketized Cuckoo Filter Morton Filter MF DRAM-optimized Cuckoo Filter Rank-and-Select Quotient Filter RSQF SSD-optimized Quotient Filter Bloom Filter BF Bitmap-based AMQ è Configured to have the same false-positive rates

slide-52
SLIDE 52

Parallel Th Throughput*

52

*Intel Optane DC PM, 10 threads

Insertion Successful Lookup Random Lookup

AniFilter upto 10.7x faster (2.6x faster on avg) for insertion

AniFilter Cuckoo Filter

slide-53
SLIDE 53

Sequential Th Throughput*

53

*Intel Optane DC PM

Insertion Successful Lookup Random Lookup

slide-54
SLIDE 54

Sequential Th Throughput*

54

*Quartz emulation, 75% load factor

Insertion Successful Lookup Random Lookup

read/write latency read/write latency read/write latency

slide-55
SLIDE 55

Ef Effect of Lo Lookahead Ev Eviction – Occupa

Occupancy ncy Flags gs

55

slide-56
SLIDE 56

Sy Synergy between Optimizations

56

  • Spillable Buckets and Lookahead Eviction’s Impact on Bucket Primacy
slide-57
SLIDE 57

Co Conclusi sion

57

  • AniFilter – Optimized Cuckoo Filter for NVM
  • Optimizations
  • Spillable Buckets
  • Lookahead Evictions
  • Bucket Primacy
  • Logging for Failure-Atomicity
  • Evaluation on NVM
slide-58
SLIDE 58

Q/ Q/A

58

seojiwon@gmail.com