Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU - - PowerPoint PPT Presentation

su surf practical range filtering with fa fast st su
SMART_READER_LITE
LIVE PREVIEW

Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU - - PowerPoint PPT Presentation

Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES Huanchen Zhang Hu Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo Fi Filters answer approximate


slide-1
SLIDE 1

Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES

Hu Huanchen Zhang

Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo

slide-2
SLIDE 2

Fi Filters answer approximate membership queries

2

slide-3
SLIDE 3

Fi Filters answer approximate membership queries

Bi Billionaire

2

slide-4
SLIDE 4

Fi Filters answer approximate membership queries

Bi Billionaire

2

slide-5
SLIDE 5

Fi Filters answer approximate membership queries

Bi Billionaire

2

YE YES, 100% No No False Ne Negatives

slide-6
SLIDE 6

Fi Filters answer approximate membership queries

Bi Billionaire

2

slide-7
SLIDE 7

Fi Filters answer approximate membership queries

Bi Billionaire

2

NO NO, 99%

slide-8
SLIDE 8

Fi Filters answer approximate membership queries

Bi Billionaire

2

NO NO, 99% YE YES, 1%

slide-9
SLIDE 9

Fi Filters answer approximate membership queries

Bi Billionaire

2

NO NO, 99% YE YES, 1%

slide-10
SLIDE 10

Fi Filters answer approximate membership queries

Bi Billionaire

2

NO NO, 99% YE YES, 1% Fa False Positive Ra Rate

slide-11
SLIDE 11

3

Lo Local Memory Sl Slow Devices Qu Queries

Fi Filters pr pre-re reject mo most t negati tive queries

slide-12
SLIDE 12

3

Lo Local Memory Sl Slow Devices Qu Queries

Fi Filters pr pre-re reject mo most t negati tive queries

NO NO Pro robably YES

slide-13
SLIDE 13

Ex Existing filters only support point filtering

Point Filteri ring

4

Bl Bloom Filter (1

(1970)

Qu Quotient Filter (2

(2012)

Cu Cuckoo Filter (2

(2014)

SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’

slide-14
SLIDE 14

Ex Existing filters only support point filtering

Point Filteri ring

4

Bl Bloom Filter (1

(1970)

Qu Quotient Filter (2

(2012)

Cu Cuckoo Filter (2

(2014)

SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’

Range Filteri ring

SELECT * FROM Billionaire res WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’

slide-15
SLIDE 15

Ex Existing filters only support point filtering

Point Filteri ring

4

Bl Bloom Filter (1

(1970)

Qu Quotient Filter (2

(2012)

Cu Cuckoo Filter (2

(2014)

SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’

Range Filteri ring

SELECT * FROM Billionaire res WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’

slide-16
SLIDE 16

Ou Our solution: Su Succinct Range Filters (Su SuRF)

Firs rst pra ractical, genera ral-purp rpose ra range filter

5

SM SMALL:

clo lose to theoretic minimum

FA FAST:

com

  • mparable to
  • fastest trees

US USEFUL UL: ev

evaluated ed in Ro RocksDB

64 64-bit integer r keys, 1% false positive ra rate: ≈ 12 12 bi bits per r key 10 0 million 64-bit integer r keys: ≈ 200 00 ns ns per r query ry speed up ra range queri ries by up to 5x 5x

slide-17
SLIDE 17

St Starting point: a complete tr trie

S 6 I G M O D K D D O P S

slide-18
SLIDE 18

St Starting point: a complete tr trie

S 6 I G M O D K D D O P S

TO TOO BI BIG

slide-19
SLIDE 19

S 7 I G M O D K D D O P S S I G M K O

Ma Make it smaller: a truncated tr trie

slide-20
SLIDE 20

S 7 I G M O D K D D O P S S I G M K O

Ma Make it smaller: a truncated tr trie

SI SIGMOD OD SI SIGMET ETRICS

slide-21
SLIDE 21

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

slide-22
SLIDE 22

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P SI SIGM GMETRICS

slide-23
SLIDE 23

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P SI SIGM GMETRICS 0x 0x18

slide-24
SLIDE 24

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P SI SIGM GMETRICS 0x 0x18 SI SIGM GMETRICS E

slide-25
SLIDE 25

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

slide-26
SLIDE 26

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

Each bit re reduces FPR by half

slide-27
SLIDE 27

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

Each bit re reduces FPR by half Ca Cannot help ra range queri ries

slide-28
SLIDE 28

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

Each bit re reduces FPR by half Ca Cannot help ra range queri ries Be Benefit point & ra range queri ries

slide-29
SLIDE 29

8

Us Use suffix bits to reduce fa false positive rate

S I G M K O

0x 0x20 0x 0xC8 0x 0x06 06

Ha Hashed Suffix Bits Re Real Su Suffix Bits

S I G M K O

O D P

Each bit re reduces FPR by half Ca Cannot help ra range queri ries Be Benefit point & ra range queri ries Weaker r distinguishability

slide-30
SLIDE 30

Su Succinct Data St Structure

9

… … us uses an an am amount of spac ace that at is “close” to the inform rmation-theore retic lower r bound, but still allows efficient query ry opera

  • rations. [wi

wikipedia]

slide-31
SLIDE 31

Su SuRF’s en encodin ing is is small and fast

10 10

Sm Small

≈10 10 + suffix bi bits pe per key for 64-bi bit in integers ≈14 14 + suffix bi bits pe per key for emails

Fa Fast

Ma Matches st state-of

  • f-th

the-ar art po pointer-ba based trees

slide-32
SLIDE 32

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11

Bl Bloom filters speed up point queries in Ro RocksDB

B B B

Ca Cached Filters

B, B, B, B, B, B, …

SST SSTable

slide-33
SLIDE 33

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11 B B B

Ca Cached Filters

B, B, B, B, B, B, …

GE GET(16)

Bl Bloom filters speed up point queries in Ro RocksDB

slide-34
SLIDE 34

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11 B B B

Ca Cached Filters

B, B, B, B, B, B, …

GE GET(16) NO NO

Bl Bloom filters speed up point queries in Ro RocksDB

slide-35
SLIDE 35

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 12 12 B B B

Ca Cached Filters

B, B, B, B, B, B, …

SE SEEK(14, 18)

Bl Bloom filters can’t help range queries in Ro RocksDB

slide-36
SLIDE 36

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 12 12 B B B

Ca Cached Filters

B, B, B, B, B, B, …

SE SEEK(14, 18)

Bl Bloom filters can’t help range queries in Ro RocksDB

slide-37
SLIDE 37

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 13 13 S S S

Ca Cached Filters

S, , S, , S, , …

Su SuRFs ca can benefit both point and range queries

slide-38
SLIDE 38

LN-2 LN-1 LN

…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 13 13 S S S

Ca Cached Filters

S, , S, , S, , …

SE SEEK(14, 18) GE GET(16) NO NO

Su SuRFs ca can benefit both point and range queries

slide-39
SLIDE 39

Ev Evaluation setup: a time-se series s benchmark

14 14

Ti Time

Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload

slide-40
SLIDE 40

Ev Evaluation setup: a time-se series s benchmark

14 14

Ti Time

Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload

SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries:

slide-41
SLIDE 41

Ev Evaluation setup: a time-se series s benchmark

14 14

Ti Time

Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload

SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries: Sys System Co Config

Da Datase set: ≈100 00 GB on SSD DR DRAM: 32 32 GB

slide-42
SLIDE 42

Ev Evaluation setup: a time-se series s benchmark

14 14

Ti Time

Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload

SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries: Fi Filter Co Config

Bloom filter: r: 14 bits per r key Su SuRF: 4-bit re real suffix

Sys System Co Config

Da Datase set: ≈100 00 GB on SSD DR DRAM: 32 32 GB

slide-43
SLIDE 43

15 15

Su SuRFs st still benefit point queries s in Ro RocksDB

10 10 20 20 30 40 40

Th Through ghput (Kops/s) No No Filter Bl Bloom Filter Su SuRF

Al All-false point queri ries

Wo Worst st-ca case Gap

slide-44
SLIDE 44

2 4 6 8 10 10

Th Through ghput (Kops/s) Pe Percent of queries with empty results

10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99

No No Filter/ Bl Bloom Fi Filter Su SuRF

16 16

Su SuRFs sp speed up range queries s in Ro RocksDB

slide-45
SLIDE 45

2 4 6 8 10 10

Th Through ghput (Kops/s) Pe Percent of queries with empty results

10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99

Su SuRF

16 16

5x 5x

Su SuRFs sp speed up range queries s in Ro RocksDB

No No Filter/ Bl Bloom Fi Filter

slide-46
SLIDE 46

Co Conclusion

17 17

Su SuRF is is a fast and compact data structure

  • p
  • ptimized for
  • r ra

range filteri ring gi github.com/e /efficient/Su SuRF ra rangefilter. r.io [D [Demo]

slide-47
SLIDE 47

Ba Backup Slides

slide-48
SLIDE 48

Co Compar aring ARF to Su SuRF

B1 B1

Ex Experiment: insert 5M 64-bi bit integers, 10M Zi Zipf-di distributed d ra range queri ries (ARF uses 2M queri ries for r tra raining)

Bits per r Key (held constant) Ra Range Query ry Thro roughput (Mops/s) Fa False Positive Ra Rate Bu Build Time (s) Build Memory ry (GB) Tra raining Time (s) Tra raining Thro roughput (Mops/s) AR ARF Su SuRF Impro rovement 14 14 0. 0.16 25 25.7 118 118 26 26 117 117 0. 0.02 02 14 14 3. 3.3 2. 2.2 1. 1.2 0. 0.02 02 N/ N/A N/ N/A

  • 20x

0x 12x 12x 98 98x 1300x 00x N/ N/A N/ N/A

slide-49
SLIDE 49

LO LOUDS-Sp Sparse encoding example

B2 B2

a i h t f t

v1 v1 v2 v2 v3 v3 v4 v4 v5 v5

a a i d d h t t f t 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0

v1 v1 v2 v2 v3 v3 v4 v4 v5 v5

La Labe bel: St Structure: Ha Has-ch child: Va Value: LO LOUDS-Sp Sparse

d 10N 0N bits Theore retic Limit ≈ 9. 9.4N bi bits mo moveToChild (p (p) ) = se select(S (S, ra rank(H (HC, p) ) + 1)

slide-50
SLIDE 50

LO LOUDS-DS DS trade des small space for performance

B3 B3

LO LOUDS-De Dense LO LOUDS-Sp Sparse Ho Hot Co Cold

space overh rhead sp speed-up up

< < 1% 3x 3x