Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES
Hu Huanchen Zhang
Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo
Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU - - PowerPoint PPT Presentation
Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES Huanchen Zhang Hu Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo Fi Filters answer approximate
Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo
2
Bi Billionaire
2
Bi Billionaire
2
Bi Billionaire
2
YE YES, 100% No No False Ne Negatives
Bi Billionaire
2
Bi Billionaire
2
NO NO, 99%
Bi Billionaire
2
NO NO, 99% YE YES, 1%
Bi Billionaire
2
NO NO, 99% YE YES, 1%
Bi Billionaire
2
NO NO, 99% YE YES, 1% Fa False Positive Ra Rate
3
Lo Local Memory Sl Slow Devices Qu Queries
3
Lo Local Memory Sl Slow Devices Qu Queries
NO NO Pro robably YES
Point Filteri ring
4
Bl Bloom Filter (1
(1970)
Qu Quotient Filter (2
(2012)
Cu Cuckoo Filter (2
(2014)
SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’
Point Filteri ring
4
Bl Bloom Filter (1
(1970)
Qu Quotient Filter (2
(2012)
Cu Cuckoo Filter (2
(2014)
SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’
Range Filteri ring
SELECT * FROM Billionaire res WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’
Point Filteri ring
4
Bl Bloom Filter (1
(1970)
Qu Quotient Filter (2
(2012)
Cu Cuckoo Filter (2
(2014)
SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’
Range Filteri ring
SELECT * FROM Billionaire res WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’
Firs rst pra ractical, genera ral-purp rpose ra range filter
5
SM SMALL:
clo lose to theoretic minimum
FA FAST:
com
US USEFUL UL: ev
evaluated ed in Ro RocksDB
64 64-bit integer r keys, 1% false positive ra rate: ≈ 12 12 bi bits per r key 10 0 million 64-bit integer r keys: ≈ 200 00 ns ns per r query ry speed up ra range queri ries by up to 5x 5x
S 6 I G M O D K D D O P S
S 6 I G M O D K D D O P S
S 7 I G M O D K D D O P S S I G M K O
S 7 I G M O D K D D O P S S I G M K O
SI SIGMOD OD SI SIGMET ETRICS
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P SI SIGM GMETRICS
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P SI SIGM GMETRICS 0x 0x18
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P SI SIGM GMETRICS 0x 0x18 SI SIGM GMETRICS E
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
Each bit re reduces FPR by half
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
Each bit re reduces FPR by half Ca Cannot help ra range queri ries
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
Each bit re reduces FPR by half Ca Cannot help ra range queri ries Be Benefit point & ra range queri ries
8
S I G M K O
0x 0x20 0x 0xC8 0x 0x06 06
Ha Hashed Suffix Bits Re Real Su Suffix Bits
S I G M K O
O D P
Each bit re reduces FPR by half Ca Cannot help ra range queri ries Be Benefit point & ra range queri ries Weaker r distinguishability
9
… … us uses an an am amount of spac ace that at is “close” to the inform rmation-theore retic lower r bound, but still allows efficient query ry opera
wikipedia]
10 10
≈10 10 + suffix bi bits pe per key for 64-bi bit in integers ≈14 14 + suffix bi bits pe per key for emails
Ma Matches st state-of
the-ar art po pointer-ba based trees
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11
B B B
Ca Cached Filters
B, B, B, B, B, B, …
SST SSTable
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11 B B B
Ca Cached Filters
B, B, B, B, B, B, …
GE GET(16)
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 11 11 B B B
Ca Cached Filters
B, B, B, B, B, B, …
GE GET(16) NO NO
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 12 12 B B B
Ca Cached Filters
B, B, B, B, B, B, …
SE SEEK(14, 18)
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 12 12 B B B
Ca Cached Filters
B, B, B, B, B, B, …
SE SEEK(14, 18)
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 13 13 S S S
Ca Cached Filters
S, , S, , S, , …
LN-2 LN-1 LN
…, …, 6, 20, 0, … …, …, 12, 12, 21, 21, … …, …, 11, 11, 19 19, … 13 13 S S S
Ca Cached Filters
S, , S, , S, , …
SE SEEK(14, 18) GE GET(16) NO NO
14 14
Ti Time
Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload
14 14
Ti Time
Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload
SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries:
14 14
Ti Time
Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload
SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries: Sys System Co Config
Da Datase set: ≈100 00 GB on SSD DR DRAM: 32 32 GB
14 14
Ti Time
Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload
SE SEEK(t1, , t2) GE GET(t) t t1 t2 Qu Queries: Fi Filter Co Config
Bloom filter: r: 14 bits per r key Su SuRF: 4-bit re real suffix
Sys System Co Config
Da Datase set: ≈100 00 GB on SSD DR DRAM: 32 32 GB
15 15
10 10 20 20 30 40 40
Th Through ghput (Kops/s) No No Filter Bl Bloom Filter Su SuRF
Al All-false point queri ries
Wo Worst st-ca case Gap
2 4 6 8 10 10
Th Through ghput (Kops/s) Pe Percent of queries with empty results
10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99
No No Filter/ Bl Bloom Fi Filter Su SuRF
16 16
2 4 6 8 10 10
Th Through ghput (Kops/s) Pe Percent of queries with empty results
10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99
Su SuRF
16 16
No No Filter/ Bl Bloom Fi Filter
17 17
B1 B1
Ex Experiment: insert 5M 64-bi bit integers, 10M Zi Zipf-di distributed d ra range queri ries (ARF uses 2M queri ries for r tra raining)
Bits per r Key (held constant) Ra Range Query ry Thro roughput (Mops/s) Fa False Positive Ra Rate Bu Build Time (s) Build Memory ry (GB) Tra raining Time (s) Tra raining Thro roughput (Mops/s) AR ARF Su SuRF Impro rovement 14 14 0. 0.16 25 25.7 118 118 26 26 117 117 0. 0.02 02 14 14 3. 3.3 2. 2.2 1. 1.2 0. 0.02 02 N/ N/A N/ N/A
0x 12x 12x 98 98x 1300x 00x N/ N/A N/ N/A
B2 B2
a i h t f t
v1 v1 v2 v2 v3 v3 v4 v4 v5 v5
a a i d d h t t f t 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0
v1 v1 v2 v2 v3 v3 v4 v4 v5 v5
La Labe bel: St Structure: Ha Has-ch child: Va Value: LO LOUDS-Sp Sparse
d 10N 0N bits Theore retic Limit ≈ 9. 9.4N bi bits mo moveToChild (p (p) ) = se select(S (S, ra rank(H (HC, p) ) + 1)
B3 B3
LO LOUDS-De Dense LO LOUDS-Sp Sparse Ho Hot Co Cold
space overh rhead sp speed-up up