su surf practical range filtering with fa fast st su
play

Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU - PowerPoint PPT Presentation

Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES Huanchen Zhang Hu Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo Fi Filters answer approximate


  1. Su SuRF: : PRACTICAL RANGE FILTERING WITH FA FAST ST SU SUCCINCT TRIES Huanchen Zhang Hu Hy Hyeontaek Lim, Viktor r Leis, David G. Anders rsen Michael Kaminsky, Kimberl rly Keeton, Andre rew Pa Pavlo

  2. Fi Filters answer approximate membership queries 2

  3. Fi Filters answer approximate membership queries Bi Billionaire 2

  4. Fi Filters answer approximate membership queries Bi Billionaire 2

  5. Fi Filters answer approximate membership queries Bi Billionaire No No False Ne Negatives YE YES, 100% 2

  6. Fi Filters answer approximate membership queries Bi Billionaire 2

  7. Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire 2

  8. Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% 2

  9. Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% 2

  10. Fi Filters answer approximate membership queries NO NO, 99% Bi Billionaire YE YES, 1% Fa False Positive Ra Rate 2

  11. Filters pr Fi pre-re reject mo most t negati tive queries Queries Qu Lo Local Memory Sl Slow Devices 3

  12. Filters pr Fi pre-re reject mo most t negati tive queries Qu Queries Local Memory Lo NO NO Pro robably YES Slow Devices Sl 3

  13. Ex Existing filters only support point filtering Point Filteri ring SELECT * FROM Billionaire res WHER WH ERE E La LastName = = ‘Pa Pavlo’ Bloom Filter (1 Bl (1970) Quotient Filter (2 Qu (2012) Cuckoo Filter (2 Cu (2014) 4

  14. Existing filters only support point filtering Ex Point Filteri ring Range Filteri ring SELECT * FROM Billionaire res SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’ WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’ Bloom Filter (1 Bl (1970) Qu Quotient Filter (2 (2012) Cu Cuckoo Filter (2 (2014) 4

  15. Existing filters only support point filtering Ex Point Filteri ring Range Filteri ring SELECT * FROM Billionaire res SELECT * FROM Billionaire res WH WHER ERE E La LastName = = ‘Pa Pavlo’ WH WHER ERE E La LastName LI LIKE ‘Pa Pav%’ %’ Bloom Filter (1 Bl (1970) Qu Quotient Filter (2 (2012) Cu Cuckoo Filter (2 (2014) 4

  16. Ou Our solution: Su Succinct Range Filters (Su SuRF) Firs rst pra ractical, genera ral-purp rpose ra range filter SMALL: SM clo lose to theoretic minimum rate: ≈ 12 64 64-bit integer r keys, 1% false positive ra 12 bi bits per r key FA FAST: com omparable to o fastest trees r keys: ≈ 200 10 0 million 64-bit integer 00 ns ns per r query ry USEFUL US UL: ev evaluated ed in Ro RocksDB speed up ra range queri ries by up to 5x 5x 5

  17. St Starting point: a complete tr trie S I G K O M D O P D D S 6

  18. St Starting point: a complete tr trie S I TOO BI TO BIG G K O M D O P D D S 6

  19. Make it smaller: a truncated tr Ma trie S S I I G G K O K O M M D O P D D S 7

  20. Make it smaller: a truncated tr Ma trie S S I I G G K O K O M M D O P SI SIGMOD OD SI SIGMET ETRICS D D S 7

  21. Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0x 0xC8 0x20 0x 0x06 0x 06 D O P 8

  22. Us Use suffix bits to reduce fa false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits SIGM SI GMETRICS S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P 8

  23. Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Real Su Re Suffix Bits SI SIGM GMETRICS S S I I G G K O K O M M 0x 0xC8 0x 0x20 0x06 0x 06 D O P 0x18 0x 8

  24. Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits SIGM SI GMETRICS SIGM SI GMETRICS S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x 0x06 06 D O P 0x 0x18 E 8

  25. Us Use suffix bits to reduce fa false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0x 0xC8 0x20 0x 0x06 0x 06 D O P 8

  26. Use suffix bits to reduce fa Us false positive rate Hashed Suffix Bits Ha Re Real Su Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x 0x06 06 D O P Each bit re reduces FPR by half 8

  27. Use suffix bits to reduce fa Us false positive rate Hashed Suffix Bits Ha Real Su Re Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Ca Cannot help ra range queri ries 8

  28. Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Re Real Su Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Be Benefit point & ra range queri ries Cannot help ra Ca range queri ries 8

  29. Use suffix bits to reduce fa Us false positive rate Ha Hashed Suffix Bits Real Su Re Suffix Bits S S I I G G K O K O M M 0xC8 0x 0x20 0x 0x06 0x 06 D O P Each bit re reduces FPR by half Be Benefit point & ra range queri ries Ca Cannot help ra range queri ries Weaker r distinguishability 8

  30. Su Succinct Data St Structure … … us uses an an am amount of spac ace that at is “close” to the inform rmation-theore retic lower r bound, but still allows efficient query ry opera rations. [wi wikipedia] 9

  31. Su SuRF’s en encodin ing is is small and fast Sm Small ≈ 10 10 + suffix bi bits pe per key for 64-bi bit in integers ≈ 14 14 + suffix bi bits pe per key for emails Fa Fast Ma Matches st state-of of-th the-ar art po pointer-ba based trees 10 10

  32. Bloom filters speed up point queries in Ro Bl RocksDB Cached Filters Ca B, B, B, B, B, B, … L N-2 SST SSTable …, 6, 20, …, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 11 11

  33. Bloom filters speed up point queries in Ro Bl RocksDB Ca Cached Filters GET(16) GE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 11 11

  34. Bloom filters speed up point queries in Ro Bl RocksDB NO NO Ca Cached Filters GET(16) GE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, …, 12, 12, 21, 21, … B L N …, …, 11, 11, 19 19, … B 11 11

  35. Bloom filters can’t help range queries in Ro Bl RocksDB Ca Cached Filters SEEK(14, 18) SE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 12 12

  36. Bloom filters can’t help range queries in Ro Bl RocksDB Ca Cached Filters SEEK(14, 18) SE B, B, B, B, B, B, … L N-2 …, …, 6, 20, 0, … B L N-1 …, 12, …, 12, 21, 21, … B L N …, 11, …, 11, 19 19, … B 12 12

  37. SuRFs ca Su can benefit both point and range queries Ca Cached Filters S, , S, , S, , … L N-2 …, …, 6, 20, 0, … S L N-1 …, 12, …, 12, 21, 21, … S L N …, …, 11, 11, 19 19, … S 13 13

  38. SuRFs ca Su can benefit both point and range queries NO NO GET(16) GE Ca Cached Filters SEEK(14, 18) SE S, , S, , S, , … L N-2 …, …, 6, 20, 0, … S L N-1 …, …, 12, 12, 21, 21, … S L N …, …, 11, 11, 19 19, … S 13 13

  39. Ev Evaluation setup: a time-se series s benchmark Time Ti Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Va Value: 1K 1KB pa payload 14 14

  40. Ev Evaluation setup: a time-se series s benchmark Qu Queries: SEEK(t 1 , SE , t 2 ) GET(t) GE t t 1 t 2 Ti Time Ke Key: 64 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload 14 14

  41. Ev Evaluation setup: a time-se series s benchmark Queries: Qu SEEK(t 1 , SE , t 2 ) GE GET(t) t t 1 t 2 Time Ti Key: 64 Ke 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload Sys System Co Config set: ≈ 100 Datase Da 00 GB on SSD DR DRAM: 32 32 GB 14 14

  42. Evaluation setup: a time-se Ev series s benchmark Queries: Qu SE SEEK(t 1 , , t 2 ) GET(t) GE t t 1 t 2 Ti Time Key: 64 Ke 64-bi bit timestamp p + + 64-bit sensor r ID Value: 1K Va 1KB pa payload System Co Sys Config Filter Co Fi Config set: ≈ 100 Bloom filter: r: 14 bits per r key Da Datase 00 GB on SSD Su SuRF: 4-bit re real suffix DR DRAM: 32 32 GB 14 14

  43. Su SuRFs st still benefit point queries s in Ro RocksDB All-false point queri Al ries ghput (Kops/s) 40 40 30 Worst Wo st-ca case Gap 20 20 Through 10 10 Th 0 No Filter No Bloom Filter Bl Su SuRF 15 15

  44. Su SuRFs sp speed up range queries s in Ro RocksDB 10 10 SuRF Su ghput (Kops/s) 8 6 4 Through No Filter/ No 2 Bloom Fi Bl Filter Th 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 99 99 Percent of queries with empty results Pe 16 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend