don t thrash
play

Dont Thrash: How to Cache Your Hash in Flash M.A. Bender, M. - PowerPoint PPT Presentation

Dont Thrash: How to Cache Your Hash in Flash M.A. Bender, M. Farach-Colton, R. Johnson, B.C. Kuszmaul, D. Medjedovic, P. Montes , P. Shetty, R. P. Spillane , E. Zadok Stony Brook U., Rutgers U., MIT, TokuTek Bloom Filter Cache (e.g., RAM)


  1. Don’t Thrash: How to Cache Your Hash in Flash M.A. Bender, M. Farach-Colton, R. Johnson, B.C. Kuszmaul, D. Medjedovic, P. Montes , P. Shetty, R. P. Spillane , E. Zadok Stony Brook U., Rutgers U., MIT, TokuTek

  2. Bloom Filter Cache (e.g., RAM) Bit-array 0 1 0 1 1 Store Each element is hashed A B To K positions in the Elements stored bit-array. Here k=2 in the Bloom filter • A Bloom filter is a bit-array + k hash functions • Storing a few bits per element lets the BF stay in RAM, even as the elements are too large Don't Thrash: How to Cache Your Hash in Flash 2

  3. Bloom Filter Lookups & False Positives False positive D C lookup A [ ] = 1 B h 2 ( D ) (C was never inserted) [ ] = 0 B h 1 ( D ) B 0 1 0 1 1 A B ( ) k ( ) » 1 - e kn / m p FP x • False positives unlikely, • No false negatives (no means no) • Allowing false positives is what keeps the BF small Don't Thrash: How to Cache Your Hash in Flash 3

  4. Flash RAM Flash B 0 1 0 1 1 Store A B • Bigger & cheaper than RAM, faster than disk • 8TB of 512B keys needs 16GB of RAM for a ~1% BF • Flash is a good place to cheaply store large BFs Don't Thrash: How to Cache Your Hash in Flash 4

  5. Thrashing Random writes C Flash B 0 1 0 1 1 • Setting random bits to 1 causes random writes • OK in RAM, not in Flash Don't Thrash: How to Cache Your Hash in Flash 5

  6. Summary of Our Results • Cascade Filter (CF), a BF replacement opt. for fast inserts on Flash • Our performance – We do 670,000 inserts/sec (40x of other variants) – We do 530 lookups/sec (1/3x of other variants) • We use Quotient Filters (QF) instead of Bloom Filters – They have better access locality – You can efficiently merge two QFs into a larger QF (w/ same FP rate) • We use merging techniques to compose multiple QFs into a CF Don't Thrash: How to Cache Your Hash in Flash 6

  7. Thrashing is the Problem C Random Writes K Flash • Every insert, you write to K Flash pages • Expensive to write to a Flash page • We can’t do fast insertions without working around this issue Don't Thrash: How to Cache Your Hash in Flash 7

  8. Shaving off K C K Now just one random write, not K Flash • Now you only write one block for each insert instead of K blocks • Two-step hash [Canim et. al., 2010] • This helps a little Don't Thrash: How to Cache Your Hash in Flash 8

  9. Queue Writes D B A RAM We write 5 bits with 3 4 1 3 0 1 only 2 flash writes Flash B 1 1 0 1 1 • This helps a lot [Canim et. al. 2010] • Buffering gives bit-flips a chance to piggy-back • How others have cached hashes in Flashes Don't Thrash: How to Cache Your Hash in Flash 9

  10. We Need Help RAM Flash • Buffering works when the queue is large • Small queues insert ~1 element per flash write • We’re interested in large datasets, and fast insertions (i.e., when buffering doesn’t work) Don't Thrash: How to Cache Your Hash in Flash 10

  11. An Important Problem • Many companies optimize their DBs for large data-sets and fast inserts – Bai-Du Hypertable – Facebook Cassandra – Google BigTable – TokuTek TokuDB – Yahoo! HBase – … and more! • Scaling the trusty Bloom Filter to Flash would be a powerful tool for tackling these problems Don't Thrash: How to Cache Your Hash in Flash 11

  12. Several data structures avoid RWs • A list of the most common methods – Buffered Repository Trees – Cassandra – Cache Oblivious Look-ahead Arrays – Log-structured Merge Trees – …and more • We can try to adapt the general method many of these structures use Don't Thrash: How to Cache Your Hash in Flash 12

  13. The General Method 7 2 4 8 Lookup 8 No RAM 2 4 7 No Found Store 1 3 9 2 Previously flushed buffers Buffers are merged to keep é ù £ log N ê ú 5 6 8 total number of buffers low 1 3 5 6 8 9 • Supports deletes • Composed of many sorted lists • We can use this technique to avoid random writes Don't Thrash: How to Cache Your Hash in Flash 13

  14. Problem: Elements not Bits • This method is used with sorted lists of elements, not Bloom filters • We need a data structure that – Supports insert + lookup – Is as space efficient as a Bloom filter – Can be merged on Flash like a sorted list of elements – Bonus: supports always-working deletes – Bonus: faster than BFs Don't Thrash: How to Cache Your Hash in Flash 14

  15. Our Proposal: Quotient Filters • Supports insert + lookup • Compact like a Bloom filter • Two QFs can be merged into a larger QF • Supports always-working deletes • Faster • We can use this alternative to replace the sorted lists of elements in a write-opt. method Don't Thrash: How to Cache Your Hash in Flash 15

  16. A Quotient Filter 2 or 3 MD bits per element r-bit array 101 000 110 000 r=3 00 01 10 11 address:identity Q[10]=110 A B h( A )=00:101 h( B )=10:110 • fingerprints + quotienting to save space • fingerprint: p-bit hash (p=5) • Compact, only stores r+MD bits per element Don't Thrash: How to Cache Your Hash in Flash 16

  17. A Quotient Filter h( C )=01:010 h( A )=00:101 A C E False positive h( E )=10:110 (E was never inserted) r-bit array 101 000 110 111 r=3 11 00 01 10 Soft collision (push D to the side, use a few MD bits to remember) D A B h( D )= 10 :111 h( B )= 10 :110 • False positive: fingerprint collision size = a - 1 r + MD ( ) 2 q ( ) £ a 1 • p FP x , , or ~1.2x a BF for ~0.1% FP-rate 2 r • Quotient Filters also remain small by allowing false positives Don't Thrash: How to Cache Your Hash in Flash 17

  18. But Will it Merge? 00:101= 5 10:111= 23 10:110= 22 r-bit array 101 000 110 111 r=3 11 00 01 10 A B D • Actually, a compact sorted list of integers Don't Thrash: How to Cache Your Hash in Flash 18

  19. Merge as Integers, Then Insert 101 110 111 100 011 000 001 010 r-bit array 00 01 00 00 00 10 11 00 r=2 001 :01= 5 101:10= 22 101:11= 23 A B C 00 :101= 5 10:110= 22 10:111= 23 101 000 111 000 000 000 110 000 r-bit arrays r=3 00 01 10 11 00 01 10 11 • QFs support Plug-n-Play with wrt.-opt. DSes Don't Thrash: How to Cache Your Hash in Flash 19

  20. Cascade Filter RAM QF Store QF é ù £ log N ê ú QF QF • Just substitute sorted lists of elements with Quotient Filters instead • Now we have fast insertions and a compact representation in Flash Don't Thrash: How to Cache Your Hash in Flash 20

  21. Experimental Setup • Everything was the same (e.g., cache size) • Inserted 8.4 billion hashes • Randomly queried them Don't Thrash: How to Cache Your Hash in Flash 21

  22. Insertion Throughput 9E+09 Number of Fingerprints Inserted Peak append 8E+09 thruput: 8.4MB/S 7E+09 6E+09 Large Merges 5E+09 4E+09 Thruput much higher: 3E+09 40x higher than BBF 2E+09 3000x higher than BF 1E+09 0 0 2000 4000 6000 8000 10000 12000 14000 Seconds Don't Thrash: How to Cache Your Hash in Flash 22

  23. Lookup Throughput 1600 lkus/sec 1600 lkus/sec 10000 530 lkus/sec Lookup Throughput 1000 1/3x 100 10 1 CF Traditional BF Elevator BF Don't Thrash: How to Cache Your Hash in Flash 23

  24. Conclusions • Quotient Filters outperform BFs in RAM – 3x faster inserts, same lookups – Support deletes – Can be dynamically resized • Cascade Filters outperform BFs in Flash – All advantages of Quotient Filters (e.g., deletes) – 40x faster inserts, 1/3x lookups – CPU bound Don't Thrash: How to Cache Your Hash in Flash 24

  25. Future Work • Tweak the CF to handle buffering as well • Measure real index workloads • Can a CF help a write-optimized DB? • There are a lot of exciting boulevards to explore Don't Thrash: How to Cache Your Hash in Flash 25

  26. And That is How… • …you Don’t Thrash, when you Cache Your Hash in Flash • Thank you for listening, Questions? – Pablo Montes: pmontes@cs.stonybrook.edu – Rick Spillane: rick@fsl.cs.sunysb.edu Don't Thrash: How to Cache Your Hash in Flash 26

  27. Insertion Throughput 670,000 ins/sec 1000000 17000 ins/sec 40x 100000 Insertion Throughput 3000x 10000 200 ins/sec 1000 100 10 1 CF Traditional BF Elevator BF Don't Thrash: How to Cache Your Hash in Flash 27

  28. Experimental Setup • Controls: – ~Equal DS cache size, BF given benefit of doubt – Equal RAM in all runs/tests – BF tests run in steady-state for 4+ hours – CF tests run for 8.4 billion insertions (~16GB CF) – Flash partition 60% of Intel X25-Mv2, 90GB • Machine: – Quad-core 2.4GHz Xeon E5530 with 8MB cache – 24GB of RAM (booted with 0.994GB) – 159.4GB Intel X-25M SSD (second generation) Don't Thrash: How to Cache Your Hash in Flash 28

  29. Future Work • Measure CF effectiveness for read-optimized • Measure real index workloads • Can a CF help a write-optimized DB? • Better CPU/GPU optimization • There are a lot of exciting boulevards to explore Don't Thrash: How to Cache Your Hash in Flash 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend