Dont Thrash: How to Cache Your Hash in Flash M.A. Bender, M. - PowerPoint PPT Presentation

Don’t Thrash: How to Cache Your Hash in Flash M.A. Bender, M. Farach-Colton, R. Johnson, B.C. Kuszmaul, D. Medjedovic, P. Montes , P. Shetty, R. P. Spillane , E. Zadok Stony Brook U., Rutgers U., MIT, TokuTek

Bloom Filter Cache (e.g., RAM) Bit-array 0 1 0 1 1 Store Each element is hashed A B To K positions in the Elements stored bit-array. Here k=2 in the Bloom filter • A Bloom filter is a bit-array + k hash functions • Storing a few bits per element lets the BF stay in RAM, even as the elements are too large Don't Thrash: How to Cache Your Hash in Flash 2

Bloom Filter Lookups & False Positives False positive D C lookup A [ ] = 1 B h 2 ( D ) (C was never inserted) [ ] = 0 B h 1 ( D ) B 0 1 0 1 1 A B ( ) k ( ) » 1 - e kn / m p FP x • False positives unlikely, • No false negatives (no means no) • Allowing false positives is what keeps the BF small Don't Thrash: How to Cache Your Hash in Flash 3

Flash RAM Flash B 0 1 0 1 1 Store A B • Bigger & cheaper than RAM, faster than disk • 8TB of 512B keys needs 16GB of RAM for a ~1% BF • Flash is a good place to cheaply store large BFs Don't Thrash: How to Cache Your Hash in Flash 4

Thrashing Random writes C Flash B 0 1 0 1 1 • Setting random bits to 1 causes random writes • OK in RAM, not in Flash Don't Thrash: How to Cache Your Hash in Flash 5

Summary of Our Results • Cascade Filter (CF), a BF replacement opt. for fast inserts on Flash • Our performance – We do 670,000 inserts/sec (40x of other variants) – We do 530 lookups/sec (1/3x of other variants) • We use Quotient Filters (QF) instead of Bloom Filters – They have better access locality – You can efficiently merge two QFs into a larger QF (w/ same FP rate) • We use merging techniques to compose multiple QFs into a CF Don't Thrash: How to Cache Your Hash in Flash 6

Thrashing is the Problem C Random Writes K Flash • Every insert, you write to K Flash pages • Expensive to write to a Flash page • We can’t do fast insertions without working around this issue Don't Thrash: How to Cache Your Hash in Flash 7

Shaving off K C K Now just one random write, not K Flash • Now you only write one block for each insert instead of K blocks • Two-step hash [Canim et. al., 2010] • This helps a little Don't Thrash: How to Cache Your Hash in Flash 8

Queue Writes D B A RAM We write 5 bits with 3 4 1 3 0 1 only 2 flash writes Flash B 1 1 0 1 1 • This helps a lot [Canim et. al. 2010] • Buffering gives bit-flips a chance to piggy-back • How others have cached hashes in Flashes Don't Thrash: How to Cache Your Hash in Flash 9

We Need Help RAM Flash • Buffering works when the queue is large • Small queues insert ~1 element per flash write • We’re interested in large datasets, and fast insertions (i.e., when buffering doesn’t work) Don't Thrash: How to Cache Your Hash in Flash 10

An Important Problem • Many companies optimize their DBs for large data-sets and fast inserts – Bai-Du Hypertable – Facebook Cassandra – Google BigTable – TokuTek TokuDB – Yahoo! HBase – … and more! • Scaling the trusty Bloom Filter to Flash would be a powerful tool for tackling these problems Don't Thrash: How to Cache Your Hash in Flash 11

Several data structures avoid RWs • A list of the most common methods – Buffered Repository Trees – Cassandra – Cache Oblivious Look-ahead Arrays – Log-structured Merge Trees – …and more • We can try to adapt the general method many of these structures use Don't Thrash: How to Cache Your Hash in Flash 12

The General Method 7 2 4 8 Lookup 8 No RAM 2 4 7 No Found Store 1 3 9 2 Previously flushed buffers Buffers are merged to keep é ù £ log N ê ú 5 6 8 total number of buffers low 1 3 5 6 8 9 • Supports deletes • Composed of many sorted lists • We can use this technique to avoid random writes Don't Thrash: How to Cache Your Hash in Flash 13

Problem: Elements not Bits • This method is used with sorted lists of elements, not Bloom filters • We need a data structure that – Supports insert + lookup – Is as space efficient as a Bloom filter – Can be merged on Flash like a sorted list of elements – Bonus: supports always-working deletes – Bonus: faster than BFs Don't Thrash: How to Cache Your Hash in Flash 14

Our Proposal: Quotient Filters • Supports insert + lookup • Compact like a Bloom filter • Two QFs can be merged into a larger QF • Supports always-working deletes • Faster • We can use this alternative to replace the sorted lists of elements in a write-opt. method Don't Thrash: How to Cache Your Hash in Flash 15

A Quotient Filter 2 or 3 MD bits per element r-bit array 101 000 110 000 r=3 00 01 10 11 address:identity Q[10]=110 A B h( A )=00:101 h( B )=10:110 • fingerprints + quotienting to save space • fingerprint: p-bit hash (p=5) • Compact, only stores r+MD bits per element Don't Thrash: How to Cache Your Hash in Flash 16

A Quotient Filter h( C )=01:010 h( A )=00:101 A C E False positive h( E )=10:110 (E was never inserted) r-bit array 101 000 110 111 r=3 11 00 01 10 Soft collision (push D to the side, use a few MD bits to remember) D A B h( D )= 10 :111 h( B )= 10 :110 • False positive: fingerprint collision size = a - 1 r + MD ( ) 2 q ( ) £ a 1 • p FP x , , or ~1.2x a BF for ~0.1% FP-rate 2 r • Quotient Filters also remain small by allowing false positives Don't Thrash: How to Cache Your Hash in Flash 17

But Will it Merge? 00:101= 5 10:111= 23 10:110= 22 r-bit array 101 000 110 111 r=3 11 00 01 10 A B D • Actually, a compact sorted list of integers Don't Thrash: How to Cache Your Hash in Flash 18

Merge as Integers, Then Insert 101 110 111 100 011 000 001 010 r-bit array 00 01 00 00 00 10 11 00 r=2 001 :01= 5 101:10= 22 101:11= 23 A B C 00 :101= 5 10:110= 22 10:111= 23 101 000 111 000 000 000 110 000 r-bit arrays r=3 00 01 10 11 00 01 10 11 • QFs support Plug-n-Play with wrt.-opt. DSes Don't Thrash: How to Cache Your Hash in Flash 19

Cascade Filter RAM QF Store QF é ù £ log N ê ú QF QF • Just substitute sorted lists of elements with Quotient Filters instead • Now we have fast insertions and a compact representation in Flash Don't Thrash: How to Cache Your Hash in Flash 20

Experimental Setup • Everything was the same (e.g., cache size) • Inserted 8.4 billion hashes • Randomly queried them Don't Thrash: How to Cache Your Hash in Flash 21

Insertion Throughput 9E+09 Number of Fingerprints Inserted Peak append 8E+09 thruput: 8.4MB/S 7E+09 6E+09 Large Merges 5E+09 4E+09 Thruput much higher: 3E+09 40x higher than BBF 2E+09 3000x higher than BF 1E+09 0 0 2000 4000 6000 8000 10000 12000 14000 Seconds Don't Thrash: How to Cache Your Hash in Flash 22

Lookup Throughput 1600 lkus/sec 1600 lkus/sec 10000 530 lkus/sec Lookup Throughput 1000 1/3x 100 10 1 CF Traditional BF Elevator BF Don't Thrash: How to Cache Your Hash in Flash 23

Conclusions • Quotient Filters outperform BFs in RAM – 3x faster inserts, same lookups – Support deletes – Can be dynamically resized • Cascade Filters outperform BFs in Flash – All advantages of Quotient Filters (e.g., deletes) – 40x faster inserts, 1/3x lookups – CPU bound Don't Thrash: How to Cache Your Hash in Flash 24

Future Work • Tweak the CF to handle buffering as well • Measure real index workloads • Can a CF help a write-optimized DB? • There are a lot of exciting boulevards to explore Don't Thrash: How to Cache Your Hash in Flash 25

And That is How… • …you Don’t Thrash, when you Cache Your Hash in Flash • Thank you for listening, Questions? – Pablo Montes: pmontes@cs.stonybrook.edu – Rick Spillane: rick@fsl.cs.sunysb.edu Don't Thrash: How to Cache Your Hash in Flash 26

Insertion Throughput 670,000 ins/sec 1000000 17000 ins/sec 40x 100000 Insertion Throughput 3000x 10000 200 ins/sec 1000 100 10 1 CF Traditional BF Elevator BF Don't Thrash: How to Cache Your Hash in Flash 27

Experimental Setup • Controls: – ~Equal DS cache size, BF given benefit of doubt – Equal RAM in all runs/tests – BF tests run in steady-state for 4+ hours – CF tests run for 8.4 billion insertions (~16GB CF) – Flash partition 60% of Intel X25-Mv2, 90GB • Machine: – Quad-core 2.4GHz Xeon E5530 with 8MB cache – 24GB of RAM (booted with 0.994GB) – 159.4GB Intel X-25M SSD (second generation) Don't Thrash: How to Cache Your Hash in Flash 28

Future Work • Measure CF effectiveness for read-optimized • Measure real index workloads • Can a CF help a write-optimized DB? • Better CPU/GPU optimization • There are a lot of exciting boulevards to explore Don't Thrash: How to Cache Your Hash in Flash 29

Dont Thrash: How to Cache Your Hash in Flash M.A. Bender, M. - PowerPoint PPT Presentation

Dont Thrash: How to Cache Your Hash in Flash M.A. Bender, M. Farach-Colton, R. Johnson, B.C. Kuszmaul, D. Medjedovic, P. Montes , P. Shetty, R. P. Spillane , E. Zadok Stony Brook U., Rutgers U., MIT, TokuTek Bloom Filter Cache (e.g., RAM)

They Don t Want Them Or You t Want Them Or You They Don Don t Have Them: t Have

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Lower Don Trail Master Plan Refresh Public Open House_September 17 2019 1 Lower Don Trail

DON Cybersecurity/Information Assurance Workforce Management Chris Kelsall DON CIO, Director,

Typical English mistakes The system consist of three main component. Giorgio Buttazzo don't forget

BACKGROUND JOB PROCESSING DO'S AND DON'TS BACKGROUND JOB PROCESSING - DO'S AND DON'TS IMAGE

ANDROID APPLICATIONS PRESENTER: DON HART Don spent several years as an independent computer

Don T Make Them Think Creating The Best Flow For The Elements Of Any Great Presentation Weissman

Sloughs at Don Edwards NWR Photo by Glenn Nevill Jesse Navarro, Supervisory FWO May 24, 2016

DON'T ASK, DON'T TELL THE VIRTUES OF PRIVACY BY DESIGN Eleanor McHugh 1998 PKI elliptic curves

Trail Talk Presentation by Don Reeves Radios Roundtable Discussion Radios Roundtable Discussion

MiDataHub Overview for Charter Schools Don Dailey, Director Introduction and Agenda Don

SR 874/Don Shula Expressway SR 874/Don Shula Expressway Ramp Connector Ramp Connector Ramp

Building Big Licensed Games with Big Teams Don L. Daglow President, Stormfront Studios Don L.

IoT Workshop Philly Tech Week - May 10, 2019 Don Coleman github.com/don/chariot-iot-workshop

Classroom Assessment and its Role in Teaching and Learning Don Klinger don.klinger@queensu.ca

Pseudo-Bayes Factors Stefano Cabras 1 , Walter Racugno 1 and Laura Ventura 2 1 Department of

Invyswell: A HyTM for Haswell RTM Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles

So You Think You Want to MIGRATE TO RDF? Steven Anderson Eben English Boston Public Library

Limitation? Pierre Kornprobst (INRIA) 0:20 Bilateral filter Soft texture is removed Input

Private Set Intersection for Unequal Set Sizes with Mobile Applications gnes Kiss (TU

Bloom Filter Encryption and Applications to Efficient Forward-Secret 0-RTT Key Exchange David

Charm spectroscopy and rare decays Diego Milans , on behalf of the LHCb collaboration,

A Too it for Ri - E y o Vadim Lyubashevsky 1 Chris Peikert 2 Oded