sampling and reconstruction using bloom filters
play

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - PowerPoint PPT Presentation

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab Bloom Filters Compact storage m bits, k hash functions Set Membership


  1. Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab

  2. Bloom Filters ● Compact storage ● m bits, k hash functions ● Set Membership Query: W ● Union ● Intersection 0 1 1 1 0 1 0 1 0 1 1 0 ● Sampling ● Reconstruction S = { A, B, C }

  3. Sampling from a Bloom filter Two Approaches: Dictionary Attack Hash Sample Slow Non-uniform Sample Invertible hash function t e e c S 0100010001011110 a e p t s a Membership Membership Invert e d m i d Set Bit a n N a C Candidate Set 0100010001011110 Bloom Filter Sample Sample

  4. BloomSampleTree Sampling from Bloom filters BloomSampleTree bT: (0..15) b 1 , b 2 , b 3 storing sets S 1 , S 2 , S 3 respectively. 1111111110 1111111010 (0..7) (8..15) Set S contains elements in range [0,15] M = 16 S stored in Bloom filter b (0..3) (4..7) (8..11) (12..15) with 10 bits m = 10 1101100010 0110011000 1100110010 0111011100 Bloom filters use 2 hash functions k = 2 Bloom filter b 1 Bloom filter b 2 Bloom filter b 3 BloomSampleTree bT 0100111000 0111000000 1000010100 created using S 1 = {1,12} S 2 = {4,6} S 3 = {3,7} (M = 16, m = 10, k = 2) Can be used to sample from all 3 sets

  5. BloomSampleTree Example: Sampling from Bloom filter b 1 using Start at the root (0..15) BloomSampleTree bT 1111111010 1111111110 . . 0100111000 Original Set S 1 = {1, 12} 0100111000 (0..7) (8..15) = 0100111000 = 0100111000 b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100

  6. BloomSampleTree Example: Sampling from Bloom filter b 1 using Start at the root (0..15) BloomSampleTree bT 1111111010 p R = 0.5 1111111110 . p L = 0.5 . 0100111000 Original Set S 1 = {1, 12} 0100111000 (0..7) (8..15) = 0100111000 = 0100111000 b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100

  7. BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} Chosen (0..7) (8..15) subtree b 1 = 0100111000 p R = 0.48 p L = 0.52 (0 .. 3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 . . 0100111000 0100111000 = 0100110000 = 0100011000

  8. BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} (0..7) (8..15) Chosen leaf b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 ● Membership(4, b 1 ) = false ● Membership(5, b 1 ) = false This path was a false positive ● Membership(6, b 1 ) = false ● Membership(7, b 1 ) = false

  9. BloomSampleTree Example: Sampling from Bloom filter b 1 using (0..15) BloomSampleTree bT 1111111110 1111111010 Original Set S 1 = {1, 12} (0..7) (8..15) Chosen leaf b 1 = 0100111000 (0..3) (4..7) (8..11) (12..15) 1101100010 0110011000 1100110010 0111011100 ● Membership(0, b 1 ) = false ● Membership(1, b 1 ) = true Sample = 1 ● Membership(2, b 1 ) = false ● Membership(3, b 1 ) = false

  10. BloomSampleTree - Sampling 1 (1 ... 10M) False Positive Path 2 3 Empty Intersection 4 5 Potential Path 7 6 True Path 8 9 10 11 14 15 13 12 Subtree pruned Subtree not from 25 37 visited at all search 38

  11. BloomSampleTree - Sampling Setting:

  12. BloomSampleTree - Sampling Setting: Sample Quality:

  13. BloomSampleTree - Sampling Setting: Running Time:

  14. BloomSampleTree - Sampling Simple Hash Functions Algorithms: ● Dictionary Attack (DA) ● BloomSampleTree(BST) Setting: ● M = 10 7 ● k = 3 ● m increases with desired Accuracy ● Size of set S, |S| varies from 100 to 50K MD5/Murmur Hash Functions

  15. BloomSampleTree - Reconstruction ● BloomSampleTree bT Similar to Sampling ● S = { 4, 6 } Follow all positive paths ● Bloom filter b Multi-threaded 0111000000 ● Challenge: − Bloom filter intersections almost never empty Reconstruction − Every path is a “true path” − Follow path only if the size of intersection exceeds a threshold. ● False Positive of set S part of reconstructed Reconstructed Set: set S’ S’ = { 4, 6, 13 }

  16. BloomSampleTree - Reconstruction Simple Hash Functions Algorithms: ● Dictionary Attack (DA): test each element in namespace for membership ● HashInvert (HI): Invert each set bit in the bloom filter and prune using membership ● BloomSampleTree(BST) Setting: ● M = 10 7 ● MD5 Hash Functions k = 3 ● m increases with desired Precision ● Size of set S, |S| varies from 100 to 50K

  17. Pruned BloomSampleTree Large range of possible values Actually used namespace much smaller (0..15) ● Do not expand nodes corresponding to 1111111110 1111111010 ‘unused’ regions (0..7) (8..15) ● Smaller BloomSampleTree, faster Sampling ● Add nodes to BloomSampleTree as namespace changes (0..3) (4..7) (8..11) (12..15) 1100110010 0110011000

  18. Pruned BloomSampleTree ● Sampling for smaller Namespace with Simple Hash Functions ● How large is the actually used section of the range = Namespace Fraction ● How are the actually used parts of namespace distributed within it? ● Uniform or Clustered ● Affects BST size, sampling time

  19. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend