Sampling and Reconstruction Using Bloom Filters
Neha Sengupta 1, Amitabha Bagchi 1, Srikanta Bedathur 2, Maya Ramanath 1
1 IIT Delhi 2 IBM India Research Lab
Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - - PowerPoint PPT Presentation
Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab Bloom Filters Compact storage m bits, k hash functions Set Membership
Neha Sengupta 1, Amitabha Bagchi 1, Srikanta Bedathur 2, Maya Ramanath 1
1 IIT Delhi 2 IBM India Research Lab
1 1 1 1 1 1 1 S = { A, B, C } Query: W
N a m e s p a c e
Membership C a n d i d a t e S e t
0100010001011110
Slow
Invert Set Bit
Candidate Set 0100010001011110
Non-uniform Sample Invertible hash function
Dictionary Attack Hash Sample
Bloom Filter Sample
Membership
Sample
(0..15) (0..7) (8..15)
(0..3) (4..7) (8..11) (12..15)
1111111110 1111111010 1100110010 0111011100 1101100010 0110011000
0100111000
Sampling from Bloom filters b1, b2, b3 storing sets S1, S2, S3 respectively. Set S contains elements in range [0,15] M = 16 S stored in Bloom filter b with 10 bits m = 10 Bloom filters use 2 hash functions k = 2 BloomSampleTree bT created using (M = 16, m = 10, k = 2) Can be used to sample from all 3 sets Bloom filter b1 S1 = {1,12}
0111000000
Bloom filter b2 S2 = {4,6}
1000010100
Bloom filter b3 S3 = {3,7}
BloomSampleTree bT:
(0..15) (0..7) (8..15)
(0..3) (4..7) (8..11) (12..15)
1100110010 0111011100 1101100010 0110011000
Start at the root
Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =
0100111000 1111111110 0100111000 0100111000
= .
1111111010 0100111000 0100111000
= .
(0..15) (0..7) (8..15)
(0..3) (4..7) (8..11) (12..15)
1111111110 1111111010 1100110010 0111011100 1101100010 0110011000
Start at the root
0100111000 0100111000
=
0100111000 0100111000
= pL = 0.5 pR = 0.5
Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =
0100111000
. .
(0..15) (0..7) (8..15)
(0..3)
(4..7)
(8..11) (12..15)
1111111110 1111111010 1100110010 0111011100 1101100010 0110011000
Chosen subtree
0100111000
.
0100110000
= .
0100111000 0100011000
=
Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =
pL = 0.52 pR = 0.48
0100111000
(0..15) (0..7) (8..15) (0..3) (4..7)
(8..11) (12..15)
1111111110 1111111010 1100110010 0111011100 1101100010 0110011000
Chosen leaf
Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =
0100111000
This path was a false positive
(0..15) (0..7) (8..15) (0..3)
(4..7) (8..11) (12..15)
1111111110 1111111010 1100110010 0111011100 1101100010 0110011000
Chosen leaf
Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =
0100111000
Sample = 1
Subtree not visited at all Subtree pruned from search
False Positive Path Empty Intersection Potential Path True Path
(1 ... 10M)
1 2 5 4 6 3 7 8 9 11 12 13 10 25 15 37 38 14
Setting:
Sample Quality: Setting:
Setting:
Running Time:
MD5/Murmur Hash Functions Algorithms:
Setting:
Simple Hash Functions
− Bloom filter intersections almost never empty − Every path is a “true path” − Follow path only if the size of intersection exceeds a threshold.
set S’
S = { 4, 6 }
0111000000
Bloom filter b BloomSampleTree bT Reconstruction
Reconstructed Set: S’ = { 4, 6, 13 }
Algorithms:
namespace for membership
bloom filter and prune using membership
Setting:
Simple Hash Functions MD5 Hash Functions
Large range of possible values Actually used namespace much smaller
‘unused’ regions
namespace changes
(0..15) (0..7) (8..15) (0..3) (4..7) (8..11) (12..15)
1111111110 1111111010 1100110010 0110011000