Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - - PowerPoint PPT Presentation

sampling and reconstruction using bloom filters
SMART_READER_LITE
LIVE PREVIEW

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , - - PowerPoint PPT Presentation

Sampling and Reconstruction Using Bloom Filters Neha Sengupta 1 , Amitabha Bagchi 1 , Srikanta Bedathur 2 , Maya Ramanath 1 1 IIT Delhi 2 IBM India Research Lab Bloom Filters Compact storage m bits, k hash functions Set Membership


slide-1
SLIDE 1

Sampling and Reconstruction Using Bloom Filters

Neha Sengupta 1, Amitabha Bagchi 1, Srikanta Bedathur 2, Maya Ramanath 1

1 IIT Delhi 2 IBM India Research Lab

slide-2
SLIDE 2

Bloom Filters

  • Compact storage
  • m bits, k hash functions
  • Set Membership
  • Union
  • Intersection
  • Sampling
  • Reconstruction

1 1 1 1 1 1 1 S = { A, B, C } Query: W

slide-3
SLIDE 3

Sampling from a Bloom filter

Two Approaches:

N a m e s p a c e

Membership C a n d i d a t e S e t

0100010001011110

Slow

Invert Set Bit

Candidate Set 0100010001011110

Non-uniform Sample Invertible hash function

Dictionary Attack Hash Sample

Bloom Filter Sample

Membership

Sample

slide-4
SLIDE 4

BloomSampleTree

(0..15) (0..7) (8..15)

(0..3) (4..7) (8..11) (12..15)

1111111110 1111111010 1100110010 0111011100 1101100010 0110011000

0100111000

Sampling from Bloom filters b1, b2, b3 storing sets S1, S2, S3 respectively. Set S contains elements in range [0,15] M = 16 S stored in Bloom filter b with 10 bits m = 10 Bloom filters use 2 hash functions k = 2 BloomSampleTree bT created using (M = 16, m = 10, k = 2) Can be used to sample from all 3 sets Bloom filter b1 S1 = {1,12}

0111000000

Bloom filter b2 S2 = {4,6}

1000010100

Bloom filter b3 S3 = {3,7}

BloomSampleTree bT:

slide-5
SLIDE 5

BloomSampleTree

(0..15) (0..7) (8..15)

(0..3) (4..7) (8..11) (12..15)

1100110010 0111011100 1101100010 0110011000

Start at the root

Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =

0100111000 1111111110 0100111000 0100111000

= .

1111111010 0100111000 0100111000

= .

slide-6
SLIDE 6

BloomSampleTree

(0..15) (0..7) (8..15)

(0..3) (4..7) (8..11) (12..15)

1111111110 1111111010 1100110010 0111011100 1101100010 0110011000

Start at the root

0100111000 0100111000

=

0100111000 0100111000

= pL = 0.5 pR = 0.5

Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =

0100111000

. .

slide-7
SLIDE 7

BloomSampleTree

(0..15) (0..7) (8..15)

(0..3)

(4..7)

(8..11) (12..15)

1111111110 1111111010 1100110010 0111011100 1101100010 0110011000

Chosen subtree

0100111000

.

0100110000

= .

0100111000 0100011000

=

Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =

pL = 0.52 pR = 0.48

0100111000

slide-8
SLIDE 8

BloomSampleTree

(0..15) (0..7) (8..15) (0..3) (4..7)

(8..11) (12..15)

1111111110 1111111010 1100110010 0111011100 1101100010 0110011000

Chosen leaf

  • Membership(4, b1) = false
  • Membership(5, b1) = false
  • Membership(6, b1) = false
  • Membership(7, b1) = false

Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =

0100111000

This path was a false positive

slide-9
SLIDE 9

BloomSampleTree

(0..15) (0..7) (8..15) (0..3)

(4..7) (8..11) (12..15)

1111111110 1111111010 1100110010 0111011100 1101100010 0110011000

Chosen leaf

  • Membership(0, b1) = false
  • Membership(1, b1) = true
  • Membership(2, b1) = false
  • Membership(3, b1) = false

Example: Sampling from Bloom filter b1 using BloomSampleTree bT Original Set S1 = {1, 12} b1 =

0100111000

Sample = 1

slide-10
SLIDE 10

BloomSampleTree - Sampling

Subtree not visited at all Subtree pruned from search

False Positive Path Empty Intersection Potential Path True Path

(1 ... 10M)

1 2 5 4 6 3 7 8 9 11 12 13 10 25 15 37 38 14

slide-11
SLIDE 11

Setting:

BloomSampleTree - Sampling

slide-12
SLIDE 12

BloomSampleTree - Sampling

Sample Quality: Setting:

slide-13
SLIDE 13

Setting:

BloomSampleTree - Sampling

Running Time:

slide-14
SLIDE 14

BloomSampleTree - Sampling

MD5/Murmur Hash Functions Algorithms:

  • Dictionary Attack (DA)
  • BloomSampleTree(BST)

Setting:

  • M = 107
  • k = 3
  • m increases with desired Accuracy
  • Size of set S, |S| varies from 100 to 50K

Simple Hash Functions

slide-15
SLIDE 15

BloomSampleTree - Reconstruction

  • Similar to Sampling
  • Follow all positive paths
  • Multi-threaded
  • Challenge:

− Bloom filter intersections almost never empty − Every path is a “true path” − Follow path only if the size of intersection exceeds a threshold.

  • False Positive of set S part of reconstructed

set S’

S = { 4, 6 }

0111000000

Bloom filter b BloomSampleTree bT Reconstruction

Reconstructed Set: S’ = { 4, 6, 13 }

slide-16
SLIDE 16

BloomSampleTree - Reconstruction

Algorithms:

  • Dictionary Attack (DA): test each element in

namespace for membership

  • HashInvert (HI): Invert each set bit in the

bloom filter and prune using membership

  • BloomSampleTree(BST)

Setting:

  • M = 107
  • k = 3
  • m increases with desired Precision
  • Size of set S, |S| varies from 100 to 50K

Simple Hash Functions MD5 Hash Functions

slide-17
SLIDE 17

Pruned BloomSampleTree

Large range of possible values Actually used namespace much smaller

  • Do not expand nodes corresponding to

‘unused’ regions

  • Smaller BloomSampleTree, faster Sampling
  • Add nodes to BloomSampleTree as

namespace changes

(0..15) (0..7) (8..15) (0..3) (4..7) (8..11) (12..15)

1111111110 1111111010 1100110010 0110011000

slide-18
SLIDE 18

Pruned BloomSampleTree

  • Sampling for smaller Namespace with Simple Hash Functions
  • How large is the actually used section of the range = Namespace Fraction
  • How are the actually used parts of namespace distributed within it?
  • Uniform or Clustered
  • Affects BST size, sampling time
slide-19
SLIDE 19

Thank you!