Bloom Filter & Hashing Barna Saha Bloom Filter Checks for SET - - PowerPoint PPT Presentation

bloom filter amp hashing
SMART_READER_LITE
LIVE PREVIEW

Bloom Filter & Hashing Barna Saha Bloom Filter Checks for SET - - PowerPoint PPT Presentation

Bloom Filter & Hashing Barna Saha Bloom Filter Checks for SET MEMBERSHIP efficiently Is element x in the set? MoAvaAng Example Spam Filtering We have a set of 1 billion email addresses that we consider to be non-spam. Each


slide-1
SLIDE 1

Bloom Filter & Hashing

Barna Saha

slide-2
SLIDE 2

Bloom Filter

  • Checks for SET MEMBERSHIP efficiently

Is element x in the set?

slide-3
SLIDE 3

MoAvaAng Example

  • Spam Filtering

Ø We have a set of 1 billion email addresses that we consider to be non-spam. Ø Each stream element is of the form (email address, email). Ø Before accepAng the email, a mail-client needs to check if this address belongs to set S. Ø Each typical email address requires 20 bytes of storage whereas in the main memory we only have say 1 billion byte (roughly 1 Gigabyte), or 8 billion bits. Ø We cannot store all the valid email addresses in the main memory.

slide-4
SLIDE 4

MoAvaAng Example

  • Spam Filtering

– All valid emails must be delivered – Number of spam emails delivered should be as low as possible

slide-5
SLIDE 5

Bloom Filter

slide-6
SLIDE 6

Bloom Filter

slide-7
SLIDE 7

Analysis of Bloom Filter

slide-8
SLIDE 8

Analysis of Bloom Filter

slide-9
SLIDE 9

Spam Filtering Example

  • We have
slide-10
SLIDE 10

OpAmum Value of k

  • As the number of hash funcAons increase,

higher is the chance of finding a 0 bit cell

  • Also with increasing number of hash

funcAons, the number of cells with 0 bits decreases

  • OpAmum value obtained by differenAaAon
slide-11
SLIDE 11

ApplicaAons of Bloom Filter

  • Bloom Filter has found innumerable applicaAons

in networking and web technology

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Analysis of Bloom Filter

Analysis uses fully random hash funcAons—difficult to

  • btain with high space and compuAng requirements
slide-18
SLIDE 18

Strongly 2-wise Universal Hash FuncAon

  • Mapping set of keys U=[0,1,2,…,m-1] to range

R=[0,1,2,…,n-1] – H={ha,b=[(ax+b) mod p] mod n}

  • p >=m is a prime, 1 <= a <=p-1, 0<=b <=p-1
  • Easy to compute and store: O(1)
  • SaAsfies (almost) for all ,
slide-19
SLIDE 19

Strongly 3-wise Universal Hash FuncAon

  • Mapping set of keys U=[0,1,2,…,m-1] to range

R=[0,1,2,…,n-1] – H={ha,b=[(ax2+bx+c) mod p] mod n}

  • p >=m is a prime, 1 <= a <=p-1, 0<=b,c<=p-1
  • Easy to compute and store: O(1)
  • SaAsfies (almost)
slide-20
SLIDE 20
  • Mapping set of keys U=[0,1,2,…,p-1] to range

R=[0,1,2,…,p-1] – H={ha,b=(ax+b) mod p}, 0<= a,b <=p-1

  • Fix .

– What is ? – Number of hash funcAons – Number of soluAons for “a” and “b”=1

Strongly 2-Universal