Randomness in Computing L ECTURE 16 Last time Hashing Universal - - PowerPoint PPT Presentation

โ–ถ
randomness in computing
SMART_READER_LITE
LIVE PREVIEW

Randomness in Computing L ECTURE 16 Last time Hashing Universal - - PowerPoint PPT Presentation

Randomness in Computing L ECTURE 16 Last time Hashing Universal hash families Today Using universal hash families Perfect hashing Bloom filters 3/24/2020 Sofya Raskhodnikova;Randomness in Computing Static dictionary problem


slide-1
SLIDE 1

3/24/2020

Randomness in Computing

LECTURE 16

Last time

  • Hashing
  • Universal hash families

Today

  • Using universal hash families
  • Perfect hashing
  • Bloom filters

Sofya Raskhodnikova;Randomness in Computing

slide-2
SLIDE 2

Static dictionary problem

Motivating example Password checker to prevent people from using common passwords.

  • S is the set of common passwords
  • Universe: set ๐‘‰
  • ๐‘‡ โІ ๐‘‰ and ๐‘› = |๐‘‡|
  • ๐‘› โ‰ช |๐‘‰|

Goal: A data structure for storing ๐‘ป that supports the search query โ€œDoes ๐‘ฅ โˆˆ ๐‘ป ?โ€ for all words ๐‘ฅ โˆˆ ๐‘ฝ.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

slide-3
SLIDE 3

Solutions

Deterministic solutions

  • Store ๐‘ป as a sorted array (or as a binary search tree)

Search time: O(log ๐‘›), Space: O(๐‘›)

  • Store an array that for each ๐‘ฅ โˆˆ ๐‘ฝ has 1 if ๐‘ฅ โˆˆ ๐‘ป and 0 otherwise.

Search time: O(1), Space: O(|๐‘ฝ|)

A randomized solution

  • Hashing
slide-4
SLIDE 4

Chain Hashing

  • Hash table: ๐’ bins, words that fall in the

same bin are chained into a linked list.

  • Hash function: โ„Ž : ๐‘‰๏ƒ  [๐‘œ]

To construct the table hash all elements of ๐‘‡ To search for word ๐’™ check if ๐‘ฅ is in bin โ„Ž(๐‘ฅ) Desiderata for ๐’Š:

  • O(1) evaluation time.
  • O(1) space to store โ„Ž.

โ‹ฎ โ‹ฎ 1 2

๐’

Elements of ๐‘ป

slide-5
SLIDE 5

Universal hash family

  • A set โ„‹of hash functions is universal if for every pair

๐‘ฅ1, ๐‘ฅ2 โˆˆ ๐‘‰ and for โ„Ž chosen uniformly from โ„‹ Pr โ„Ž ๐‘ฅ1 = โ„Ž ๐‘ฅ2 โ‰ค 1 ๐‘œ Constructing a universal hash family

  • Fix a prime ๐‘ž โ‰ฅ |๐‘‰| and think of the range as 0,1, โ€ฆ , ๐‘œ โˆ’ 1 .
  • Define ๐’Š๐’ƒ,๐’„ ๐’š =

๐‘๐‘ฆ + ๐‘ ๐‘›๐‘๐‘’ ๐‘ž ๐‘›๐‘๐‘’ ๐‘œ โ„‹ = โ„Ž๐‘,๐‘ ๐‘ โˆˆ ๐‘ž โˆ’ 1 , 0 โ‰ค ๐‘ โ‰ค ๐‘ž โˆ’ 1}

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem โ„‹ is universal.

slide-6
SLIDE 6

Using a universal family

As before:

  • If ๐‘ฅ โˆ‰ ๐‘‡, expected number of words in bin โ„Ž(๐‘ฅ) is
  • If ๐‘ฅ โˆˆ ๐‘‡, expected number of words in bin โ„Ž(๐‘ฅ) is

The previous guarantee on max load no longer holds! Goal: Given ๐‘‡, find a hash function with no collisions for words in S.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

โ‰ค ๐’ ๐’ โ‰ค ๐Ÿ + ๐’ โˆ’ ๐Ÿ ๐’

slide-7
SLIDE 7

Perfect hashing: no collisions

Proof: Let ๐‘ก1, โ€ฆ , ๐‘ก๐‘› be elements of ๐‘‡.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem If โ„Ž: ๐‘‰ โ†’ {0,1, โ€ฆ , ๐‘œ โˆ’ 1} is chosen uniformly at random from a universal hash family, then โˆ€๐‘‡ of size ๐‘›, such that ๐‘œ โ‰ฅ ๐‘›2, Pr โ„Ž is perfect โ‰ฅ 1/2.

slide-8
SLIDE 8

Perfect hashing

  • Select โ„Ž โˆˆ โ„‹ until a perfect โ„Ž is found.
  • Expected number of tries is at most 2.
  • Each try takes ๐‘ƒ(๐‘›) time.
  • Drawback: ฮฉ ๐‘›2 space.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem If โ„Ž: ๐‘‰ โ†’ {0,1, โ€ฆ , ๐‘œ โˆ’ 1} is chosen uniformly at random from a universal hash family, then โˆ€๐‘‡ of size ๐‘›, such that ๐‘œ โ‰ฅ ๐‘›2, Pr โ„Ž is perfect โ‰ฅ 1/2.

slide-9
SLIDE 9

2-level scheme for perfect hashing

  • Set ๐‘› = ๐‘œ.
  • Select โ„Ž โˆˆ โ„‹ until โ„Ž with at most ๐‘› collisions is found.
  • For each bin ๐‘— with collisions, that is, with ๐‘™ > 1 items:

โ€“ select a new hash function โ„Ž๐‘— with ๐‘™2 bins from a universal family until โ„Ž๐‘— has no collisions.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

1 2 . .

.

๐‘œ โˆ’ 1

1 2 . . . ๐‘œ โˆ’ 1

slide-10
SLIDE 10

2-level scheme for perfect hashing

  • Set ๐‘› = ๐‘œ.
  • Select โ„Ž โˆˆ โ„‹ until โ„Ž with at most ๐‘› collisions is found.
  • For each bin ๐‘— with collisions, that is, with ๐‘™ > 1 items:

โ€“ select a new hash function โ„Ž๐‘— with ๐‘™2 bins from a universal family until โ„Ž๐‘— has no collisions.

A solution for static dictionary problem with:

  • O(1) worst case guarantee on search time.
  • O(๐’) space.
  • Expected O(๐’) preprocessing time.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem 2-level scheme achieves perfect hashing with ๐‘ƒ(๐‘›) space.

slide-11
SLIDE 11

Proof:

  • Let ๐‘Œ = # of collisions in Stage 1.
  • We showed before: Pr ๐‘Œ >

๐‘›2 ๐‘œ

โ‰ค

1 2.

  • Now ๐‘› = ๐‘œ: Pr ๐‘Œ > ๐‘› โ‰ค

1 2 .

  • So at least half of โ„Ž โˆˆ โ„‹ have โ‰ค ๐‘› collisions.
  • Assume we found such โ„Ž.

Analysis of 2-level scheme

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem 2-level scheme achieves perfect hashing with ๐‘ƒ(๐‘›) space.

slide-12
SLIDE 12

Proof (continued):

Analysis of 2-level scheme

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

Theorem 2-level scheme achieves perfect hashing with ๐‘ƒ(๐‘›) space.

slide-13
SLIDE 13

Conclusion: 2-level hashing

A solution for static dictionary problem with:

  • O(1) worst case guarantee on search time.
  • O(๐’) space.
  • Expected O(๐’) preprocessing time.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

slide-14
SLIDE 14

Approximate solutions

for static dictionary problem

  • False positives: If ๐‘ฅ โˆˆ ๐‘ป, our data structure must answer
  • correctly. If ๐‘ฅ โˆ‰ ๐‘ป, we may err with small probability.
  • E.g, we prevent all unsuitable passwords and some suitable ones,

too.

Fingerprints

  • Use hash function โ„Ž
  • Store sorted list ๐‘€ of fingerprints โ„Ž ๐‘ฆ , ๐‘ฆ โˆˆ ๐‘‡.
  • To see if ๐‘ฅ โˆˆ ๐‘‡, perform binary search for โ„Ž ๐‘ฅ .

3/24/2020

Sofya Raskhodnikova; Randomness in Computing

slide-15
SLIDE 15

Bloom filters

  • Trade off between space and false positive probability
  • Parameters ๐‘™, ๐‘œ
  • Bloom filter: array of ๐‘œ bits ๐ต 1 , โ€ฆ , ๐ต[๐‘œ]

โ€“ Initially: all bits are 0 ๐‘™ independent random hash functions โ„Ž1, โ€ฆ , โ„Ž๐‘™ with range [๐‘œ]

  • To represent set ๐‘‡

โ€“ For each ๐‘ฆ โˆˆ ๐‘‡ and ๐‘— โˆˆ [๐‘™], set bits ๐ต[โ„Ž๐‘— ๐‘ฆ ] to 1.

  • To search for ๐‘ฅ:

โ€“ If for all ๐‘— โˆˆ ๐‘™ , bits ๐ต โ„Ž๐‘— ๐‘ฅ = 1, accept, o.w. reject.

3/24/2020

Sofya Raskhodnikova; Randomness in Computing