randomness in computing
play

Randomness in Computing L ECTURE 16 Last time Hashing Universal - PowerPoint PPT Presentation

Randomness in Computing L ECTURE 16 Last time Hashing Universal hash families Today Using universal hash families Perfect hashing Bloom filters 3/24/2020 Sofya Raskhodnikova;Randomness in Computing Static dictionary problem


  1. Randomness in Computing L ECTURE 16 Last time • Hashing • Universal hash families Today • Using universal hash families • Perfect hashing • Bloom filters 3/24/2020 Sofya Raskhodnikova;Randomness in Computing

  2. Static dictionary problem Motivating example Password checker to prevent people from using common passwords. • S is the set of common passwords • Universe: set 𝑉 • 𝑇 ⊆ 𝑉 and 𝑛 = |𝑇| • 𝑛 ≪ |𝑉 | Goal: A data structure for storing 𝑻 that supports the search query “ Does 𝑥 ∈ 𝑻 ?” for all words 𝑥 ∈ 𝑽 . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  3. Solutions Deterministic solutions • Store 𝑻 as a sorted array (or as a binary search tree) Search time : O ( log 𝑛 ), Space : O ( 𝑛 ) • Store an array that for each 𝑥 ∈ 𝑽 has 1 if 𝑥 ∈ 𝑻 and 0 otherwise. Search time: O ( 1 ), Space : O ( |𝑽| ) A randomized solution • Hashing

  4. Chain Hashing • Hash table: 𝒐 bins, words that fall in the Elements of 𝑻 same bin are chained into a linked list. • Hash function: ℎ : 𝑉  [𝑜] 1 To construct the table 2 hash all elements of 𝑇 ⋮ To search for word 𝒙 check if 𝑥 is in bin ℎ(𝑥) ⋮ Desiderata for 𝒊 : • O(1) evaluation time. 𝒐 • O(1) space to store ℎ .

  5. Universal hash family • A set ℋ of hash functions is universal if for every pair 𝑥 1 , 𝑥 2 ∈ 𝑉 and for ℎ chosen uniformly from ℋ ≤ 1 Pr ℎ 𝑥 1 = ℎ 𝑥 2 𝑜 Constructing a universal hash family • Fix a prime 𝑞 ≥ |𝑉| and think of the range as 0,1, … , 𝑜 − 1 . • Define 𝒊 𝒃,𝒄 𝒚 = 𝑏𝑦 + 𝑐 𝑛𝑝𝑒 𝑞 𝑛𝑝𝑒 𝑜 ℋ = ℎ 𝑏,𝑐 𝑏 ∈ 𝑞 − 1 , 0 ≤ 𝑐 ≤ 𝑞 − 1} Theorem ℋ is universal. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  6. Using a universal family As before : ≤ 𝒏 • If 𝑥 ∉ 𝑇, expected number of words in bin ℎ(𝑥) is 𝒐 ≤ 𝟐 + 𝒏 − 𝟐 • If 𝑥 ∈ 𝑇, expected number of words in bin ℎ(𝑥) is 𝒐 The previous guarantee on max load no longer holds! Goal: Given 𝑇, find a hash function with no collisions for words in S. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  7. Perfect hashing: no collisions Theorem If ℎ: 𝑉 → {0,1, … , 𝑜 − 1} is chosen uniformly at random from a universal hash family, then ∀𝑇 of size 𝑛 , such that 𝑜 ≥ 𝑛 2 , Pr ℎ is perfect ≥ 1/2. Proof: Let 𝑡 1 , … , 𝑡 𝑛 be elements of 𝑇. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  8. Perfect hashing Theorem If ℎ: 𝑉 → {0,1, … , 𝑜 − 1} is chosen uniformly at random from a universal hash family, then ∀𝑇 of size 𝑛 , such that 𝑜 ≥ 𝑛 2 , Pr ℎ is perfect ≥ 1/2. • Select ℎ ∈ ℋ until a perfect ℎ is found. • Expected number of tries is at most 2. • Each try takes 𝑃(𝑛) time. • Drawback: Ω 𝑛 2 space. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  9. 2-level scheme for perfect hashing • Set 𝑛 = 𝑜 . • Select ℎ ∈ ℋ until ℎ with at most 𝑛 collisions is found. • For each bin 𝑗 with collisions, that is, with 𝑙 > 1 items: – select a new hash function ℎ 𝑗 with 𝑙 2 bins from a universal family until ℎ 𝑗 has no collisions. 0 0 1 1 2 2 . . . . . . 𝑜 − 1 𝑜 − 1 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  10. 2-level scheme for perfect hashing • Set 𝑛 = 𝑜 . • Select ℎ ∈ ℋ until ℎ with at most 𝑛 collisions is found. • For each bin 𝑗 with collisions, that is, with 𝑙 > 1 items: – select a new hash function ℎ 𝑗 with 𝑙 2 bins from a universal family until ℎ 𝑗 has no collisions. Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. A solution for static dictionary problem with: • O( 1 ) worst case guarantee on search time. • O( 𝒏 ) space. • Expected O( 𝒏 ) preprocessing time. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  11. Analysis of 2-level scheme Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. Proof: • Let 𝑌 = # of collisions in Stage 1. 𝑛 2 1 • We showed before: Pr 𝑌 > ≤ 2 . 𝑜 1 • Now 𝑛 = 𝑜 : Pr 𝑌 > 𝑛 ≤ 2 . • So at least half of ℎ ∈ ℋ have ≤ 𝑛 collisions. • Assume we found such ℎ . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  12. Analysis of 2-level scheme Theorem 2-level scheme achieves perfect hashing with 𝑃(𝑛) space. Proof (continued): 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  13. Conclusion: 2-level hashing A solution for static dictionary problem with: • O( 1 ) worst case guarantee on search time. • O( 𝒏 ) space. • Expected O( 𝒏 ) preprocessing time. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  14. Approximate solutions for static dictionary problem • False positives: If 𝑥 ∈ 𝑻 , our data structure must answer correctly. If 𝑥 ∉ 𝑻 , we may err with small probability. • E.g, we prevent all unsuitable passwords and some suitable ones, too. Fingerprints • Use hash function ℎ • Store sorted list 𝑀 of fingerprints ℎ 𝑦 , 𝑦 ∈ 𝑇 . • To see if 𝑥 ∈ 𝑇, perform binary search for ℎ 𝑥 . 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

  15. Bloom filters • Trade off between space and false positive probability • Parameters 𝑙, 𝑜 • Bloom filter: array of 𝑜 bits 𝐵 1 , … , 𝐵[𝑜] – Initially: all bits are 0 𝑙 independent random hash functions ℎ 1 , … , ℎ 𝑙 with range [𝑜] • To represent set 𝑇 – For each 𝑦 ∈ 𝑇 and 𝑗 ∈ [𝑙] , set bits 𝐵[ℎ 𝑗 𝑦 ] to 1. • To search for 𝑥: – If for all 𝑗 ∈ 𝑙 , bits 𝐵 ℎ 𝑗 𝑥 = 1 , accept, o.w. reject. 3/24/2020 Sofya Raskhodnikova; Randomness in Computing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend