cs 473 algorithms
play

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of - PowerPoint PPT Presentation

CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall 2016 Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 32 CS 473: Algorithms, Fall 2016 Universal Hashing Lecture 10 September 23, 2016 Chandra


  1. CS 473: Algorithms Chandra Chekuri Ruta Mehta University of Illinois, Urbana-Champaign Fall 2016 Chandra & Ruta (UIUC) CS473 1 Fall 2016 1 / 32

  2. CS 473: Algorithms, Fall 2016 Universal Hashing Lecture 10 September 23, 2016 Chandra & Ruta (UIUC) CS473 2 Fall 2016 2 / 32

  3. Part I Hash Tables Chandra & Ruta (UIUC) CS473 3 Fall 2016 3 / 32

  4. Dictionary Data Structure U : universe of keys with total order: numbers, strings, etc. 1 Data structure to store a subset S ⊆ U 2 Operations: 3 Search / look up : given x ∈ U is x ∈ S ? 1 Insert : given x �∈ S add x to S . 2 Delete : given x ∈ S delete x from S 3 Static structure: S given in advance or changes very 4 infrequently, main operations are lookups. Dynamic structure: S changes rapidly so inserts and deletes as 5 important as lookups. Can we do everything in O(1) time? Chandra & Ruta (UIUC) CS473 4 Fall 2016 4 / 32

  5. Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 32

  6. Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 32

  7. Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Ideal situation: Each element x ∈ S hashes to a distinct slot in T . Store x in 1 slot h(x) Lookup : Given y ∈ U check if T[h(y)] = y . O(1) time! 2 Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 32

  8. Hashing and Hash Tables Hash Table data structure: A (hash) table/array T of size m (the table size ). 1 A hash function h : U → { 0 , . . . , m − 1 } . 2 Item x ∈ U hashes to slot h(x) in T . 3 Given S ⊆ U . How do we store S and how do we do lookups? Ideal situation: Each element x ∈ S hashes to a distinct slot in T . Store x in 1 slot h(x) Lookup : Given y ∈ U check if T[h(y)] = y . O(1) time! 2 Collisions unavoidable if | T | < |U| . Several techniques to handle them. Chandra & Ruta (UIUC) CS473 5 Fall 2016 5 / 32

  9. Handling Collisions: Chaining Collision: h(x) = h(y) for some x � = y . Chaining/Open hashing to handle collisions: For each slot i store all items hashed to slot i in a linked list. 1 T[i] points to the linked list Lookup : to find if y ∈ U is in T , check the linked list at 2 T[h(y)] . Time proportion to size of linked list. f y s Does hashing give O(1) time per operation for dictionaries? Chandra & Ruta (UIUC) CS473 6 Fall 2016 6 / 32

  10. Hash Functions Parameters: N = |U| (very large), m = | T | , n = | S | Goal: O(1) -time lookup, insertion, deletion. Single hash function If N ≥ m 2 , then for any hash function h : U → T there exists i < m such that at least N / m ≥ m elements of U get hashed to slot i . Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

  11. Hash Functions Parameters: N = |U| (very large), m = | T | , n = | S | Goal: O(1) -time lookup, insertion, deletion. Single hash function If N ≥ m 2 , then for any hash function h : U → T there exists i < m such that at least N / m ≥ m elements of U get hashed to slot i . Any S containing all of these is a very very bad set for h ! Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

  12. Hash Functions Parameters: N = |U| (very large), m = | T | , n = | S | Goal: O(1) -time lookup, insertion, deletion. Single hash function If N ≥ m 2 , then for any hash function h : U → T there exists i < m such that at least N / m ≥ m elements of U get hashed to slot i . Any S containing all of these is a very very bad set for h ! Such a bad set may lead to O(m) lookup time! Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

  13. Hash Functions Parameters: N = |U| (very large), m = | T | , n = | S | Goal: O(1) -time lookup, insertion, deletion. Single hash function If N ≥ m 2 , then for any hash function h : U → T there exists i < m such that at least N / m ≥ m elements of U get hashed to slot i . Any S containing all of these is a very very bad set for h ! Such a bad set may lead to O(m) lookup time! Lesson: Consider a family H of hash functions with good properties and choose h uniformly at random. Guarantees: small # collisions in expectation for a given S . H should allow efficient sampling. Chandra & Ruta (UIUC) CS473 7 Fall 2016 7 / 32

  14. Universal Hashing Question: What are good properties of H in distributing data? Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

  15. Universal Hashing Question: What are good properties of H in distributing data? Uniform: Consider any element x ∈ U . Then if h ∈ H is 1 picked randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

  16. Universal Hashing Question: What are good properties of H in distributing data? Uniform: Consider any element x ∈ U . Then if h ∈ H is 1 picked randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . Universal: Consider any two distinct elements x , y ∈ U . Then 2 if h ∈ H is picked randomly then the probability of a collision between x and y should be at most 1 / m . In other words Pr [h(x) = h(y)] = 1 / m (cannot be smaller). Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

  17. Universal Hashing Question: What are good properties of H in distributing data? Uniform: Consider any element x ∈ U . Then if h ∈ H is 1 picked randomly then x should go into a random slot in T . In other words Pr [h(x) = i] = 1 / m for every 0 ≤ i < m . Universal: Consider any two distinct elements x , y ∈ U . Then 2 if h ∈ H is picked randomly then the probability of a collision between x and y should be at most 1 / m . In other words Pr [h(x) = h(y)] = 1 / m (cannot be smaller). Second property is stronger than the first and the crucial issue. 3 Definition A family of hash function H is ( 2 -) universal if for all distinct x , y ∈ U , Pr h ∼H [h(x) = h(y)] = 1 / m where m is the table size. Chandra & Ruta (UIUC) CS473 8 Fall 2016 8 / 32

  18. Analyzing Universal Hashing Question: Fixing set S , what is the expected time to look up x ∈ S when h is picked uniformly at random from H ? ℓ (x) : the size of the list at T[h(x)] . We want E [ ℓ (x)] 1 For y ∈ S let D y be one if h(y) = h(x) , else zero. 2 ℓ (x) = � y ∈ S D y Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 32

  19. Analyzing Universal Hashing Question: Fixing set S , what is the expected time to look up x ∈ S when h is picked uniformly at random from H ? ℓ (x) : the size of the list at T[h(x)] . We want E [ ℓ (x)] 1 For y ∈ S let D y be one if h(y) = h(x) , else zero. 2 ℓ (x) = � y ∈ S D y E[ ℓ (x)] = � y ∈ S E [D y ] = � y ∈ S Pr[h(x) = h(y)] 1 = � (since H is a universal hash family) y ∈ S m = | S | / m ≤ 1 if | S | ≤ m Chandra & Ruta (UIUC) CS473 9 Fall 2016 9 / 32

  20. Analyzing Universal Hashing Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? Answer: O(n / m) . Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 32

  21. Analyzing Universal Hashing Question: What is the expected time to look up x in T using h assuming chaining used to resolve collisions? Answer: O(n / m) . Comments: O(1) expected time also holds for insertion. 1 Analysis assumes static set S but holds as long as S is a set 2 formed with at most O(m) insertions and deletions. Worst-case : look up time can be large! How large? 3 Ω(log n / log log n) Chandra & Ruta (UIUC) CS473 10 Fall 2016 10 / 32

  22. Universal Hash Family Universal: H such that Pr[h(x) = h(y)] = 1 / m . All functions H : Set of all possible functions h : U → { 0 , . . . , m − 1 } . Universal. Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 32

  23. Universal Hash Family Universal: H such that Pr[h(x) = h(y)] = 1 / m . All functions H : Set of all possible functions h : U → { 0 , . . . , m − 1 } . Universal. |H| = m |U| representing h requires |U| log m – Not O(1) ! Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 32

  24. Universal Hash Family Universal: H such that Pr[h(x) = h(y)] = 1 / m . All functions H : Set of all possible functions h : U → { 0 , . . . , m − 1 } . Universal. |H| = m |U| representing h requires |U| log m – Not O(1) ! We need compactly representable universal family. Chandra & Ruta (UIUC) CS473 11 Fall 2016 11 / 32

  25. Compact Universal Hash Family Parameters: N = |U| , m = | T | , n = | S | Choose a prime number p ≥ N . Z p = { 0 , 1 , . . . , p − 1 } is a 1 field. For a , b ∈ Z p , a � = 0 , define the hash function h a , b as 2 h a , b (x) = ((ax + b) mod p) mod m . Let H = { h a , b | a , b ∈ Z p , a � = 0 } . Note that 3 |H| = p(p − 1) . Chandra & Ruta (UIUC) CS473 12 Fall 2016 12 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend