hash tables
play

Hash Tables Direct-Address Tables Hash Functions Universal Hashing - PowerPoint PPT Presentation

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables Let U = { 0 , . . . , m 1 } , the set of possible keys. direct


  1. Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide – 1

  2. Direct-Address Tables Let U = { 0 , . . . , m − 1 } , the set of possible keys. ⊲ direct address Use array T [0 . . . m − 1] as a direct-address table. hash tables hash functions Implies 1-1 correspondence between keys and slots. hash functions universal hashing chaining chaining 2 open address Direct-Address-Search ( T, k ) open address 2 return T [ k ] analysis analysis 2 practical Direct-Address-Insert ( T, x ) practical T [ x.key ] ← x Direct-Address-Delete ( T, k ) T [ x.key ] ← nil Advantage: operations are Θ(1) . Disadvantage: Θ( | U | ) space required. CS 5633 Analysis of Algorithms Chapter 11: Slide – 2

  3. Hash Tables Let K be the set of keys to be stored. direct address ⊲ hash tables hash functions hash functions Goal: use Θ( | K | ) space and Θ(1) time/op. universal hashing chaining chaining 2 open address Idea: Use array T [0 . . . m − 1] as a hash table, open address 2 analysis and use a Θ(1) hash function h , where analysis 2 practical h : U → { 0 , . . . , m − 1 } maps from keys to slots. practical A collision is when two keys map to the same slot. CS 5633 Analysis of Algorithms Chapter 11: Slide – 3

  4. Good Hash Functions direct address Division method: h ( k ) = k mod m hash tables ⊲ hash functions m is prime, not close to any 2 i . hash functions universal hashing chaining Division variation: h ( k ) = ( k mod M ) mod m chaining 2 open address M is prime, << than | U | , not close to any 2 i . open address 2 analysis m is << than M . analysis 2 practical practical Multiplication method: h ( k ) = ⌊ m (( kA ) mod 1) ⌋ √ m is a power of 2. A = ( 5 − 1) / 2 CS 5633 Analysis of Algorithms Chapter 11: Slide – 4

  5. Horner’s Method for Division Hash Function direct address If k = � k [1] , . . . , k [ l ] � , and if 0 ≤ k [ i ] < r , then hash tables compute hash function by: hash functions ⊲ hash functions universal hashing chaining chaining 2 open address open address 2 analysis h ← k [1] mod m analysis 2 practical for i ← 2 to l practical do h ← ( rh + k [ i ]) mod m CS 5633 Analysis of Algorithms Chapter 11: Slide – 5

  6. Universal Hashing Let H be a set of hashing functions. direct address hash tables H is universal if h ( k ) = h ( k ′ ) with prob. 1 /m hash functions hash functions ⊲ universal hashing chaining m is a prime number. chaining 2 open address k = � k [1] , . . . , k [ l ] � , where 0 ≤ k [ i ] < m open address 2 analysis Assign a [ i ] ← Random ( 0 , m − 1 ) analysis 2 practical practical � � l h ( k ) = i =1 a [ i ] ∗ k [ i ] mod m Σ The set of possible functions h ( k ) is universal. h ( k ) = h ( k ′ ) with prob. 1 /m . If k [ i ] � = k ′ [ i ] , ( a [ i ] ∗ ( k [ i ] − k ′ [ i ])) mod m has equally likely results. CS 5633 Analysis of Algorithms Chapter 11: Slide – 6

  7. Chaining In chaining, slots are linked lists of the elements direct address hash tables that hash to that slot, i.e., collisions. hash functions hash functions universal hashing ⊲ chaining Consider m slots, n elts., load factor α = n/m . chaining 2 open address Worst-case: Θ( n ) if all elts. hash to same slot. open address 2 analysis Best-case: Θ(1 + α ) , each slot has ⌊ α ⌋ or ⌈ α ⌉ . analysis 2 practical practical Average-case: Assume each slot is equally likely. Unsuccessful search: Θ(1 + α ) This is because average slot length = α . CS 5633 Analysis of Algorithms Chapter 11: Slide – 7

  8. Chaining, Part 2 direct address Successful search: Θ(1 + α ) hash tables Before i th elt. inserted, avg. length = ( i − 1) /m . hash functions hash functions Expected position of i th elt. = 1 + ( i − 1) /m . universal hashing chaining ⊲ chaining 2 open address Expected search length is the summation: open address 2 analysis analysis 2 n practical n elements to search for. Σ practical i =1 1 /n Prob. for i th element is 1 /n . 1 + ( i − 1) /m Expected position of i th elt. � 1 � � � 1 + i − 1 = 1 + α 2 − 1 n Σ n m 2 m i =1 CS 5633 Analysis of Algorithms Chapter 11: Slide – 8

  9. Open-Address Hashing In open addressing, when a collision occurs, probe direct address hash tables for an empty slot and insert the new elt. there. hash functions hash functions universal hashing chaining The hash function becomes: chaining 2 ⊲ open address h : U × { 0 , . . . , m − 1 } → { 0 , . . . , m − 1 } open address 2 analysis analysis 2 The probe sequence � h ( k, 0) , . . . , h ( k, m − 1) � practical practical should include all the slots. CS 5633 Analysis of Algorithms Chapter 11: Slide – 9

  10. Open-Address Hashing, Part 2 direct address Hash-Insert ( T, x ) hash tables hash functions for i ← 0 to m − 1 hash functions universal hashing do j ← h ( x.key, i ) chaining chaining 2 if T [ j ] = nil open address ⊲ open address 2 then T [ j ] ← x analysis analysis 2 return j practical practical error “hash table overflow” Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots. CS 5633 Analysis of Algorithms Chapter 11: Slide – 10

  11. Uniform Hashing Analysis Uniform hashing assumes each open-address direct address hash tables probe-sequence is equally likely. hash functions hash functions universal hashing chaining 1 � � Unsuccessful Search: Θ chaining 2 1 − α open address open address 2 Let p i = prob. exactly i probes find full slots. ⊲ analysis Let q i = prob. first i probes find full slots. analysis 2 practical p i = q i − q i +1 practical � n � � n − 1 < α 2 � q 1 = n/m = α and q 2 = m − 1 m � n n − k � i i − 1 = α i q i = m − k ≤ Π m k =0 CS 5633 Analysis of Algorithms Chapter 11: Slide – 11

  12. Uniform Hashing Analysis, Part 2 Average number of probes is: direct address hash tables hash functions 1 hash functions ∞ n n i =0 α i = 1 + i =1 i p i = 1 + i =1 q i ≤ universal hashing Σ Σ Σ chaining 1 − α chaining 2 open address � 1 open address 2 1 � Successful Search: Θ α ln analysis 1 − α ⊲ analysis 2 Inserting i th elt. = unsuccessful search i − 1 elts. practical practical Average number of probes is: � 1 � � � 1 ≤ 1 1 n α ln Σ n 1 − ( i − 1) /m 1 − α i =1 CS 5633 Analysis of Algorithms Chapter 11: Slide – 12

  13. Performance of Practical Methods Linear Probing: h ( k, i ) = ( h ′ ( k ) + i ) mod m direct address hash tables hash functions 1 � � Successful Search: Θ hash functions 1 − α universal hashing � � chaining 1 Unsuccessful Search: Θ chaining 2 (1 − α ) 2 open address open address 2 analysis analysis 2 ⊲ practical Linear probing suffers from primary clustering , practical from long runs of occupied slots. An empty slot preceded by i full slots gets filled next with probability ( i + 1) /m . CS 5633 Analysis of Algorithms Chapter 11: Slide – 13

  14. Performance of Practical Methods Quadratic Probing assumes m is a power of 2. direct address hash tables hash functions 2 + i 2 h ( k, i ) = ( h ′ ( k ) + i hash functions 2 ) mod m universal hashing chaining chaining 2 � 1 open address 1 � Successful Search: Θ α ln open address 2 1 − α analysis 1 � � analysis 2 Unsuccessful Search: Θ practical 1 − α ⊲ practical Double Hashing, m is prime, 1 ≤ h 2 ( k ) ≤ m − 1 h ( k, i ) = ( h 1 ( k ) + i h 2 ( k )) mod m � 1 1 � Successful Search: Θ α ln 1 − α 1 � � Unsuccessful Search: Θ 1 − α CS 5633 Analysis of Algorithms Chapter 11: Slide – 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend