cse 326 data structures
play

CSE 326: Data Structures (amortized) linked list Array Hash - PowerPoint PPT Presentation

Dictionary Implementations So Far BST AVL Splay Unsorted Sorted CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins Spring 2007 Delete Lecture 16 1 2 Hash Tables Example 0 Constant


  1. Dictionary Implementations So Far BST AVL Splay Unsorted Sorted CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins Spring 2007 Delete Lecture 16 1 2 Hash Tables Example 0 • Constant time accesses! • key space = integers hash table 1 • A hash table is an array of some • TableSize = 10 2 0 fixed size, usually a prime number. 3 • General idea: 4 • h (K) = K mod 10 5 hash function: h(K) 6 … • Insert : 7, 18, 41, 94 7 8 9 key space (e.g., integers, strings) TableSize –1 3 4 1

  2. Another Example Hash Functions • key space = integers 0 • TableSize = 6 1. simple/fast to compute, 1 2. Avoid collisions 2 • h (K) = K mod 6 3. have keys distributed evenly among cells. 3 4 5 • Insert : 7, 18, 41, 34 Perfect Hash function: 5 6 Sample Hash Functions: Collision Resolution • key space = strings Collision : when two keys map to the same location in the hash table. • s = s 0 s 1 s 2 … s k-1 1. h(s) = s 0 mod TableSize Two ways to resolve collisions: ⎛ ∑ ⎞ − k 1 ⎜ ⎟ s 1. Separate Chaining 2. h(s) = mod TableSize i ⎝ ⎠ = i 0 2. Open Addressing (linear probing, ⎛ − ⎞ 1 k ∑ quadratic probing, double hashing) ⎜ ⋅ ⎟ 3. h(s) = mod TableSize i 37 s i ⎝ ⎠ = i 0 7 8 2

  3. Separate Chaining Analysis of find Insert : • Defn: The load factor, λ , of a hash table is 10 0 22 the ratio: ← no. of elements 1 N 107 M ← table size 2 12 42 3 For separate chaining, λ = average # of • Separate chaining : 4 elements in a bucket All keys that map to 5 • Unsuccessful find: the same hash value 6 are kept in a list (or 7 • Successful find: “bucket”). 8 9 9 10 tableSize: Why Prime? How big should the hash table be? • Suppose – data stored in hash table: 7160, 493, 60, 55, 321, • For Separate Chaining: 900, 810 Real-life data tends to have a pattern – tableSize = 10 data hashes to 0, 3, 0, 5, 1, 0, 0 Being a multiple of 11 is usually not the pattern ☺ – tableSize = 11 data hashes to 10, 9, 5, 0, 2, 9, 7 11 12 3

  4. Open Addressing Insert : Terminology Alert! 38 0 19 8 1 109 2 10 “ Open Hashing” “Closed Hashing” 3 equals equals • Linear Probing : 4 “Separate Chaining” “ Open Addressing” after checking spot Weiss 5 h(k), try spot 6 h(k)+1, if that is 7 full, try h(k)+2, 8 then h(k)+3, etc. 9 13 14 Linear Probing Linear Probing – Clustering f(i) = i no collision collision in small cluster no collision • Probe sequence: 0 th probe = h(k) mod TableSize 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 2) mod TableSize collision in large cluster . . . i th probe = (h(k) + i) mod TableSize [R. Sedgewick] 15 16 4

  5. Quadratic Probing Less likely Load Factor in Linear Probing to encounter Primary f(i) = i 2 • For any λ < 1, linear probing will find an empty slot Clustering • Expected # of probes (for large table sizes) • Probe sequence: – successful search: ⎛ ⎞ 1 1 0 th probe = h(k) mod TableSize ⎜ + ⎟ ⎜ 1 ) ⎟ ( − λ ⎝ ⎠ 2 1 1 th probe = (h(k) + 1) mod TableSize 2 th probe = (h(k) + 4) mod TableSize ⎛ ⎞ – unsuccessful search: 1 1 ⎜ + ⎟ ⎜ 1 ) ⎟ 3 th probe = (h(k) + 9) mod TableSize ( − λ 2 2 ⎝ 1 ⎠ . . . i th probe = (h(k) + i 2 ) mod TableSize • Linear probing suffers from primary clustering • Performance quickly degrades for λ > 1/2 17 18 Quadratic Probing Quadratic Probing Example 0 insert(76) insert(40) insert(48) insert(5) insert(55) Insert: 76%7 = 6 40%7 = 5 48%7 = 6 5%7 = 5 55%7 = 6 89 1 0 18 2 49 insert(47) 1 But… 58 3 47%7 = 5 2 79 4 3 5 4 6 5 7 6 76 8 9 19 20 5

  6. Quadratic Probing: Success guarantee for λ < ½ Quadratic Probing: Properties • If size is prime and λ < ½, then quadratic probing will • For any λ < ½, quadratic probing will find an find an empty slot in size/2 probes or fewer. empty slot; for bigger λ , quadratic probing may – show for all 0 ≤ i,j ≤ size/2 and i ≠ j (h(x) + i 2 ) mod size ≠ (h(x) + j 2 ) mod size find a slot – by contradiction: suppose that for some i ≠ j: (h(x) + i 2 ) mod size = (h(x) + j 2 ) mod size • Quadratic probing does not suffer from primary ⇒ i 2 mod size = j 2 mod size ⇒ (i 2 - j 2 ) mod size = 0 clustering: keys hashing to the same area are ⇒ [(i + j)(i - j)] mod size = 0 not bad BUT size does not divide (i-j) or (i+j) • But what about keys that hash to the same spot ? – Secondary Clustering! 21 22 Double Hashing Quadratic Probing Works for λ < 1/2 f(i) = i * g(k) • If HSize is prime then where g is a second hash function (h(x) + i 2 ) mod HSize ≠ (h(x) + j 2 ) mod HSize for i ≠ j and 0 < i,j < HSize/2. • Probe sequence: • Proof 0 th probe = h(k) mod TableSize (h(x) + i 2 ) mod HSize = (h(x) + j 2 ) mod HSize 1 th probe = (h(k) + g(k)) mod TableSize (h(x) + i 2 ) - (h(x) + j 2 ) mod HSize = 0 2 th probe = (h(k) + 2*g(k)) mod TableSize (i 2 - j 2 ) mod HSize = 0 3 th probe = (h(k) + 3*g(k)) mod TableSize (i-j)(i+j) mod HSize = 0 ⇒⇐ HSize does not divide (i-j) or (i+j) . . . i th probe = (h(k) + i*g(k)) mod TableSize 23 24 6

  7. Resolving Collisions with Double Hashing Double Hashing Example Hash Functions: 0 H(K) = K mod M 1 h(k) = k mod 7 and g(k) = 5 – (k mod 5) H 2 (K) = 1 + ((K/M) mod (M-1)) 2 M = 76 93 40 47 10 55 3 4 Insert these values into the hash table 0 0 0 0 0 0 in this order. Resolve any collisions 1 1 1 47 1 47 1 47 1 5 with double hashing : 2 2 93 2 93 2 93 2 93 2 93 13 6 3 3 3 3 3 10 3 10 28 7 4 4 4 4 4 55 4 33 5 5 5 40 5 40 5 40 5 40 8 147 6 76 6 76 6 76 6 76 6 76 6 76 9 43 Probes 1 1 1 2 1 2 25 26 Rehashing Java hashCode() Method Idea : When the table gets too full, create a • Class Object defines a hashCode method bigger table (usually 2x as large) and hash – Intent: returns a suitable hashcode for the object all the items from the original table into the – Result is arbitrary int; must scale to fit a hash new table. table (e.g. obj.hashCode() % nBuckets) • When to rehash? – Used by collection classes like HashMap – half full ( λ = 0.5) • Classes should override with calculation – when an insertion fails appropriate for instances of the class – some other threshold – Calculation should involve semantically “significant” fields of objects • Cost of rehashing? 27 28 7

  8. hashCode() and equals() Hashing Summary • To work right, particularly with collection • Hashing is one of the most important data classes like HashMap, hashCode() and structures. equals() must obey this rule: • Hashing has many applications where if a.equals(b) then it must be true that operations are limited to find, insert, and delete. a.hashCode() == b.hashCode() • Dynamic hash tables have good amortized – Why? complexity. • Reverse is not required 29 30 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend