1
play

1 1. Need a hashing function, h(k), that To provide a unique set - PDF document

Searching A systematic method for locating a record with a key value k j = K. Searching successful search Chapter 9 unsuccessful search sections 9.1-9.4.1 exact match query range query Maps A Simple List-Based Map A


  1. Searching A systematic method for locating a record with a key value k j = K. Searching – successful search Chapter 9 – unsuccessful search sections 9.1-9.4.1 – exact match query – range query Maps A Simple List-Based Map • A map models a searchable collection • We can efficiently implement a map using an of key-value entries unsorted list • The main operations of a map are for – We store the items of the map in a list S (based on a doubly-linked list), in arbitrary order searching, inserting, and deleting items • Multiple entries with the same key are not allowed ������� ��������������� ������ • Applications: – address book 5 c 8 c 9 c 6 c – student-record database ������� Performance of a List- Hashing Based Map Table Representations of Data • Performance: – put takes O (1) time since we can insert the new item at the 1 to 1 mapping beginning or at the end of the sequence ex. 5000 employees – get and remove take O ( n ) time since in the worst case (the item is not found) we traverse the entire sequence to look for key= an item with the given key • The unsorted list implementation is effective only for spaces= maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) to provide a unique set of keys 1

  2. 1. Need a hashing function, h(k), that To provide a unique set of keys: maps key K onto an address in the YOU MUST HAVE A UNIQUE KEY! table. ex. binary search ≈ f(n) = 2 log 2 N 2. Must ensure that h(k 1 ) ≠ h(k 2 ) 2 log 2 5000 = 21.6 comparisons O(log 2 n) Simple, naïve example: table size = M h(k) = k mod M How would you like O(1)? Example: add 10,2,19,14,24,23 Choosing a Hash Function Table size=7 * a good hash function maps keys h(k) = k mod 7 uniformly and randomly 10 mod 7 = 3 – a poor hash function maps keys 2 mod 7 = 2 non-uniformly, or maps contiguous clusters 0 of keys into clusters of hash table locations. 1 19 mod 7 = 5 2 2 3 10 4 5 19 6 Where do 14, 24 and 23 go? Example Hash functions 2. Shift Folding – key is divided into sections 1. Division Method – sections are added together – choose a prime number as table size, M – ex. 9 digit key k=013402122 – interpret keys as integers – h(k) = k mod M 013+402+122 = 537 = h(k) – can use multiplication, subtraction, addition, (whatever) in some fashion, to combine into a final value 2

  3. 4. Middle Squaring 3. Boundary Folding – take middle portion of key, square it and adjust – like shift folding – ex. k = 013402122 – every other number is reversed before folding h(k) = 402 2 = 161604 – ex. k = 013402122 adjust - 013 + 204 + 122 = 339 = h(k) 1) could mod M – not much difference 2) could take middle 4 digits (6160) – decide which method gives better scattered result based on experiments – used for character codes. 5. Truncation 6. Digit or Character Extraction – similar to truncation – simply delete part of the key ε use – when key has a predictable value, extract remaining digits before applying another hash function. – ex. k = 013402122 h(k) = 122 – ex. a company coding scheme – easy to compute – not very random ε uniform – seldom used alone - commonly used in conjunction with another method Collisions 7. Random Number Generator ∼∼∼∼∼∼∼ – use key as the “seed” – next number computed is hash value def. h(k 1 ) = h( k 2 ) – unlikely 2 different seeds will give the recall we need to map keys in a same random number UNIFORM and RANDOM – can be computationally expensive fashion (For an arbitrary key, any possible table address, 0 to M-1, should be equally likely to be chosen by your hashing function.) 3

  4. K = C 3 C 2 C 1, A BAD EXAMPLE numerically K= C 3 *256 2 *256 1 +C1*256 0 reduced mod 256 has value C 1 M = 2 8 = 256 Hash 6 keys: RX1,RX2,RX3,RY1,RY2,RY3 h(k) = k mod M = k mod 256 h(RX1) = h(RY1) = ‘1’ = 49 keys: variable names (registers in assembly) h(RX2) = h(RY2) = ‘2’ = 50 up to 3 characters, use 8-bit ASCII chars h(RX3) = h(RY3) = ‘3’ = 51 ( 1- 24 bit integer, divided into 3 equal 8 bit sections) 1) 6 original keys map into only 3 unique Problem with policy - has the effect of selecting addresses the low order character as the value of h(k). 2) contiguous runs of keys, in general, result in contiguous runs of table space Q(2) = Q(1) × (364/365) Hashing Q(3) = Q(2) × (363/365) How often do collisions really happen? develop a recurrence relation Von Mises Birthday Paradox * As soon as the table is 12.9% full, there is 23 + people, 50% probability of a match greater than a 95% chance that 2 will – probe is ∅ -1. collide. – Q(n) is probability if you randomly toss n § Moral of the story§ balls into a table with 365 slots, there will Even in sparsely occupied hash table, be no collision – P(n) = 1- Q(n) probability of a collision collisions are relatively common. – Q(1) = 1 // there will be no collisions 2) double hashing Collision Resolution Policies – calculate a probe decrement P(K) = max (1, K mod M) Open Addressing Inserting keys into other empty locations 3) rehashing in the table – apply h(k) – if a collision, apply h 1 (k) – if still a collision, apply h 2 (k) 1) linear probing – use entire sequence of hash functions – go to next open space – wrap around, if necessary 4

  5. Collision Resolution Policies Example of Double Hashing Chaining • Consider a hash h ( k ) d ( k ) ������ k use linked lists table storing integer �� � � � �� � � � keys that handles �� � � � Quadratic Collision Processing collision with double �� � � � �� �� � � � hashing – examines locations whose distance form �� � � � the initial collision point increases as the – N = 13 �� � � � � � square of the distance from the previous �� � � � – h ( k ) = k mod 13 location tried. – d ( k ) = 7 − k mod 7 – ex. h(k) = A we collide • Insert keys 18, 41, try A+1 2 , A+2 2 , A+3 2 , ... A+R 2 0 1 2 3 4 5 6 7 8 9 10 11 12 22, 44, 59, 32, 31, – uses wraparound 73, in this order – leaves increasingly larger gaps between �� �� �� �� �� �� �� �� successive relocation positions 0 1 2 3 4 5 6 7 8 9 10 11 12 Clusters Load Factors def, contiguous runs of occupied entries Primary Clustering Suppose table T is of size M, and N * look at linear probing entries are occupied, ( M-N are empty) * causes a small “puddle” of keys to α = N/M load factor of T form at the collision location. * the small puddle grows larger ex. M = 100, N = 75, α = 0.75 * the larger it grows, the faster it grows we say T is 75% full. * small puddles connect to form large puddles note - linear probing is subject to primary clustering Performance double hashing is not 1) Based on Uniformity and Randomness of h(k) Secondary Clustering 2) Based on Collision Resolution Policy – when any 2 keys have a collision at a given location, they both subsequently examine the 3) Based on Load Factor (Density of Table) same sequence of alternative locations until the collision is resolved. “ Density - Dependent Search Technique” – not as bad as primary clusters – secondary clusters do not form larger // means you can achieve a highly efficient result if you are willing to secondary clusters waste enough vacant records – quadratic collision processing is subject to secondary clusters 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend