week 9
play

Week 9 Oliver Kullmann Generalising arrays Hash tables Direct - PowerPoint PPT Presentation

CS 270 Algorithms Week 9 Oliver Kullmann Generalising arrays Hash tables Direct addressing Hashing in general Generalising arrays 1 Hashing through chaining Direct addressing 2 Hash functions Hashing in general 3 Tutorial


  1. CS 270 Algorithms Week 9 Oliver Kullmann Generalising arrays Hash tables Direct addressing Hashing in general Generalising arrays 1 Hashing through chaining Direct addressing 2 Hash functions Hashing in general 3 Tutorial Hashing through chaining 4 Hash functions 5 Tutorial 6

  2. CS 270 General remarks Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general We continue data structures by discussing hash tables. Hashing through chaining Reading from CLRS for week 7 Hash 1 Chapter 11, Sections 11.1, 11.2, 11.3. functions Tutorial

  3. CS 270 Recall: Dictionaries Algorithms Oliver Kullmann Recall the three operations for a dictionary : Generalising arrays Direct 1 INSERT(x) (input pointer x to element to be inserted) addressing 2 SEARCH(k) (input key k , returns a pointer) Hashing in general 3 DELETE(x) (input pointer x to element to be deleted) Hashing through chaining Via binary search trees (last week) we get such a dictionary: Hash functions We actually get a full-fledged implementation of dynamic Tutorial sets (including the four order-related operations). Hashing is a technique specialised for dictionaries (not supporting the four order-related operations). It usually is faster for dictionaries (only).

  4. CS 270 Applications of dictionaries Algorithms Oliver Kullmann A standard application is for example in a compiler: Generalising arrays Direct 1 We have many different “identifiers”, for variables, addressing functions and classes for example. Hashing in general 2 For such an identifier, for example the class-name Hashing BreadthFirst , a lot of information needs to be stored. through chaining 3 The dictionary now translates the character sequence Hash functions “BreadthFirst” into a pointer to the data associated with Tutorial this class. But dictionaries are everywhere — it’s always there when you have to associate data to some “keys”! Can you think of some examples?

  5. CS 270 The fastest implementation: keys as array indices Algorithms Oliver Kullmann With binary search trees we an achieve worst-case logarithmic Generalising arrays time for the three dictionary operations. Direct addressing Hashing in We now want constant time for the three operations — general on average, and if we provide enough space. Hashing through chaining The basic idea for this is to use arrays: Hash functions Tutorial If we can use the keys as array indices, we are done (mostly). Hashing is the process of handling arbitrary key-spaces K as if they were array indices.

  6. CS 270 The basic idea of hashing Algorithms Oliver Kullmann We consider the key-space K (an arbitrary set), and we can use Generalising arrays an array of length m . The basic idea is to use a hash function Direct addressing h : K → { 0 , . . . , m − 1 } . Hashing in general Hashing which translates key-values into indices. through chaining Hash The simplest case is when h is injective , i.e., maps different functions keys to different indices. Tutorial Injective hash functions are called perfect . For that to be possible we need | K | ≤ m , i.e., there are at most m different keys. Otherwise we have to handle collisions .

  7. CS 270 The simplest case of hashing Algorithms Oliver Kullmann Generalising arrays The simplest case of hashing is when for h we can use the Direct addressing identity, that is, Hashing in general the keys are natural numbers ≥ 0, i.e., K ⊂ { 0 , 1 , 2 , . . . } ; Hashing through within a feasible range, i.e., m = max( K ) + 1 is not too chaining large (note that in general max( K ), i.e., the maximum of Hash functions possible indices, is much larger than | K | ). Tutorial The array is called a direct-address (hash) table ; the book uses the letter T (for “table”).

  8. CS 270 Key or not Algorithms Oliver Kullmann Generalising arrays Direct addressing A basic problem is how to show that an element is not there: Hashing in general 1 Conceptually simplest is to use pointers, where then the Hashing through chaining NIL -pointer shows that the element is not there. Hash 2 Alternatively we can use a special “singular” key-value. functions 3 Or for example an additional boolean array. Tutorial

  9. CS 270 Using pointers Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general Hashing through chaining Hash functions Tutorial How to implement a dynamic set by a direct-address table T : The keys of the key-space K = { 0 , 1 , . . . , 9 } = U are used as indices in the table. The empty slots in the table contain NIL .

  10. CS 270 The basics of the implementation Algorithms Oliver Kullmann Generalising arrays Direct Search(T,k) addressing 1 return T[k] Hashing in general Hashing through Insert(T,x) chaining Hash 1 T[x.key] = x functions Tutorial Delete(T,x) 1 T[x.key] = NIL

  11. CS 270 Examples where some simple translation is needed Algorithms Oliver Kullmann If K is small, then we typically can find a nice injective (i.e., Generalising perfect) hash function h , for example: arrays Direct The keys are integers in a known range — just move them. addressing Hashing in The keys are (arbitrary) images with 20 pixels — use binary general encoding. Hashing through chaining Do you know other examples? Hash functions In principle we can always use an injective hash function: Tutorial If we have enough memory, then this is very fast. However in practice this is often not feasible, for example for strings, and so we need to handle “collisions”, that is, cases where the hash function yields the same index for different keys.

  12. CS 270 General hash tables Algorithms Oliver Kullmann Generalising arrays Direct As said already, the general idea of “hashing” is to use a “hash addressing function” Hashing in general h : K → { 0 , . . . , m − 1 } Hashing through (here using “ K ” instead of “ U ” as in the book). chaining Hash functions m is the size of the hash table. Tutorial An element with key k hashes to slot h ( k ). h ( k ) is the hash value of key k .

  13. CS 270 General hash tables (cont.) Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general Hashing through chaining Hash functions Tutorial We see that we have a collision for keys k 2 and k 5 .

  14. CS 270 How to handle collisions? Algorithms Oliver Kullmann Generalising arrays Direct addressing In general, | K | is much bigger than m : Hashing in general The hash function should be as “random” as possible. Hashing through That is, it should hash “unpredictably”. chaining That is, it should be independent of our choices of keys. Hash functions That is, we should get as few collisions as possible. Tutorial However we have to handle collisions nevertheless!

  15. CS 270 Using linked lists Algorithms Oliver Kullmann Generalising arrays Direct addressing Hashing in general Hashing through chaining Hash functions Tutorial

  16. CS 270 Collision resolution by “chaining” Algorithms Oliver Kullmann Generalising arrays Direct addressing Put all elements that hash to the same slot Hashing in into a linked list. general Hashing through chaining Slot j contains a pointer to the head of the list of all stored Hash elements that hash to j . functions Tutorial If there are no such elements, slot j contains NIL . Singly linked lists can be used if we do not want to delete elements.

  17. CS 270 The basics of the implementation Algorithms Oliver Kullmann Search(T,k) Generalising return result of search for key k in list T[h(k)] arrays Direct addressing Run time: linear in the length of the list of elements in slot h ( k ). Hashing in general Hashing through Insert(T,x) chaining Hash insert x at head of list T[h(x.key)] functions Tutorial Run time: constant Delete(T,x) delete x from list T[h(x.key)] Run time: constant

  18. CS 270 Analysis Algorithms Oliver Kullmann What is the time-complexity of Search ? Generalising Worst-case is when all n keys hash to the same slot, and we get arrays just a single list of length n ; so the worst-case time is Θ( n ), plus Direct addressing time to compute the hash function. Hashing in general How can we treat average-case performance ? Hashing through chaining Assume simple uniform hashing : any given element is Hash equally likely to hash into any of the m slots. functions Tutorial Analysis is in terms of the load factor α := n m . We assume computation of the hash function takes constant time. Theorem 1 Search takes expected time Θ(1 + α ) .

  19. CS 270 What makes a good hash function? Algorithms What are the conditions for a “good” hash function Oliver Kullmann h : K → { 0 , . . . , m − 1 } ? Generalising arrays Direct 1 By definition, h must be a function , that is, for the same addressing Hashing in key it must return always the same hash value. general 2 Now, ideally, the hash function satisfies the assumption of Hashing through simple uniform hashing — is this possible? chaining Hash functions Actually, “simple uniform hashing” is an assumption on the Tutorial interplay between the hash function and the probability distribution that keys are drawn from: In practice we might not know the probability distribution of the keys — since we have a hash function , in the worst-case we can always pick the keys to hash into the same slot!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend