uses of dictionaries
play

Uses of dictionaries n Symbol table in a compiler n Key: nameof - PDF document

Advanced Programming Dictionaries, Hash Tables Dictionaries (Maps) Hash tables ADT Dictionary or Map Has following operations: n I NSERT : inserts a new element, associated to unique value of a field (key) n S EARCH : searches an element


  1. Advanced Programming Dictionaries, Hash Tables Dictionaries (Maps) Hash tables ADT Dictionary or Map Has following operations: n I NSERT : inserts a new element, associated to unique value of a field (key) n S EARCH : searches an element with a certain value of the key. If it esists, it returns it n D ELETE : cancels element with given key, if exists 2 1

  2. Advanced Programming Dictionaries, Hash Tables Uses of dictionaries n Symbol table in a compiler n Key: nameof identifier n Values: types, context n Citizens in a country n Key: social security number n Values: name, surname, age, address 3 Associative array A dictionary would be easily implemented with an associative array (index of value = key instead of position) Ex: n Citizens = {{“jr50”, “john”, “red”}, {“bg40”, “bill”, “green”}, } n Citizens[“jr50”] = {“jr50”, “john”, “red”} 4 2

  3. Advanced Programming Dictionaries, Hash Tables Goal Complexity of insert/search/delete: n O(1) average case n Θ (n) worst case 5 Hash tables Implementation of associative arrays An array containing elements. Address of element is computed by hash function, in time O(1). Ex: n Hash(“jr50”) = 117: element john red is in position 117 of vector 6 3

  4. Advanced Programming Dictionaries, Hash Tables Associative array key U (all keys) T 0 value 1 • 0 • 7 2 2 3 • 4 3 • 9 • 6 4 • 1 5 5 6 • 2 • 3 • 5 7 • 8 8 8 9 K (used keys) 7 Dictionary implemented w associative array n T: associative array, key: key, x: value n Search(T, key) n Return T[key] n Insert(T, x) O(|U|) number of n T[key[x]] ← x different values of key n Delete(T, x) n T[key[x]] ← NIL n Complexity O(1), memory O(|U|) 8 4

  5. Advanced Programming Dictionaries, Hash Tables Assumptions Two assumptions are needed: n No two elements with same key (keys are unique) n Size of T == size of max number of possible values of key, |U|. n This is critical, if |U| is large, array unfeasible n Ex: key = SSN, 10chars, |U| = 24 10 ≈ 10 13 n Assuming 24 values alphabet n But, the citizens of a country are in the order 10 7 - 10 9 n It is essential that size of array be O(|K|) and not O(|U|) 9 Hash tables n A kind of associative array with size O(|K|) and not O(|U|) n Insert/search/delete are O(1) on average n However, the way of computing index given key must be different: hash function 10 5

  6. Advanced Programming Dictionaries, Hash Tables Hash function n Hash table is array with size m (m<<|U|) n Hash function h , from key to position in array (index) n h: U → { 0, 1, ..., m-1 } n Element x is stored in n T[h(key[x])] 11 Hash function T 0 1 U 2 h(k 1 ) 3 h(k 4 ) 4 • k 3 • k 4 h(k 2 )=h(k 5 ) 5 • k 2 6 • k 1 • k 5 7 8 h(k 3 ) m-1 12 6

  7. Advanced Programming Dictionaries, Hash Tables Collision n Collision n when h(k i )=h(k j ) and k i ≠ k j , n Essential to: n Minimize number of collisions n Depend on hash function n Manage collisions 13 Example Key is a string of characters Hash function h(k) = Σ (c i ) mod m with n c i ASCII code of i-th char of string k n m number of elements (size) of array T 14 7

  8. Advanced Programming Dictionaries, Hash Tables Ex (II) Collision with strings “paperino” and “paperoga” m = 15. n h(“pippo”) = (112+105+112+112+111)mod 15= 552 mod 15 = 12 n h(“pluto”) = (112+108+117+116+111)mod 15= 564 mod 15 = 9 n h(“paperino”) = (112+97+112+101+114+105+110+111)mod 15= 862 mod 15 = 7 n h(“topolino”) = (116+111+112+111+108+105+110+111)mod 15= 884 mod 15 = 14 n h(“paperoga”) = (112+97+112+101+114+111+103+97)mod 15= 847 mod 15 = 7 15 Ex (II) m = 15. n h("Mickey”) = (77 + 105 + 99 + 107 + 101 + 121) mod 15 = 10 n h("Minnie") = (77 + 105 + 110 + 110 + 105 + 101) mod 15 = 8 n h("Donald") = (68 + 111 + 110 + 97 + 108 + 100) mod 15 = 9 n h("Daisy") = (68 + 97 + 105 + 115 + 121) mod 15 = 11 n h("foo") = (102 + 111 + 111) mod 15 = 9 n h("bar") = (98 + 97 + 114) mod 15 = 9 Collision with strings “foo” and “bar” 16 8

  9. Advanced Programming Dictionaries, Hash Tables Collisions mitigation The best hash functions are capable of distributing as uniformly (randomly) as possible the |K| elements among the m positions available Typical strategies: pick m as a prime number manipulate bits of k 17 Collision management n Chaining n Open Addressing 18 9

  10. Advanced Programming Dictionaries, Hash Tables Chaining (I) Position i can contain more than one element This can be implemented through a linked list 19 Chaining (II) T 0 1 U 2 k 1 k 6 3 k 4 • k 6 4 • k 3 • k 4 5 k 2 k 5 • k 2 6 • k 1 • k 5 7 8 k 3 m-1 20 10

  11. Advanced Programming Dictionaries, Hash Tables Chaining (III) n T[i] is a pointer to a list, initially NIL. n C HAINED -H ASH -I NSERT (T,x) n insert x at head of list T[h(key[x])] n C HAINED -H ASH -S EARCH (T,k) n Search element with key k in list T[h(k)] n C HAINED -H ASH -D ELETE (T,x) n Cancel x from list T[h(key[x])] 21 Chaining - Complexity n Assumption: unorderd list, single chaining n Insert: O(1) n Search: O(length of lists) n Cancel: O(length of lists) n Requires a search 22 11

  12. Advanced Programming Dictionaries, Hash Tables Search (hash + chaining) - complexity n We have n n : number of elements in hash table T n m : size of hash table T n α =n/m: load factor for hash table T n Normally α >1 n What if m,n →∞ (with same α ) ? 23 Search (hash + chaining) – complexity (II) n Search n Worst case: a linked list, not ordered n Time to compute h(k) + n Time to transverse the list, Θ (n) n Best case: depends on how uniformly h(k) distributes the elements n Let’s assume h(k) is capable of simple uniform hashing (distributes in perfect uniform way) (this requires that the table grows with the elements, so that α remains constant) 24 12

  13. Advanced Programming Dictionaries, Hash Tables Search (hash + chaining) – complexity (II) Search Time to compute h(k) = O(1). Time to trasverse the list, depends on length of list T[h(k)] depends on element found/not found In both cases complexity is Θ (1+ α ). summing up O(1) + Θ (1+ α ) = O(1) 25 Open Addressing T[i] can contain only one element In case of collision another free cell is searched for next one, after next, etc Must be α <1. 26 13

  14. Advanced Programming Dictionaries, Hash Tables Hash-Insert H ASH -I NSERT ( T , k ) 1 i ← 0 2 repeat j ← h ( k , i ) 3 if T [ j ] = NIL 4 then T [ j ] ← k 5 return 6 else i ← i + 1 7 until i = m 8 error “hash table overflow” 27 Hash-Search H ASH -S EARCH ( T , k ) 1 i ← 0 2 repeat j ← h ( k , i ) 3 if T [ j ] = k 4 then return j 5 i ← i + 1 6 until T [ j ] = NIL or i = m 7 return NIL 28 14

  15. Advanced Programming Dictionaries, Hash Tables Re-hash functions n Linear probing n h(k, i) = (h’(k)+i) mod m n Quadratic probing n h(k, i) = (h’(k)+ c 1 i + c 2 i 2 ) mod m n Double hashing n h(k, i) = (h 1 (k)+ i h 2 (k) ) mod m 29 Ex - insert n m = 10 n open addressing with linear probing. Hash values sequence: n h(A)=5, h(B)=4, h(C)=9, h(D)=4, h(E)=8, h(F)=8, h(G)=10 30 15

  16. Advanced Programming Dictionaries, Hash Tables Ex - insert (II) A 5 B A 4 B A C 9 B A D C 4 B A D E C 8 B A D E C F 8 G B A D E C F 10 31 Ex - search (III) search: n D: (h(D)=4) n Read 4 n Read 5 n Read 6 ⇒ found n G: (h(G)=10) n Read 10 n Read 1 ⇒ found n M: (h(M)=4) n Read 4, n Read 5, n Read 6, n Read 7, ⇒ not found 32 16

  17. Advanced Programming Dictionaries, Hash Tables Delete Very complex, because changes the rehash/ collision sequence In practice open hashing is used only if no delete 33 Complexity With uniform hashing and linear probing: n The number of probing trials is 1/(1– α ), and complexity is the same as for insert n Complexity of search is 1 1 1 ln + 1 α − α α 34 17

  18. Advanced Programming Dictionaries, Hash Tables Hash functions 35 Uniform hashing Best hash functions do a uniform hashing: if keys have the same probability, also h(k) should have equal probability 1 P ( k ) , j 0 , 1 , … , m 1 ∑ = = − m k : h ( k ) j = 36 18

  19. Advanced Programming Dictionaries, Hash Tables Keys are not uniform However, keys often are not equally distributed (ex words in a language, ex names and surnames) use all characters amplify the differences 37 Keys as numbers Usually keys are strings of characters Easiest thing is to treat them as integers n Ex: “abc” becomes ‘a’*256 2 + ‘b’*256 + ‘c’ However, with very long strings this is impractical, variants have to be used In the following the key is an integer 38 19

  20. Advanced Programming Dictionaries, Hash Tables Hash function = mod m n k is an integer : n h(k) = k mod m n Requires m ≥ n/ α . n m size, n number of elements 39 Choice of m n Avoid n Powers of 2 n Division by m looses high bits of k n Powers of 10 n Same as above, if k is decimal number n Use n A prime number n Far from powers of 2 40 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend