Uses of dictionaries n Symbol table in a compiler n Key: nameof - - PDF document

uses of dictionaries
SMART_READER_LITE
LIVE PREVIEW

Uses of dictionaries n Symbol table in a compiler n Key: nameof - - PDF document

Advanced Programming Dictionaries, Hash Tables Dictionaries (Maps) Hash tables ADT Dictionary or Map Has following operations: n I NSERT : inserts a new element, associated to unique value of a field (key) n S EARCH : searches an element


slide-1
SLIDE 1

Advanced Programming Dictionaries, Hash Tables 1

Dictionaries (Maps)

Hash tables

2

ADT Dictionary or Map

Has following operations:

n INSERT: inserts a new element, associated

to unique value of a field (key)

n SEARCH: searches an element with a certain

value of the key. If it esists, it returns it

n DELETE: cancels element with given key, if

exists

slide-2
SLIDE 2

Advanced Programming Dictionaries, Hash Tables 2

3

Uses of dictionaries

n Symbol table in a compiler

n Key: nameof identifier n Values: types, context

n Citizens in a country

n Key: social security number n Values: name, surname, age, address

4

Associative array

A dictionary would be easily implemented with an associative array (index of value = key instead of position) Ex:

n Citizens = {{“jr50”, “john”, “red”},

{“bg40”, “bill”, “green”}, }

n Citizens[“jr50”] = {“jr50”, “john”, “red”}

slide-3
SLIDE 3

Advanced Programming Dictionaries, Hash Tables 3

5

Goal

Complexity of insert/search/delete:

n O(1) average case n Θ(n) worst case

6

Hash tables

Implementation of associative arrays An array containing elements. Address of element is computed by hash function, in time O(1). Ex:

n Hash(“jr50”) = 117: element john red is in

position 117 of vector

slide-4
SLIDE 4

Advanced Programming Dictionaries, Hash Tables 4

7

Associative array

U (all keys) K (used keys)

  • 7
  • 4
  • 9
  • 5
  • 3
  • 8
  • 6
  • 1
  • 2

1 2 3 4 5 6 7 8 9 2 3 5 8 key value T

8

Dictionary implemented w associative array

n T: associative array, key: key, x: value n Search(T, key)

n Return T[key]

n Insert(T, x)

n T[key[x]] ← x

n Delete(T, x)

n T[key[x]] ← NIL

n Complexity O(1), memory O(|U|)

O(|U|) number of

different values of key

slide-5
SLIDE 5

Advanced Programming Dictionaries, Hash Tables 5

9

Assumptions

Two assumptions are needed:

n No two elements with same key (keys are unique) n Size of T == size of max number of possible

values of key, |U|.

n This is critical, if |U| is large, array unfeasible n Ex: key = SSN, 10chars, |U| = 24 10 ≈ 10 13

n Assuming 24 values alphabet

n But, the citizens of a country are in the order 10 7 - 10 9

n It is essential that size of array be O(|K|) and not

O(|U|)

10

Hash tables

n A kind of associative array with size O(|K|)

and not O(|U|)

n Insert/search/delete are O(1) on average n However, the way of computing index

given key must be different: hash function

slide-6
SLIDE 6

Advanced Programming Dictionaries, Hash Tables 6

11

Hash function

n Hash table is array with size m (m<<|U|) n Hash function h, from key to position in

array (index)

n h: U → { 0, 1, ..., m-1 }

n Element x is stored in

n T[h(key[x])]

12

Hash function

  • k1

1 2 3 4 5 6 7 8 m-1 T U

  • k3
  • k2
  • k4
  • k5

h(k1) h(k4) h(k2)=h(k5) h(k3)

slide-7
SLIDE 7

Advanced Programming Dictionaries, Hash Tables 7

13

Collision

n Collision

n when h(ki)=h(kj) and ki ≠ kj,

n Essential to:

n Minimize number of collisions

n Depend on hash function

n Manage collisions

14

Example

Key is a string of characters Hash function h(k) = Σ(ci) mod m with

n ci ASCII code of i-th char of string k n m number of elements (size) of array T

slide-8
SLIDE 8

Advanced Programming Dictionaries, Hash Tables 8

15

Ex (II)

m = 15.

n h(“pippo”) = (112+105+112+112+111)mod 15= 552 mod

15 = 12

n h(“pluto”) = (112+108+117+116+111)mod 15= 564 mod

15 = 9

n h(“paperino”) =

(112+97+112+101+114+105+110+111)mod 15= 862 mod 15 = 7

n h(“topolino”) =

(116+111+112+111+108+105+110+111)mod 15= 884 mod 15 = 14

n h(“paperoga”) =

(112+97+112+101+114+111+103+97)mod 15= 847 mod 15 = 7

Collision with strings “paperino” and “paperoga”

Ex (II)

m = 15.

n h("Mickey”) = (77 + 105 + 99 + 107 + 101 + 121) mod 15 = 10 n h("Minnie") = (77 + 105 + 110 + 110 + 105 + 101) mod 15 = 8 n h("Donald") = (68 + 111 + 110 + 97 + 108 + 100) mod 15 = 9 n h("Daisy") = (68 + 97 + 105 + 115 + 121) mod 15 = 11 n h("foo") = (102 + 111 + 111) mod 15 = 9 n h("bar") = (98 + 97 + 114) mod 15 = 9

16 Collision with strings “foo” and “bar”

slide-9
SLIDE 9

Advanced Programming Dictionaries, Hash Tables 9

17

Collisions mitigation

The best hash functions are capable of distributing as uniformly (randomly) as possible the |K| elements among the m positions available Typical strategies: pick m as a prime number manipulate bits of k

18

Collision management

n Chaining n Open Addressing

slide-10
SLIDE 10

Advanced Programming Dictionaries, Hash Tables 10

19

Chaining (I)

Position i can contain more than one element This can be implemented through a linked list

20

Chaining (II)

  • k1

1 2 3 4 5 6 7 8 m-1 T U

  • k3
  • k2
  • k4
  • k5

k1 k4 k3 k2 k5

  • k6

k6

slide-11
SLIDE 11

Advanced Programming Dictionaries, Hash Tables 11

21

Chaining (III)

n T[i] is a pointer to a list, initially NIL. n CHAINED-HASH-INSERT(T,x)

n insert x at head of list T[h(key[x])]

n CHAINED-HASH-SEARCH(T,k)

n Search element with key k in list T[h(k)]

n CHAINED-HASH-DELETE(T,x)

n Cancel x from list T[h(key[x])]

22

Chaining - Complexity

n Assumption: unorderd list, single chaining n Insert: O(1) n Search: O(length of lists) n Cancel: O(length of lists)

n Requires a search

slide-12
SLIDE 12

Advanced Programming Dictionaries, Hash Tables 12

23

Search (hash + chaining) - complexity

n We have

n n : number of elements in hash table T n m : size of hash table T n α=n/m: load factor for hash table T

n Normally α>1 n What if m,n→∞ (with same α) ?

24

Search (hash + chaining) – complexity (II)

n Search

n Worst case: a linked list, not ordered

n Time to compute h(k) + n Time to transverse the list, Θ(n)

n Best case: depends on how uniformly h(k)

distributes the elements

n Let’s assume h(k) is capable of simple uniform

hashing (distributes in perfect uniform way) (this requires that the table grows with the elements, so that α remains constant)

slide-13
SLIDE 13

Advanced Programming Dictionaries, Hash Tables 13

25

Search (hash + chaining) – complexity (II)

Search Time to compute h(k) = O(1). Time to trasverse the list, depends on length of list T[h(k)] depends on element found/not found In both cases complexity is Θ(1+α). summing up O(1) + Θ(1+α) = O(1)

26

Open Addressing

T[i] can contain only one element In case of collision another free cell is searched for next one, after next, etc Must be α<1.

slide-14
SLIDE 14

Advanced Programming Dictionaries, Hash Tables 14

27

Hash-Insert

HASH-INSERT(T, k) 1 i ← 0 2 repeat j ← h(k, i) 3 if T[j] = NIL 4 then T[j] ← k 5 return 6 else i ← i + 1 7 until i = m 8 error “hash table overflow”

28

Hash-Search

HASH-SEARCH(T, k) 1 i ← 0 2 repeat j ← h(k, i) 3 if T[j] = k 4 then return j 5 i ← i + 1 6 until T[j] = NIL or i = m 7 return NIL

slide-15
SLIDE 15

Advanced Programming Dictionaries, Hash Tables 15

29

Re-hash functions

n Linear probing

n h(k, i) = (h’(k)+i) mod m

n Quadratic probing

n h(k, i) = (h’(k)+ c1i + c2i2) mod m

n Double hashing

n h(k, i) = (h1(k)+ i h2(k) ) mod m

30

Ex - insert

n m = 10 n open addressing with linear probing.

Hash values sequence:

n h(A)=5, h(B)=4, h(C)=9, h(D)=4, h(E)=8,

h(F)=8, h(G)=10

slide-16
SLIDE 16

Advanced Programming Dictionaries, Hash Tables 16

31

Ex - insert (II)

A B A B A C B A D C B A D E C B A D E C F G B A D E C F 5 4 9 4 8 8 10

32

Ex - search (III)

search:

n D: (h(D)=4)

n Read 4 n Read 5 n Read 6 ⇒ found

n G: (h(G)=10)

n Read 10 n Read 1 ⇒ found

n M: (h(M)=4)

n Read 4, n Read 5, n Read 6, n Read 7, ⇒ not found

slide-17
SLIDE 17

Advanced Programming Dictionaries, Hash Tables 17

33

Delete

Very complex, because changes the rehash/ collision sequence In practice open hashing is used only if no delete

34

Complexity

With uniform hashing and linear probing:

n The number of probing trials is 1/(1–α),

and complexity is the same as for insert

n Complexity of search is

α α α 1 1 1 ln 1 + −

slide-18
SLIDE 18

Advanced Programming Dictionaries, Hash Tables 18

35

Hash functions

36

Uniform hashing

Best hash functions do a uniform hashing: if keys have the same probability, also h(k) should have equal probability

=

− = =

j k h k

m j m k P

) ( :

1 , , 1 , , 1 ) ( …

slide-19
SLIDE 19

Advanced Programming Dictionaries, Hash Tables 19

37

Keys are not uniform

However, keys often are not equally distributed (ex words in a language, ex names and surnames) use all characters amplify the differences

38

Keys as numbers

Usually keys are strings of characters Easiest thing is to treat them as integers

n Ex: “abc” becomes

‘a’*2562 + ‘b’*256 + ‘c’

However, with very long strings this is impractical, variants have to be used In the following the key is an integer

slide-20
SLIDE 20

Advanced Programming Dictionaries, Hash Tables 20

39

Hash function = mod m

n k is an integer :

n h(k) = k mod m

n Requires m≥n/α.

n m size, n number of elements

40

Choice of m

n Avoid

n Powers of 2

n Division by m looses high bits of k

n Powers of 10

n Same as above, if k is decimal number

n Use

n A prime number n Far from powers of 2

slide-21
SLIDE 21

Advanced Programming Dictionaries, Hash Tables 21

41

Ex

n n = 2000 n On average 3 comparisons in searches n m = 701 is a prime, close to 2000/3 but far

from powers of 2

n h(k) = k mod 701

42

Hash function = multiply

n K integer:

n A constant 0<A<1 n Frac(x) = x - ⎣x⎦ n h(k) = ⎣ m ⋅ frac(k ⋅ A) ⎦

n k⋅A “shuffles” bits of k, n Multiplying by m expands [0,1] in [0,m]

slide-22
SLIDE 22

Advanced Programming Dictionaries, Hash Tables 22

43

Choice of m and A

n M is not critical. Using a power of 2

simplifies the multiplication

n Best A depends on how keys are

statistically distributed

n A = (√5 – 1) / 2 = 0.6180339887... Is a

good choice