Hash Tables Direct-Address Tables Hash Functions Universal Hashing - - PowerPoint PPT Presentation

hash tables
SMART_READER_LITE
LIVE PREVIEW

Hash Tables Direct-Address Tables Hash Functions Universal Hashing - - PowerPoint PPT Presentation

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing CS 5633 Analysis of Algorithms Chapter 11: Slide 1 Direct-Address Tables Let U = { 0 , . . . , m 1 } , the set of possible keys. direct


slide-1
SLIDE 1

CS 5633 Analysis of Algorithms Chapter 11: Slide – 1

Hash Tables

Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

slide-2
SLIDE 2

Direct-Address Tables

⊲ direct address

hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 2

Let U = {0, . . . , m − 1}, the set of possible keys. Use array T[0 . . . m − 1] as a direct-address table. Implies 1-1 correspondence between keys and slots. Direct-Address-Search(T, k) return T[k] Direct-Address-Insert(T, x) T[x.key] ← x Direct-Address-Delete(T, k) T[x.key] ← nil Advantage: operations are Θ(1). Disadvantage: Θ(|U|) space required.

slide-3
SLIDE 3

Hash Tables

direct address

⊲ hash tables

hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 3

Let K be the set of keys to be stored. Goal: use Θ(|K|) space and Θ(1) time/op. Idea: Use array T[0 . . . m − 1] as a hash table, and use a Θ(1) hash function h, where h: U → {0, . . . , m−1} maps from keys to slots. A collision is when two keys map to the same slot.

slide-4
SLIDE 4

Good Hash Functions

direct address hash tables

⊲ hash functions

hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 4

Division method: h(k) = k mod m m is prime, not close to any 2i. Division variation: h(k) = (k mod M) mod m M is prime, << than |U|, not close to any 2i. m is << than M. Multiplication method: h(k) = ⌊m((kA) mod 1)⌋ m is a power of 2. A = ( √ 5 − 1)/2

slide-5
SLIDE 5

Horner’s Method for Division Hash Function

direct address hash tables hash functions

⊲ hash functions

universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 5

If k = k[1], . . . , k[l], and if 0 ≤ k[i] < r, then compute hash function by: h ← k[1] mod m for i ← 2 to l do h ← (rh + k[i]) mod m

slide-6
SLIDE 6

Universal Hashing

direct address hash tables hash functions hash functions

⊲ universal hashing

chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 6

Let H be a set of hashing functions. H is universal if h(k) = h(k′) with prob. 1/m m is a prime number. k = k[1], . . . , k[l], where 0 ≤ k[i] < m Assign a[i] ← Random(0, m − 1) h(k) =

  • l

Σ

i=1 a[i] ∗ k[i]

  • mod m

The set of possible functions h(k) is universal. h(k) = h(k′) with prob. 1/m. If k[i] = k′[i], (a[i]∗(k[i]−k′[i])) mod m has equally likely results.

slide-7
SLIDE 7

Chaining

direct address hash tables hash functions hash functions universal hashing

⊲ chaining

chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 7

In chaining, slots are linked lists of the elements that hash to that slot, i.e., collisions. Consider m slots, n elts., load factor α = n/m. Worst-case: Θ(n) if all elts. hash to same slot. Best-case: Θ(1 + α), each slot has ⌊α⌋ or ⌈α⌉. Average-case: Assume each slot is equally likely. Unsuccessful search: Θ(1 + α) This is because average slot length = α.

slide-8
SLIDE 8

Chaining, Part 2

direct address hash tables hash functions hash functions universal hashing chaining

⊲ chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 8

Successful search: Θ(1 + α) Before ith elt. inserted, avg. length = (i − 1)/m. Expected position of ith elt. = 1 + (i − 1)/m. Expected search length is the summation:

n

Σ

i=1

n elements to search for. 1/n

  • Prob. for ith element is 1/n.

1 + (i − 1)/m Expected position of ith elt.

n

Σ

i=1

1 n 1 + i − 1 m

  • = 1 + α

2 − 1 2m

slide-9
SLIDE 9

Open-Address Hashing

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

⊲ open address

  • pen address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 9

In open addressing, when a collision occurs, probe for an empty slot and insert the new elt. there. The hash function becomes: h : U × {0, . . . , m − 1} → {0, . . . , m − 1} The probe sequence h(k, 0), . . . , h(k, m − 1) should include all the slots.

slide-10
SLIDE 10

Open-Address Hashing, Part 2

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address

⊲ open address 2

analysis analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 10

Hash-Insert(T, x) for i ← 0 to m − 1 do j ← h(x.key, i) if T[j] = nil then T[j] ← x return j error “hash table overflow” Hash-Delete marks the slot as deleted. Hash-Search must continue past deleted slots. Hash-Insert can put new elts. in deleted slots.

slide-11
SLIDE 11

Uniform Hashing Analysis

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

⊲ analysis

analysis 2 practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 11

Uniform hashing assumes each open-address probe-sequence is equally likely. Unsuccessful Search: Θ

  • 1

1−α

  • Let pi = prob. exactly i probes find full slots.

Let qi = prob. first i probes find full slots. pi = qi − qi+1 q1 = n/m = α and q2 = n

m

n−1

m−1

  • < α2

qi =

i−1

Π

k=0

n − k m − k ≤ n m i = αi

slide-12
SLIDE 12

Uniform Hashing Analysis, Part 2

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis

⊲ analysis 2

practical practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 12

Average number of probes is: 1 +

n

Σ

i=1 i pi = 1 + n

Σ

i=1 qi ≤ ∞

Σ

i=0 αi =

1 1 − α Successful Search: Θ 1

α ln 1 1−α

  • Inserting ith elt. = unsuccessful search i − 1 elts.

Average number of probes is:

n

Σ

i=1

1 n 1 1 − (i − 1)/m

  • ≤ 1

α ln 1 1 − α

slide-13
SLIDE 13

Performance of Practical Methods

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2

⊲ practical

practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 13

Linear Probing: h(k, i) = (h′(k) + i) mod m Successful Search: Θ

  • 1

1−α

  • Unsuccessful Search: Θ
  • 1

(1−α)2

  • Linear probing suffers from primary clustering,

from long runs of occupied slots. An empty slot preceded by i full slots gets filled next with probability (i + 1)/m.

slide-14
SLIDE 14

Performance of Practical Methods

direct address hash tables hash functions hash functions universal hashing chaining chaining 2

  • pen address
  • pen address 2

analysis analysis 2 practical

⊲ practical

CS 5633 Analysis of Algorithms Chapter 11: Slide – 14

Quadratic Probing assumes m is a power of 2. h(k, i) = (h′(k) + i 2 + i2 2 ) mod m Successful Search: Θ 1

α ln 1 1−α

  • Unsuccessful Search: Θ
  • 1

1−α

  • Double Hashing, m is prime, 1 ≤ h2(k) ≤ m − 1

h(k, i) = (h1(k) + i h2(k)) mod m Successful Search: Θ 1

α ln 1 1−α

  • Unsuccessful Search: Θ
  • 1

1−α