SLIDE 1
Advanced Algorithms – COMS31900 Hashing part two Static Perfect Hashing
Rapha¨ el Clifford Slides by Benjamin Sach
SLIDE 2 Dictionaries and Hashing recap
A dynamic dictionary stores (key, value)-pairs and supports:
Universe U of u keys. Hash table T of size m n. Collisions were fixed by chaining A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
n arbitrary operations arrive online, one at a time.
add(key, value), lookup(key) (which returns value) and delete(key) (building linked lists)
SLIDE 3 Dictionaries and Hashing recap
A dynamic dictionary stores (key, value)-pairs and supports:
Universe U of u keys. Hash table T of size m n. Collisions were fixed by chaining A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),
Pr
m
(h is picked uniformly at random from H)
n arbitrary operations arrive online, one at a time.
add(key, value), lookup(key) (which returns value) and delete(key) (building linked lists)
SLIDE 4 Dictionaries and Hashing recap
A dynamic dictionary stores (key, value)-pairs and supports:
Universe U of u keys. Hash table T of size m n. Collisions were fixed by chaining A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),
Pr
m
(h is picked uniformly at random from H) For any n operations, the expected run-time is O(1) per operation. Using weakly universal hashing:
n arbitrary operations arrive online, one at a time.
add(key, value), lookup(key) (which returns value) and delete(key) (building linked lists)
SLIDE 5 Dictionaries and Hashing recap
A dynamic dictionary stores (key, value)-pairs and supports:
Universe U of u keys. Hash table T of size m n. Collisions were fixed by chaining A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (with x = y),
Pr
m
(h is picked uniformly at random from H) For any n operations, the expected run-time is O(1) per operation. But this doesn’t tell us much about the worst-case behaviour Using weakly universal hashing:
n arbitrary operations arrive online, one at a time.
add(key, value), lookup(key) (which returns value) and delete(key) (building linked lists)
SLIDE 6 Static Dictionaries and Perfect hashing
A static dictionary stores (key, value)-pairs and supports:
Hash table T of size m n. A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
we are given n different (key, value)-pairs and want to pick a good h lookup(key) (which returns value) - no inserts or deletes are allowed Universe U of u keys. Collisions were fixed by chaining (building linked lists)
SLIDE 7 Static Dictionaries and Perfect hashing
A static dictionary stores (key, value)-pairs and supports:
Hash table T of size m n. A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
we are given n different (key, value)-pairs and want to pick a good h lookup(key) (which returns value) - no inserts or deletes are allowed
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
Universe U of u keys. Collisions were fixed by chaining (building linked lists)
SLIDE 8 Static Dictionaries and Perfect hashing
A static dictionary stores (key, value)-pairs and supports:
Hash table T of size m n. A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
we are given n different (key, value)-pairs and want to pick a good h lookup(key) (which returns value) - no inserts or deletes are allowed
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The rest of this lecture is devoted to the FKS scheme Universe U of u keys. Collisions were fixed by chaining (building linked lists)
SLIDE 9 Static Dictionaries and Perfect hashing
A static dictionary stores (key, value)-pairs and supports:
Hash table T of size m n. A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
we are given n different (key, value)-pairs and want to pick a good h lookup(key) (which returns value) - no inserts or deletes are allowed
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The rest of this lecture is devoted to the FKS scheme The construction is based on weak universal hashing Universe U of u keys. Collisions were fixed by chaining (building linked lists)
SLIDE 10 Static Dictionaries and Perfect hashing
A static dictionary stores (key, value)-pairs and supports:
Hash table T of size m n. A hash function maps a key x to position h(x)
- i.e T[h(x)] = (key, value).
we are given n different (key, value)-pairs and want to pick a good h lookup(key) (which returns value) - no inserts or deletes are allowed
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The rest of this lecture is devoted to the FKS scheme The construction is based on weak universal hashing (with an O(1) time hash function) Universe U of u keys. Collisions were fixed by chaining (building linked lists)
SLIDE 11 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H
SLIDE 12 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function
n
SLIDE 13 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function
n
(where any h(x) can be computed in O(1) time)
SLIDE 14 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function
n
SLIDE 15 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions
n
SLIDE 16 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Profit!
n
SLIDE 17 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n
SLIDE 18 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
n
SLIDE 19 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 20 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 21 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions
n
Linearity of Expectation Let Y1, Y2, . . . , Yk be k random variables. Then
E
Yi
k
E(Yi) E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 22 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 23 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 24 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 25 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation
E(Ix,y) = 1 · Pr(Ix,y = 1) + 0 · Pr(Ix,y = 0) 1 m n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
By the definition of expectation. . .
SLIDE 26 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 27 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 28 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n 2
2 n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 29 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m
SLIDE 30 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m
SLIDE 31 Perfect hashing - a first attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H Step 1: Insert everything into a hash table of size m = n using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
n E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m n 2 .
SLIDE 32 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
SLIDE 33 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
How many collisions do we get on average?
SLIDE 34 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m
SLIDE 35 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m 1 2
SLIDE 36 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m 1 2
much better!
SLIDE 37 Perfect hashing - a second attempt
A set H of hash functions is weakly universal if for any two keys x, y ∈ U (x = y),
Pr
m
where h is picked uniformly at random from H using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if necessary
n2
Step 1: Insert everything into a hash table of size m = n2
How many collisions do we get on average?
where indicator random variable Ix,y = 1 iff h(x) = h(y). number of collisions linearity of expectation definition of expectation
n2/2
E(C) = E
x,y∈T,x<y
Ix,y
E(Ix,y)
1 m = n 2
m n2 2m 1 2
much (except we cheated) better!
SLIDE 38
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
SLIDE 39
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
SLIDE 40
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
The probability of at least one collision: Pr(C 1) 1
2
SLIDE 41
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
The probability of at least one collision: Pr(C 1) 1
2
Markov’s inequality If X is a non-negative r.v., then for all a > 0,
Pr(X a) E(X) a .
SLIDE 42
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
The probability of at least one collision: Pr(C 1) 1
2
Markov’s inequality
SLIDE 43
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
Markov’s inequality
SLIDE 44
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
i.e. at least as good as tossing a heads on a fair coin Markov’s inequality
SLIDE 45
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
i.e. at least as good as tossing a heads on a fair coin Markov’s inequality
SLIDE 46
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
i.e. at least as good as tossing a heads on a fair coin
E(construction time) = O(m)·E(runs) = O(m) = O(n2)
Markov’s inequality
SLIDE 47
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
i.e. at least as good as tossing a heads on a fair coin
E(construction time) = O(m)·E(runs) = O(m) = O(n2)
. . . and then the look-up time is always O(1) Markov’s inequality
SLIDE 48
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there was a collision Step 1: Insert everything into a hash table of size m = n2
How many times do we repeat on average?
The expected number of collisions: E(C) 1
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least one collision: Pr(C 1) 1
2
The probability of zero collisions is at least 1
2
i.e. at least as good as tossing a heads on a fair coin
E(construction time) = O(m)·E(runs) = O(m) = O(n2)
. . . and then the look-up time is always O(1) (because any h(x) can be computed in O(1) time) Markov’s inequality
SLIDE 49
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
SLIDE 50
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n This looks rubbish but it will be useful in a bit!
SLIDE 51
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
How many times do we repeat on average?
The expected number of collisions: E(C) n
2
The probability of at least n collisions: Pr(C n) 1
2
This looks rubbish but it will be useful in a bit!
SLIDE 52
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
How many times do we repeat on average?
The expected number of collisions: E(C) n
2
The probability of at least n collisions: Pr(C n) 1
2
This looks rubbish but it will be useful in a bit! (where a = n)
Markov’s inequality If X is a non-negative r.v., then for all a > 0,
Pr(X a) E(X) a .
SLIDE 53
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
How many times do we repeat on average?
The expected number of collisions: E(C) n
2
The probability of at least n collisions: Pr(C n) 1
2
This looks rubbish but it will be useful in a bit!
SLIDE 54
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
How many times do we repeat on average?
The expected number of collisions: E(C) n
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least n collisions: Pr(C n) 1
2
i.e. at least as good as tossing a heads on a fair coin
E(construction time) = O(m)·E(runs) = O(m) = O(n)
This looks rubbish but it will be useful in a bit! The probability of at most n collisions is at least 1
2
SLIDE 55
Expected construction time
using a weakly universal hash function Step 2: Check for collisions Step 3: Repeat if there are more than n collisions Step 1: Insert everything into a hash table of size m = n
How many times do we repeat on average?
The expected number of collisions: E(C) n
2
E(runs) E(coin tosses to get a heads) = 2
The probability of at least n collisions: Pr(C n) 1
2
i.e. at least as good as tossing a heads on a fair coin
E(construction time) = O(m)·E(runs) = O(m) = O(n)
. . . but the look-up time could be rubbish (lots of collisions) This looks rubbish but it will be useful in a bit! The probability of at most n collisions is at least 1
2
SLIDE 56
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h
T
SLIDE 57
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h Let ni be the number of items in T[i]
T
SLIDE 58
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h Let ni be the number of items in T[i]
T n1 = 2 n5 = 2 n8 = 3
SLIDE 59
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
SLIDE 60
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
SLIDE 61
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
(Step 3) Immediately repeat a step if either a) T has more than n collisions b) some Ti has a collision
SLIDE 62
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
(Step 3) Immediately repeat a step if either a) T has more than n collisions b) some Ti has a collision i.e. check (and if necessary rebuild) each table immediately after building it
SLIDE 63
Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
(Step 3) Immediately repeat a step if either a) T has more than n collisions b) some Ti has a collision
SLIDE 64 Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
(Step 3) Immediately repeat a step if either a) T has more than n collisions The look-up time is always O(1)
- 1. Compute i = h(x) (x is the key)
- 2. Compute j = hi(x)
- 3. The item is in Ti[j]
b) some Ti has a collision
SLIDE 65 Perfect hashing - attempt three n
Step 1: Insert everything into a hash table, T , of size n using a weakly universal hash function, h . . . but don’t use chaining Step 2: The ni items in T[i] are inserted into another hash table Ti of size n2
i
using another weakly universal hash function Let ni be the number of items in T[i]
T
denoted hi (there is one for each i)
n2
i
(Step 3) Immediately repeat a step if either a) T has more than n collisions What is the expected construction time? What is the space usage? The look-up time is always O(1)
- 1. Compute i = h(x) (x is the key)
- 2. Compute j = hi(x)
- 3. The item is in Ti[j]
Two questions remain: b) some Ti has a collision
SLIDE 66 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
SLIDE 67 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n)
SLIDE 68 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2)
SLIDE 69 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) Storing hi uses O(1) space
SLIDE 70 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
SLIDE 71 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . .
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
Storing hi uses O(1) space
SLIDE 72 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
SLIDE 73 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
SLIDE 74 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
SLIDE 75 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
SLIDE 76 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
SLIDE 77 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
SLIDE 78 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
n2
i
4
ni 2
SLIDE 79 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
n2
i
4
ni 2
SLIDE 80 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
ni
2
2
n2
i
4
n2
i
4
ni 2
SLIDE 81 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
n2
i
4
ni 2
SLIDE 82 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this?
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
n2
i
4
ni 2
n2
i 4n
SLIDE 83 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
how big is this? How big is
i n2 i ?
There are
ni
2
so there are
i
ni
2
but we know that there are at most n collisions in T . . .
n2
i
4
ni 2
n2
i 4n
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
= O(n)
SLIDE 84 Perfect Hashing - Space usage n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
How much space does this use? (Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The size of T is O(n) The size of Ti is O(ni2) So the total space is. . . Storing hi uses O(1) space
O(n)+
O(n2
i ) = O(n)+O
i
n2
i
= O(n)
SLIDE 85 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
SLIDE 86 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) (we considered this on a previous slide)
SLIDE 87 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) (we considered this on a previous slide) The expected construction time for each Ti is O(ni2)
SLIDE 88 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) (we considered this on a previous slide) The expected construction time for each Ti is O(ni2)
- we insert ni items into a table of size m = n2
i
- then repeat if there was a collision
SLIDE 89 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) (we considered this on a previous slide) The expected construction time for each Ti is O(ni2)
- we insert ni items into a table of size m = n2
i
(we also considered this on a previous slide)
- then repeat if there was a collision
SLIDE 90 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) (we considered this on a previous slide) The expected construction time for each Ti is O(ni2)
- we insert ni items into a table of size m = n2
i
(we also considered this on a previous slide)
- then repeat if there was a collision
The overall expected constuction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
SLIDE 91 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
SLIDE 92 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
= E
- construction time of T)+
- i
E(construction time of Ti
SLIDE 93 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
= E
- construction time of T)+
- i
E(construction time of Ti
O(n2
i ) = O(n)+O
i
n2
i
SLIDE 94 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
= E
- construction time of T)+
- i
E(construction time of Ti
O(n2
i ) = O(n)+O
i
n2
i
= O(n)
SLIDE 95 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
= E
- construction time of T)+
- i
E(construction time of Ti
O(n2
i ) = O(n)+O
i
n2
i
= O(n)
n2
i 4n
SLIDE 96 Perfect Hashing - Expected construction time n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
The expected construction time for T is O(n) The expected construction time for each Ti is O(ni2) The overall expected construction time is therefore:
E(construction time) = E construction time of T +
construction time of Ti
= E
- construction time of T)+
- i
E(construction time of Ti
O(n2
i ) = O(n)+O
i
n2
i
= O(n)
SLIDE 97 Perfect Hashing - Summary n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The look-up time is always O(1)
- 1. Compute i = h(x) (x is the key)
- 2. Compute j = hi(x)
- 3. The item is in Ti[j]
SLIDE 98 Perfect Hashing - Summary n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The look-up time is always O(1)
- 1. Compute i = h(x) (x is the key)
- 2. Compute j = hi(x)
- 3. The item is in Ti[j]
SLIDE 99 Perfect Hashing - Summary n
n2
i
Step 1: Insert everything into a hash table, T , of size n using a weakly universal (w.u.) hash function, h Step 2: The ni items in T[i] are inserted into another hash table Ti
i using w.u hash function hi
(Step 3) Immediately repeat if either a) T has more than n collisions b) some Ti has a collision
T Ti
THEOREM
The FKS hashing scheme:
- Has no collisions
- Every lookup takes O(1) worst-case time,
- Uses O(n) space,
- Can be built in O(n) expected time.
The look-up time is always O(1)
- 1. Compute i = h(x) (x is the key)
- 2. Compute j = hi(x)
- 3. The item is in Ti[j]