Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Hashing Connections 2-Universal Hash Function Perfect Hashing - - PowerPoint PPT Presentation
Hashing Connections 2-Universal Hash Function Perfect Hashing - - PowerPoint PPT Presentation
Hashing Anil Maheshwari Setting Balls & Bins Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs anil@scs.carleton.ca School of Computer Science Carleton University Canada Outline Hashing Anil
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Outline
1
Setting
2
Balls & Bins Connections
3
2-Universal Hash Function
4
Perfect Hashing
5
Proofs
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Setting
Input
U = Universe of size u S = A subset of U consisting of m elements
Objective
Construct a hash map (a data structure) h : U → [n], where n = O(|S|) = O(m). ∀S ⊆ U of size m, the number of memory access required for lookup is O(1) per element.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Possible Approaches
1
Use a binary search tree to store elements of S.
2
Maintain membership bit for each element in U to indicate its membership in S.
3
. . .
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Collisions
# Hash functions of type h : U → [n] are n|U| = nu Possible Strategy:
1
Pick a random function h among nu such functions.
2
Initialize an array A (Hash Table) of size n. Each element of A also stores a link list.
3
Insert(x): Set A[h(x)] = 1 and append x in the link list stored at A[h(x)]. Locate(x): if A[h(x)] = 0 report x ∈ S, else check if x is stored in the link list at A[h(x)]. Space = O(n + u log n) Time = O(
log n log log n)/element (w.h.p.)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-Universal Family of Hash Functions
A random hash function h : U → [n] requires u log n bits. Required Property: ∀x, y ∈ U (x = y) and i, j ∈ [n], Pr(h(x) = i ∧ h(y) = j) = Pr(h(x) = i)Pr(h(y) = j) =
1 n2
Any family of hash-functions that satisfy the property is called a 2-Universal Family. Can we construct a 2-Universal Family that requires less space?
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-Universal Families
1
Let X1, X2 be uniform r.v. on {0, 1, . . . , p − 1}, where p is prime. Define Yi = X1 + iX2 (mod p). Claim: {Y0, Y1, . . . , Yp−1} are pairwise independent, i.e. Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1
p2
Space Used: O(log p)
2
Let X = {x1, . . . , xk} be a set of k random bits. Consider 2k − 1 (non-empty) subsets of X. For each subset S ⊆ X, generate a bit YS =
x∈S
x (mod 2). Claim: Y bits are pairwise independent. Space Used: O(k)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-Universal Families Contd.
3
Let U = {0, 1}log u and Index set I = {0, 1}log n. Hash function family is the set of random Boolean matrices H of dimension log n × log u. For example, for U = {0, 1}6 and I = {0, 1}4 (i.e., n = 24): 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 (mod 2) The matrix maps 101100 ∈ U to index (1110)2 = 13. Claim: Pr(Hx = Hy) = 1
n for any x = y ∈ U.
Space Used: O(log u × log n)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-level Hash Table
Input
U = Universe of size u S = A subset of U consisting of m elements 1st Level: Apply a random hash function from a 2-Universal Hash Family to map elements of S to Hash Table of size n = O(m). 2nd Level: If si elements are mapped to an index i in a Hash Table, create a secondary Hash Table for these elements of size s2
i using another random hash function.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-level Hash Table Contd.
E[# of Collisions in 1st Level]= 1
n
m
2
- = O(m)
E[# of Collisions when si elements mapped to a table of size s2
i ]= 1 s2
i
si
2
- = s2
i −si
2s2
i
< 1
2
Claim: E n
- i=1
s2
i
- = O(m)
E n
- i=1
s2
i
- =
E n
- i=1
- si + 2
si 2
- =
m + 2E n
- i=1
si 2
- =
m + 2E [# of collisions in 1st Level] = O(m)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
2-level Hash Table Contd.
Expected Lookup Time: E[Time for 1st Level + Time for 2nd Level] = 1 + O(1) = O(1) Expected Space Used: E[Hash functions + 1st Level + 2nd Level] = (n + 1) + m +
n
- i=1
s2
i = O(m)
Suppose E[Space Used] ≤ 6m. By Markov’s inequality, Pr(Actual Space Used > 12m) ≤ 6m
12m = 1 2
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
References
1
Probability and Computing (Chapter 13) by Mitzenmacher and Upfal, Cambridge Univ. Press 2005.
2
Introduction to Algorithms (Chapter 11), Cormen, Leiserson, Rivest and Stein, MIT Press 2009.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Missing Details
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Example I: 2-Universal Family
Let X1, X2 be uniform r.v. on {0, 1, . . . , p − 1}, p is prime. Define: Yi = X1 + iX2 (mod p). Claim: {Y0, Y1, . . . , Yp−1} are pairwise independent r.v. To Show: Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1
p2 .
1
Pr(Yi = a) = 1
p: For a fixed X2, Yi (mod p) is equally
likely to take any of the values {0, . . . , p − 1} as X1 varies from {0, . . . , p − 1}.
2
Given Yi = a = X1 + iX2 and Yj = b = X1 + jX2. = ⇒ X2 = (a − b)(i − j)−1, X1 = a − i(a − b)(i − j)−1. The inverse always exists in this setting. Pair (X1, X2) can take p2 possible values, but for Yi = a and Yj = b there is a fixed choice. Thus, Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1
p2
Storage Requirement: Need to store p, X1, X2.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Example II: 2-Universal Family
Let X = {x1, . . . , xk} be a set of k random bits. Consider 2k − 1 subsets of X (excluding the empty set). For each subset s ⊆ X, generate a bit ys =
x∈s
x (mod 2), i.e. the sum of the bits in s modulo 2. Claim All the y-bits corresponding to 2k − 1 subsets of X are pairwise independent. Consider any two bits ys and ys′, where s = s′. Pr(ys = 0) = Pr(ys = 1) = 1
2 as even if we fix all but
- ne of the random bits of set s, the value of ys
depends on that bit. Since s = s′: Either s ∩ s′ = ∅ or s ∩ s′ = ∅ If s ∩ s′ = ∅, ys and ys′ are mutually independent.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Example II: 2-Universal Family contd.
Consider s ∩ s′ = ∅ and w.l.o.g. assume ∃xi ∈ s − s′. Since bit xi is random, Pr(Ys = α/Ys′ = β) = 1
2 for
any α, β ∈ {0, 1}. Pr(Ys = α ∧ Ys′ = β) = Pr(Ys = α/Ys′ = β)Pr(Ys′ = β) = 1
2 ∗ 1 2 = 1 4
= ⇒ ys and ys′ are mutually independent. Storage Requirements: Set X of k bits to generate 2k − 1 random mutually independent bits. Question: Is it a 3-Universal family? Consider k = 3. There are 7 non-empty subsets of three random bits {x1, x2, x3}. Bits y{x1} and y{x2} completely determine the bit y{x1,x2}.
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Example III: 2-Universal Family
U = {0, 1}log u and Index set I = {0, 1}log n Hash function family is the set of Random Boolean Matrix
- f dimension log n × log u.
For example, for U = {0, 1}6 and n = 24, we may have 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 (mod 2) The matrix maps 101100 ∈ U to the index (1110)2 = 13 Property: Pr(Hx = Hy) = 1
n for any x = y ∈ U.
Space=|H| = O(log u log n)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Proof of Example III
To Show: Hash family of Random Boolean Matrices of dimension log n × log u is 2-universal. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 (mod 2) Claim 1: For any pair x = y ∈ U, Pr(Hxj = Hyj) = 1
2
Proof: x = y → ∃i : xi = yi. If Hji = 1 = ⇒ Hxj = Hyj; Otherwise Hxj = Hyj. Claim 2: Pr(Hx = Hy) = 1
2
log2 n = 1
n
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Revisiting Example I
Input
U = Universe of size u S = A subset of U consisting of m elements [n] = Indices of Hash Table Let p ∈ [u, 2u] be a prime. Define a collection of p(p − 1) hash functions Hab(x) = ((ax + b) (mod p) (mod n)), for 1 ≤ a ≤ p − 1 and 0 ≤ b ≤ p − 1. Hash Family: H = ∪a,bHab Claim 1: Hab is 2-Universal (i.e. Pr(Hab(x) = Hab(y)) ≤ 1
n)
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Hab is 2-Universal
Proof: Consider any two numbers x = y ∈ U. Let r = ax + b (mod p) and s = ay + b (mod p).
1
r = s
2
No collision of x and y with respect to mod p
3
a = (r − s)((x − y)−1 (mod p)) (mod p) b = r − ax (mod p)
4
Different choices of (a, b) yields different pairs (r, s)
5
1 − 1 correspondence between pairs (r, s) and (a, b).
Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs
Hab is 2-Universal (contd.)
5
Pr(x, y collide) = Pr(r = s (mod n)), where r, s are randomly chosen distinct values (mod p).
6
For fixed r, s = r (mod n) holds for ≤ p−1
n
values of s
7
Pr(r = s (mod n)) ≤ p−1/n
p−1
= 1
n
8