Hashing Connections 2-Universal Hash Function Perfect Hashing - - PowerPoint PPT Presentation

hashing
SMART_READER_LITE
LIVE PREVIEW

Hashing Connections 2-Universal Hash Function Perfect Hashing - - PowerPoint PPT Presentation

Hashing Anil Maheshwari Setting Balls & Bins Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs anil@scs.carleton.ca School of Computer Science Carleton University Canada Outline Hashing Anil


slide-1
SLIDE 1

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Hashing

Anil Maheshwari

anil@scs.carleton.ca School of Computer Science Carleton University Canada

slide-2
SLIDE 2

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Outline

1

Setting

2

Balls & Bins Connections

3

2-Universal Hash Function

4

Perfect Hashing

5

Proofs

slide-3
SLIDE 3

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Setting

Input

U = Universe of size u S = A subset of U consisting of m elements

Objective

Construct a hash map (a data structure) h : U → [n], where n = O(|S|) = O(m). ∀S ⊆ U of size m, the number of memory access required for lookup is O(1) per element.

slide-4
SLIDE 4

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Possible Approaches

1

Use a binary search tree to store elements of S.

2

Maintain membership bit for each element in U to indicate its membership in S.

3

. . .

slide-5
SLIDE 5

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Collisions

# Hash functions of type h : U → [n] are n|U| = nu Possible Strategy:

1

Pick a random function h among nu such functions.

2

Initialize an array A (Hash Table) of size n. Each element of A also stores a link list.

3

Insert(x): Set A[h(x)] = 1 and append x in the link list stored at A[h(x)]. Locate(x): if A[h(x)] = 0 report x ∈ S, else check if x is stored in the link list at A[h(x)]. Space = O(n + u log n) Time = O(

log n log log n)/element (w.h.p.)

slide-6
SLIDE 6

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-Universal Family of Hash Functions

A random hash function h : U → [n] requires u log n bits. Required Property: ∀x, y ∈ U (x = y) and i, j ∈ [n], Pr(h(x) = i ∧ h(y) = j) = Pr(h(x) = i)Pr(h(y) = j) =

1 n2

Any family of hash-functions that satisfy the property is called a 2-Universal Family. Can we construct a 2-Universal Family that requires less space?

slide-7
SLIDE 7

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-Universal Families

1

Let X1, X2 be uniform r.v. on {0, 1, . . . , p − 1}, where p is prime. Define Yi = X1 + iX2 (mod p). Claim: {Y0, Y1, . . . , Yp−1} are pairwise independent, i.e. Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1

p2

Space Used: O(log p)

2

Let X = {x1, . . . , xk} be a set of k random bits. Consider 2k − 1 (non-empty) subsets of X. For each subset S ⊆ X, generate a bit YS =

x∈S

x (mod 2). Claim: Y bits are pairwise independent. Space Used: O(k)

slide-8
SLIDE 8

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-Universal Families Contd.

3

Let U = {0, 1}log u and Index set I = {0, 1}log n. Hash function family is the set of random Boolean matrices H of dimension log n × log u. For example, for U = {0, 1}6 and I = {0, 1}4 (i.e., n = 24):     1 1 1 1 1 1 1 1 1 1 1 1 1             1 1 1         =     1 1 1     (mod 2) The matrix maps 101100 ∈ U to index (1110)2 = 13. Claim: Pr(Hx = Hy) = 1

n for any x = y ∈ U.

Space Used: O(log u × log n)

slide-9
SLIDE 9

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-level Hash Table

Input

U = Universe of size u S = A subset of U consisting of m elements 1st Level: Apply a random hash function from a 2-Universal Hash Family to map elements of S to Hash Table of size n = O(m). 2nd Level: If si elements are mapped to an index i in a Hash Table, create a secondary Hash Table for these elements of size s2

i using another random hash function.

slide-10
SLIDE 10

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-level Hash Table Contd.

E[# of Collisions in 1st Level]= 1

n

m

2

  • = O(m)

E[# of Collisions when si elements mapped to a table of size s2

i ]= 1 s2

i

si

2

  • = s2

i −si

2s2

i

< 1

2

Claim: E n

  • i=1

s2

i

  • = O(m)

E n

  • i=1

s2

i

  • =

E n

  • i=1
  • si + 2

si 2

  • =

m + 2E n

  • i=1

si 2

  • =

m + 2E [# of collisions in 1st Level] = O(m)

slide-11
SLIDE 11

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

2-level Hash Table Contd.

Expected Lookup Time: E[Time for 1st Level + Time for 2nd Level] = 1 + O(1) = O(1) Expected Space Used: E[Hash functions + 1st Level + 2nd Level] = (n + 1) + m +

n

  • i=1

s2

i = O(m)

Suppose E[Space Used] ≤ 6m. By Markov’s inequality, Pr(Actual Space Used > 12m) ≤ 6m

12m = 1 2

slide-12
SLIDE 12

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

References

1

Probability and Computing (Chapter 13) by Mitzenmacher and Upfal, Cambridge Univ. Press 2005.

2

Introduction to Algorithms (Chapter 11), Cormen, Leiserson, Rivest and Stein, MIT Press 2009.

slide-13
SLIDE 13

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Missing Details

slide-14
SLIDE 14

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Example I: 2-Universal Family

Let X1, X2 be uniform r.v. on {0, 1, . . . , p − 1}, p is prime. Define: Yi = X1 + iX2 (mod p). Claim: {Y0, Y1, . . . , Yp−1} are pairwise independent r.v. To Show: Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1

p2 .

1

Pr(Yi = a) = 1

p: For a fixed X2, Yi (mod p) is equally

likely to take any of the values {0, . . . , p − 1} as X1 varies from {0, . . . , p − 1}.

2

Given Yi = a = X1 + iX2 and Yj = b = X1 + jX2. = ⇒ X2 = (a − b)(i − j)−1, X1 = a − i(a − b)(i − j)−1. The inverse always exists in this setting. Pair (X1, X2) can take p2 possible values, but for Yi = a and Yj = b there is a fixed choice. Thus, Pr(Yi = a ∧ Yj = b) = Pr(Yi = a)Pr(Yj = b) = 1

p2

Storage Requirement: Need to store p, X1, X2.

slide-15
SLIDE 15

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Example II: 2-Universal Family

Let X = {x1, . . . , xk} be a set of k random bits. Consider 2k − 1 subsets of X (excluding the empty set). For each subset s ⊆ X, generate a bit ys =

x∈s

x (mod 2), i.e. the sum of the bits in s modulo 2. Claim All the y-bits corresponding to 2k − 1 subsets of X are pairwise independent. Consider any two bits ys and ys′, where s = s′. Pr(ys = 0) = Pr(ys = 1) = 1

2 as even if we fix all but

  • ne of the random bits of set s, the value of ys

depends on that bit. Since s = s′: Either s ∩ s′ = ∅ or s ∩ s′ = ∅ If s ∩ s′ = ∅, ys and ys′ are mutually independent.

slide-16
SLIDE 16

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Example II: 2-Universal Family contd.

Consider s ∩ s′ = ∅ and w.l.o.g. assume ∃xi ∈ s − s′. Since bit xi is random, Pr(Ys = α/Ys′ = β) = 1

2 for

any α, β ∈ {0, 1}. Pr(Ys = α ∧ Ys′ = β) = Pr(Ys = α/Ys′ = β)Pr(Ys′ = β) = 1

2 ∗ 1 2 = 1 4

= ⇒ ys and ys′ are mutually independent. Storage Requirements: Set X of k bits to generate 2k − 1 random mutually independent bits. Question: Is it a 3-Universal family? Consider k = 3. There are 7 non-empty subsets of three random bits {x1, x2, x3}. Bits y{x1} and y{x2} completely determine the bit y{x1,x2}.

slide-17
SLIDE 17

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Example III: 2-Universal Family

U = {0, 1}log u and Index set I = {0, 1}log n Hash function family is the set of Random Boolean Matrix

  • f dimension log n × log u.

For example, for U = {0, 1}6 and n = 24, we may have     1 1 1 1 1 1 1 1 1 1 1 1 1             1 1 1         =     1 1 1     (mod 2) The matrix maps 101100 ∈ U to the index (1110)2 = 13 Property: Pr(Hx = Hy) = 1

n for any x = y ∈ U.

Space=|H| = O(log u log n)

slide-18
SLIDE 18

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Proof of Example III

To Show: Hash family of Random Boolean Matrices of dimension log n × log u is 2-universal.     1 1 1 1 1 1 1 1 1 1 1 1 1             1 1 1         =     1 1 1     (mod 2) Claim 1: For any pair x = y ∈ U, Pr(Hxj = Hyj) = 1

2

Proof: x = y → ∃i : xi = yi. If Hji = 1 = ⇒ Hxj = Hyj; Otherwise Hxj = Hyj. Claim 2: Pr(Hx = Hy) = 1

2

log2 n = 1

n

slide-19
SLIDE 19

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Revisiting Example I

Input

U = Universe of size u S = A subset of U consisting of m elements [n] = Indices of Hash Table Let p ∈ [u, 2u] be a prime. Define a collection of p(p − 1) hash functions Hab(x) = ((ax + b) (mod p) (mod n)), for 1 ≤ a ≤ p − 1 and 0 ≤ b ≤ p − 1. Hash Family: H = ∪a,bHab Claim 1: Hab is 2-Universal (i.e. Pr(Hab(x) = Hab(y)) ≤ 1

n)

slide-20
SLIDE 20

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Hab is 2-Universal

Proof: Consider any two numbers x = y ∈ U. Let r = ax + b (mod p) and s = ay + b (mod p).

1

r = s

2

No collision of x and y with respect to mod p

3

a = (r − s)((x − y)−1 (mod p)) (mod p) b = r − ax (mod p)

4

Different choices of (a, b) yields different pairs (r, s)

5

1 − 1 correspondence between pairs (r, s) and (a, b).

slide-21
SLIDE 21

Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

Hab is 2-Universal (contd.)

5

Pr(x, y collide) = Pr(r = s (mod n)), where r, s are randomly chosen distinct values (mod p).

6

For fixed r, s = r (mod n) holds for ≤ p−1

n

values of s

7

Pr(r = s (mod n)) ≤ p−1/n

p−1

= 1

n

8

Pr(Hab(x) = Hab(y)) ≤ 1

n