hashing
play

Hashing Connections 2-Universal Hash Function Perfect Hashing - PowerPoint PPT Presentation

Hashing Anil Maheshwari Setting Balls & Bins Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs anil@scs.carleton.ca School of Computer Science Carleton University Canada Outline Hashing Anil


  1. Hashing Anil Maheshwari Setting Balls & Bins Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs anil@scs.carleton.ca School of Computer Science Carleton University Canada

  2. Outline Hashing Anil Maheshwari Setting Balls & Bins Setting 1 Connections 2-Universal Hash Function Perfect Hashing Balls & Bins Connections 2 Proofs 2-Universal Hash Function 3 Perfect Hashing 4 Proofs 5

  3. Setting Hashing Anil Maheshwari Setting Input Balls & Bins Connections U = Universe of size u 2-Universal Hash S = A subset of U consisting of m elements Function Perfect Hashing Objective Proofs Construct a hash map (a data structure) h : U → [ n ] , where n = O ( | S | ) = O ( m ) . ∀ S ⊆ U of size m , the number of memory access required for lookup is O (1) per element.

  4. Possible Approaches Hashing Anil Maheshwari Setting Use a binary search tree to store elements of S . 1 Balls & Bins Maintain membership bit for each element in U to 2 Connections indicate its membership in S . 2-Universal Hash Function . . . 3 Perfect Hashing Proofs

  5. Collisions Hashing Anil Maheshwari # Hash functions of type h : U → [ n ] are n | U | = n u Setting Balls & Bins Possible Strategy: Connections Pick a random function h among n u such functions. 2-Universal Hash 1 Function Initialize an array A (Hash Table) of size n . Perfect Hashing 2 Each element of A also stores a link list. Proofs Insert( x ): Set A [ h ( x )] = 1 and append x in the link 3 list stored at A [ h ( x )] . Locate( x ): if A [ h ( x )] = 0 report x �∈ S , else check if x is stored in the link list at A [ h ( x )] . Space = O ( n + u log n ) log n Time = O ( log log n ) /element (w.h.p.)

  6. 2 -Universal Family of Hash Functions Hashing Anil Maheshwari A random hash function h : U → [ n ] requires u log n bits. Setting Balls & Bins Required Property: ∀ x, y ∈ U ( x � = y ) and i, j ∈ [ n ] , Connections 1 Pr ( h ( x ) = i ∧ h ( y ) = j ) = Pr ( h ( x ) = i ) Pr ( h ( y ) = j ) = 2-Universal Hash n 2 Function Any family of hash-functions that satisfy the property is Perfect Hashing called a 2-Universal Family. Proofs Can we construct a 2-Universal Family that requires less space?

  7. 2-Universal Families Hashing Anil Maheshwari Setting Let X 1 , X 2 be uniform r.v. on { 0 , 1 , . . . , p − 1 } , where 1 Balls & Bins p is prime. Define Y i = X 1 + iX 2 (mod p ) . Connections Claim: { Y 0 , Y 1 , . . . , Y p − 1 } are pairwise independent, 2-Universal Hash Function i.e. Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 p 2 Perfect Hashing Space Used: O (log p ) Proofs Let X = { x 1 , . . . , x k } be a set of k random bits. 2 Consider 2 k − 1 (non-empty) subsets of X . For each subset S ⊆ X , generate a bit Y S = � x (mod 2) . x ∈ S Claim: Y bits are pairwise independent. Space Used: O ( k )

  8. 2-Universal Families Contd. Hashing Anil Maheshwari Let U = { 0 , 1 } log u and Index set I = { 0 , 1 } log n . Setting 3 Balls & Bins Hash function family is the set of random Boolean Connections matrices H of dimension log n × log u . For example, 2-Universal Hash for U = { 0 , 1 } 6 and I = { 0 , 1 } 4 (i.e., n = 2 4 ): Function Perfect Hashing Proofs  1   1 0 1 1 0 1  0  1      0 1 1 0 1 1 1 1       = (mod 2)       0 1 0 1 0 0 1 1         1 0 0 1 1 0 0 0   0 The matrix maps 101100 ∈ U to index (1110) 2 = 13 . Claim: Pr ( Hx = Hy ) = 1 n for any x � = y ∈ U . Space Used: O (log u × log n )

  9. 2-level Hash Table Hashing Anil Maheshwari Setting Input Balls & Bins Connections U = Universe of size u 2-Universal Hash S = A subset of U consisting of m elements Function Perfect Hashing Proofs 1st Level: Apply a random hash function from a 2-Universal Hash Family to map elements of S to Hash Table of size n = O ( m ) . 2nd Level: If s i elements are mapped to an index i in a Hash Table, create a secondary Hash Table for these elements of size s 2 i using another random hash function.

  10. 2-level Hash Table Contd. Hashing Anil Maheshwari � m E[# of Collisions in 1st Level] = 1 � = O ( m ) Setting n 2 Balls & Bins E[# of Collisions when s i elements mapped to a table of Connections = s 2 i − s i � s i i ] = 1 < 1 2-Universal Hash size s 2 � Function s 2 2 2 s 2 2 i i � n Perfect Hashing � s 2 Claim: E � = O ( m ) Proofs i i =1 � n � n � ��� � � s i � � s 2 E = E s i + 2 i 2 i =1 i =1 � n �� � s i � = m + 2 E 2 i =1 = m + 2 E [ # of collisions in 1st Level ] = O ( m )

  11. 2-level Hash Table Contd. Hashing Anil Maheshwari Expected Lookup Time: Setting Balls & Bins E[Time for 1st Level + Time for 2nd Level] Connections = 1 + O (1) = O (1) 2-Universal Hash Function Perfect Hashing Expected Space Used: Proofs E[Hash functions + 1st Level + 2nd Level] n s 2 = ( n + 1) + m + � i = O ( m ) i =1 Suppose E[Space Used] ≤ 6 m . By Markov’s inequality, Pr(Actual Space Used > 12 m ) ≤ 6 m 12 m = 1 2

  12. References Hashing Anil Maheshwari Setting Probability and Computing (Chapter 13) by 1 Balls & Bins Mitzenmacher and Upfal, Cambridge Univ. Press Connections 2005. 2-Universal Hash Function Introduction to Algorithms (Chapter 11), Cormen, 2 Perfect Hashing Leiserson, Rivest and Stein, MIT Press 2009. Proofs

  13. Missing Details Hashing Anil Maheshwari Setting Balls & Bins Connections 2-Universal Hash Function Perfect Hashing Proofs

  14. Example I: 2 -Universal Family Hashing Anil Maheshwari Let X 1 , X 2 be uniform r.v. on { 0 , 1 , . . . , p − 1 } , p is prime. Setting Define: Y i = X 1 + iX 2 (mod p ) . Balls & Bins Connections Claim: { Y 0 , Y 1 , . . . , Y p − 1 } are pairwise independent r.v. 2-Universal Hash To Show: Function Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 Perfect Hashing p 2 . Proofs Pr ( Y i = a ) = 1 p : For a fixed X 2 , Y i (mod p ) is equally 1 likely to take any of the values { 0 , . . . , p − 1 } as X 1 varies from { 0 , . . . , p − 1 } . Given Y i = a = X 1 + iX 2 and Y j = b = X 1 + jX 2 . 2 ⇒ X 2 = ( a − b )( i − j ) − 1 , X 1 = a − i ( a − b )( i − j ) − 1 . = The inverse always exists in this setting. Pair ( X 1 , X 2 ) can take p 2 possible values, but for Y i = a and Y j = b there is a fixed choice. Thus, Pr ( Y i = a ∧ Y j = b ) = Pr ( Y i = a ) Pr ( Y j = b ) = 1 p 2 Storage Requirement: Need to store p, X 1 , X 2 .

  15. Example II: 2 -Universal Family Hashing Anil Maheshwari Let X = { x 1 , . . . , x k } be a set of k random bits. Setting Consider 2 k − 1 subsets of X (excluding the empty set). Balls & Bins Connections For each subset s ⊆ X , generate a bit 2-Universal Hash y s = � x (mod 2) , i.e. the sum of the bits in s modulo 2 . Function x ∈ s Perfect Hashing Claim All the y -bits corresponding to 2 k − 1 subsets of X Proofs are pairwise independent. Consider any two bits y s and y s ′ , where s � = s ′ . Pr ( y s = 0) = Pr ( y s = 1) = 1 2 as even if we fix all but one of the random bits of set s , the value of y s depends on that bit. Since s � = s ′ : Either s ∩ s ′ = ∅ or s ∩ s ′ � = ∅ If s ∩ s ′ = ∅ , y s and y s ′ are mutually independent.

  16. Example II: 2 -Universal Family contd. Hashing Anil Maheshwari Consider s ∩ s ′ � = ∅ and w.l.o.g. assume ∃ x i ∈ s − s ′ . Setting Balls & Bins Since bit x i is random, Pr ( Y s = α/Y s ′ = β ) = 1 2 for Connections any α, β ∈ { 0 , 1 } . 2-Universal Hash Function Pr ( Y s = α ∧ Y s ′ = β ) = Pr ( Y s = α/Y s ′ = β ) Pr ( Y s ′ = Perfect Hashing β ) = 1 2 ∗ 1 2 = 1 Proofs 4 = ⇒ y s and y s ′ are mutually independent. Storage Requirements: Set X of k bits to generate 2 k − 1 random mutually independent bits. Question: Is it a 3-Universal family? Consider k = 3 . There are 7 non-empty subsets of three random bits { x 1 , x 2 , x 3 } . Bits y { x 1 } and y { x 2 } completely determine the bit y { x 1 ,x 2 } .

  17. Example III: 2 -Universal Family Hashing Anil Maheshwari U = { 0 , 1 } log u and Index set I = { 0 , 1 } log n Setting Balls & Bins Hash function family is the set of Random Boolean Matrix Connections of dimension log n × log u . 2-Universal Hash Function For example, for U = { 0 , 1 } 6 and n = 2 4 , we may have Perfect Hashing Proofs  1   1 0 1 1 0 1  0  1      0 1 1 0 1 1 1 1       = (mod 2)       0 1 0 1 0 0 1 1         1 0 0 1 1 0 0 0   0 The matrix maps 101100 ∈ U to the index (1110) 2 = 13 Property: Pr ( Hx = Hy ) = 1 n for any x � = y ∈ U . Space= | H | = O (log u log n )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend