14 hashing
play

14. Hashing Gloal: Efficient management of a table of all n - PowerPoint PPT Presentation

Motivating Example 14. Hashing Gloal: Efficient management of a table of all n ETH-students of Possible Requirement: fast access (insertion, removal, find) of a Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using dataset by name


  1. Motivating Example 14. Hashing Gloal: Efficient management of a table of all n ETH-students of Possible Requirement: fast access (insertion, removal, find) of a Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using dataset by name Chaining, Simple Uniform Hashing, Popular Hash Functions, Table-Doubling, Open Addressing: Probing, Uniform Hashing, Universal Hashing, Perfect Hashing [Ottman/Widmayer, Kap. 4.1-4.3.2, 4.3.4, Cormen et al, Kap. 11-11.4] 375 376 Dictionary in C++ Dictionary Associative Container std::unordered_map<> Abstract Data Type (ADT) D to manage items 20 i with keys k ∈ K // Create an unordered_map of strings that map to strings with operations std::unordered_map<std::string, std::string> u = { D. insert ( i ) : Insert or replace i in the dictionary D . {"RED","#FF0000"}, {"GREEN","#00FF00"} }; D. delete ( i ) : Delete i from the dictionary D . Not existing ⇒ error message. u["BLUE"] = "#0000FF"; // Add D. search ( k ) : Returns item with key k if it exists. std::cout << "The HEX of color RED is: " << u["RED"] << "\n"; for( const auto& n : u ) // iterate over key − value pairs std::cout << n.first << ":" << n.second << "\n"; 20 Key-value pairs ( k, v ) , in the following we consider mainly the keys 377 378

  2. Motivation / Use 1. Idea: Direct Access Table (Array) Perhaps the most popular data structure. Index Item Supported in many programming languages (C++, Java, Python, 0 - Ruby, Javascript, C# ...) 1 - Obvious use Problems 2 - 3 [3,value(3)] 1 Keys must be non-negative Databases, Spreadsheets 4 - Symbol tables in compilers and interpreters integers 5 - 2 Large key-range ⇒ large array . . Less obvious . . . . k [k,value(k)] Substrin Search (Google, grep) . . . . String commonalities (Document distance, DNA) . . File Synchronisation Cryptography: File-transfer and identification 379 380 Solution to the first problem: Pre-hashing Prehashing Example : String Mapping Name s = s 1 s 2 . . . s l s to key Prehashing: Map keys to positive integers using a function ph : K → ◆ � l s � � s l s − i +1 · b i mod 2 w Theoretically always possible because each key is stored as a ph ( s ) = bit-sequence in the computer i =1 Theoretically also: x = y ⇔ ph ( x ) = ph ( y ) b so that different names map to different keys as far as possible. Practically: APIs offer functions for pre-hashing. (Java: b Word-size of the system (e.g. 32 or 64) object.hashCode() , C++: std::hash<> , Python: hash(object) ) Example (Java) with b = 31 , w = 32 . Ascii-Values s i . APIs map the key from the key set to an integer with a restricted Anna �→ 2045632 size. 21 Jacqueline �→ 2042089953442505 mod 2 32 = 507919049 21 Therefore the implication ph ( x ) = ph ( y ) ⇒ x = y does not hold any more for all x , y . 381 382

  3. L¨ osung zum zweiten Problem: Hashing Nomenclature Reduce the universe. Map (hash-function) h : K → { 0 , ..., m − 1 } ( m ≈ n = number entries of the table) Hash funtion h : Mapping from the set of keys K to the index set { 0 , 1 , . . . , m − 1 } of an array ( hash table ). h : K → { 0 , 1 , . . . , m − 1 } . Normally |K| ≫ m . There are k 1 , k 2 ∈ K with h ( k 1 ) = h ( k 2 ) ( collision ). A hash function should map the set of keys as uniformly as possible to the hash table. Collision: h ( k i ) = h ( k j ) . 383 384 Resolving Collisions: Chaining Algorithm for Hashing with Chaining Example m = 7 , K = { 0 , . . . , 500 } , h ( k ) = k mod m . Keys 12 , 55 , 5 , 15 , 2 , 19 , 43 insert ( i ) Check if key k of item i is in list at position h ( k ) . If no, Direct Chaining of the Colliding entries then append i to the end of the list. Otherwise replace element by 0 1 2 3 4 5 6 i . hash table find ( k ) Check if key k is in list at position h ( k ) . If yes, return the 15 2 12 55 data associated to key k , otherwise return empty element null . delete ( k ) Search the list at position h ( k ) for k . If successful, Colliding entries 43 5 remove the list element. 19 385 386

  4. Worst-case Analysis Simple Uniform Hashing Strong Assumptions: Each key will be mapped to one of the m available slots Worst-case: all keys are mapped to the same index. with equal probability (Uniformity) ⇒ Θ( n ) per operation in the worst case. and independent of where other keys are hashed (Independence). 387 388 Simple Uniform Hashing Simple Uniform Hashing Under the assumption of simple uniform hashing: Expected length of a chain when n elements are inserted into a Theorem hash table with m elements Let a hash table with chaining be filled with load-factor α = n m < 1 . Under the assumption of simple uniform hashing, the next operation has expected costs of ≤ 1 + α . � n − 1 � n − 1 � � ❊ ( Länge Kette j ) = ❊ ✶ ( k i = j ) = P ( k i = j ) Consequence: if the number slots m of the hash table is always at i =0 i =0 least proportional to the number of elements n of the hash table, n m = n 1 � = n ∈ O ( m ) ⇒ Expected Running time of Insertion, Search and m Deletion is O (1) . i =1 α = n/m is called load factor of the hash table. 389 390

  5. Further Analysis (directly chained list) Advantages and Disadvantages of Chaining 1 Unsuccesful search. The average list lenght is α = n m . The list has to be traversed completely. ⇒ Average number of entries considered Advantages C ′ n = α. Possible to overcommit: α > 1 allowed Easy to remove keys. 2 Successful search Consider the insertion history: key j sees an Disadvantages average list length of ( j − 1) /m . ⇒ Average number of considered entries Memory consumption of the chains- n C n = 1 (1 + ( j − 1) /m )) = 1 + 1 n ( n − 1) ≈ 1 + α � 2 . n n 2 m j =1 391 392 [Variant:Indirect Chaining] Examples of popular Hash Functions Example m = 7 , K = { 0 , . . . , 500 } , h ( k ) = k mod m . Keys 12 , 55 , 5 , 15 , 2 , 19 , 43 Indirect chaining the Collisions h ( k ) = k mod m 0 1 2 3 4 5 6 15 2 12 55 hash table Ideal: m prime, not too close to powers of 2 or 10 Colliding entries 43 5 But often: m = 2 k − 1 ( k ∈ ◆ ) 19 393 394

  6. Examples of popular Hash Functions Illustration Multiplication method ← → w bits k k ( a · k mod 2 w ) / 2 w − r � � h ( k ) = mod m a × 11 1 k m = 2 r , w = size of the machine word in bits. + k Multiplication adds k along all bits of a , integer division with 2 w − r and mod m extract the upper r bits. + k Written as code a ∗ k >> (w − r) = ← r bits → � √ · 2 w � 5 − 1 A good value of a : : Integer that represents the first w bits of the >> ( w − r ) 0 ← r bits → 2 fractional part of the irrational number. 395 396 Table size increase Table size increase We do not know beforehand how large n will be 1.Idea n = m ⇒ m ′ ← m + 1 Require m = Θ( n ) at all times. Increase for each insertion: Costs Θ(1 + 2 + 3 + · · · + n ) = Θ( n 2 ) Table size needs to be adapted. Hash-Function changes ⇒ 2.Idea n = m ⇒ m ′ ← 2 m Increase only if m = 2 i : rehashing Θ(1 + 2 + 4 + 8 + · · · + n ) = Θ( n ) Allocate array A ′ with size m ′ > m Few insertions cost linear time but on average we have Θ(1) Insert each entry of A into A ′ (with re-hashing the keys) Jede Operation vom Hashing mit Verketten hat erwartet amortisierte Set A ← A ′ . Kosten Θ(1) . Costs O ( n + m + m ′ ) . ( ⇒ Amortized Analysis) How to choose m ′ ? 397 398

  7. Open Addressing 22 Algorithms for open addressing Store the colliding entries directly in the hash table using a probing insert ( i ) Search for kes k of i in the table according to S ( k ) . If k function s : K × { 0 , 1 , . . . , m − 1 } → { 0 , 1 , . . . , m − 1 } is not present, insert k at the first free position in the probing Key table position along a probing sequence sequence. Otherwise error message. find ( k ) Traverse table entries according to S ( k ) . If k is found, S ( k ) := ( s ( k, 0) , s ( k, 1) , . . . , s ( k, m − 1)) mod m return data associated to k . Otherwise return an empty element null . Probing sequence must for each k ∈ K be a permutation of delete ( k ) Search k in the table according to S ( k ) . If k is found, { 0 , 1 , . . . , m − 1 } replace it with a special key removed . 22 Notational clarification: this method uses open addressing (meaning that the positions in the hashtable are not fixed) but it is a closed hashing procedure (because the entries stay in the hashtable) 399 400 Linear Probing [Analysis linear probing (without proof)] s ( k, j ) = h ( k ) + j ⇒ S ( k ) = ( h ( k ) , h ( k ) + 1 , . . . , h ( k ) + m − 1) 1 Unsuccessful search. Average number of considered entries mod m n ≈ 1 � 1 � C ′ 1 + (1 − α ) 2 2 Example m = 7 , K = { 0 , . . . , 500 } , h ( k ) = k mod m . Key 12 , 55 , 5 , 15 , 2 , 19 2 Successful search. Average number of considered entries 0 1 2 3 4 5 6 C n ≈ 1 � 1 � 1 + . 5 15 2 19 12 55 2 1 − α 401 402

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend