hash tables outline overview
play

Hash Tables Outline Overview Implementation style for the Table - PowerPoint PPT Presentation

Hash Tables Outline Overview Implementation style for the Table ADT that is Definition good in a wide range of situations is the hash Hash functions table Open hashing efficient Insert, Delete, and Search operations


  1. Hash Tables � Outline Overview � Implementation style for the Table ADT that is � Definition good in a wide range of situations is the hash � Hash functions table � Open hashing � efficient Insert, Delete, and Search operations � Closed hashing � difficult Sorted Traversal � efficient unsorted traversal � collision resolution techniques � Good approach as long as sorted output � Efficiency comparatively rare in the total set of hash table operations EECS 268 Programming II 1 EECS 268 Programming II 2 Definition Definitions � Hash table is defined by: � An Array of buckets B[0 ... m-1] holds all data managed by the hash table � set of records R = { r 1 , r 2 , ... , r n } stored by the table � set of input keys K = { k 1 , k 2 , ...., k n }, n >= 0 that can be � Open or External Hashing associated with records (k x , r y ) � bucket locations store pointers (references) to record � Array of buckets B[0 ... m-1]: each array element is pairs (k x , r y ) capable of holding one or more (k x , r y ) pairs � colliding records stored in a linked list � Hash Function H: K � {0, 1, ... , m-1} � Closed or Internal Hashing � for any given (k x , r y ), B[H(k x )] is the designated storage � buckets store actual objects location for (k x , r y ) � colliding records stored in other bucket locations � Collision resolution scheme � Note that the associated keys may be implicit � when (k x , r y ) and (k a , r b ) map to the same bucket under H, this scheme determines where the second record is stored rather than explicitly stored EECS 268 Programming II 3 EECS 268 Programming II 4

  2. � ����������������������������������������������������� Hash Functions Hash Function � 2 � H(i) = i � Strings are a common search key in many cases � reduces the hash table to an array � convert string to an integer � Selecting digits � ������������������� � choose some subset of digits in a large number � Approaches � specific slice or positions � add characters or slices of characters together as n-bit � Folding unsigned numbers with the sum rolling over within x- � take digits or slices of a number and add them bits together with roll-over � bit shifting to form numbers possible � H(i) = i modulo m � where m is Hash Table size � x-bits chose for table size or x modulo m � several other options possible ���������������������� EECS 268 Programming II 5 EECS 268 Programming II 6 Open Hashing Open Hashing � 2 � Example: take a hash table size of 7 (prime) and a hash � Advantages of Open Hashing with chaining function h(x) = x mod 7 � simple in concept and implementation � insert 64, 26, 56, 72, 8, 36, 42 � insertion is always possible � If data set is large compared to hash table size, or the � Disadvantages of hashing with chaining hash function clusters data, then length of the list holding the bucket contents can be significant � unbalanced distribution decreases efficiency � sorted list will reduce the average failure time � O(n) for a linked list, O(log n) for a BST � can identify failure before the end of the list � greater memory overhead � use binary search tree instead of list � higher execution overhead of stepping through � why not a BST for the whole data set? pointers � use second Hash table EECS 268 Programming II 7 EECS 268 Programming II 8

  3. � ������������������������������������������������� Closed Hashing Closed Hashing � Collision Resolution � Create a sequence of collision resolution � Closed hashing with Open addressing functions � storing all data items within single hash table, but � h 0 (x) is base hash function ��������������������������������������������������� � h 1 (x) used to find first alternate storage location after � Hash table of size m can hold at most m items a collision � h 2 (x) used to find the next alternate if first alternate is occupied items to m different table elements � Each h i (x) must be guaranteed to choose different table locations � collisions will generally occur before table is full � Hash function series should ideally check all table � Collision resolution is thus crucial to efficient use locations of closed hash tables EECS 268 Programming II 9 EECS 268 Programming II 10 Collision Resolution � Quadratic Collision Resolution � Linear Probing Probing � Spread probed locations across the table � Search hash table sequentially starting from the original location specified by the hash � � � � � � � � � � � ������ ��� � � function � Example: Insert 64, 26, 56, 72, 8, 36, 42 � Series of probed locations is not guaranteed to � � � � � � � � � � ������ ��� � � cover the whole table without duplication � Insert 64, 26, 56, 72, 8, 36, 42 in an empty � Closed hashing schemes can fail even though the table of size 7 � table is not full � Fragile � causes primary clusters by occupying � and secondary clusters may form adjacent table locations � if the probing scheme will not visit all table locations ���������������������������������������� � similar to long chains in open hashing EECS 268 Programming II 11 EECS 268 Programming II 12

  4. � �������������������������������������������� Collision Resolution � Collision Resolution � Double Hashing Linear Probing with Fixed Increment � � � � � � � � � �� � ��� ������ ��� � � � Use a second hash function (h'(x)) to generate � FI is relatively prime to m the probe sequence used after a collision � linear probing will visit all table locations without � � � � � � � � � �������� ������ ��� � � repeats � ������������� (x mod R), where R < m is prime � X is relatively prime to Y iff GCD(X,Y) = 1 � Example: m=7, R=5, insert 64,26,56,72,8,36,42 EECS 268 Programming II 13 EECS 268 Programming II 14 Closed Hashing -- Deletions Closed Hashing -- Deletions � During a probing sequence, � Example: Insert 64, 56, 72, 8 using linear probling � if an AE bucket is found, searching can stop � delete 64; delete 8 � if an ED bucket is found, searching must continue � Closed Hashing is thus subject to a form of creates a problem because the empty cell could ��������� be there for two reasons � as cells are deleted, probing sequences generally � no further elements exist along this probing sequence lengthen as the probability of encountering ED cells increases � deletion of an item along the sequence took place � failed searches get more expensive because they � Two types of empty buckets cannot terminate until � bucket has always been empty (AE) (flag 0) � an AE cell is found � bucket emptied by deletion (ED) (flag 1) � all cells of the table can be visited EECS 268 Programming II 15 EECS 268 Programming II 16

  5. Closed Hashing The Efficiency of Hashing � Advantages of Closed Hashing with Open Addressing � An analysis of the average-case efficiency � lower execution overhead as addresses are calculated rather than read from pointers in memory � Load factor � � lower memory overhead as pointers are not stored � ratio of the current number of items in the table to the � Disadvantages maximum size of the array table � more complex than chaining � measures how full a hash table is � can degenerate into linear search due to primary or secondary clustering � should not exceed 2/3 � Delete and Find operations are more complex � Hashing efficiency for a particular search also � Insert is not always possible even though the table is not full depends on whether the search is successful � Delete can increase probe sequence length by making search termination conditions ambiguous � unsuccessful searches generally require more time than successful searches EECS 268 Programming II 17 EECS 268 Programming II 18 The Efficiency of Hashing Summary � Hash Tables are useful and efficient data structures in a wide range of applications � Open hashing with chaining is simple, easy to implement, and usually efficient � length of the chains is key to performance � Closed hashing with various approaches to generating a probe sequence can also be efficient � lower space and computation overhead � more complex implementation � performance is sensitive to probe sequence � Monitoring load factor and other hash-table behavior parameters is important in maintaining performance EECS 268 Programming II 19 EECS 268 Programming II 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend