1 / 91
Hash Tables
Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been - - PowerPoint PPT Presentation
Hash Tables Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be having a couple of guest lectures later in the semester. 2 / 91 Recap Recap 3 / 91 Recap Access Methods Access methods are alternative
1 / 91
Hash Tables
2 / 91
Hash Tables
3 / 91
Recap
4 / 91
Recap
5 / 91
Recap
▶ How to organize meta-data on disk or in memory to support efficient access to specific tuples?
▶ How to allow multiple threads to access the derived data structure at the same time without causing problems?
6 / 91
Recap
7 / 91
Hash Tables
8 / 91
Hash Tables
▶ mymap.insert(’a’, 50); ▶ mymap[’b’]=100; ▶ mymap.find(’a’) ▶ mymap[’a’]
9 / 91
Hash Tables
▶ Average: O(1) ▶ Worst: O(n)
10 / 91
Hash Tables
11 / 91
Hash Tables
12 / 91
Hash Tables
▶ If key1 != key2, then hash(key1) != hash(key2)
13 / 91
Hash Tables
▶ How to map a large key space into a smaller domain of array offsets. ▶ Trade-off between being fast vs. collision rate.
▶ How to handle key collisions after hashing. ▶ Trade-off between allocating a large hash table vs. additional steps to find/insert keys.
14 / 91
Hash Functions
15 / 91
Hash Functions
16 / 91
Hash Functions
▶ Used in networking for error detection.
▶ Designed to a fast, general purpose hash function.
▶ Designed to be faster for short keys (<64 bytes). ▶ New assembly instructions have been added recently to accelerate hashing
▶ From the creator of zstd compression.
▶ Newer version of CityHash with better collision rates.
17 / 91
Hash Functions
18 / 91
Hash Functions
19 / 91
Static Hashing Schemes
20 / 91
Static Hashing Schemes
▶ Approach 1: Linear Probe Hashing ▶ Approach 2: Robin Hood Hashing ▶ Approach 3: Cuckoo Hashing
21 / 91
Static Hashing Schemes
▶ To determine whether an element is present, hash to a location in the index and scan for it. ▶ Have to store the key in the index to know when to stop scanning. ▶ Insertions and deletions are generalizations of lookups.
22 / 91
Static Hashing Schemes
23 / 91
Static Hashing Schemes
24 / 91
Static Hashing Schemes
25 / 91
Static Hashing Schemes
26 / 91
Static Hashing Schemes
27 / 91
Static Hashing Schemes
28 / 91
Static Hashing Schemes
29 / 91
Static Hashing Schemes
30 / 91
Static Hashing Schemes
▶ Approach 1: Tombstone ▶ Approach 2: Movement
31 / 91
Static Hashing Schemes
32 / 91
Static Hashing Schemes
33 / 91
Static Hashing Schemes
34 / 91
Static Hashing Schemes
35 / 91
Static Hashing Schemes
36 / 91
Static Hashing Schemes
37 / 91
Static Hashing Schemes
▶ Store values in separate storage area for each key.
▶ Store duplicate keys entries together in the hash table.
38 / 91
Static Hashing Schemes
▶ Each key tracks the number of positions they are from where its optimal position in the table. ▶ On insert, a key takes the slot of another key if the first key is farther away from its
39 / 91
Static Hashing Schemes
40 / 91
Static Hashing Schemes
41 / 91
Static Hashing Schemes
42 / 91
Static Hashing Schemes
43 / 91
Static Hashing Schemes
44 / 91
Static Hashing Schemes
45 / 91
Static Hashing Schemes
46 / 91
Static Hashing Schemes
47 / 91
Static Hashing Schemes
▶ On insert, check every table and pick anyone that has a free slot. ▶ If no table has a free slot, evict the element from one of them and then re-hash it find a new location.
48 / 91
Static Hashing Schemes
49 / 91
Static Hashing Schemes
50 / 91
Static Hashing Schemes
51 / 91
Static Hashing Schemes
52 / 91
Static Hashing Schemes
53 / 91
Static Hashing Schemes
54 / 91
Static Hashing Schemes
▶ Otherwise it has to rebuild the table if it needs to grow/shrink the table in size. Why? ▶ You would have to take a latch on the entire hash table to prevent threads from adding new entries.
▶ Approach 1: Chained Hashing ▶ Approach 2: Extendible Hashing ▶ Approach 3: Linear Hashing
55 / 91
Dynamic Hashing Schemes
56 / 91
Dynamic Hashing Schemes
▶ To determine whether an element is present, hash to its bucket and scan for it. ▶ Insertions and deletions are generalizations of lookups.
57 / 91
Dynamic Hashing Schemes
58 / 91
Dynamic Hashing Schemes
59 / 91
Dynamic Hashing Schemes
60 / 91
Dynamic Hashing Schemes
61 / 91
Dynamic Hashing Schemes
62 / 91
Dynamic Hashing Schemes
▶ Data movement is localized to just the split chain.
63 / 91
Dynamic Hashing Schemes
▶ Global counter keeps track of the number of bits that the the hash table uses. ▶ Local counter in each bucket tracks the number of hash bits used by that bucket.
64 / 91
Dynamic Hashing Schemes
65 / 91
Dynamic Hashing Schemes
66 / 91
Dynamic Hashing Schemes
67 / 91
Dynamic Hashing Schemes
68 / 91
Dynamic Hashing Schemes
69 / 91
Dynamic Hashing Schemes
70 / 91
Dynamic Hashing Schemes
71 / 91
Dynamic Hashing Schemes
72 / 91
Dynamic Hashing Schemes
▶ When any bucket overflows, split the bucket at the pointer location.
▶ Space Utilization ▶ Average Length of Overflow Chains
73 / 91
Dynamic Hashing Schemes
74 / 91
Dynamic Hashing Schemes
75 / 91
Dynamic Hashing Schemes
76 / 91
Dynamic Hashing Schemes
77 / 91
Dynamic Hashing Schemes
78 / 91
Dynamic Hashing Schemes
79 / 91
Dynamic Hashing Schemes
80 / 91
Dynamic Hashing Schemes
81 / 91
Dynamic Hashing Schemes
82 / 91
Dynamic Hashing Schemes
▶ When the pointer reaches the last slot, delete the first hash function and move back to beginning.
83 / 91
Dynamic Hashing Schemes
84 / 91
Dynamic Hashing Schemes
85 / 91
Dynamic Hashing Schemes
86 / 91
Dynamic Hashing Schemes
87 / 91
Dynamic Hashing Schemes
88 / 91
Dynamic Hashing Schemes
89 / 91
Dynamic Hashing Schemes
▶ Directory is gradually doubled over the course of a round ▶ A directory can be avoided by a clever choice of the buckets to split ▶ More flexibility: need not always split the appropriate dense bucket
90 / 91
Dynamic Hashing Schemes
▶ Examples: Page Table (Buffer Manager), Lock Table (Lock Manager)
91 / 91
Dynamic Hashing Schemes
▶ Lack of ordering in widely-used hashing schemes ▶ Lack of locality of reference −→ more disk seeks ▶ Persistent data structures are much more complex (logging and recovery) ▶ Reference
▶ a.k.a., "The Greatest Data Structure of All Time!"