SLIDE 1
Something very different - - PowerPoint PPT Presentation
Something very different - - PowerPoint PPT Presentation
Something very different https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04 , http://data-science-sequencing.github.io/Win2018/lectures/lecture7/ , http://virological.org/t/ 1
SLIDE 2
SLIDE 3
Hashing with chaining
Store multiple key in each array slot How?
- We will consider linked lists
- Any dictionary ADT could be
used provided ... Result (using linked list)
- We can hash more than m things
into an array of size m
- Worst case runtime depends on
length of largest chain
- Memory is allocated on each
insert
AT GA CT AA TA 1 2 3 4 5 6
3
SLIDE 4
Acces time for chaining
Load factor: α = #items hashed #size of array = n m Assuming a uniform hash function i.e. probability of hashing to any slot is equal Search cost:
- Unsuccessful search examines
items
- Successful search examines 1+ n−1
2m = 1+ α 2 − α 2n items
For good performance we want a small load factor
4
SLIDE 5
Open adressing
Each array element contains one item. The hash function specifies a sequence of elements to try. Insert: If first slot is occupied check next location in hash function sequence. Find: If slot does not match keep trying the next slot in sequence until either the item is found or an empty slot is visited (item not found). Remove: Find and replace item with a tombstone. Result:
- Cannot hash more than m items by pigeonhole
principle
- Hash table memory allocated once
- Performance will depend on how many times we
check slots
AT GA CT AA TA 1 2 3 4 5 6
5
SLIDE 6
Linear probing
Try (h(k)+i) modm for i = 0,1,2,...m −1
2 1 3 4 5 6
For this example h(k) = k mod7 and m = 7
6
SLIDE 7
Double hashing
Try (h(k)+i ·h2(k)) modm for i = 0,1,2,...m −1
2 1 3 4 5 6
For this example h(k) = k mod7, h2(k) = 5−k mod 5 and m = 7
7
SLIDE 8
Rehashing
Sometimes we need to resize the hash table
- For open addressing this will have to happen when we fill
the table
- For separate chaining we want to do this when the load
factor gets big To resize we:
- Resize the hash table
- Θ(1) amortized time if doubling
- Get a new hash function
Result:
- Spread the keys out
- Remove tombstones (open addressing)
- Allows arbitrarily large tables
8
SLIDE 9
Hashing summary
What collision resolution strategy is best? What is the best implementation of a dictionary ADT? Why did we talk about trees?
More in depth info: http://jeffe.cs.illinois.edu/teaching/ algorithms/notes/05-hashing.pdf
9
SLIDE 10