something very different
play

Something very different - PowerPoint PPT Presentation

Something very different https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04 , http://data-science-sequencing.github.io/Win2018/lectures/lecture7/ , http://virological.org/t/ 1


  1. Something very different https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04 , http://data-science-sequencing.github.io/Win2018/lectures/lecture7/ , http://virological.org/t/ 1 response-to-on-the-origin-and-continuing-evolution-of-sars-cov-2/418

  2. Back to hashing (( ax + b ) mod p ) mod m Warmup: Find the largest set of keys that collide hash ( x ) = ( 3 x + 2 ) mod9 hash ( x ) = ( 3 x + 2 ) mod11 Which is a better hash function? 2

  3. Hashing with chaining Store multiple key in each array slot 0 How? 1 AT GA • We will consider linked lists • Any dictionary ADT could be 2 used provided ... 3 Result (using linked list) 4 CT • We can hash more than m things 5 into an array of size m 6 AA TA • Worst case runtime depends on length of largest chain • Memory is allocated on each insert 3

  4. Acces time for chaining Load factor: # items hashed # size of array = n = α m Assuming a uniform hash function i.e. probability of hashing to any slot is equal Search cost: • Unsuccessful search examines items • Successful search examines 1 + n − 1 2 m = 1 + α 2 − α 2 n items For good performance we want a small load factor 4

  5. Open adressing Each array element contains one item. The hash 0 TA function specifies a sequence of elements to try. Insert: If first slot is occupied check next location in 1 AT hash function sequence. Find: If slot does not match keep trying the next slot in 2 GA sequence until either the item is found or an empty slot is visited (item not found). 3 Remove: Find and replace item with a tombstone . 4 CT Result: • Cannot hash more than m items by pigeonhole 5 principle • Hash table memory allocated once 6 AA • Performance will depend on how many times we check slots 5

  6. Linear probing Try ( h ( k ) + i ) mod m for i = 0 , 1 , 2 ,... m − 1 0 1 2 3 4 5 6 For this example h ( k ) = k mod7 and m = 7 6

  7. Double hashing Try ( h ( k ) + i · h 2 ( k )) mod m for i = 0 , 1 , 2 ,... m − 1 0 1 2 3 4 5 6 For this example h ( k ) = k mod7, h 2 ( k ) = 5 − k mod 5 and m = 7 7

  8. Rehashing Sometimes we need to resize the hash table • For open addressing this will have to happen when we fill the table • For separate chaining we want to do this when the load factor gets big To resize we: • Resize the hash table • Θ ( 1 ) amortized time if doubling • Get a new hash function Result: • Spread the keys out • Remove tombstones (open addressing) • Allows arbitrarily large tables 8

  9. Hashing summary What collision resolution strategy is best? What is the best implementation of a dictionary ADT? Why did we talk about trees? More in depth info: http://jeffe.cs.illinois.edu/teaching/ algorithms/notes/05-hashing.pdf 9

  10. Something new What is interesting about this tree? 2 5 6 9 8 7 14 29 21 42 15 33 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend