Something very different - - PowerPoint PPT Presentation

something very different
SMART_READER_LITE
LIVE PREVIEW

Something very different - - PowerPoint PPT Presentation

Something very different https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04 , http://data-science-sequencing.github.io/Win2018/lectures/lecture7/ , http://virological.org/t/ 1


slide-1
SLIDE 1

Something very different

https://nextstrain.org/narratives/ncov/sit-rep/2020-03-04, http://data-science-sequencing.github.io/Win2018/lectures/lecture7/, http://virological.org/t/ response-to-on-the-origin-and-continuing-evolution-of-sars-cov-2/418

1

slide-2
SLIDE 2

Back to hashing

((ax +b)modp) modm Warmup: Find the largest set of keys that collide hash(x) = (3x +2)mod9 hash(x) = (3x +2)mod11 Which is a better hash function?

2

slide-3
SLIDE 3

Hashing with chaining

Store multiple key in each array slot How?

  • We will consider linked lists
  • Any dictionary ADT could be

used provided ... Result (using linked list)

  • We can hash more than m things

into an array of size m

  • Worst case runtime depends on

length of largest chain

  • Memory is allocated on each

insert

AT GA CT AA TA 1 2 3 4 5 6

3

slide-4
SLIDE 4

Acces time for chaining

Load factor: α = #items hashed #size of array = n m Assuming a uniform hash function i.e. probability of hashing to any slot is equal Search cost:

  • Unsuccessful search examines

items

  • Successful search examines 1+ n−1

2m = 1+ α 2 − α 2n items

For good performance we want a small load factor

4

slide-5
SLIDE 5

Open adressing

Each array element contains one item. The hash function specifies a sequence of elements to try. Insert: If first slot is occupied check next location in hash function sequence. Find: If slot does not match keep trying the next slot in sequence until either the item is found or an empty slot is visited (item not found). Remove: Find and replace item with a tombstone. Result:

  • Cannot hash more than m items by pigeonhole

principle

  • Hash table memory allocated once
  • Performance will depend on how many times we

check slots

AT GA CT AA TA 1 2 3 4 5 6

5

slide-6
SLIDE 6

Linear probing

Try (h(k)+i) modm for i = 0,1,2,...m −1

2 1 3 4 5 6

For this example h(k) = k mod7 and m = 7

6

slide-7
SLIDE 7

Double hashing

Try (h(k)+i ·h2(k)) modm for i = 0,1,2,...m −1

2 1 3 4 5 6

For this example h(k) = k mod7, h2(k) = 5−k mod 5 and m = 7

7

slide-8
SLIDE 8

Rehashing

Sometimes we need to resize the hash table

  • For open addressing this will have to happen when we fill

the table

  • For separate chaining we want to do this when the load

factor gets big To resize we:

  • Resize the hash table
  • Θ(1) amortized time if doubling
  • Get a new hash function

Result:

  • Spread the keys out
  • Remove tombstones (open addressing)
  • Allows arbitrarily large tables

8

slide-9
SLIDE 9

Hashing summary

What collision resolution strategy is best? What is the best implementation of a dictionary ADT? Why did we talk about trees?

More in depth info: http://jeffe.cs.illinois.edu/teaching/ algorithms/notes/05-hashing.pdf

9

slide-10
SLIDE 10

Something new

What is interesting about this tree?

2 21 5 6 7 8 9 14 15 29 33 42

10