hashing and birthdays
play

Hashing and Birthdays Todays announcements: PA2 out, due Nov 1, - PowerPoint PPT Presentation

Hashing and Birthdays Todays announcements: PA2 out, due Nov 1, 23:59 MT2 Nov 7, 19:00-21:00 WOOD 2 Todays Plan Hashing Birthdays and probability Warm up: Thinking about AVL trees AVL trees are binary search trees that


  1. Hashing and Birthdays Today’s announcements: ◮ PA2 out, due Nov 1, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 Today’s Plan ◮ Hashing ◮ Birthdays and probability Warm up: Thinking about AVL trees ◮ AVL trees are binary search trees that allow only slight imbalance ◮ Worst-case O (log n ) time for find, insert, and remove ◮ Elements (even siblings) may be scattered in memory Could we preserve optimal balance always? 5 3 7 2 4 6 1 / 10

  2. Dictionary ADT key value (data) Multics MULTiplexed Information and Computing Service Operations Unix Uniplexed Multics ◮ insert BSD Berkeley Software Distribution ◮ remove GNU GNU’s Not Unix ◮ find ◮ insert(Linux, Linus Torvald’s Unix) ◮ find(Unix) returns “Uniplexed Multics” 2 / 10

  3. Hash Table Goal We can do: We want to do: a[2]=“GNU’s Not Unix” a[“GNU”]=“GNU’s Not Unix” 0 Multics 1 Linux 2 GNU’s Not Unix GNU’s Not Unix GNU 3 Unix m − 1 Unics 3 / 10

  4. Hash table approach Choose a hash function to map keys to indices. keys hash table 0 GNU 1 Linux 2 GNU’s Not Unix Multics 3 Unics Unix m − 1 hash function hash(“GNU”) = 2 4 / 10

  5. Collisions A collision occurs when two different keys x and y map to the same index (i.e. slot in table), hash( x ) = hash( y ). hash table 0 GNU 1 Linux 2 GNU’s Not Unix 3 Multics Unics Unix m − 1 Mac OS X hash function Can we prevent collisions? 5 / 10

  6. Birthdays and Probability Probability that someone in this room has a birthday today? What if this was a birthday party? Probability that two people in this room have the same birthday? What if the room contained 366 people? 183? 6 / 10

  7. Expected Value Definition: The expected value of a number X that depends on random events ( X is called a random variable ) is: � E [ X ] = Prob [ X = x ] · x . x X is the sum of two six-sided dice. E [ X ] = 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Linearity of Expectation For any two random variables X and Y , E [ X + Y ] = E [ X ] + E [ Y ]. 7 / 10

  8. More Birthdays What is the expected number of people who share a birthday in this room? � 1 if person i and j have same birthday Let X ij = 0 otherwise X = � i < j X ij is the number of pairs who share a birthday. E [ X ] = E [ � i < j X ij ] = � i < j E [ X ij ] = � i < j Generalized birthdays k ( k − 1) If we randomly put k people into m bins, we expect 1 pairs m 2 √ to share a bin, which is greater than 1 for k = 2 m + 1. 8 / 10

  9. Hashing string keys with mod and Horner’s Rule int hash( string s ) { int h = 0; for (i = s.length() - 1; i >= 0; i--) { h = (256 * h + s[i]) % m; } return h; } Compare that to the hash function from yacc: #define TABLE_SIZE 1024 // must be a power of 2 int hash( char *s ) { int h = *s++; while( *s ) h = (31 * h + *s++) & (TABLE_SIZE - 1); return h; } What’s different? 9 / 10

  10. Fixed hash functions are dangerous! Good hash table performance depends on few collisions. If a user knows your hash function, she can cause many elements to hash to the same slot. Why would she want to do that? Yacc h ( s ) = (31 k − 1 s [0] + 31 k − 2 s [1] + · · · + 31 0 s [ k − 1])mod1023 h ( XY ) = h ( xy ). Find many strings that hash to the same slot? Protection ◮ Use a cryptographically secure hash function (e.g. SHA-512). ◮ Choose a new hash function at random for every hash table. 10 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend