searching
play

Searching Consider the problem of searching an array for a given - PDF document

Searching Consider the problem of searching an array for a given value Hashing If the array is not sorted, the search requires O(n) time If the value isnt there, we need to search all n elements If the value is there, we


  1. Searching • Consider the problem of searching an array for a given value Hashing – If the array is not sorted, the search requires O(n) time • If the value isn’t there, we need to search all n elements • If the value is there, we search n/2 elements on average – If the array is sorted, we can do a binary search • A binary search requires O(log n) time • About equally fast whether the element is found or not – It doesn’t seem like we could do much better • How about an O(1), that is, constant time search? • We can do it if the array is organized in a particular way 2 Hashing Example (ideal) hash function kiwi 0 • Suppose we were to come up with a “magic • Suppose our hash function function” that, given a value to search for, would gave us the following values: 1 tell us exactly where in the array to look banana hashCode("apple") = 5 2 hashCode("watermelon") = 3 watermelon – If it’s in that location, it’s in the array 3 hashCode("grapes") = 8 – If it’s not in that location, it’s not in the array hashCode("cantaloupe") = 7 4 hashCode("kiwi") = 0 • This function would have no other purpose 5 apple hashCode("strawberry") = 9 mango hashCode("mango") = 6 6 • If we look at the function’s inputs and outputs, hashCode("banana") = 2 cantaloupe they probably won’t “make sense” 7 grapes 8 • This function is called a hash function because it strawberry 9 “makes hash” of its inputs 3 4 1

  2. Why hash tables? Finding the hash function key value • How can we come up with this magic function? • We don’t (usually) use . . . hash tables just to see if • In general, we cannot--there is no such magic 141 function  something is there or 142 robin robin info – In a few specific cases, where all the possible values are not—instead, we put 143 sparrow sparrow info known in advance, it has been possible to compute a key / value pairs into the 144 hawk hawk info perfect hash function table 145 • What is the next best thing? seagull seagull info – We use a key to find a 146 – A perfect hash function would tell us exactly where to place in the table look 147 bluejay bluejay info – The value holds the – In general, the best we can do is a function that tells us 148 owl owl info information we are where to start looking! actually interested in 5 6 Example imperfect hash function Collisions kiwi 0 • Suppose our hash function • When two values hash to the same array location, gave us the following 1 this is called a collision values: banana 2 • Collisions are normally treated as “first come, first watermelon – hash("apple") = 5 3 served”—the first value that hashes to the location hash("watermelon") = 3 4 gets it hash("grapes") = 8 hash("cantaloupe") = 7 5 apple • We have to find something to do with the second hash("kiwi") = 0 mango 6 and subsequent values that hash to this same hash("strawberry") = 9 hash("mango") = 6 cantaloupe location 7 hash("banana") = 2 grapes 8 hash("honeydew") = 6 strawberry 9 • Now what? 7 8 2

  3. Handling collisions Insertion, I • What can we do when two different values attempt • Suppose you want to add . . . to occupy the same place in an array? seagull to this hash table 141 – Solution #1: Search from there for an empty location • Also suppose: 142 robin • Can stop searching when we find the value or an empty location – hashCode(seagull) = 143 143 sparrow • Search must be end-around – table[143] is not empty – Solution #2: Use a second hash function 144 hawk – table[143] != seagull • ...and a third, and a fourth, and a fifth, ... 145 seagull – Solution #3: Use the array location as the header of a – table[144] is not empty 146 linked list of values that hash to this location – table[144] != seagull 147 bluejay • All these solutions work, provided: – table[145] is empty 148 owl – We use the same technique to add things to the array as • Therefore, put seagull at we use to search for things in the array . . . location 145 9 10 Searching, I Searching, II • Suppose you want to look up • Suppose you want to look up . . . . . . seagull in this hash table cow in this hash table 141 141 • Also suppose: • Also suppose: 142 robin 142 robin – hashCode(seagull) = 143 – hashCode(cow) = 144 143 sparrow 143 sparrow – table[143] is not empty – table[144] is not empty 144 hawk 144 hawk – table[143] != seagull – table[144] != cow 145 seagull 145 seagull – table[144] is not empty – table[145] is not empty – table[144] != seagull – table[145] != cow 146 146 – table[145] is not empty – table[146] is empty 147 bluejay 147 bluejay • If cow were in the table, we – table[145] == seagull ! 148 owl 148 owl • We found seagull at location should have found it by now . . . . . . 145 • Therefore, it isn’t here 11 12 3

  4. Insertion, II Insertion, III • Suppose you want to add . . . • Suppose: . . . hawk to this hash table 141 – You want to add cardinal to 141 this hash table • Also suppose 142 robin 142 robin – hashCode(cardinal) = 147 – hashCode(hawk) = 143 143 sparrow 143 sparrow – The last location is 148 – table[143] is not empty 144 hawk 144 hawk – 147 and 148 are occupied – table[143] != hawk 145 seagull 145 seagull • Solution: – table[144] is not empty 146 146 – table[144] == hawk – Treat the table as circular; after 147 bluejay 147 bluejay 148 comes 0 • hawk is already in the table, 148 owl 148 owl so do nothing – Hence, cardinal goes in . . . location 0 (or 1, or 2, or ...) 13 14 Clustering Efficiency • One problem with the above technique is the tendency to • Hash tables are actually surprisingly efficient form “clusters” • Until the table is about 70% full, the number of • A cluster is a group of items not containing any open slots probes (places looked at in the table) is typically • The bigger a cluster gets, the more likely it is that new only 2 or 3 values will hash into the cluster, and make it ever bigger • Sophisticated mathematical analysis is required to • Clusters cause efficiency to degrade prove that the expected cost of inserting into a • Here is a non -solution: instead of stepping one ahead, step n hash table, or looking something up in the hash locations ahead table, is O(1) – The clusters are still there, they’re just harder to see – Unless n and the table size are mutually prime, some table locations • Even if the table is nearly full (leading to long are never checked searches), efficiency is usually still quite high 15 16 4

  5. Solution #2: Rehashing Solution #3: Bucket hashing • In the event of a collision, another approach is to rehash: • The previous . . . compute another hash function solutions used open 141 – Since we may need to rehash many times, we need an easily hashing: all entries computable sequence of functions 142 robin went into a “flat” • Simple example: in the case of hashing Strings, we might 143 sparrow seagull (unstructured) array take the previous hash code and add the length of the 144 hawk String to it • Another solution is to make each array 145 – Probably better if the length of the string was not a component in computing the original hash function location the header of 146 • Possibly better yet: add the length of the String plus the a linked list of values 147 bluejay number of probes made so far that hash to that – Problem: are we sure we will look at every location in the array? 148 owl location • Rehashing is a fairly uncommon approach, and we won’t . . . pursue it any further here 17 18 Writing your own hashCode method The hashCode function • A hashCode method must: • public int hashCode() is defined in Object – Return a value that is (or can be converted to) a legal • Like equals , the default implementation of array index hashCode just uses the address of the object— – Always return the same value for the same input • It can’t use random numbers, or the time of day probably not what you want for your own objects – Return the same value for equal inputs • You can override hashCode for your own objects • Must be consistent with your equals method • As you might expect, String overrides hashCode • It does not need to return different values for different inputs with a version appropriate for strings • A good hashCode method should: • Note that the supplied hashCode method does not – Be efficient to compute know the size of your array —you have to adjust – Give a uniform distribution of array indices the returned int value yourself – Not assign similar numbers to similar input values 19 20 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend