introduction to algorithms introduction to algorithms
play

Introduction to Algorithms Introduction to Algorithms Arrays - PowerPoint PPT Presentation

Motivation Introduction to Algorithms Introduction to Algorithms Arrays provide an indirect way to access a set . y p y Many times we need an association between two sets, or a set of keys and associated data. Hash Tables Hash Tables


  1. Motivation Introduction to Algorithms Introduction to Algorithms � Arrays provide an indirect way to access a set . y p y � Many times we need an association between two sets, or a set of keys and associated data. Hash Tables Hash Tables � Ideally we would like to access this data directly � Ideally we would like to access this data directly with the keys. CSE 680 � We would like a data structure that supports fast search, insertion, and deletion. h i ti d d l ti Prof. Roger Crawfis � Do not usually care about sorting. � The abstract data type is usually called a The abstract data type is usually called a Dictionary or Partial Map � float googleStockPrice = stocks[“Goog”].CurrentPrice; Dictionaries Direct Addressing � What is the best way to implement this? y p � Let’s look at an easy case, suppose: Let s look at an easy case, suppose: � Linked Lists? � The range of keys is 0.. m -1 � Double Linked Lists? � Keys are distinct � Queues? � Queues? � Possible solution � Stacks? � Multiple indexed arrays (e.g., data[key[i]])? � Set up an array T[0..m-1] in which � To answer this, ask what the complexity of the T thi k h t th l it f th if x ∈ T and key[ x ] = i � T[ i ] = x operations are: � T[ i ] = NULL otherwise � Insertion � This is called a direct-address table � This is called a direct address table � Deletion � Operations take O(1) time! � Search � So what’s the problem?

  2. Direct Addressing Hash Table � Hash Tables provide O (1) support for all � Hash Tables provide O (1) support for all � Direct addressing works well when the � Direct addressing works well when the of these operations! range m of keys is relatively small � The key is rather than index an array � The key is rather than index an array � But what if the keys are 32-bit integers? � But what if the keys are 32-bit integers? directly, index it through some function, � Problem 1: direct-address table will have h ( x ), called a hash function . ( ) 2 32 entries, more than 4 billion , � myArray[ h (index) ] � Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be � Key questions: y q � Solution: map keys to smaller range 0.. p -1 � What is the set that the x comes from? � What is h() and what is its range? g � Desire p = O ( m ). () Hash Table Hash Functions � In general a difficult problem. Try something simpler. g p y g p � Consider this problem: � Consider this problem: � If I know a prior the m keys from some finite 0 U set U , is it possible to develop a function set U is it possible to develop a function (universe of keys) (universe of keys) h(x) that will uniquely map the m keys onto h(k 1 ) k 1 the set of numbers 0.. m -1? h(k 4 ) k k 4 K k 5 (actual h(k 2 ) h(k 2 ) = h(k 5 ) keys) k 2 h(k 3 ) k 3 m - 1

  3. Hash Functions Hash Functions � A collision occurs when h(x) maps two keys to the � A hash function, h, maps keys of a given type to , p y g yp , same location. l ti integers in a fixed interval [0, N − 1] 0 � Example: U h ( x ) = x mod N (universe of keys) (universe of keys) h ( ) d N h(k 1 ) is a hash function for integer keys collision k 1 h(k 4 ) � The integer h ( x ) is called the hash value of x. The integer h ( x ) is called the hash value of x. k k 4 K k 5 � A hash table for a given key type consists of (actual h(k 2 ) = h(k 5 ) h(k 2 ) keys) � Hash function h � Array (called table) of size N k 2 h(k 3 ) k 3 � The goal is to store item ( k , o ) at index i = h ( k ) p - 1 Example p Example p � Our hash table uses an � We design a hash table g 0 0 ∅ ∅ ∅ ∅ array of size N = 100. f i N 100 storing employees 1 1 � We have n = 49 025-612-0001 025-612-0001 records using their 2 2 employees. 981-101-0002 981-101-0002 social security number, social security number � Need a method to handle 3 3 ∅ ∅ SSN as the key. collisions . 4 4 As long as the chance 451-229-0004 451-229-0004 � SSN is a nine-digit � for collision is low, we f lli i i l positive integer iti i t … … can achieve this goal. � Our hash table uses an Setting N = 1000 and g array of size N = 10,000 y , 9997 � 9997 ∅ ∅ ∅ ∅ looking at the last four and the hash function 9998 9998 200-751-9998 digits will reduce the 200-751-9998 h ( x ) = last four digits of x 9999 9999 176-354-9998 ∅ chance of collision. ∅

  4. Collisions Chaining � Can collisions be avoided? � Can collisions be avoided? � Chaining puts elements that hash to the � Chaining puts elements that hash to the same slot in a linked list: � In general, no. See perfect hashing for the case were the set of keys is static (not covered). —— U U � Two primary techniques for resolving k 1 k 4 (universe of keys) —— —— collisions: collisions: k 1 1 —— k 4 � Chaining – keep a collection at each key k 5 K —— (actual slot. k 7 k 5 k 2 k 7 —— keys) y ) —— � Open addressing – if the current slot is full k 3 k 2 k 3 k 8 —— k 6 use the next open one. p k 8 k 6 —— —— Chaining Chaining � How do we delete an element? � How do we insert an element? � How do we insert an element? � Do we need a doubly-linked list for efficient delete? —— —— U U U U k 1 k 1 k 4 k 4 (universe of keys) —— (universe of keys) —— —— —— k 1 k 1 1 1 —— —— k 4 k 4 k 5 k 5 K K —— —— (actual (actual k 5 k 2 k 7 k 5 k 2 k 7 k 7 k 7 —— —— keys) y ) keys) y ) —— —— k 3 k 3 k 2 k 2 k 3 k 3 k 8 —— k 8 —— k 6 k 6 k 8 k 6 k 8 k 6 —— —— —— ——

  5. Chaining Open Addressing p g � Basic idea: � Basic idea: � How do we search for a element with a � How do we search for a element with a � To insert: if slot is full, try another slot, …, until given key? T an open slot is found ( probing ) p ( p g ) —— —— U � To search, follow same sequence of probes as k 1 k 4 (universe of keys) —— would be used when inserting the element —— k 1 k � If reach element with correct key, return it —— k 4 k 5 � If reach a NULL pointer, element is not in table K —— (actual k 7 k 7 k 5 k 5 k 2 k 2 k 7 k 7 —— � Good for fixed sets (adding but no deletion) G d f fi d t ( ddi b t d l ti ) keys) —— � Example: spell checking k 3 k 2 k 3 k 8 —— k 6 6 k 8 k 6 —— —— Open Addressing p g Probing � The colliding item is placed in a The colliding item is placed in a � They key question is what should the � They key question is what should the different cell of the table. next cell to try be? � No dynamic memory. � Random would be great, but we need to � Random would be great but we need to � Fixed Table size. be able to repeat it. � Load factor: n/N , where n is the number of items to store and N the size of the hash f it t t d N th i f th h h � Three common techniques: Th t h i table. � Linear Probing (useful for discussion only) � Cleary, n ≤ N, or n/ N ≤ 1. � Cleary n ≤ N or n/ N ≤ 1 � Quadratic Probing � To get a reasonable performance, n/ N<0.5. � Double Hashing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend