dynamic programming hash tables and biostatistics 615 815
play

Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture - PowerPoint PPT Presentation

. . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang February 1st, 2011 Hyun Min Kang Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture 8: . . . . . . Summary . Introduction . . . . . . . . .


  1. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang February 1st, 2011 Hyun Min Kang Dynamic Programming Hash Tables, and Biostatistics 615/815 Lecture 8: . . . . . . Summary . Introduction . . . . . . . . . . Hash Tables . ChainedHash OpenHash Fibonacci 1 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. . . . . . . class web page . 815 projects . . . . . . . . Instructor sent out E-mails to individually today morning Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . . . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash Fibonacci . Summary Announcements . Homework #2 2 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • For problem 3, assume that all the input values are unique • Include the class definition into myTree.h and myTreeNode.h (do not make .cpp file) • The homework .tex file containing the source code is uploaded in the

  3. . . . . . . class web page . 815 projects . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . . . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash Fibonacci . Summary Announcements . Homework #2 2 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • For problem 3, assume that all the input values are unique • Include the class definition into myTree.h and myTreeNode.h (do not make .cpp file) • The homework .tex file containing the source code is uploaded in the • Instructor sent out E-mails to individually today morning

  4. . Summary February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Hash Tree List SortedArray . Array Remove Insert Search Recap : Elementary data structures 3 / 36 . Fibonacci OpenHash ChainedHash Hash Tables . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Θ( n ) Θ(1) Θ( n ) Θ( log n ) Θ( n ) Θ( n ) Θ( n ) Θ(1) Θ( n ) Θ( log n ) Θ( log n ) Θ( log n ) Θ(1) Θ(1) Θ(1) • Array or list is simple and fast enough for small-sized data • Tree is easier to scale up to moderate to large-sized data • Hash is the most robust for very large datasets

  5. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Recap: Example of a linked list Summary . Fibonacci OpenHash 4 / 36 ChainedHash Hash Tables Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Example of a doubly-linked list • Singly-linked list if prev field does not exist

  6. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Recap: An example binary search tree Summary . Fibonacci OpenHash 5 / 36 ChainedHash Hash Tables Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Pointers to left and right children ( Nil if absent) • Pointers to its parent can be omitted.

  7. . . . . . . . . Or create a Makefile and just type ’make’ . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Individually compile and link - Does NOT work with template Hash Tables 6 / 36 OpenHash Fibonacci . Correction: Building your program (lecture 6) Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Include the content of your .cpp files into .h • For example, Main.cpp includes myArray.h user@host: ˜ /> g++ -o myArrayTest Main.cpp all: myArrayTest # binary name is myArrayTest myArrayTest: Main.cpp # link two object files to build binary g++ -o myArrayTest Main.cpp # must start with a tab clean: rm *.o myArrayTest

  8. . . . . . . . . Dynamic programming . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . Data structure . . . . . . . . . . Introduction Hash Tables ChainedHash Fibonacci OpenHash . Summary Today . 7 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Hash table • Divide and conquer vs dynammic programming

  9. . . . . . . . . . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . OpenHash . . . . . . . . . . Introduction Hash Tables Containers for single-valued objects - last lectures ChainedHash 8 / 36 . . Two types of containers Summary Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Insert ( T , x ) - Insert x to the container. • Search ( T , x ) - Returns the location/index/existence of x . • Remove ( T , x ) - Delete x from the container if exists • STL examples include std::vector , std::list , std::deque , std::set , and std::multiset . Containers for (key,value) pairs - this lecture • Insert ( T , x ) - Insert ( x . key , x . value ) to the container. • Search ( T , k ) - Returns the value associated with key k . • Remove ( T , x ) - Delete element x from the container if exitst • Examples include std::map , std::multimap , and gnu cxx::hash map

  10. . . . . . . . . Direct address table : a constant-time continaer . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Hash Tables 9 / 36 OpenHash Fibonacci . Direct address tables Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example (key,value) container • U = { 0 , 1 , · · · , N − 1 } is possible values of keys ( N is not huge) • No two elements have the same key Let T [0 , · · · , N − 1] be an array space that can contain N objects • Insert ( T , x ) : T [ x . key ] = x • Search ( T , k ) : return T [ k ] • Remove ( T , x ) : T [ x . key ] = Nil

  11. . . . . . . . Memory requirement . . . . . . . . arbitrary-length strings (or exponential to the length of the string) Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction Hash Tables . 10 / 36 OpenHash Summary Time complexity Fibonacci . Analysis of direct address tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Requires a single memory access for each operation • O (1) - constant time complexity • Requires to pre-allocate memory space for any possible input value • 2 32 = 4 GB × (size of data) for 4 bytes (32 bit) key • 2 64 = 18 EB (1 . 8 × 10 7 TB ) × (size of data) for 8 bytes (64 bit) key • An infinite amount of memory space needed for storing a set of

  12. . . . good performance . Key components . . . . . . . . Hash function h x key mapping key onto smaller ’addressible’ space H Total required memory is the possible number of hash values Good hash function minimize the possibility of key collisions Collision-resolution strategy, when h k h k . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . Fibonacci . . . . . . . . . . Introduction Hash Tables ChainedHash OpenHash . 11 / 36 . Summary Hash Tables . Key features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • O (1) complexity for Insert , Search , and Remove • Requires large memory space than the actual content for maintainng • But uses much smaller memory than direct-addres tables

  13. . . . . . . good performance . Key components . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . OpenHash . . . . . . . . . . Introduction Hash Tables . ChainedHash 11 / 36 Key features Fibonacci . Hash Tables . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . • O (1) complexity for Insert , Search , and Remove • Requires large memory space than the actual content for maintainng • But uses much smaller memory than direct-addres tables • Hash function • h ( x . key ) mapping key onto smaller ’addressible’ space H • Total required memory is the possible number of hash values • Good hash function minimize the possibility of key collisions • Collision-resolution strategy, when h ( k 1 ) = h ( k 2 ) .

  14. . . . . . . . uniformly’ distribute key values to H . . . . . . . . . Hyun Min Kang Biostatistics 615/815 - Lecture 8 February 1st, 2011 . . . ChainedHash . . . . . . . . . . Introduction A good hash function Hash Tables OpenHash Fibonacci . Chained hash : A simple example Summary . 12 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Assume that we have a good hash function h ( x . key ) that ’fairly • What makes a good hash function will be discussed later today. A ChainedHash • Each possible hash key contains a linked list • Each linked list is originally empty • An input (key,value) pair is appened to the linked list when inserted • O (1) time complexity is guaranteed when no collision occurs • When collision occurs, the time complexity is proportional to size of linked list assocated with h ( x . key )

  15. . . February 1st, 2011 Biostatistics 615/815 - Lecture 8 Hyun Min Kang Illustration of ChainedHash Summary . Fibonacci OpenHash ChainedHash Hash Tables Introduction . . . . . . . . . . 13 / 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend