hashing
play

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - PowerPoint PPT Presentation

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, 19:00-21:00 WOOD 2 PA2 due Nov 1, 23:59 Todays Plan B-Tree intro Warm up: What collision resolution strategy is best? Whats the best dictionary? Why


  1. Hashing Today’s announcements ◮ HW3 out, due Nov 15, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 ◮ PA2 due Nov 1, 23:59 Today’s Plan ◮ B-Tree intro Warm up: What collision resolution strategy is best? What’s the best dictionary? Why consider balanced BSTs? More info: http://jeffe.cs.illinois.edu/teaching/algorithms/notes/05-hashing.pdf 1 / 10

  2. Memory Hierarchy Why worry about the number of disk I/Os? Size Access Time < 1 cycle hundreds of bytes CPU registers Cache memory tens of kilobytes L1 a few cycles L2 megabytes L3 tens of cycles gigabytes Main memory hundreds of cycles terabytes Disk millions of cycles 2 / 10

  3. Time Cost: Processor to Disk 7200 RPM Processor ◮ Operates at a few GHz (gigahertz = billion cycles per second) . ◮ Several instructions per cycle. ◮ Average time per instruction < 1ns (nanosecond = 10 − 9 seconds) . Disk ◮ Seek time ≈ 10ms (ms = millisecond = 10 − 3 seconds) ◮ (Solid State Drives have “seek time” ≈ 0.1ms.) Result: 10 million instructions for each disk read! Hold on... How long does it take to read a 1TB (terrabyte = 10 12 bytes) disk? 1TB × 10ms = 10 billion seconds > 300 years? What’s wrong? Each disk read/write moves more than a byte. Sequential disk access is faster than Seek. 3 / 10

  4. Memory Blocks Each memory access to a slower level of the hierarchy fetches a block of data. Block Size Block Name CPU a few bytes word Cache 10s bytes cache line Main memory a few kilobytes page Disk A block is the contents of consecutive memory locations. So random access between levels of the hierarchy is very slow. 4 / 10

  5. Chopping Trees into Blocks Idea Store data for many adjacent nodes in consecutive memory locations. Result One memory block access provides keys to determine many (more than two) search directions. 5 / 10

  6. m -ary Search Tree 3 7 12 21 k < 3 3 < k < 7 21 < k 7 < k < 12 12 < k < 21 m -ary tree property ◮ Each node has ≤ m children Result: Complete m -ary tree with n nodes has height Θ(log m n ) Search tree property ◮ Each node has ≤ m − 1 search keys: k 1 < k 2 < k 3 . . . ◮ All keys k in i th subtree obey k i < k < k i +1 for i = 0 , 1 , . . . . Disk I/O’s (runtime) for find : 6 / 10

  7. B-Trees B-Trees of order m are specialized m -ary search trees: ◮ ALL leaves are at the same depth! ◮ Internal nodes have between ⌈ m / 2 ⌉ and m children, except the Root has between 2 and m children ◮ Leaves hold at most m − 1 keys Result ◮ Height is Θ(log m n ) ◮ Insert, delete, find visit Θ(log m n ) nodes ◮ m is chosen so that each (full) node fills one page of memory. Each node visit (disk I/O) retrieves between m / 2 and m keys. 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68 7 / 10

  8. B-Tree Nodes Internal node with i search keys m − 1 1 2 i left sibling · · · · · · right sibling k 1 k 2 k i ∅ ∅ ◮ i + 1 subtree pointers ◮ parent and left & right sibling pointers Each node may hold a different number of items. 8 / 10

  9. Making a B-Tree 3 3 14 insert(3) insert(14) the empty B-Tree M = 3 The root is a leaf. What happens when we now insert(1)? 9 / 10

  10. Splitting the Root 3 3 14 1 14 insert(1) Split the leaf Make a new root Move key 3 up Too many keys for one leaf! So, make a new leaf and create a parent (the new root) for both. Which key goes to the parent? 10 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend