hashing
play

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - PowerPoint PPT Presentation

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, 19:00-21:00 WOOD 2 PA2 due tonight, 23:59 Todays Plan B-Trees Warm up: Whats special about the following m -ary search tree? 17 3 8 28 48 1 2 6 7


  1. Hashing Today’s announcements ◮ HW3 out, due Nov 15, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 ◮ PA2 due tonight, 23:59 Today’s Plan ◮ B-Trees Warm up: What’s special about the following m -ary search tree? 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68 1 / 12

  2. B-Trees B-Trees of order m are specialized m -ary search trees: ◮ ALL leaves are at the same depth! ◮ Internal nodes have between ⌈ m / 2 ⌉ and m children, except the Root has between 2 and m children ◮ Leaves hold between ⌈ m / 2 ⌉ − 1 and m − 1 keys (except Root) Result ◮ Height is Θ(log m n ) ◮ Insert, remove, find visit Θ(log m n ) nodes ◮ m is chosen so that each (full) node fills one page of memory. Each node visit (disk I/O) retrieves between m / 2 and m keys. 17 3 8 28 48 1 2 6 7 12 14 16 25 26 29 45 52 53 55 68 2 / 12

  3. B-Tree Nodes Internal node with i search keys m − 1 1 2 i left sibling · · · · · · right sibling k 1 k 2 k i ∅ ∅ ◮ i + 1 subtree pointers ◮ parent and left & right sibling pointers Each node may hold a different number of items. 3 / 12

  4. Making a B-Tree 3 3 14 insert(3) insert(14) the empty B-Tree m = 3 The root is a leaf. What happens when we now insert(1)? 4 / 12

  5. Splitting the Root 3 3 14 1 14 insert(1) Split the leaf Make a new root Move key 3 up Too many keys for one leaf! So, make a new leaf and create a parent (the new root) for both. Which key goes to the parent? 5 / 12

  6. Splitting a Leaf 3 3 insert(59) insert(26) 1 14 1 14 59 3 26 1 14 59 insert(26) causes too many keys for the 14 59 leaf. So, make a new leaf and move the middle key up to the common parent. 6 / 12

  7. Propagating Splits insert(5) insert(16) 3 26 3 26 14 3 26 1 14 59 1 5 14 59 1 5 16 59 Add a new leaf Move key 14 to parent There’s no room! 14 3 26 Split the internal node Add a new parent 1 5 16 59 Move key 14 up 14 leaf. insert(16) causes too many keys for 5 Move up key 14 causes too many keys for 26 node. 3 So, make a new internal node and move up the middle key. 7 / 12

  8. Insertion Algorithm 1. Insert key 1 in its leaf node X . 2. While X has m keys: // overflow ◮ Split X into two nodes: ◮ Original holds the ⌊ m / 2 ⌋ smallest keys. ◮ New holds the ⌈ m / 2 ⌉ − 1 largest keys. ◮ Move the middle key up to parent and attach new child. ◮ If X is root, create new root and attach both children. Stop. ◮ Set X to be the parent. See: https://people.ksp.sk/~kuko/gnarley-trees/ for animation. 1 key means (key,value) pair for rest of algorithm 8 / 12

  9. Remove Algorithm 1. Find node containing key to remove. 2. If node is internal, swap key 1 with predecessor (or successor) 3. Remove key which is now at leaf called X . 4. While node X has ⌈ m / 2 ⌉ − 2 keys, // underflow ◮ If a sibling has a spare key then move it up (smallest from right sibling or largest from left sibling) & take down parent’s separator key & Stop. ◮ Merge with a sibling & take down parent’s separator key ◮ If parent is root ◮ If root now has 0 keys, remove root, make child the new root ◮ Stop. Note: Merge never creates a node with too many keys. Why? 1 key means (key,value) pair for rest of algorithm 9 / 12

  10. Thinking about B-Trees ◮ Delete is fast if leaf doesn’t underflow or we can take from a sibling. Merging and propagation take more time. ◮ Insert is fast if leaf doesn’t overflow. (Could we give to a sibling?) Splitting and propagation take more time. ◮ Propagation is rare if m is large (Why?) ◮ Repeated insertions and deletion can cause thrashing ◮ If m = 128, then a B-Tree of height 4 will store at least 30,000,000 items ◮ Range queries (i.e., findBetween(key1, key2) ) are fast because of sibling pointers. 10 / 12

  11. B-Trees in practice Multiple B-Trees can index the same data records. Employee Name bat emu kea ant auk bee boa elk ewe fox gnu goa kit owl yak Employee ID 15 20 30 10 11 12 16 17 22 25 26 31 32 33 36 disk Name: auk Name: yak ID: 16 ID: 22 Food: fish Food: grass 11 / 12

  12. A Tree by Any Other Name... ◮ B-Trees with m = 3 are called 2-3 trees ◮ B-Trees with m = 4 are called 2-3-4 trees Why would we ever use these? 12 / 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend