Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - - PowerPoint PPT Presentation

hashing
SMART_READER_LITE
LIVE PREVIEW

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, - - PowerPoint PPT Presentation

Hashing Todays announcements HW3 out, due Nov 15, 23:59 MT2 Nov 7, 19:00-21:00 WOOD 2 PA2 due tonight, 23:59 Todays Plan B-Trees Warm up: Whats special about the following m -ary search tree? 17 3 8 28 48 1 2 6 7


slide-1
SLIDE 1

Hashing

Today’s announcements

◮ HW3 out, due Nov 15, 23:59 ◮ MT2 Nov 7, 19:00-21:00 WOOD 2 ◮ PA2 due tonight, 23:59

Today’s Plan

◮ B-Trees

Warm up: What’s special about the following m-ary search tree?

17 3 28 8 48 1 2 6 7 12 14 25 26 29 45 52 53 55 68 16

1 / 12

slide-2
SLIDE 2

B-Trees

B-Trees of order m are specialized m-ary search trees:

◮ ALL leaves are at the same depth! ◮ Internal nodes have between ⌈m/2⌉ and m children, except the

Root has between 2 and m children

◮ Leaves hold between ⌈m/2⌉ − 1 and m − 1 keys (except Root)

Result

◮ Height is Θ(logm n) ◮ Insert, remove, find visit Θ(logm n) nodes ◮ m is chosen so that each (full) node fills one page of memory.

Each node visit (disk I/O) retrieves between m/2 and m keys.

17 3 28 8 48 1 2 6 7 12 14 25 26 29 45 52 53 55 68 16

2 / 12

slide-3
SLIDE 3

B-Tree Nodes

Internal node with i search keys

k1 k2 ki · · · ∅ ∅ · · · 1 2 i m − 1 left sibling right sibling

◮ i + 1 subtree pointers ◮ parent and left & right sibling pointers

Each node may hold a different number of items.

3 / 12

slide-4
SLIDE 4

Making a B-Tree

3 the empty B-Tree m = 3 3 14 insert(3) insert(14)

The root is a leaf. What happens when we now insert(1)?

4 / 12

slide-5
SLIDE 5

Splitting the Root

1 3 14 insert(1) 14 Split the leaf Make a new root Move key 3 up 3

Too many keys for one leaf! So, make a new leaf and create a parent (the new root) for both. Which key goes to the parent?

5 / 12

slide-6
SLIDE 6

Splitting a Leaf

1 14 3 1 14 3 1 14 3 insert(26) insert(59) 59 59 26

insert(26) causes too many keys for the 14 59 leaf. So, make a new leaf and move the middle key up to the common parent.

6 / 12

slide-7
SLIDE 7

Propagating Splits

1 14 3 59 26 insert(5) Add a new leaf Move key 14 to parent There’s no room! 5 16 59 1 3 14 26 Split the internal node Add a new parent Move key 14 up 1 5 3 59 26 14 insert(16) 5 16 59 1 3 26 14

insert(16) causes too many keys for

5 14 leaf.

Move up key 14 causes too many keys for

3 26 node.

So, make a new internal node and move up the middle key.

7 / 12

slide-8
SLIDE 8

Insertion Algorithm

  • 1. Insert key1 in its leaf node X.
  • 2. While X has m keys: // overflow

◮ Split X into two nodes:

◮ Original holds the ⌊m/2⌋ smallest keys. ◮ New holds the ⌈m/2⌉ − 1 largest keys.

◮ Move the middle key up to parent and attach new child. ◮ If X is root, create new root and attach both children. Stop. ◮ Set X to be the parent.

See: https://people.ksp.sk/~kuko/gnarley-trees/ for animation.

1key means (key,value) pair for rest of algorithm 8 / 12

slide-9
SLIDE 9

Remove Algorithm

  • 1. Find node containing key to remove.
  • 2. If node is internal, swap key1 with predecessor (or successor)
  • 3. Remove key which is now at leaf called X.
  • 4. While node X has ⌈m/2⌉ − 2 keys, // underflow

◮ If a sibling has a spare key then move it up (smallest from

right sibling or largest from left sibling) & take down parent’s separator key & Stop.

◮ Merge with a sibling & take down parent’s separator key ◮ If parent is root

◮ If root now has 0 keys, remove root, make child the new root ◮ Stop.

Note: Merge never creates a node with too many keys. Why?

1key means (key,value) pair for rest of algorithm 9 / 12

slide-10
SLIDE 10

Thinking about B-Trees

◮ Delete is fast if leaf doesn’t underflow or we can take from a

  • sibling. Merging and propagation take more time.

◮ Insert is fast if leaf doesn’t overflow. (Could we give to a

sibling?) Splitting and propagation take more time.

◮ Propagation is rare if m is large (Why?) ◮ Repeated insertions and deletion can cause thrashing ◮ If m = 128, then a B-Tree of height 4 will store at least

30,000,000 items

◮ Range queries (i.e., findBetween(key1, key2)) are fast

because of sibling pointers.

10 / 12

slide-11
SLIDE 11

B-Trees in practice

Multiple B-Trees can index the same data records.

15 20 30 16 10 31 22 11 12 17 25 26 32 33 36 auk yak boa ewe gnu elk fox

  • wl

bee goa kit ant bat emu kea Employee ID Employee Name Name: auk ID: 16 Food: fish Name: yak ID: 22 Food: grass disk

11 / 12

slide-12
SLIDE 12

A Tree by Any Other Name...

◮ B-Trees with m = 3 are called 2-3 trees ◮ B-Trees with m = 4 are called 2-3-4 trees

Why would we ever use these?

12 / 12