Massive Data Algorithmics Lecture 3: External Search Trees Massive - - PowerPoint PPT Presentation

massive data algorithmics
SMART_READER_LITE
LIVE PREVIEW

Massive Data Algorithmics Lecture 3: External Search Trees Massive - - PowerPoint PPT Presentation

BST B-trees Summary Massive Data Algorithmics Lecture 3: External Search Trees Massive Data Algorithmics Lecture 3: External Search Trees BST Definition B-trees Blocking Summary Dynamic Binary search tree Standard method for search


slide-1
SLIDE 1

BST B-trees Summary

Massive Data Algorithmics

Lecture 3: External Search Trees

Massive Data Algorithmics Lecture 3: External Search Trees

slide-2
SLIDE 2

BST B-trees Summary Definition Blocking Dynamic

Binary search tree

Standard method for search among N elements We assume elements in leaves Search traces at least one root-leaf path If nodes stored arbitrarily on disk ⇒ Search in O(log2 N) I/Os ⇒ Range-search in O(log2 N +T) I/Os

Massive Data Algorithmics Lecture 3: External Search Trees

slide-3
SLIDE 3

BST B-trees Summary Definition Blocking Dynamic

BFS Blocking

Block height: O(log2 N)/O(log2 B) = O(logB N) Output elements blocked ⇒ Range-search in O(logB N +T/B) I/Os Optimal: O(N/B) space and O(logB N +T/B) query

Massive Data Algorithmics Lecture 3: External Search Trees

slide-4
SLIDE 4

BST B-trees Summary Definition Blocking Dynamic

Updating

Maintaining BFS blocking during updates?

  • Balance normally maintained in search trees using rotations

Seems very difficult to maintain BFS blocking during rotation

  • Also need to make sure output (leaves) is blocked!

Massive Data Algorithmics Lecture 3: External Search Trees

slide-5
SLIDE 5

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

B-trees

BFS-blocking naturally corresponds to tree with fan-out θ(B) B-trees balanced by allowing node degree to vary

  • Re-balancing performed by splitting and merging nodes

Massive Data Algorithmics Lecture 3: External Search Trees

slide-6
SLIDE 6

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

(a,b)-Trees

T is an (a,b)-tree (a ≥ 2 and b ≥ 2a−1)

All leaves on the same level and contain between a and b elements Except for the root, all nodes have degree between a and b Root has degree between 2 and b

(a,b)-tree uses linear space and has height O(loga N) Choosing a,b = Θ(B), each node/leaf stored in one disk block O(N/B) space and O(logB N +T/B) query

Massive Data Algorithmics Lecture 3: External Search Trees

slide-7
SLIDE 7

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

(a,b)-Trees Insert

Search and insert element in leaf v DO v has b+1 elements/children

make nodes v and v with ⌊(b+1)/2⌋ and ⌈(b+1)/2⌉ elements insert element (ref) in parent(v) (make new root if necessary)

v = parent(v) Insert touch O(loga N) nodes

Massive Data Algorithmics Lecture 3: External Search Trees

slide-8
SLIDE 8

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Insert

Massive Data Algorithmics Lecture 3: External Search Trees

slide-9
SLIDE 9

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Insert

Massive Data Algorithmics Lecture 3: External Search Trees

slide-10
SLIDE 10

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Insert

Massive Data Algorithmics Lecture 3: External Search Trees

slide-11
SLIDE 11

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Insert

Massive Data Algorithmics Lecture 3: External Search Trees

slide-12
SLIDE 12

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

(a,b)-Trees Deletion

Search and delete element from leaf v DO v has a−1 elements/children

Fuse v with sibling v”:

  • move children of v” to v
  • delete element (ref) from parent(v)

(delete root if necessary) If v has > b (and ≤ a+b−1 < 2b) children split v

v = parent(v) Delete touch O(loga N) nodes

Massive Data Algorithmics Lecture 3: External Search Trees

slide-13
SLIDE 13

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Delete

Massive Data Algorithmics Lecture 3: External Search Trees

slide-14
SLIDE 14

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Delete

Massive Data Algorithmics Lecture 3: External Search Trees

slide-15
SLIDE 15

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Delete

Massive Data Algorithmics Lecture 3: External Search Trees

slide-16
SLIDE 16

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Delete

Massive Data Algorithmics Lecture 3: External Search Trees

slide-17
SLIDE 17

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

Example: (2,4)-Tree Delete

Massive Data Algorithmics Lecture 3: External Search Trees

slide-18
SLIDE 18

BST B-trees Summary Definition (a,b)-Trees Insertion Deletion Properties

(a,b)-Trees Properties

If b = 2a−1 every update can cause many re-balancing operations If b ≥ 2a update only cause O(1) re-balancing operations amortized If b > 2a only O(1/(b/2−a)) = O(1/a) re-balancing operations amortized *Both somewhat hard to show If b=4a easy to show that update causes O(1/aloga N) re-balance

  • perations amortized

* After split during insert a leaf contains ∼ = 4a/2 = 2a elements * After fuse during delete a leaf contains between ∼ = 2a and ∼ = 5a elements (split if more than 3a ⇒ between 3/2a and 5/2a)

Massive Data Algorithmics Lecture 3: External Search Trees

slide-19
SLIDE 19

BST B-trees Summary

Summary and Conclusion: B-trees

B-trees: (a,b)-trees with a,b = Θ(B)

  • O(N/B) space
  • O(logB N +T/B) query
  • O(logB N) update

B-trees with elements in the leaves sometimes called B+-trees Construction in O(N/BlogM/B N/B) I/Os

  • Sort elements and construct leaves
  • Build tree level-by-level bottom-up

Massive Data Algorithmics Lecture 3: External Search Trees

slide-20
SLIDE 20

BST B-trees Summary

Summary and Conclusion: B-trees

B-tree with branching parameter b and leaf parameter k (b,k ≥ 8)

  • All leaves on same level and contain between 1/4k and k elements
  • Except for the root, all nodes have degree between 1/4b and b
  • Root has degree between 2 and b

B-tree with leaf parameter k = Ω(B)

  • O(N/B) space
  • Height O(logb N/B)
  • O(1/k) amortized leaf rebalance operations
  • O(1/(bk)logb N/B) amortized internal node rebalance operations

B-tree with branching parameter Bc, 0 < c ≤ 1, and leaf parameter B

  • Space O(N/B), updates O(logB N) , queries O(logB N +T/B)

Massive Data Algorithmics Lecture 3: External Search Trees

slide-21
SLIDE 21

BST B-trees Summary

References

External Memory Geometric Data Structures Lecture notes by Lars Arge.

  • Section 1-3

Massive Data Algorithmics Lecture 3: External Search Trees