b trees
play

BTrees [Bayer & McCreight, 1972] EMADS Fall 2003: BTrees 1 An - PowerPoint PPT Presentation

BTrees [Bayer & McCreight, 1972] EMADS Fall 2003: BTrees 1 An Application of BTrees Core indexing data structure in many database management systems TELSTRA, an Australian telecommunications company, maintains a customer database


  1. B–Trees [Bayer & McCreight, 1972] EMADS Fall 2003: B–Trees 1

  2. An Application of B–Trees Core indexing data structure in many database management systems TELSTRA, an Australian telecommunications company, maintains a customer database with 51.000.000.000 rows and 4.2 terabytes of data EMADS Fall 2003: B–Trees 2

  3. ( a, b ) –Trees and B–trees [Bayer & McCreight, 1972] 7 14 3 5 9 12 13 15 17 2 4 5 8 10 12 13 14 16 17 A tree is an ( a, b ) –tree if a ≥ 2 , b ≥ 2 a − 1 and Definition • All leaves have the same depth. • All internal nodes have degree at most b . • All internal nodes except the root have degree at least a . • The root has degree at least two. ( a, 2 a − 1) –trees are also denoted B–trees EMADS Fall 2003: B–Trees 3

  4. Properties of ( a, b ) –Trees 7 14 3 5 9 12 13 15 17 2 4 5 8 10 12 13 14 16 17 � � � (log n ) − 1 � log n N leaves implies ≤ height ≤ + 1 Lemma log b log a Searches require O (log a n ) I/Os if b = O ( B ) Lemma EMADS Fall 2003: B–Trees 4

  5. Updates in ( a, b ) –Trees • Search for location to insert or delete a leaf • Create/delete leaf and search key at the parent node • Rebalance using the following transformations Split � b +1 � b +1 b + 1 � � 2 2 Share a − 1 > a ≥ a a Fusion a a − 1 2 a − 1 EMADS Fall 2003: B–Trees 5

  6. Example : Insert into a (2,4)–Tree 7 14 3 5 9 12 13 15 17 2 4 5 8 10 12 13 14 16 17 ⇓ Insert(11) 7 12 14 3 5 9 11 13 15 17 2 4 5 8 10 11 12 13 14 16 17 EMADS Fall 2003: B–Trees 6

  7. Analysis of ( a, b ) –Trees – Insertions Only Theorem n insertions imply n/ ⌊ ( b + 1) / 2 ⌋ h splits at height h i.e. in total O ( n/b ) splits Proof • Nodes are created due to splits • All nodes except the root has degree at least ⌊ ( b + 1) / 2 ⌋ h • The number of nodes in the lowest level dominates all other levels ✷ EMADS Fall 2003: B–Trees 7

  8. Analysis of ( a, b ) –Trees If b ≥ 2 a , then i insertions and d deletions perform at Theorem most O ( δ h ( i + d )) splits and fusions at height h , where δ < 1 depends on a and b Amortization argument, each node has a potential φ Proof (sketch) (= measure of unbalancedness) φ 1 + δ 1 1 2 δ 1 δ 2 1 1 degree a − 1 a − 1 b + 1 α β ✷ 2 If b ≥ 2 a , then the total # splits and # fusions is O ( i + d ) . Theorem If b ≥ (2 + ε ) a , for some ε > 0 , the number of node splittings and node fusions is O ( 1 a ( i + d )) EMADS Fall 2003: B–Trees 8

  9. Analysis of ( a, b ) –Trees Theorem ( B/ 3 , B ) –trees perform Θ(1 /B ) rebalancing per update Theorem ( ⌊ B/ 2 ⌋ , B ) –trees perform Θ(1) rebalancing per update Theorem ( ⌈ B/ 2 ⌉ , B ) –trees perform Θ(log B N ) rebalancing per update if B odd EMADS Fall 2003: B–Trees 9

  10. Lower Bound for Searching Theorem Searching for an element among N elements in external memory requires Ω(log B +1 N ) I/Os Proof (sketch) • Adversary argument • Algorithm knows total order of stored elements • Initially all elements are candidates for being the query element • If prior to an I/O there are C candidate elements left, then there � � C − B candidates after reading B elements exists anwers leaving B +1 ✷ The lower bound holds even if an I/O can read B arbitrary Note elements from memory EMADS Fall 2003: B–Trees 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend