BTrees [Bayer & McCreight, 1972] EMADS Fall 2003: BTrees 1 An - - PowerPoint PPT Presentation

b trees
SMART_READER_LITE
LIVE PREVIEW

BTrees [Bayer & McCreight, 1972] EMADS Fall 2003: BTrees 1 An - - PowerPoint PPT Presentation

BTrees [Bayer & McCreight, 1972] EMADS Fall 2003: BTrees 1 An Application of BTrees Core indexing data structure in many database management systems TELSTRA, an Australian telecommunications company, maintains a customer database


slide-1
SLIDE 1

B–Trees

[Bayer & McCreight, 1972]

EMADS Fall 2003: B–Trees 1

slide-2
SLIDE 2

An Application of B–Trees

Core indexing data structure in many database management systems TELSTRA, an Australian telecommunications company, maintains a customer database with 51.000.000.000 rows and 4.2 terabytes of data

EMADS Fall 2003: B–Trees 2

slide-3
SLIDE 3

(a, b)–Trees and B–trees

[Bayer & McCreight, 1972]

17 16 14 13 12 10 5 8 4 2 5 3 17 15 7 14 12 9 13

Definition A tree is an (a, b)–tree if a ≥ 2, b ≥ 2a − 1 and

  • All leaves have the same depth.
  • All internal nodes have degree at most b.
  • All internal nodes except the root have degree at least a.
  • The root has degree at least two.

(a, 2a − 1)–trees are also denoted B–trees

EMADS Fall 2003: B–Trees 3

slide-4
SLIDE 4

Properties of (a, b)–Trees

17 16 14 13 12 10 5 8 4 2 5 3 17 15 7 14 12 9 13

Lemma N leaves implies

  • log n

log b

  • ≤ height ≤

(log n)−1

log a

  • + 1

Lemma Searches require O(loga n) I/Os if b = O(B)

EMADS Fall 2003: B–Trees 4

slide-5
SLIDE 5

Updates in (a, b)–Trees

  • Search for location to insert or delete a leaf
  • Create/delete leaf and search key at the parent node
  • Rebalance using the following transformations

Split Fusion Share b + 1 > a a − 1 a − 1 a 2a − 1

b+1

2

  • b+1

2

  • ≥ a

a

EMADS Fall 2003: B–Trees 5

slide-6
SLIDE 6

Example : Insert into a (2,4)–Tree

17 16 14 13 12 10 5 8 4 2 5 3 17 15 7 14 12 9 13

⇓ Insert(11)

4 2 5 3 5 8 10 12 13 14 16 17 17 15 13 9 7 14 12 11 11

EMADS Fall 2003: B–Trees 6

slide-7
SLIDE 7

Analysis of (a, b)–Trees – Insertions Only

Theorem n insertions imply n/ ⌊(b + 1)/2⌋h splits at height h i.e. in total O(n/b) splits Proof

  • Nodes are created due to splits
  • All nodes except the root has degree at least ⌊(b + 1)/2⌋h
  • The number of nodes in the lowest level dominates all other levels

EMADS Fall 2003: B–Trees 7

slide-8
SLIDE 8

Analysis of (a, b)–Trees

Theorem If b ≥ 2a, then i insertions and d deletions perform at most O(δh(i + d)) splits and fusions at height h, where δ < 1 depends

  • n a and b

Proof (sketch) Amortization argument, each node has a potential φ (= measure of unbalancedness)

b + 1 β

1 2

1 + δ1 δ2 a − 1

2

α φ degree

1 1

δ1 a − 1

✷ Theorem If b ≥ 2a, then the total # splits and # fusions is O(i + d). If b ≥ (2 + ε)a, for some ε > 0, the number of node splittings and node fusions is O( 1

a(i + d))

EMADS Fall 2003: B–Trees 8

slide-9
SLIDE 9

Analysis of (a, b)–Trees

Theorem (B/3, B)–trees perform Θ(1/B) rebalancing per update Theorem (⌊B/2⌋ , B)–trees perform Θ(1) rebalancing per update Theorem (⌈B/2⌉ , B)–trees perform Θ(logB N) rebalancing per update if B odd

EMADS Fall 2003: B–Trees 9

slide-10
SLIDE 10

Lower Bound for Searching

Theorem Searching for an element among N elements in external memory requires Ω(logB+1 N) I/Os Proof (sketch)

  • Adversary argument
  • Algorithm knows total order of stored elements
  • Initially all elements are candidates for being the query element
  • If prior to an I/O there are C candidate elements left, then there

exists anwers leaving

  • C−B

B+1

  • candidates after reading B elements

✷ Note The lower bound holds even if an I/O can read B arbitrary elements from memory

EMADS Fall 2003: B–Trees 10