Massive Data Algorithmics Lecture 4: External Search Trees Massive - - PowerPoint PPT Presentation

massive data algorithmics
SMART_READER_LITE
LIVE PREVIEW

Massive Data Algorithmics Lecture 4: External Search Trees Massive - - PowerPoint PPT Presentation

Introduction Weight-balanced B-tree Persistent trees Massive Data Algorithmics Lecture 4: External Search Trees Massive Data Algorithmics Lecture 4: External Search Trees Introduction Range queries Weight-balanced B-tree 1D range queries


slide-1
SLIDE 1

Introduction Weight-balanced B-tree Persistent trees

Massive Data Algorithmics

Lecture 4: External Search Trees

Massive Data Algorithmics Lecture 4: External Search Trees

slide-2
SLIDE 2

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Database queries

A database query may ask for all employees with age between a1 and a2, and salary between s1 and s2

date of birth salary 19,500,000 19,559,999

  • G. Ometer

born: Aug 16, 1954 salary: $3,500

Massive Data Algorithmics Lecture 4: External Search Trees

slide-3
SLIDE 3

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Balanced binary search trees

A balanced binary search tree with the points in the leaves

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

Massive Data Algorithmics Lecture 4: External Search Trees

slide-4
SLIDE 4

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Balanced binary search trees

The search path for 25

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

Massive Data Algorithmics Lecture 4: External Search Trees

slide-5
SLIDE 5

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Balanced binary search trees

The search paths for 25 and for 90

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

Massive Data Algorithmics Lecture 4: External Search Trees

slide-6
SLIDE 6

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Example 1D range query

A 1-dimensional range query with [25, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

Massive Data Algorithmics Lecture 4: External Search Trees

slide-7
SLIDE 7

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Example 1D range query

A 1-dimensional range query with [61, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49 split node

Massive Data Algorithmics Lecture 4: External Search Trees

slide-8
SLIDE 8

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Node types for a query

Three types of nodes for a given query: White nodes: never visited by the query Grey nodes: visited by the query, unclear if they lead to

  • utput

Black nodes: visited by the query, whole subtree is

  • utput

Massive Data Algorithmics Lecture 4: External Search Trees

slide-9
SLIDE 9

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Examining 1D range queries

For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query

Massive Data Algorithmics Lecture 4: External Search Trees

slide-10
SLIDE 10

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Toward 2D range queries

For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate

Massive Data Algorithmics Lecture 4: External Search Trees

slide-11
SLIDE 11

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-12
SLIDE 12

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-13
SLIDE 13

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) data structure for searching on y-coordinate

Massive Data Algorithmics Lecture 4: External Search Trees

slide-14
SLIDE 14

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) (3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-15
SLIDE 15

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

2D range trees

Every internal node stores a whole tree in an associated structure, on y-coordinate Question: How much storage does this take?

Massive Data Algorithmics Lecture 4: External Search Trees

slide-16
SLIDE 16

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

2D range queries

ν µ µ′ p p p p

Massive Data Algorithmics Lecture 4: External Search Trees

slide-17
SLIDE 17

Introduction Weight-balanced B-tree Persistent trees Range queries 1D range queries 2D range queries

2D range queries

ν µ µ′

Massive Data Algorithmics Lecture 4: External Search Trees

slide-18
SLIDE 18

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Secondary Structures

When secondary structures used, a rebalance on v often requires O(w(v)) I/Os (w(v) is weight of v)

  • If Ω(w(v)) inserts have to be made below v between operations

⇒ O(1) amortized split bound ⇒ O(logB N) amortized insert bound Nodes in standard B-tree do not have this property

Massive Data Algorithmics Lecture 4: External Search Trees

slide-19
SLIDE 19

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

BB[α]-tree

In internal memory BB[α]-trees have the desired property Defined using weight-constraint

  • Ratio between weight of left child and weight of right child of a

node v is between α and 1−α (α < 1) ⇒ Height: O(logN) If 2/11 < α < 1−1/2 √ 2 rebalancing can be performed using rotations Seems hard to implement BB[α]-trees I/O-efficiently

Massive Data Algorithmics Lecture 4: External Search Trees

slide-20
SLIDE 20

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Weight-balanced B-tree

Idea: Combination of B-tree and BB[α]-tree

  • Weight constraint on nodes instead of degree constraint
  • Rebalancing performed using split/fuse as in B-tree

Weight-balanced B-tree with parameters b and k (b > 8, k ≥ 8)

  • All leaves on same level and

contain between k/4 and k elements

  • Internal node v at level l has

w(v) < blk

  • Except for the root, internal

node v at level l has w(v) > 1/4blk

  • The root has more than one

child

Massive Data Algorithmics Lecture 4: External Search Trees

slide-21
SLIDE 21

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Weight-balanced B-tree

Every internal node has degree between 1/4blk/bl−1k = 1/4b and blk/(1/4)bl−1k = 4b ⇒ Height: O(logb N/k) External memory:

  • Choose 4b = B (or even Bc for 0 < c ≤ 1)
  • k = B

⇒ O(N/B) space, O(logB N/B+T/B) query

Massive Data Algorithmics Lecture 4: External Search Trees

slide-22
SLIDE 22

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Weight-balanced B-tree Insert

Search for relevant leaf u and insert new element Traverse path from u to root:

  • If level l node v now has

w(v) = blk +1 then split into nodes v′ and v′′ with w(v′) ≥ ⌊1/2(blk +1)⌋−bl−1k and w(v′′) ≤ ⌈1/2(blk +1)⌉+bl−1k

Algorithm correct since bl−1k ≤ 1/8blk such that w(v′) ≥ 3/8blk and w(v′′) ≤ 5/8blk

  • touch O(logb N/k) nodes

Weight-balance property: Ω(blk) updates below v′ and v′′ before next rebalance operation

Massive Data Algorithmics Lecture 4: External Search Trees

slide-23
SLIDE 23

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Weight-balanced B-tree Delete

Search for relevant leaf u and insert new element Traverse path from u to root:

  • If level l node v now has

w(v) = 1/4blk −1 then fuse with sibling into nodes v′ with 2/4blk −1 ≤ w(v′) ≤ 5/4blk −1 If now w(v′) ≥ 7/8blk then split into nodes with weight ≥ 7/16blk−1−bl−1k ≥ 5/16blk−1 and ≤ 5/8blk +bl−1k ≤ 6/8blk

Algorithm correct and touch O(logb N/k) nodes Weight-balance property: Ω(blk) updates below v′ and v′′ before next rebalance operation

Massive Data Algorithmics Lecture 4: External Search Trees

slide-24
SLIDE 24

Introduction Weight-balanced B-tree Persistent trees Definition Insertion Deletion Summary

Summary/Conclusion: Weight-balanced B-tree

Weight-balanced B-tree with branching parameter b and leaf parameter k = Ω(B)

  • O(N/B) space
  • Height O(logb N/k)
  • O(logB N) rebalancing operations after update
  • Ω(w(v)) updates below v between consecutive operations on v

Weight-balanced B-tree with branching parameter Bc and leaf parameter B

  • Updates in O(logB N) and queries in O(logB N +T/B) I/Os

Construction bottom-up in O(N/BlogM/B N/B) I/O

Massive Data Algorithmics Lecture 4: External Search Trees

slide-25
SLIDE 25

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree

In some applications we are interested in being able to access previous versions of data structure

  • Databases
  • Geometric data structures (later)

Partial persistence:

  • Update current version (getting new version)
  • Query all versions

We would like to have partial persistent B-tree with

  • O(N/B) space N is number of updates performed
  • O(logB N) update
  • O(logB N +T/B) query in any version

Massive Data Algorithmics Lecture 4: External Search Trees

slide-26
SLIDE 26

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree

Easy way to make B-tree partial persistent

  • Copy structure at each operation
  • Maintain version-access structure (B-tree)

Good O(logB N) query in any version, but

  • O(N/B) I/Os update
  • O(N2/B) space

Massive Data Algorithmics Lecture 4: External Search Trees

slide-27
SLIDE 27

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree

Idea: Elements augmented with existence interval and stored in one structure Persistent B-tree with parameter b (> 16):

  • Directed graph

* Nodes contain elements augmented with existence interval * At any time t, nodes with elements alive at time t form B-tree with leaf and branching parameter b

  • B-tree with leaf and branching parameter b on indegree 0 node (roots)

If b = B: Query at any time t in O(logB N +T/B) I/Os

Massive Data Algorithmics Lecture 4: External Search Trees

slide-28
SLIDE 28

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree:Updates

Updates performed as in B-tree

alive block: containing at least one alive element at current version each alive block must contain at least 1/4B alive elements

To obtain linear space we maintain new-node invariant:

  • New node contains between 3/8B and 7/8B alive elements and no

dead elements

Massive Data Algorithmics Lecture 4: External Search Trees

slide-29
SLIDE 29

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow

  • Version split:

Mark u dead and create new node u′ with x alive element

  • If x > 7/8B: Strong overflow
  • If x < 3/8B: Strong underflow
  • If 3/8B ≤ x ≤ 7/8B then recursively persistently update parent(u):

Delete reference to u (dead ref.) and insert reference to u′ (alive ref.)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-30
SLIDE 30

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow

  • Version split:

Mark u dead and create new node u′ with x alive element

  • If x > 7/8B: Strong overflow
  • If x < 3/8B: Strong underflow
  • If 3/8B ≤ x ≤ 7/8B then recursively persistently update parent(u):

Delete reference to u (dead ref.) and insert reference to u′ (alive ref.)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-31
SLIDE 31

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow

  • Version split:

Mark u dead and create new node u′ with x alive element

  • If x > 7/8B: Strong overflow
  • If x < 3/8B: Strong underflow
  • If 3/8B ≤ x ≤ 7/8B then recursively persistently update parent(u):

Delete reference to u (dead ref.) and insert reference to u′ (alive ref.)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-32
SLIDE 32

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow

  • Version split:

Mark u dead and create new node u′ with x alive element

  • If x > 7/8B: Strong overflow
  • If x < 3/8B: Strong underflow
  • If 3/8B ≤ x ≤ 7/8B then recursively persistently update parent(u):

Delete reference to u (dead ref.) and insert reference to u′ (alive ref.)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-33
SLIDE 33

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow

  • Version split:

Mark u dead and create new node u′ with x alive element

  • If x > 7/8B: Strong overflow
  • If x < 3/8B: Strong underflow
  • If 3/8B ≤ x ≤ 7/8B then recursively persistently update parent(u):

Delete reference to u (dead ref.) and insert reference to u′ (alive ref.)

Massive Data Algorithmics Lecture 4: External Search Trees

slide-34
SLIDE 34

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Insert

Strong overflow (x > 7/8B)

  • Split v into u′ and u′′ with x/2 elements each (3/8B ≤ x ≤ 1/2B)
  • Recursively update parent(u):

Delete reference to u and insert reference to v′ and v′′

Strong underflow (x < 3/8B):

  • Merge x elements with y live elements obtained by version split on

sibling (1/2B ≤ x+y ≤ 11/8B)

  • If x+y > 7/8B then (strong overflow) perform split into nodes with

(x+y)/2 elements each (7/16B ≤ (x+y)/2 ≤ 11/16B)

  • Recursively update parent(u): Delete two insert one/two references

Massive Data Algorithmics Lecture 4: External Search Trees

slide-35
SLIDE 35

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Delete

Search for relevant leaf u and mark element dead If u contains x < 1/4B alive elements: Block underflow

  • Version split

Mark u dead and create new node u′ with x alive element

  • Strong underflow (x < 3/8B):

Merge (version split) and possibly split (strong overflow)

  • Recursively update parent(u):

Delete two references and insert one or two references

Massive Data Algorithmics Lecture 4: External Search Trees

slide-36
SLIDE 36

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Updates

Massive Data Algorithmics Lecture 4: External Search Trees

slide-37
SLIDE 37

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Persistent B-tree Analysis

Update: O(logB N)

  • Search and rebalance on one root-leaf path

Space: O(N/B)

  • At least 1/8B updates in leaf before re-balancing
  • 2∗N/(B/8) leaves created after N updates
  • The number of nodes created at l level up is bounded by

2l+1 ∗N/(B/8)l+1

  • ⇒ ∑2l+1 ∗N/(B/8)l+1 = O(N/B) blocks

Massive Data Algorithmics Lecture 4: External Search Trees

slide-38
SLIDE 38

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

Summary/Conclusion: Persistent B-tree

Persistent B-tree

  • Update current version
  • Query all versions

Efficient implementation obtained using existence intervals

  • Standard technique

During N operations

  • O(N/B) space
  • O(logB N) updates
  • O(logB N +T/B) query

Massive Data Algorithmics Lecture 4: External Search Trees

slide-39
SLIDE 39

Introduction Weight-balanced B-tree Persistent trees Definition Updates Analysis Summary

References

External Memory Geometric Data Structures Lecture notes by Lars Arge.

  • Section 3-4

Massive Data Algorithmics Lecture 4: External Search Trees