Massive Data Algorithmics Lecture 7: Range Searching Massive Data - - PowerPoint PPT Presentation

massive data algorithmics
SMART_READER_LITE
LIVE PREVIEW

Massive Data Algorithmics Lecture 7: Range Searching Massive Data - - PowerPoint PPT Presentation

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Massive Data Algorithmics Lecture 7: Range Searching Massive Data Algorithmics Lecture 7: Range Searching Three-Sided Range Queries Internal Priority


slide-1
SLIDE 1

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree

Massive Data Algorithmics

Lecture 7: Range Searching

Massive Data Algorithmics Lecture 7: Range Searching

slide-2
SLIDE 2

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree

Three-Sided Range Queries

Interval management: 1.5 dimensional search More general 2d problem: Dynamic 3-sidede range searching

  • Maintain set of points in plane

such that given query (q1,q2,q3), all points (x,y) with q1 ≤ x ≤ q2 and y ≥ q3 can be found efficiently

Massive Data Algorithmics Lecture 7: Range Searching

slide-3
SLIDE 3

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree

Three-Sided Range Queries: Static Solution

Static solution:

  • Sweep top-down inserting x in

persistent B-tree at (x,y)

  • Answer query by performing range

query with [q1,q2] in B-tree at q3

Optimal:

  • O(N/B) space
  • O(logBN +T/B) query
  • O(N/BlogM/B N/B) construction

Dynamic? in internal memory: priority search tree

Massive Data Algorithmics Lecture 7: Range Searching

slide-4
SLIDE 4

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree

Base tree on x-coordinates with nodes augmented with points Heap on y-coordinates:

  • Decreasing y values on root-leaf path
  • (x,y) on path from root to leaf holding x
  • If v holds point then parent(v) holds point

Massive Data Algorithmics Lecture 7: Range Searching

slide-5
SLIDE 5

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Insert

Linear space Insert of (x,y) (assuming fixed x-coordinate set):

  • Compare y with y-coordinate in root
  • Smaller: Recursively insert (x,y) in subtree on path to x
  • Bigger: Insert in root and recursively insert old point in subtree

⇒ O(logN) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-6
SLIDE 6

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Insert

Linear space Insert of (x,y) (assuming fixed x-coordinate set):

  • Compare y with y-coordinate in root
  • Smaller: Recursively insert (x,y) in subtree on path to x
  • Bigger: Insert in root and recursively insert old point in subtree

⇒ O(logN) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-7
SLIDE 7

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Insert

Linear space Insert of (x,y) (assuming fixed x-coordinate set):

  • Compare y with y-coordinate in root
  • Smaller: Recursively insert (x,y) in subtree on path to x
  • Bigger: Insert in root and recursively insert old point in subtree

⇒ O(logN) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-8
SLIDE 8

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Insert

Linear space Insert of (x,y) (assuming fixed x-coordinate set):

  • Compare y with y-coordinate in root
  • Smaller: Recursively insert (x,y) in subtree on path to x
  • Bigger: Insert in root and recursively insert old point in subtree

⇒ O(logN) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-9
SLIDE 9

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Insert

Linear space Insert of (x,y) (assuming fixed x-coordinate set):

  • Compare y with y-coordinate in root
  • Smaller: Recursively insert (x,y) in subtree on path to x
  • Bigger: Insert in root and recursively insert old point in subtree

⇒ O(logN) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-10
SLIDE 10

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-11
SLIDE 11

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-12
SLIDE 12

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-13
SLIDE 13

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-14
SLIDE 14

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-15
SLIDE 15

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Definition Insert Query

Internal Priority Search Tree: Query

Query with (q1,q2,q3) starting at root v:

  • Report point in v if satisfying query
  • Visit both children of v if point reported
  • Always visit child(s) of v on path(s) to q1 and q2

⇒ O(logN +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-16
SLIDE 16

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Externalizing Priority Search Tree

Natural idea: Block tree Problem:

  • O(logB N) I/Os to follow paths to to q1 and q2
  • But O(T) I/Os may be used to visit other nodes (”overshooting”)

⇒ O(logB N +T) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-17
SLIDE 17

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Externalizing Priority Search Tree

Solution idea:

  • Store B points in each node:

* O(B2) points stored in each supernode * B output points can pay for overshooting

  • Bootstrapping:

* Store O(B2) points in each supernode in static structure

Massive Data Algorithmics Lecture 7: Range Searching

slide-18
SLIDE 18

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Base Tree

Base tree: Weight-balanced B-tree with branching parameter B/4 and leaf parameter B on x-coordinates Points in heap order:

  • Root stores B top points for each of the Θ(B) child slabs
  • Remaining points stored recursively

Points in each node stored in B2-structure

  • Persistent B-tree structure for static problem

⇒ Linear space

Massive Data Algorithmics Lecture 7: Range Searching

slide-19
SLIDE 19

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Answering Queries

Query with (q1,q2,q3) starting at root v:

  • Query B2-structure and report points satisfying query
  • Visit child v if

* v on path to q1 or q2 * All points corresponding to v satisfy query

Massive Data Algorithmics Lecture 7: Range Searching

slide-20
SLIDE 20

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Answering Queries

Query with (q1,q2,q3) starting at root v:

  • Query B2-structure and report points satisfying query
  • Visit child v if

* v on path to q1 or q2 * All points corresponding to v satisfy query

Massive Data Algorithmics Lecture 7: Range Searching

slide-21
SLIDE 21

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Answering Queries

Query with (q1,q2,q3) starting at root v:

  • Query B2-structure and report points satisfying query
  • Visit child v if

* v on path to q1 or q2 * All points corresponding to v satisfy query

Massive Data Algorithmics Lecture 7: Range Searching

slide-22
SLIDE 22

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Query Analysis

Analysis:

  • O(logB B2 +Tv/B) = O(1+Tv/B) I/Os used to visit node v
  • O(logB N) nodes on path to q1 or q2
  • For each node v not on path to q1 or q2 visited, B points reported in

parent(v)

⇒ O(logB N +T/B)

Massive Data Algorithmics Lecture 7: Range Searching

slide-23
SLIDE 23

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Updates

Insert (x,y) (ignoring insert in base tree - rebalancing):

  • Find relevant node u:

* Query B2-structure to find B points in root corresponding to node u on path to x * If y smaller than y-coordinates of all B points then recursively search in u

  • Insert (x,y) in B2-structure of v
  • If B2-structure contains > B points for child u, remove lowest point

and insert recursively in u

Delete: Similarly

Massive Data Algorithmics Lecture 7: Range Searching

slide-24
SLIDE 24

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Updates

Insert (x,y) (ignoring insert in base tree - rebalancing):

  • Find relevant node u:

* Query B2-structure to find B points in root corresponding to node u on path to x * If y smaller than y-coordinates of all B points then recursively search in u

  • Insert (x,y) in B2-structure of v
  • If B2-structure contains > B points for child u, remove lowest point

and insert recursively in u

Delete: Similarly

Massive Data Algorithmics Lecture 7: Range Searching

slide-25
SLIDE 25

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Updates

Insert (x,y) (ignoring insert in base tree - rebalancing):

  • Find relevant node u:

* Query B2-structure to find B points in root corresponding to node u on path to x * If y smaller than y-coordinates of all B points then recursively search in u

  • Insert (x,y) in B2-structure of v
  • If B2-structure contains > B points for child u, remove lowest point

and insert recursively in u

Delete: Similarly

Massive Data Algorithmics Lecture 7: Range Searching

slide-26
SLIDE 26

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Updates

Insert (x,y) (ignoring insert in base tree - rebalancing):

  • Find relevant node u:

* Query B2-structure to find B points in root corresponding to node u on path to x * If y smaller than y-coordinates of all B points then recursively search in u

  • Insert (x,y) in B2-structure of v
  • If B2-structure contains > B points for child u, remove lowest point

and insert recursively in u

Delete: Similarly

Massive Data Algorithmics Lecture 7: Range Searching

slide-27
SLIDE 27

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Updates

Insert (x,y) (ignoring insert in base tree - rebalancing):

  • Find relevant node u:

* Query B2-structure to find B points in root corresponding to node u on path to x * If y smaller than y-coordinates of all B points then recursively search in u

  • Insert (x,y) in B2-structure of v
  • If B2-structure contains > B points for child u, remove lowest point

and insert recursively in u

Delete: Similarly

Massive Data Algorithmics Lecture 7: Range Searching

slide-28
SLIDE 28

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Update Analysis

Analysis:

  • Update visits O(logB N) nodes
  • B2-structure queried/updated in each node

* One query * One insert and one delete

B2-structure analysis:

  • Query: O(logB B2 +B/B) = O(1) I/Os
  • Update: O(1) using global rebuilding

* Store updates in update block * Rebuild after B updates using O(B2/BlogM/B BB2/B) = O(B) I/Os

⇒ O(logB N) update

Massive Data Algorithmics Lecture 7: Range Searching

slide-29
SLIDE 29

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Dynamic Base Tree

Deletion:

  • Delete point as previously
  • Delete x-coordinate from base tree using

global rebuilding

⇒ O(logB N) I/Os Insertion:

  • Insert x-coordinate in base tree and

rebalance (using splits)

  • Insert point as previously

Split: Boundary in v becomes boundary in parent(v)

Massive Data Algorithmics Lecture 7: Range Searching

slide-30
SLIDE 30

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Dynamic Base Tree

Split: When v splits B new points needed in parent(v) One point obtained from v′ (v′′) using bubble-up operation:

  • Find top point p in v′
  • Insert p in B2-structure
  • Remove p from B2-structure of v′
  • Recursively bubble-up point to v′

⇒ O(logB N) I/Os Bubble-up in O(logB w(v)) I/Os

  • Follow one path from v to leaf
  • Uses O(1) I/O in each node

⇒ Split in O(BlogB w(v)) = O(w(v)) I/Os

Massive Data Algorithmics Lecture 7: Range Searching

slide-31
SLIDE 31

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Dynamic Base Tree

O(1) amortized split cost:

  • Cost: O(w(v))
  • Weight balanced base tree: Ω(w(v)) inserts below v between splits

⇓ External Priority Search Tree

  • Space: O(N/B)
  • Query: O(logB N +T/B)

Update: O(logB N) I/Os amortized

Massive Data Algorithmics Lecture 7: Range Searching

slide-32
SLIDE 32

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

Summary/Conclusion: Range Search

We have now discussed structures for special cases of two-dimensional range searching

  • Space: O(N/B)
  • Query: O(logB N +T/B)
  • Updates: O(logB N)

Cannot be obtained for general (4-sided) 2d range searching:

  • O(logc

B N) query requires Ω( N B logB N logB logB N ) space

  • O(N/B) space requires Ω(
  • N/B) query

Massive Data Algorithmics Lecture 7: Range Searching

slide-33
SLIDE 33

Three-Sided Range Queries Internal Priority Search Tree Externalizing Priority Search Tree Ideas Base tree Query Update

References

External Memory Geometric Data Structures Lecture notes by Lars Arge.

  • Section 7

Massive Data Algorithmics Lecture 7: Range Searching