1/180
Geometric Algorithms Range & windowing queries (2 lectures) - - PowerPoint PPT Presentation
Geometric Algorithms Range & windowing queries (2 lectures) - - PowerPoint PPT Presentation
Range & windowing queries 1/180 Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G. Ometer born: Aug 16, 1954 salary salary: $3,500 A database query may ask for all employees with age between a 1
2/180
Database queries
A database query may ask for all employees with age between a1 and a2, and salary between s1 and s2
date of birth salary 19,500,000 19,559,999
- G. Ometer
born: Aug 16, 1954 salary: $3,500
3/180
Data structures
Idea of data structures
◮ Representation of structure, for convenience (like DCEL) ◮ Preprocessing of data, to be able to solve future
questions really fast (sub-linear time) A (search) data structure has a storage requirement, a query time, and a construction time (and an update time)
4/180
1D range query problem
1D range query problem: Preprocess a set of n points on the real line such that the ones inside a 1D query range (interval) can be answered fast The points p1,..., pn are known beforehand, the query [x,x′]
- nly later
A solution to a query problem is a data structure, a query algorithm, and a construction algorithm Question: What are the most important factors for the efficiency of a solution?
5/180
Balanced binary search trees
A balanced binary search tree with the points in the leaves
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49
6/180
Balanced binary search trees
The search path for 25
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49
7/180
Balanced binary search trees
The search paths for 25 and for 90
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49
8/180
Example 1D range query
A 1-dimensional range query with [25, 90]
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49
9/180
Node types for a query
Three types of nodes for a given query:
◮ White nodes: never visited by the query ◮ Grey nodes: visited by the query, unclear if they lead to
- utput
◮ Black nodes: visited by the query, whole subtree is
- utput
Question: What query time do we hope for?
10/180
Node types for a query
The query algorithm comes down to what we do at each type
- f node
Grey nodes: use query range to decide how to proceed: to not visit a subtree (pruning), to report a complete subtree, or just continue Black nodes: traverse and enumerate all points in the leaves
11/180
Example 1D range query
A 1-dimensional range query with [61, 90]
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49
12/180
Example 1D range query
A 1-dimensional range query with [61, 90]
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49 split node
13/180
1D range query algorithm
Algorithm 1DRANGEQUERY(T,[x : x′]) 1. νsplit ←FINDSPLITNODE(T,x,x′) 2. if νsplit is a leaf 3. then Check if the point in νsplit must be reported. 4. else ν ← lc(νsplit) 5. while ν is not a leaf 6. do if x ≤ xν 7. then REPORTSUBTREE(rc(ν)) 8. ν ← lc(ν) 9. else ν ← rc(ν) 10. Check if the point stored in ν must be reported. 11. Similarly, follow the path to x′, and ...
14/180
Query time analysis
The efficiency analysis is based on counting the numbers of nodes visited for each type
◮ White nodes: never visited by the query; no time spent ◮ Grey nodes: visited by the query, unclear if they lead to
- utput; time determines dependency on n
◮ Black nodes: visited by the query, whole subtree is
- utput; time determines dependency on k, the output
size
15/180
Query time analysis
Grey nodes: they occur on only two paths in the tree, and since the tree is balanced, its depth is O(logn) Black nodes: a (sub)tree with m leaves has m−1 internal nodes; traversal visits O(m) nodes and finds m points for the
- utput
The time spent at each node is O(1) ⇒ O(logn+k) query time
16/180
Storage requirement and preprocessing
A (balanced) binary search tree storing n points uses O(n) storage A balanced binary search tree storing n points can be built in O(n) time after sorting
17/180
Result
Theorem: A set of n points on the real line can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 1D range query can be answered in O(logn+k) time, where k is the number of answers reported
18/180
Range queries in 2D
18/180
Range queries in 2D
quadtrees: good in practice, but not so good worst-case query time Kd-trees: queries take O(√n+k) (Chapter 5.2) range trees: today
19/180
Back to 1D range queries
A 1-dimensional range query with [61, 90]
3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49 split node
20/180
Examining 1D range queries
Observation: Ignoring the search path leaves, all answers are jointly represented by the highest nodes strictly between the two search paths Question: How many highest nodes between the search paths can there be?
21/180
Examining 1D range queries
For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query
22/180
Toward 2D range queries
For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate
23/180
Toward 2D range queries
(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)
24/180
Toward 2D range queries
(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)
25/180
Toward 2D range queries
(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) data structure for searching on y-coordinate
26/180
Toward 2D range queries
(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) (3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)
27/180
2D range trees
Every internal node stores a whole tree in an associated structure, on y-coordinate Question: How much storage does this take?
28/180
Storage of 2D range trees
To analyze storage, two arguments can be used:
◮ By level: On each level, any point is stored exactly once.
So all associated trees on one level together have O(n) size
◮ By point: For any point, it is stored in the associated
structures of its search path. So it is stored in O(logn) of them
29/180
Construction algorithm
Algorithm BUILD2DRANGETREE(P) 1. Construct the associated structure: Build a binary search tree Tassoc on the set P
y of y-coordinates in P
2. if P contains only one point 3. then Create a leaf ν storing this point, and make Tassoc the associated structure of ν. 4. else Split P into P
left and P right, the subsets ≤ and >
the median x-coordinate xmid 5. νleft ← BUILD2DRANGETREE(P
left)
6. νright ← BUILD2DRANGETREE(P
right)
7. Create a node ν storing xmid, make νleft the left child of ν, make νright the right child of ν, and make Tassoc the associated structure of ν 8. return ν
30/180
Efficiency of construction
The construction algorithm takes O(nlog2 n) time T(1) = O(1) T(n) = 2·T(n/2)+O(nlogn) which solves to O(nlog2 n) time
31/180
Efficiency of construction
Suppose we pre-sort P on y-coordinate, and whenever we split P into P
left and P right, we keep the y-order in both subsets
For a sorted set, the associated structure can be built in linear time
32/180
Efficiency of construction
The adapted construction algorithm takes O(nlogn) time T(1) = O(1) T(n) = 2·T(n/2)+O(n) which solves to O(nlogn) time
33/180
2D range queries
How are queries performed and why are they correct?
◮ Are we sure that each answer is found? ◮ Are we sure that the same point is found only once?
34/180
2D range queries
ν µ µ′ p p p p
35/180
Query algorithm
Algorithm 2DRANGEQUERY(T,[x : x′]×[y : y′]) 1. νsplit ←FINDSPLITNODE(T,x,x′) 2. if νsplit is a leaf 3. then report the point stored at νsplit, if an answer 4. else ν ← lc(νsplit) 5. while ν is not a leaf 6. do if x ≤ xν 7. then 1DRANGEQ(Tassoc(rc(ν)),[y : y′]) 8. ν ← lc(ν) 9. else ν ← rc(ν) 10. Check if the point stored at ν must be reported. 11. Similarly, follow the path from rc(νsplit) to x′ ...
36/180
2D range query time
Question: How much time does a 2D range query take? Subquestions: In how many associated structures do we search? How much time does each such search take?
37/180
2D range queries
ν µ µ′
38/180
2D range query efficiency
We search in O(logn) associated structures to perform a 1D range query; at most two per level of the main tree The query time is O(logn)×O(logm+k′), or
∑
ν
O(lognν +kν) where ∑kν = k the number of points reported
39/180
2D range query efficiency
Use the concept of grey and black nodes again:
40/180
2D range query efficiency
The number of grey nodes is O(log2 n) The number of black nodes is O(k) if k points are reported The query time is O(log2 n+k), where k is the size of the
- utput
41/180
Result
Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(nlogn) size so that any 2D range query can be answered in O(log2 n+k) time, where k is the number of answers reported In contrast, a kd-tree has O(n) size and answers queries in O(√n+k) time (Chapter 5.2).
42/180
Higher dimensional range trees
A d-dimensional range tree has a main tree which is a
- ne-dimensional balanced
binary search tree on the first coordinate, where every node has a pointer to an associated structure that is a (d −1)-dimensional range tree
- n the other coordinates
43/180
Storage
The size Sd(n) of a d-dimensional range tree satisfies: S1(n) = O(n) for all n Sd(1) = O(1) for all d Sd(n) ≤ 2·Sd(n/2)+Sd−1(n) for d ≥ 2 This solves to Sd(n) = O(nlogd n)
44/180
Query time
The number of grey nodes Gd(n) satisfies: G1(n) = O(logn) for all n Gd(1) = O(1) for all d Gd(n) ≤ 2·logn+2·logn·Gd−1(n) for d ≥ 2 This solves to Gd(n) = O(logd n)
45/180
Result
Theorem: A set of n points in d-dimensional space can be preprocessed in O(nlogd−1 n) time into a data structure of O(nlogd−1 n) size so that any d-dimensional range query can be answered in O(logd n+k) time, where k is the number of answers reported
46/180
Improving the query time
We can improve the query time of a 2D range tree from O(log2 n) to O(logn) by a technique called fractional cascading This automatically lowers the query time in d dimensions to O(logd−1 n) time
47/180
Improving the query time
The idea illustrated best by a different query problem: Suppose that we have a collection of sets S1,...,Sm, where |S1| = n and where Si+1 ⊆ Si We want a data structure that can report for a query number x, the smallest value ≥ x in all sets S1,...,Sm
48/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21
S1 S2 S3 S4
49/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21
S1 S2 S3 S4
50/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21
S1 S2 S3 S4
51/180
Improving the query time
Suppose that we have a collection of sets S1,...,Sm, where |S1| = n and where Si+1 ⊆ Si We want a data structure that can report for a query number x, the smallest value ≥ x in all sets S1,...,Sm This query problem can be solved in O(logn+m) time instead
- f O(m·logn) time
52/180
Improving the query time
Can we do something similar for m 1-dimensional range queries on m sets S1,...,Sm? We hope to get a query time of O(logn+m+k) with k the total number of points reported
53/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21
S1 S2 S3 S4
54/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21
S1 S2 S3 S4
55/180
Improving the query time
1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21 [6,35]
S1 S2 S3 S4
56/180
Fractional cascading
Now we do “the same” on the associated structures of a 2-dimensional range tree Note that in every associated structure, we search with the same values y and y′
◮ Replace all associated structure except for the root by a
linked list
◮ For every list element (and leaf of the associated
structure of the root), store two pointers to the appropriate list elements in the lists of the left child and
- f the right child
57/180
Fractional cascading
58/180
Fractional cascading
59/180
Fractional cascading
(2, 19) (5, 80) (7, 10) (8, 37) (12, 3) (15, 99) (17, 62) (21, 49) (33, 30) (41, 95) (52, 23) (58, 59) (67, 89) (93, 70) 2 5 7 8 12 15 17 21 33 41 52 58 67 17 8 15 5 7 12 21 41 33 52 58 67 93 2
60/180
Fractional cascading
3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99
61/180
Fractional cascading
(2, 19) (5, 80) (7, 10) (8, 37) (12, 3) (15, 99) (17, 62) (21, 49) (33, 30) (41, 95) (52, 23) (58, 59) (67, 89) (93, 70) 2 5 7 8 12 15 17 21 33 41 52 58 67 17 8 15 5 7 12 21 41 33 52 58 67 93 2
[4, 58] × [19, 65]
62/180
Fractional cascading
3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99
63/180
Fractional cascading
3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99
64/180
Fractional cascading
Instead of doing a 1D range query on the associated structure
- f some node ν, we find the leaf where the search to y would
end in O(1) time via the direct pointer in the associated structure in the parent of ν The number of grey nodes reduces to O(logn)
65/180
Result
Theorem: A set of n points in d-dimensional space with d ≥ 2 can be preprocessed in O(nlogd−1 n) time into a data structure of O(nlogd−1 n) size so that any d-dimensional range query can be answered in O(logd−1 n+k) time, where k is the number of answers reported. Multiple points with the same x- or y-coordinate need to be handled with care.
66/180
Windowing
Zoom in; re-center and zoom in; select by outlining
67/180
Windowing
68/180
Windowing
Given a set of n axis-parallel line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently
69/180
Windowing
How can a rectangle and an axis-parallel line segment intersect?
70/180
Windowing
Essentially two types:
◮ Segments whose endpoint lies in
the rectangle (or both endpoints)
◮ Segments with both endpoints
- utside the rectangle
Segments of the latter type always intersect the boundary of the rectangle (even the left and/or bottom side)
71/180
Windowing
Instead of storing axis-parallel segments and searching with a rectangle, we will:
◮ store the segment endpoints
and query with the rectangle
◮ store the segments and query
with the left side and the bottom side of the rectangle Note that the query problem is at least as hard as rectangular range searching in point sets
72/180
Windowing
Instead of storing axis-parallel segments and searching with a rectangle, we will:
◮ store the segment endpoints
and query with the rectangle
◮ store the segments and query
with the left side and the bottom side of the rectangle Question: How often might we report the same segment?
73/180
Windowing
Instead of storing axis-parallel segments and searching with a rectangle, we will:
◮ store the segment endpoints and query with the
rectangle use range tree
◮ store the segments and query with the left side and the
bottom side of the rectangle need to develop data structure
74/180
Windowing
Current problem of our interest: Given a set of horizontal (vertical) line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently
75/180
Windowing
Simpler query problem: What if the vertical query segment is a full line? Then the problem is essentially 1-dimensional
76/180
Interval querying
Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently
77/180
Splitting a set of intervals
The median x of the 2n endpoints partitions the intervals into three subsets:
◮ Intervals Ileft fully left of x ◮ Intervals Imid that contain (intersect) x ◮ Intervals Iright fully right of x
x
78/180
Interval tree: recursive definition
The interval tree for I has a root node ν that contains x and
◮ the intervals Ileft are stored in the left subtree of ν ◮ the intervals Imid are stored with ν ◮ the intervals Iright are stored in the right subtree of ν
The left and right subtrees are proper interval trees for Ileft and Iright How many intervals can be in Imid? How should we store Imid?
79/180
Interval tree: left and right lists
How is Imid stored?
x
Observe: If the query point is left of x, then only the left endpoint determines if an interval is an answer Symmetrically: If the query point is right of x, then only the right endpoint determines if an interval is an answer
80/180
Interval tree: left and right lists
x
Make a list Lleft using the left-to-right order of the left endpoints of Imid Make a list Lright using the right-to-left order of the right endpoints of Imid Store both lists as associated structures with ν
81/180
Interval tree: example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s11, s12 s12, s11 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
82/180
Interval tree: storage
The main tree has O(n) nodes The total length of all lists is 2n because each interval is stored exactly twice: in Lleft and Lright and only at one node Consequently, the interval tree uses O(n) storage
83/180
Interval querying
Algorithm QUERYINTERVALTREE(ν,qx) 1. if ν is not a leaf 2. then if qx < xmid(ν) 3. then Traverse list Lleft(ν), starting at the interval with the leftmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 4. QUERYINTERVALTREE(lc(ν),qx) 5. else Traverse list Lright(ν), starting at the interval with the rightmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 6. QUERYINTERVALTREE(rc(ν),qx)
84/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
85/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
86/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
87/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
88/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
89/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
90/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
91/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
92/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
93/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
94/180
Interval tree: query example
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
Lleft Lright
95/180
Interval tree: query time
The query follows only one path in the tree, and that path has length O(logn) The query traverses O(logn) lists. Traversing a list with k′ answers takes O(1+k′) time The total time for list traversal is therefore O(log+k), with the total number of answers reported (no answer is found more than once) The query time is O(logn)+O(logn+k) = O(logn+k)
96/180
Interval tree: query example
Algorithm CONSTRUCTINTERVALTREE(I)
- Input. A set I of intervals on the real line
- Output. The root of an interval tree for I
1. if I = / 2. then return an empty leaf 3. else Create a node ν. Compute xmid, the median of the set
- f interval endpoints, and store xmid with ν
4. Compute Imid and construct two sorted lists for Imid: a list Lleft(ν) sorted on left endpoint and a list Lright(ν) sorted on right endpoint. Store these two lists at ν 5. lc(ν) ← CONSTRUCTINTERVALTREE(Ileft) 6. rc(ν) ← CONSTRUCTINTERVALTREE(Iright) 7. return ν
97/180
Interval tree: result
Theorem: An interval tree for a set I of n intervals uses O(n) storage and can be built in O(nlogn) time. All intervals that contain a query point can be reported in O(logn+k) time, where k is the number of reported intervals.
98/180
Back to the plane
99/180
Back to the plane
Suppose we use an interval tree on the x-intervals of the horizontal line segments? Then the lists Lleft and Lright are not suitable anymore to solve the query problem for the segments corresponding to Imid
100/180
Back to the plane
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
101/180
Back to the plane
s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1
q
102/180
Back to the plane
s5 s7 s6 s7, s5, s6 s5, s6, s7
q
103/180
Back to the plane
s7 s6 s7, s5, s6 s5, s6, s7
q
s5
104/180
Back to the plane
s7 s6 { s2, s5, s6, s7, s9, s22 }
q
s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }
105/180
Back to the plane
s7 s6 { s2, s5, s6, s7, s9, s22 }
q
s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }
106/180
Back to the plane
s7 s6 { s2, s5, s6, s7, s9, s22 }
q
s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }
q
107/180
Segment intersection queries
We can use a range tree as the associated structure; we only need one that stores all of the endpoints, to replace Lleft and Lright Instead of traversing Lleft or Lright, we perform a query with the region left or right, respectively, of q
108/180
Segment intersection queries
s7 s6
q
s5 s9 s2 s22
q
{ s2, s5, s6, s7, s9, s22 } all endpoints of
109/180
Segment intersection queries
In total, there are O(n) range trees that together store 2n points, so the total storage needed by all associated structures is O(nlogn) A query with a vertical segment leads to O(logn) range queries If fractional cascading is used in the associated structures, the overall query time is O(log2 n+k) Question: How about the construction time?
110/180
Result
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments
- reported. The data structure can be build in O(nlogn) time.
110/180
Result
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments
- reported. The data structure can be build in O(nlogn) time.
Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.
110/180
Result
Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments
- reported. The data structure can be build in O(nlogn) time.
Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.
111/180
Outlook: 3- and 4-sided ranges
Considering the associated structure, we only need 3-sided range queries, whereas the range tree provides 4-sided range queries Can the 3-sided range query problem be solved more efficiently than the 4-sided (rectangular) range query problem?
112/180
Outlook: arbitrary line segments
Given a set of n arbitrary, non-crossing line segments, can we preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently?
113/180
Scheme of structure
s7 s6
q
s5 s9 s2 s22
q
{ s2, s5, s6, s7, s9, s22 } all right endpoints of all left endpoints of { s2, s5, s6, s7, s9, s22 }
114/180
Heap and search tree
A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time Recall the heap:
6 1 2 3 7 4 8 11 5 13 10 14 12 9
115/180
Heap and search tree
A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time Recall the heap:
6 1 2 3 7 4 8 11 5 13 10 14 12 9
Report all values ≤ 4
116/180
Priority search tree
If P = / 0, then a priority search tree is an empty leaf Otherwise, let pmin be the leftmost point in P, and let ymid be the median y-coordinate of P\{pmin} The priority search tree has a node ν that stores pmin and ymid, and a left subtree and right subtree for the points in P\{pmin} with y-coordinate ≤ ymid and > ymid
pmin ymid
pmin ymid
117/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
118/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
119/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
120/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
121/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
122/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
123/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
124/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
125/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
126/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
127/180
Priority search tree
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6
128/180
Query algorithm
Algorithm QUERYPRIOSEARCHTREE(T,(−∞ : qx]×[qy : q′
y])
1. Search with qy and q′
y in T
2. Let νsplit be the node where the two search paths split 3. for each node ν on the search path of qy or q′
y
4. do if p(ν) ∈ (−∞ : qx]×[qy : q′
y] then report p(ν)
5. for each node ν on the path of qy in the left subtree of νsplit 6. do if the search path goes left at ν 7. then REPORTINSUBTREE(rc(ν),qx) 8. for each node ν on the path of q′
y in the right subtree of νsplit
9. do if the search path goes right at ν 10. then REPORTINSUBTREE(lc(ν),qx)
129/180
Structure of the query
130/180
Structure of the query
131/180
Query algorithm
REPORTINSUBTREE(ν,qx)
- Input. The root ν of a subtree of a priority search tree and a
value qx
- Output. All points in the subtree with x-coordinate at most qx
1. if ν is not a leaf and (p(ν))x ≤ qx 2. then Report p(ν) 3. REPORTINSUBTREE(lc(ν),qx) 4. REPORTINSUBTREE(rc(ν),qx) This subroutine takes O(1+k) time, for k reported answers
132/180
Query algorithm
The search paths to y and y′ have O(logn) nodes. At each node O(1) time is spent No nodes outside the search paths are ever visited Subtrees of nodes between the search paths are queried like a heap, and we spend O(1+k′) time on each one The total query time is O(logn+k), if k points are reported
133/180
Priority search tree: result
Theorem: A priority search tree for a set P of n points uses O(n) storage and can be built in O(nlogn) time. All points that lie in a 3-sided query range can be reported in O(logn+k) time, where k is the number of reported points
134/180
Scheme of structure
s7 s6
q
s5 s9 s2 s22
q
{ s2, s5, s6, s7, s9, s22 } all right endpoints of all left endpoints of { s2, s5, s6, s7, s9, s22 }
135/180
Storage of the structure
Question: What are the storage requirements of the structure for querying with a vertical segment in a set of horizontal segments?
136/180
Query time of the structure
Question: What is the query time of the structure for querying with a vertical segment in a set of horizontal segments?
137/180
Result
Theorem: A set of n horizontal line segments can be stored in a data structure with size O(n) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments reported
138/180
Result
Recall that the windowing problem is solved with a combination of a range tree and the structure just described Theorem: A set of n axis-parallel line segments can be stored in a data structure with size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported
139/180
Windowing
Given a set of n arbitrary, non-crossing line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently
140/180
Windowing
Two cases of intersection:
◮ An endpoint lies inside the
query window; solve with range trees
◮ The segment intersects a side of
the query window; solve how?
141/180
Using a bounding box?
If the query window intersects the line segment, then it also intersects the bounding box of the line segment (whose sides are axis-parallel segments) So we could search in the 4n bounding box sides
142/180
Using a bounding box?
But: if the query window intersects bounding box sides does not imply that it intersects the corresponding segments
143/180
Windowing
Current problem of our interest: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently
144/180
Using an interval tree?
q q
145/180
Interval querying
Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently We have the interval tree, but we will develop an alternative solution
146/180
Interval querying
Given a set S = {s1,s2,...,sn } of n segments on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently
s1 s2 s3 s4 s5 s6 s7 s8
The new structure is called the segment tree
147/180
Locus approach
The locus approach is the idea to partition the solution space into parts with equal answer sets
s1 s2 s3 s4 s5 s6 s7 s8
For the set S of segments, we get different answer sets before and after every endpoint
148/180
Locus approach
Let p1, p2,..., pm be the sorted set of unique endpoints of the intervals; m ≤ 2n
p1 p2 p3 p4 p5 p6 p7 p8 s1 s2 s3 s4 s5 s6 s7 s8
The real line is partitioned into (−∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞), these are called the elementary intervals
149/180
Locus approach
We could make a binary search tree that has a leaf for every elementary interval (−∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞) Each segment from the set S can be stored with all leaves whose elementary interval it contains: [pi, pj] is stored with [pi, pi],(pi, pi+1),..., [pj, pj] A stabbing query with point q is then solved by finding the unique leaf that contains q, and reporting all segments that it stores
150/180
Locus approach
(−∞, p1) [p1, p1] (p1, p2) [p2, p2] (p2, p3) [p3, p3] (p3, p4) [p4, p4] (p4, p5) [p5, p5] (p5, p6) [p6, p6] (p6, p7) [p7, p7] (p8, +∞) (p7, p8) [p8, p8] s1 s2 s3 s4 s5 s6 s7 s8
151/180
Locus approach
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8
152/180
Locus approach
Question: What are the storage requirements and what is the query time of this solution?
153/180
Towards segment trees
In the tree, the leaves store elementary intervals But each internal node corresponds to an interval too: the interval that is the union of the elementary intervals of all leaves below it
(pi, pi+1) [pi+1, pi+1] (pi+1, pi+2) [pi+2, pi+2] (pi, pi+2] pi pi+1 pi+2
154/180
Towards segment trees
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 (p2, p4] (p6, +∞) (p1, p2]
155/180
Towards segment trees
Let Int(ν) denote the interval of node ν To avoid quadratic storage, we store any segment sj as high as possible in the tree whose leaves correspond to elementary intervals More precisely: sj is stored with ν if and only if Int(ν) ⊆ sj but Int(parent(ν)) ⊆ sj
156/180
Towards segment trees
(pi, pi+1) [pi+1, pi+1] (pi+1, pi+2) [pi+2, pi+2] (pi, pi+2] pi pi+1 pi+2 (pi, pi+2] pi−2 pi−1 (pi−2, pi+2] (pi−2, pi+2] sj ν Int(ν) Int(parent(ν))
157/180
Segment trees
A segment tree on a set S of segments is a balanced binary search tree on the elementary intervals defined by S, and each node stores its interval, and its canonical subset of S in a list (unsorted) The canonical subset (of S) of a node ν is the subset of segments sj for which Int(ν) ⊆ sj but Int(parent(ν)) ⊆ sj
158/180
Segment trees
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1
159/180
Segment trees
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1
160/180
Segment trees
Question: Why are no segments stored with nodes on the leftmost and rightmost paths of the segment tree?
161/180
Query algorithm
The query algorithm is trivial: For a query point q, follow the path down the tree to the elementary interval that contains q, and report all segments stored in the lists with the nodes on that path
162/180
Example query
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8
s1
163/180
Example query
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2
s1, s2 s3, s4
s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1
164/180
Query time
The query time is O(logn+k), where k is the number of segments reported
165/180
Segments stored at many nodes
A segment can be stored in several lists of nodes. How bad can the storage requirements get?
166/180
Segments stored at many nodes
Lemma: Any segment can be stored at up to two nodes of the same depth Proof: Suppose a segment si is stored at three nodes ν1, ν2, and ν3 at the same depth from the root
ν1 ν2 ν3 si si si parent(ν2) si
167/180
Segments stored at many nodes
If a segment tree has depth O(logn), then any segment is stored in at most O(logn) lists ⇒ the total size of all lists is O(nlogn) The main tree uses O(n) storage The storage requirements of a segment tree on n segments is O(nlogn)
168/180
Result
Theorem: A segment tree storing n segments (=intervals) on the real line uses O(nlogn) storage, can be built in O(nlogn) time, and stabbing queries can be answered in O(logn+k) time, where k is the number of segments reported Property: For any query, all segments containing the query point are stored in the lists of O(logn) nodes
169/180
Back to windowing
Problem arising from windowing: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently
170/180
Idea for solution
The main idea is to build a segment tree on the x-projections
- f the 2D segments, and replace the associated lists with a
more suitable data structure
171/180
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1
172/180
s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1
173/180
Observe that nodes now correspond to vertical slabs of the plane (with or without left and right bounding lines), and:
◮ if a segment si is stored with a node ν, then it crosses
the slab of ν completely, but not the slab of the parent of ν
◮ the segments crossing a slab have a well-defined
top-to-bottom order
Int(ν) sj sj is stored at one
- r more nodes
below ν
174/180
s1 s3 s4 s5 p3 p4 s1, s3, s4 s5 s5
175/180
s1 s3 s4 s5 p3 p4 s1, s3, s4 s5 s5
176/180
Querying
Recall that a query is done with a vertical line segment q Only segments of S stored with nodes on the path down the tree using the x-coordinate of q can be answers At any such node, the query problem is: which of the segments (that cross the slab completely) intersects the vertical query segment q?
q
177/180
Querying
We store the canonical subset of a node ν in a balanced binary search tree that follows the bottom-to-top order in its leaves
q
s1 s2 s3 s4 s5 s6 s7 s1 s2 s4 s6 s5 s3 s1 s2 s3 s4 s5 s6 s7
178/180
Data structure
A query with q follows one path down the main tree, using the x-coordinate of q At each node, the associated tree is queried using the endpoints of q, as if it is a 1-dimensional range query The query time is O(log2 n+k)
179/180
Data structure
The data structure for intersection queries with a vertical query segment in a set of non-crossing line segments is a segment tree where the associated structures are binary search trees on the bottom-to-top order of the segments in the corresponding slab Since it is a segment tree with lists replaced by trees, the storage remains O(nlogn)
180/180