Geometric Algorithms Range & windowing queries (2 lectures) - - PowerPoint PPT Presentation

geometric algorithms
SMART_READER_LITE
LIVE PREVIEW

Geometric Algorithms Range & windowing queries (2 lectures) - - PowerPoint PPT Presentation

Range & windowing queries 1/180 Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G. Ometer born: Aug 16, 1954 salary salary: $3,500 A database query may ask for all employees with age between a 1


slide-1
SLIDE 1

1/180

Range & windowing queries

Geometric Algorithms

Range & windowing queries (2 lectures)

slide-2
SLIDE 2

2/180

Database queries

A database query may ask for all employees with age between a1 and a2, and salary between s1 and s2

date of birth salary 19,500,000 19,559,999

  • G. Ometer

born: Aug 16, 1954 salary: $3,500

slide-3
SLIDE 3

3/180

Data structures

Idea of data structures

◮ Representation of structure, for convenience (like DCEL) ◮ Preprocessing of data, to be able to solve future

questions really fast (sub-linear time) A (search) data structure has a storage requirement, a query time, and a construction time (and an update time)

slide-4
SLIDE 4

4/180

1D range query problem

1D range query problem: Preprocess a set of n points on the real line such that the ones inside a 1D query range (interval) can be answered fast The points p1,..., pn are known beforehand, the query [x,x′]

  • nly later

A solution to a query problem is a data structure, a query algorithm, and a construction algorithm Question: What are the most important factors for the efficiency of a solution?

slide-5
SLIDE 5

5/180

Balanced binary search trees

A balanced binary search tree with the points in the leaves

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

slide-6
SLIDE 6

6/180

Balanced binary search trees

The search path for 25

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

slide-7
SLIDE 7

7/180

Balanced binary search trees

The search paths for 25 and for 90

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

slide-8
SLIDE 8

8/180

Example 1D range query

A 1-dimensional range query with [25, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

slide-9
SLIDE 9

9/180

Node types for a query

Three types of nodes for a given query:

◮ White nodes: never visited by the query ◮ Grey nodes: visited by the query, unclear if they lead to

  • utput

◮ Black nodes: visited by the query, whole subtree is

  • utput

Question: What query time do we hope for?

slide-10
SLIDE 10

10/180

Node types for a query

The query algorithm comes down to what we do at each type

  • f node

Grey nodes: use query range to decide how to proceed: to not visit a subtree (pruning), to report a complete subtree, or just continue Black nodes: traverse and enumerate all points in the leaves

slide-11
SLIDE 11

11/180

Example 1D range query

A 1-dimensional range query with [61, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49

slide-12
SLIDE 12

12/180

Example 1D range query

A 1-dimensional range query with [61, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49 split node

slide-13
SLIDE 13

13/180

1D range query algorithm

Algorithm 1DRANGEQUERY(T,[x : x′]) 1. νsplit ←FINDSPLITNODE(T,x,x′) 2. if νsplit is a leaf 3. then Check if the point in νsplit must be reported. 4. else ν ← lc(νsplit) 5. while ν is not a leaf 6. do if x ≤ xν 7. then REPORTSUBTREE(rc(ν)) 8. ν ← lc(ν) 9. else ν ← rc(ν) 10. Check if the point stored in ν must be reported. 11. Similarly, follow the path to x′, and ...

slide-14
SLIDE 14

14/180

Query time analysis

The efficiency analysis is based on counting the numbers of nodes visited for each type

◮ White nodes: never visited by the query; no time spent ◮ Grey nodes: visited by the query, unclear if they lead to

  • utput; time determines dependency on n

◮ Black nodes: visited by the query, whole subtree is

  • utput; time determines dependency on k, the output

size

slide-15
SLIDE 15

15/180

Query time analysis

Grey nodes: they occur on only two paths in the tree, and since the tree is balanced, its depth is O(logn) Black nodes: a (sub)tree with m leaves has m−1 internal nodes; traversal visits O(m) nodes and finds m points for the

  • utput

The time spent at each node is O(1) ⇒ O(logn+k) query time

slide-16
SLIDE 16

16/180

Storage requirement and preprocessing

A (balanced) binary search tree storing n points uses O(n) storage A balanced binary search tree storing n points can be built in O(n) time after sorting

slide-17
SLIDE 17

17/180

Result

Theorem: A set of n points on the real line can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 1D range query can be answered in O(logn+k) time, where k is the number of answers reported

slide-18
SLIDE 18

18/180

Range queries in 2D

slide-19
SLIDE 19

18/180

Range queries in 2D

quadtrees: good in practice, but not so good worst-case query time Kd-trees: queries take O(√n+k) (Chapter 5.2) range trees: today

slide-20
SLIDE 20

19/180

Back to 1D range queries

A 1-dimensional range query with [61, 90]

3 10 19 23 30 37 59 62 70 80 89 3 19 10 30 59 70 62 93 89 80 23 49 93 97 37 49 split node

slide-21
SLIDE 21

20/180

Examining 1D range queries

Observation: Ignoring the search path leaves, all answers are jointly represented by the highest nodes strictly between the two search paths Question: How many highest nodes between the search paths can there be?

slide-22
SLIDE 22

21/180

Examining 1D range queries

For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query

slide-23
SLIDE 23

22/180

Toward 2D range queries

For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate

slide-24
SLIDE 24

23/180

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

slide-25
SLIDE 25

24/180

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

slide-26
SLIDE 26

25/180

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) data structure for searching on y-coordinate

slide-27
SLIDE 27

26/180

Toward 2D range queries

(3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4) (3, 8) (1, 5) (4, 2) (5, 9) (6, 7) (8, 1) (7, 3) (9, 4)

slide-28
SLIDE 28

27/180

2D range trees

Every internal node stores a whole tree in an associated structure, on y-coordinate Question: How much storage does this take?

slide-29
SLIDE 29

28/180

Storage of 2D range trees

To analyze storage, two arguments can be used:

◮ By level: On each level, any point is stored exactly once.

So all associated trees on one level together have O(n) size

◮ By point: For any point, it is stored in the associated

structures of its search path. So it is stored in O(logn) of them

slide-30
SLIDE 30

29/180

Construction algorithm

Algorithm BUILD2DRANGETREE(P) 1. Construct the associated structure: Build a binary search tree Tassoc on the set P

y of y-coordinates in P

2. if P contains only one point 3. then Create a leaf ν storing this point, and make Tassoc the associated structure of ν. 4. else Split P into P

left and P right, the subsets ≤ and >

the median x-coordinate xmid 5. νleft ← BUILD2DRANGETREE(P

left)

6. νright ← BUILD2DRANGETREE(P

right)

7. Create a node ν storing xmid, make νleft the left child of ν, make νright the right child of ν, and make Tassoc the associated structure of ν 8. return ν

slide-31
SLIDE 31

30/180

Efficiency of construction

The construction algorithm takes O(nlog2 n) time T(1) = O(1) T(n) = 2·T(n/2)+O(nlogn) which solves to O(nlog2 n) time

slide-32
SLIDE 32

31/180

Efficiency of construction

Suppose we pre-sort P on y-coordinate, and whenever we split P into P

left and P right, we keep the y-order in both subsets

For a sorted set, the associated structure can be built in linear time

slide-33
SLIDE 33

32/180

Efficiency of construction

The adapted construction algorithm takes O(nlogn) time T(1) = O(1) T(n) = 2·T(n/2)+O(n) which solves to O(nlogn) time

slide-34
SLIDE 34

33/180

2D range queries

How are queries performed and why are they correct?

◮ Are we sure that each answer is found? ◮ Are we sure that the same point is found only once?

slide-35
SLIDE 35

34/180

2D range queries

ν µ µ′ p p p p

slide-36
SLIDE 36

35/180

Query algorithm

Algorithm 2DRANGEQUERY(T,[x : x′]×[y : y′]) 1. νsplit ←FINDSPLITNODE(T,x,x′) 2. if νsplit is a leaf 3. then report the point stored at νsplit, if an answer 4. else ν ← lc(νsplit) 5. while ν is not a leaf 6. do if x ≤ xν 7. then 1DRANGEQ(Tassoc(rc(ν)),[y : y′]) 8. ν ← lc(ν) 9. else ν ← rc(ν) 10. Check if the point stored at ν must be reported. 11. Similarly, follow the path from rc(νsplit) to x′ ...

slide-37
SLIDE 37

36/180

2D range query time

Question: How much time does a 2D range query take? Subquestions: In how many associated structures do we search? How much time does each such search take?

slide-38
SLIDE 38

37/180

2D range queries

ν µ µ′

slide-39
SLIDE 39

38/180

2D range query efficiency

We search in O(logn) associated structures to perform a 1D range query; at most two per level of the main tree The query time is O(logn)×O(logm+k′), or

ν

O(lognν +kν) where ∑kν = k the number of points reported

slide-40
SLIDE 40

39/180

2D range query efficiency

Use the concept of grey and black nodes again:

slide-41
SLIDE 41

40/180

2D range query efficiency

The number of grey nodes is O(log2 n) The number of black nodes is O(k) if k points are reported The query time is O(log2 n+k), where k is the size of the

  • utput
slide-42
SLIDE 42

41/180

Result

Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(nlogn) size so that any 2D range query can be answered in O(log2 n+k) time, where k is the number of answers reported In contrast, a kd-tree has O(n) size and answers queries in O(√n+k) time (Chapter 5.2).

slide-43
SLIDE 43

42/180

Higher dimensional range trees

A d-dimensional range tree has a main tree which is a

  • ne-dimensional balanced

binary search tree on the first coordinate, where every node has a pointer to an associated structure that is a (d −1)-dimensional range tree

  • n the other coordinates
slide-44
SLIDE 44

43/180

Storage

The size Sd(n) of a d-dimensional range tree satisfies: S1(n) = O(n) for all n Sd(1) = O(1) for all d Sd(n) ≤ 2·Sd(n/2)+Sd−1(n) for d ≥ 2 This solves to Sd(n) = O(nlogd n)

slide-45
SLIDE 45

44/180

Query time

The number of grey nodes Gd(n) satisfies: G1(n) = O(logn) for all n Gd(1) = O(1) for all d Gd(n) ≤ 2·logn+2·logn·Gd−1(n) for d ≥ 2 This solves to Gd(n) = O(logd n)

slide-46
SLIDE 46

45/180

Result

Theorem: A set of n points in d-dimensional space can be preprocessed in O(nlogd−1 n) time into a data structure of O(nlogd−1 n) size so that any d-dimensional range query can be answered in O(logd n+k) time, where k is the number of answers reported

slide-47
SLIDE 47

46/180

Improving the query time

We can improve the query time of a 2D range tree from O(log2 n) to O(logn) by a technique called fractional cascading This automatically lowers the query time in d dimensions to O(logd−1 n) time

slide-48
SLIDE 48

47/180

Improving the query time

The idea illustrated best by a different query problem: Suppose that we have a collection of sets S1,...,Sm, where |S1| = n and where Si+1 ⊆ Si We want a data structure that can report for a query number x, the smallest value ≥ x in all sets S1,...,Sm

slide-49
SLIDE 49

48/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21

S1 S2 S3 S4

slide-50
SLIDE 50

49/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21

S1 S2 S3 S4

slide-51
SLIDE 51

50/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21

S1 S2 S3 S4

slide-52
SLIDE 52

51/180

Improving the query time

Suppose that we have a collection of sets S1,...,Sm, where |S1| = n and where Si+1 ⊆ Si We want a data structure that can report for a query number x, the smallest value ≥ x in all sets S1,...,Sm This query problem can be solved in O(logn+m) time instead

  • f O(m·logn) time
slide-53
SLIDE 53

52/180

Improving the query time

Can we do something similar for m 1-dimensional range queries on m sets S1,...,Sm? We hope to get a query time of O(logn+m+k) with k the total number of points reported

slide-54
SLIDE 54

53/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21

S1 S2 S3 S4

slide-55
SLIDE 55

54/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21

S1 S2 S3 S4

slide-56
SLIDE 56

55/180

Improving the query time

1 2 3 5 8 13 21 34 55 1 3 5 8 13 21 34 55 1 3 13 34 55 3 34 55 21 [6,35]

S1 S2 S3 S4

slide-57
SLIDE 57

56/180

Fractional cascading

Now we do “the same” on the associated structures of a 2-dimensional range tree Note that in every associated structure, we search with the same values y and y′

◮ Replace all associated structure except for the root by a

linked list

◮ For every list element (and leaf of the associated

structure of the root), store two pointers to the appropriate list elements in the lists of the left child and

  • f the right child
slide-58
SLIDE 58

57/180

Fractional cascading

slide-59
SLIDE 59

58/180

Fractional cascading

slide-60
SLIDE 60

59/180

Fractional cascading

(2, 19) (5, 80) (7, 10) (8, 37) (12, 3) (15, 99) (17, 62) (21, 49) (33, 30) (41, 95) (52, 23) (58, 59) (67, 89) (93, 70) 2 5 7 8 12 15 17 21 33 41 52 58 67 17 8 15 5 7 12 21 41 33 52 58 67 93 2

slide-61
SLIDE 61

60/180

Fractional cascading

3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99

slide-62
SLIDE 62

61/180

Fractional cascading

(2, 19) (5, 80) (7, 10) (8, 37) (12, 3) (15, 99) (17, 62) (21, 49) (33, 30) (41, 95) (52, 23) (58, 59) (67, 89) (93, 70) 2 5 7 8 12 15 17 21 33 41 52 58 67 17 8 15 5 7 12 21 41 33 52 58 67 93 2

[4, 58] × [19, 65]

slide-63
SLIDE 63

62/180

Fractional cascading

3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99

slide-64
SLIDE 64

63/180

Fractional cascading

3 99 10 19 37 80 30 49 62 3 10 19 23 30 37 49 59 62 70 80 89 95 99 89 70 59 23 95 30 49 80 3 99 62 37 19 10 3 99 62 89 70 59 23 95 30 49 89 70 23 95 10 37 19 80 70 89 23 95 37 10 19 59 49 80 3 30 99

slide-65
SLIDE 65

64/180

Fractional cascading

Instead of doing a 1D range query on the associated structure

  • f some node ν, we find the leaf where the search to y would

end in O(1) time via the direct pointer in the associated structure in the parent of ν The number of grey nodes reduces to O(logn)

slide-66
SLIDE 66

65/180

Result

Theorem: A set of n points in d-dimensional space with d ≥ 2 can be preprocessed in O(nlogd−1 n) time into a data structure of O(nlogd−1 n) size so that any d-dimensional range query can be answered in O(logd−1 n+k) time, where k is the number of answers reported. Multiple points with the same x- or y-coordinate need to be handled with care.

slide-67
SLIDE 67

66/180

Windowing

Zoom in; re-center and zoom in; select by outlining

slide-68
SLIDE 68

67/180

Windowing

slide-69
SLIDE 69

68/180

Windowing

Given a set of n axis-parallel line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently

slide-70
SLIDE 70

69/180

Windowing

How can a rectangle and an axis-parallel line segment intersect?

slide-71
SLIDE 71

70/180

Windowing

Essentially two types:

◮ Segments whose endpoint lies in

the rectangle (or both endpoints)

◮ Segments with both endpoints

  • utside the rectangle

Segments of the latter type always intersect the boundary of the rectangle (even the left and/or bottom side)

slide-72
SLIDE 72

71/180

Windowing

Instead of storing axis-parallel segments and searching with a rectangle, we will:

◮ store the segment endpoints

and query with the rectangle

◮ store the segments and query

with the left side and the bottom side of the rectangle Note that the query problem is at least as hard as rectangular range searching in point sets

slide-73
SLIDE 73

72/180

Windowing

Instead of storing axis-parallel segments and searching with a rectangle, we will:

◮ store the segment endpoints

and query with the rectangle

◮ store the segments and query

with the left side and the bottom side of the rectangle Question: How often might we report the same segment?

slide-74
SLIDE 74

73/180

Windowing

Instead of storing axis-parallel segments and searching with a rectangle, we will:

◮ store the segment endpoints and query with the

rectangle use range tree

◮ store the segments and query with the left side and the

bottom side of the rectangle need to develop data structure

slide-75
SLIDE 75

74/180

Windowing

Current problem of our interest: Given a set of horizontal (vertical) line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently

slide-76
SLIDE 76

75/180

Windowing

Simpler query problem: What if the vertical query segment is a full line? Then the problem is essentially 1-dimensional

slide-77
SLIDE 77

76/180

Interval querying

Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently

slide-78
SLIDE 78

77/180

Splitting a set of intervals

The median x of the 2n endpoints partitions the intervals into three subsets:

◮ Intervals Ileft fully left of x ◮ Intervals Imid that contain (intersect) x ◮ Intervals Iright fully right of x

x

slide-79
SLIDE 79

78/180

Interval tree: recursive definition

The interval tree for I has a root node ν that contains x and

◮ the intervals Ileft are stored in the left subtree of ν ◮ the intervals Imid are stored with ν ◮ the intervals Iright are stored in the right subtree of ν

The left and right subtrees are proper interval trees for Ileft and Iright How many intervals can be in Imid? How should we store Imid?

slide-80
SLIDE 80

79/180

Interval tree: left and right lists

How is Imid stored?

x

Observe: If the query point is left of x, then only the left endpoint determines if an interval is an answer Symmetrically: If the query point is right of x, then only the right endpoint determines if an interval is an answer

slide-81
SLIDE 81

80/180

Interval tree: left and right lists

x

Make a list Lleft using the left-to-right order of the left endpoints of Imid Make a list Lright using the right-to-left order of the right endpoints of Imid Store both lists as associated structures with ν

slide-82
SLIDE 82

81/180

Interval tree: example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s11, s12 s12, s11 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-83
SLIDE 83

82/180

Interval tree: storage

The main tree has O(n) nodes The total length of all lists is 2n because each interval is stored exactly twice: in Lleft and Lright and only at one node Consequently, the interval tree uses O(n) storage

slide-84
SLIDE 84

83/180

Interval querying

Algorithm QUERYINTERVALTREE(ν,qx) 1. if ν is not a leaf 2. then if qx < xmid(ν) 3. then Traverse list Lleft(ν), starting at the interval with the leftmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 4. QUERYINTERVALTREE(lc(ν),qx) 5. else Traverse list Lright(ν), starting at the interval with the rightmost endpoint, reporting all the intervals that contain qx. Stop as soon as an interval does not contain qx. 6. QUERYINTERVALTREE(rc(ν),qx)

slide-85
SLIDE 85

84/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-86
SLIDE 86

85/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-87
SLIDE 87

86/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-88
SLIDE 88

87/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-89
SLIDE 89

88/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-90
SLIDE 90

89/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-91
SLIDE 91

90/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-92
SLIDE 92

91/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-93
SLIDE 93

92/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-94
SLIDE 94

93/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-95
SLIDE 95

94/180

Interval tree: query example

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

Lleft Lright

slide-96
SLIDE 96

95/180

Interval tree: query time

The query follows only one path in the tree, and that path has length O(logn) The query traverses O(logn) lists. Traversing a list with k′ answers takes O(1+k′) time The total time for list traversal is therefore O(log+k), with the total number of answers reported (no answer is found more than once) The query time is O(logn)+O(logn+k) = O(logn+k)

slide-97
SLIDE 97

96/180

Interval tree: query example

Algorithm CONSTRUCTINTERVALTREE(I)

  • Input. A set I of intervals on the real line
  • Output. The root of an interval tree for I

1. if I = / 2. then return an empty leaf 3. else Create a node ν. Compute xmid, the median of the set

  • f interval endpoints, and store xmid with ν

4. Compute Imid and construct two sorted lists for Imid: a list Lleft(ν) sorted on left endpoint and a list Lright(ν) sorted on right endpoint. Store these two lists at ν 5. lc(ν) ← CONSTRUCTINTERVALTREE(Ileft) 6. rc(ν) ← CONSTRUCTINTERVALTREE(Iright) 7. return ν

slide-98
SLIDE 98

97/180

Interval tree: result

Theorem: An interval tree for a set I of n intervals uses O(n) storage and can be built in O(nlogn) time. All intervals that contain a query point can be reported in O(logn+k) time, where k is the number of reported intervals.

slide-99
SLIDE 99

98/180

Back to the plane

slide-100
SLIDE 100

99/180

Back to the plane

Suppose we use an interval tree on the x-intervals of the horizontal line segments? Then the lists Lleft and Lright are not suitable anymore to solve the query problem for the segments corresponding to Imid

slide-101
SLIDE 101

100/180

Back to the plane

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

slide-102
SLIDE 102

101/180

Back to the plane

s1 s2 s3 s4 s5 s7 s8 s9 s10 s11 s12 s6 s7, s5, s6 s5, s6, s7 s8 s8 s9, s10 s9, s10 s12, s11 s11, s12 s4, s3, s2 s4, s3, s2 s1 s1

q

slide-103
SLIDE 103

102/180

Back to the plane

s5 s7 s6 s7, s5, s6 s5, s6, s7

q

slide-104
SLIDE 104

103/180

Back to the plane

s7 s6 s7, s5, s6 s5, s6, s7

q

s5

slide-105
SLIDE 105

104/180

Back to the plane

s7 s6 { s2, s5, s6, s7, s9, s22 }

q

s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }

slide-106
SLIDE 106

105/180

Back to the plane

s7 s6 { s2, s5, s6, s7, s9, s22 }

q

s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }

slide-107
SLIDE 107

106/180

Back to the plane

s7 s6 { s2, s5, s6, s7, s9, s22 }

q

s5 s9 s2 s22 { s2, s5, s6, s7, s9, s22 }

q

slide-108
SLIDE 108

107/180

Segment intersection queries

We can use a range tree as the associated structure; we only need one that stores all of the endpoints, to replace Lleft and Lright Instead of traversing Lleft or Lright, we perform a query with the region left or right, respectively, of q

slide-109
SLIDE 109

108/180

Segment intersection queries

s7 s6

q

s5 s9 s2 s22

q

{ s2, s5, s6, s7, s9, s22 } all endpoints of

slide-110
SLIDE 110

109/180

Segment intersection queries

In total, there are O(n) range trees that together store 2n points, so the total storage needed by all associated structures is O(nlogn) A query with a vertical segment leads to O(logn) range queries If fractional cascading is used in the associated structures, the overall query time is O(log2 n+k) Question: How about the construction time?

slide-111
SLIDE 111

110/180

Result

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments

  • reported. The data structure can be build in O(nlogn) time.
slide-112
SLIDE 112

110/180

Result

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments

  • reported. The data structure can be build in O(nlogn) time.

Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.

slide-113
SLIDE 113

110/180

Result

Theorem: A set of n horizontal line segments can be stored in a data structure of size O(nlogn) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments

  • reported. The data structure can be build in O(nlogn) time.

Theorem: A set of n axis-parallel line segments can be stored in a data structure of size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported. The data structure can be build in O(nlogn) time.

slide-114
SLIDE 114

111/180

Outlook: 3- and 4-sided ranges

Considering the associated structure, we only need 3-sided range queries, whereas the range tree provides 4-sided range queries Can the 3-sided range query problem be solved more efficiently than the 4-sided (rectangular) range query problem?

slide-115
SLIDE 115

112/180

Outlook: arbitrary line segments

Given a set of n arbitrary, non-crossing line segments, can we preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently?

slide-116
SLIDE 116

113/180

Scheme of structure

s7 s6

q

s5 s9 s2 s22

q

{ s2, s5, s6, s7, s9, s22 } all right endpoints of all left endpoints of { s2, s5, s6, s7, s9, s22 }

slide-117
SLIDE 117

114/180

Heap and search tree

A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time Recall the heap:

6 1 2 3 7 4 8 11 5 13 10 14 12 9

slide-118
SLIDE 118

115/180

Heap and search tree

A priority search tree is like a heap on x-coordinate and binary search tree on y-coordinate at the same time Recall the heap:

6 1 2 3 7 4 8 11 5 13 10 14 12 9

Report all values ≤ 4

slide-119
SLIDE 119

116/180

Priority search tree

If P = / 0, then a priority search tree is an empty leaf Otherwise, let pmin be the leftmost point in P, and let ymid be the median y-coordinate of P\{pmin} The priority search tree has a node ν that stores pmin and ymid, and a left subtree and right subtree for the points in P\{pmin} with y-coordinate ≤ ymid and > ymid

pmin ymid

pmin ymid

slide-120
SLIDE 120

117/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-121
SLIDE 121

118/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-122
SLIDE 122

119/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-123
SLIDE 123

120/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-124
SLIDE 124

121/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-125
SLIDE 125

122/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-126
SLIDE 126

123/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-127
SLIDE 127

124/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-128
SLIDE 128

125/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-129
SLIDE 129

126/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-130
SLIDE 130

127/180

Priority search tree

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p1 p8 p12 p14 p13 p10 p11 p9 p2 p3 p4 p7 p5 p6

slide-131
SLIDE 131

128/180

Query algorithm

Algorithm QUERYPRIOSEARCHTREE(T,(−∞ : qx]×[qy : q′

y])

1. Search with qy and q′

y in T

2. Let νsplit be the node where the two search paths split 3. for each node ν on the search path of qy or q′

y

4. do if p(ν) ∈ (−∞ : qx]×[qy : q′

y] then report p(ν)

5. for each node ν on the path of qy in the left subtree of νsplit 6. do if the search path goes left at ν 7. then REPORTINSUBTREE(rc(ν),qx) 8. for each node ν on the path of q′

y in the right subtree of νsplit

9. do if the search path goes right at ν 10. then REPORTINSUBTREE(lc(ν),qx)

slide-132
SLIDE 132

129/180

Structure of the query

slide-133
SLIDE 133

130/180

Structure of the query

slide-134
SLIDE 134

131/180

Query algorithm

REPORTINSUBTREE(ν,qx)

  • Input. The root ν of a subtree of a priority search tree and a

value qx

  • Output. All points in the subtree with x-coordinate at most qx

1. if ν is not a leaf and (p(ν))x ≤ qx 2. then Report p(ν) 3. REPORTINSUBTREE(lc(ν),qx) 4. REPORTINSUBTREE(rc(ν),qx) This subroutine takes O(1+k) time, for k reported answers

slide-135
SLIDE 135

132/180

Query algorithm

The search paths to y and y′ have O(logn) nodes. At each node O(1) time is spent No nodes outside the search paths are ever visited Subtrees of nodes between the search paths are queried like a heap, and we spend O(1+k′) time on each one The total query time is O(logn+k), if k points are reported

slide-136
SLIDE 136

133/180

Priority search tree: result

Theorem: A priority search tree for a set P of n points uses O(n) storage and can be built in O(nlogn) time. All points that lie in a 3-sided query range can be reported in O(logn+k) time, where k is the number of reported points

slide-137
SLIDE 137

134/180

Scheme of structure

s7 s6

q

s5 s9 s2 s22

q

{ s2, s5, s6, s7, s9, s22 } all right endpoints of all left endpoints of { s2, s5, s6, s7, s9, s22 }

slide-138
SLIDE 138

135/180

Storage of the structure

Question: What are the storage requirements of the structure for querying with a vertical segment in a set of horizontal segments?

slide-139
SLIDE 139

136/180

Query time of the structure

Question: What is the query time of the structure for querying with a vertical segment in a set of horizontal segments?

slide-140
SLIDE 140

137/180

Result

Theorem: A set of n horizontal line segments can be stored in a data structure with size O(n) such that intersection queries with a vertical line segment can be performed in O(log2 n+k) time, where k is the number of segments reported

slide-141
SLIDE 141

138/180

Result

Recall that the windowing problem is solved with a combination of a range tree and the structure just described Theorem: A set of n axis-parallel line segments can be stored in a data structure with size O(nlogn) such that windowing queries can be performed in O(log2 n+k) time, where k is the number of segments reported

slide-142
SLIDE 142

139/180

Windowing

Given a set of n arbitrary, non-crossing line segments, preprocess them into a data structure so that the ones that intersect a query rectangle can be reported efficiently

slide-143
SLIDE 143

140/180

Windowing

Two cases of intersection:

◮ An endpoint lies inside the

query window; solve with range trees

◮ The segment intersects a side of

the query window; solve how?

slide-144
SLIDE 144

141/180

Using a bounding box?

If the query window intersects the line segment, then it also intersects the bounding box of the line segment (whose sides are axis-parallel segments) So we could search in the 4n bounding box sides

slide-145
SLIDE 145

142/180

Using a bounding box?

But: if the query window intersects bounding box sides does not imply that it intersects the corresponding segments

slide-146
SLIDE 146

143/180

Windowing

Current problem of our interest: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently

slide-147
SLIDE 147

144/180

Using an interval tree?

q q

slide-148
SLIDE 148

145/180

Interval querying

Given a set I of n intervals on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently We have the interval tree, but we will develop an alternative solution

slide-149
SLIDE 149

146/180

Interval querying

Given a set S = {s1,s2,...,sn } of n segments on the real line, preprocess them into a data structure so that the ones containing a query point (value) can be reported efficiently

s1 s2 s3 s4 s5 s6 s7 s8

The new structure is called the segment tree

slide-150
SLIDE 150

147/180

Locus approach

The locus approach is the idea to partition the solution space into parts with equal answer sets

s1 s2 s3 s4 s5 s6 s7 s8

For the set S of segments, we get different answer sets before and after every endpoint

slide-151
SLIDE 151

148/180

Locus approach

Let p1, p2,..., pm be the sorted set of unique endpoints of the intervals; m ≤ 2n

p1 p2 p3 p4 p5 p6 p7 p8 s1 s2 s3 s4 s5 s6 s7 s8

The real line is partitioned into (−∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞), these are called the elementary intervals

slide-152
SLIDE 152

149/180

Locus approach

We could make a binary search tree that has a leaf for every elementary interval (−∞, p1), [p1, p1],(p1, p2), [p2, p2], (p2, p3),..., (pm,+∞) Each segment from the set S can be stored with all leaves whose elementary interval it contains: [pi, pj] is stored with [pi, pi],(pi, pi+1),..., [pj, pj] A stabbing query with point q is then solved by finding the unique leaf that contains q, and reporting all segments that it stores

slide-153
SLIDE 153

150/180

Locus approach

(−∞, p1) [p1, p1] (p1, p2) [p2, p2] (p2, p3) [p3, p3] (p3, p4) [p4, p4] (p4, p5) [p5, p5] (p5, p6) [p6, p6] (p6, p7) [p7, p7] (p8, +∞) (p7, p8) [p8, p8] s1 s2 s3 s4 s5 s6 s7 s8

slide-154
SLIDE 154

151/180

Locus approach

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8

slide-155
SLIDE 155

152/180

Locus approach

Question: What are the storage requirements and what is the query time of this solution?

slide-156
SLIDE 156

153/180

Towards segment trees

In the tree, the leaves store elementary intervals But each internal node corresponds to an interval too: the interval that is the union of the elementary intervals of all leaves below it

(pi, pi+1) [pi+1, pi+1] (pi+1, pi+2) [pi+2, pi+2] (pi, pi+2] pi pi+1 pi+2

slide-157
SLIDE 157

154/180

Towards segment trees

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 (p2, p4] (p6, +∞) (p1, p2]

slide-158
SLIDE 158

155/180

Towards segment trees

Let Int(ν) denote the interval of node ν To avoid quadratic storage, we store any segment sj as high as possible in the tree whose leaves correspond to elementary intervals More precisely: sj is stored with ν if and only if Int(ν) ⊆ sj but Int(parent(ν)) ⊆ sj

slide-159
SLIDE 159

156/180

Towards segment trees

(pi, pi+1) [pi+1, pi+1] (pi+1, pi+2) [pi+2, pi+2] (pi, pi+2] pi pi+1 pi+2 (pi, pi+2] pi−2 pi−1 (pi−2, pi+2] (pi−2, pi+2] sj ν Int(ν) Int(parent(ν))

slide-160
SLIDE 160

157/180

Segment trees

A segment tree on a set S of segments is a balanced binary search tree on the elementary intervals defined by S, and each node stores its interval, and its canonical subset of S in a list (unsorted) The canonical subset (of S) of a node ν is the subset of segments sj for which Int(ν) ⊆ sj but Int(parent(ν)) ⊆ sj

slide-161
SLIDE 161

158/180

Segment trees

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1

slide-162
SLIDE 162

159/180

Segment trees

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1

slide-163
SLIDE 163

160/180

Segment trees

Question: Why are no segments stored with nodes on the leftmost and rightmost paths of the segment tree?

slide-164
SLIDE 164

161/180

Query algorithm

The query algorithm is trivial: For a query point q, follow the path down the tree to the elementary interval that contains q, and report all segments stored in the lists with the nodes on that path

slide-165
SLIDE 165

162/180

Example query

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8

s1

slide-166
SLIDE 166

163/180

Example query

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2

s1, s2 s3, s4

s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1

slide-167
SLIDE 167

164/180

Query time

The query time is O(logn+k), where k is the number of segments reported

slide-168
SLIDE 168

165/180

Segments stored at many nodes

A segment can be stored in several lists of nodes. How bad can the storage requirements get?

slide-169
SLIDE 169

166/180

Segments stored at many nodes

Lemma: Any segment can be stored at up to two nodes of the same depth Proof: Suppose a segment si is stored at three nodes ν1, ν2, and ν3 at the same depth from the root

ν1 ν2 ν3 si si si parent(ν2) si

slide-170
SLIDE 170

167/180

Segments stored at many nodes

If a segment tree has depth O(logn), then any segment is stored in at most O(logn) lists ⇒ the total size of all lists is O(nlogn) The main tree uses O(n) storage The storage requirements of a segment tree on n segments is O(nlogn)

slide-171
SLIDE 171

168/180

Result

Theorem: A segment tree storing n segments (=intervals) on the real line uses O(nlogn) storage, can be built in O(nlogn) time, and stabbing queries can be answered in O(logn+k) time, where k is the number of segments reported Property: For any query, all segments containing the query point are stored in the lists of O(logn) nodes

slide-172
SLIDE 172

169/180

Back to windowing

Problem arising from windowing: Given a set of arbitrarily oriented, non-crossing line segments, preprocess them into a data structure so that the ones intersecting a vertical (horizontal) query segment can be reported efficiently

slide-173
SLIDE 173

170/180

Idea for solution

The main idea is to build a segment tree on the x-projections

  • f the 2D segments, and replace the associated lists with a

more suitable data structure

slide-174
SLIDE 174

171/180

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1

slide-175
SLIDE 175

172/180

s1 s2 s3 s4 s5 s6 s7 s8 p1 p2 p3 p4 p5 p6 p7 p8 s1, s2 s1, s2 s3, s4 s1, s3, s4 s5 s5 s3, s5 s6 s6 s7 s6, s7 s8 s7, s8 s7, s8 s1

slide-176
SLIDE 176

173/180

Observe that nodes now correspond to vertical slabs of the plane (with or without left and right bounding lines), and:

◮ if a segment si is stored with a node ν, then it crosses

the slab of ν completely, but not the slab of the parent of ν

◮ the segments crossing a slab have a well-defined

top-to-bottom order

Int(ν) sj sj is stored at one

  • r more nodes

below ν

slide-177
SLIDE 177

174/180

s1 s3 s4 s5 p3 p4 s1, s3, s4 s5 s5

slide-178
SLIDE 178

175/180

s1 s3 s4 s5 p3 p4 s1, s3, s4 s5 s5

slide-179
SLIDE 179

176/180

Querying

Recall that a query is done with a vertical line segment q Only segments of S stored with nodes on the path down the tree using the x-coordinate of q can be answers At any such node, the query problem is: which of the segments (that cross the slab completely) intersects the vertical query segment q?

q

slide-180
SLIDE 180

177/180

Querying

We store the canonical subset of a node ν in a balanced binary search tree that follows the bottom-to-top order in its leaves

q

s1 s2 s3 s4 s5 s6 s7 s1 s2 s4 s6 s5 s3 s1 s2 s3 s4 s5 s6 s7

slide-181
SLIDE 181

178/180

Data structure

A query with q follows one path down the main tree, using the x-coordinate of q At each node, the associated tree is queried using the endpoints of q, as if it is a 1-dimensional range query The query time is O(log2 n+k)

slide-182
SLIDE 182

179/180

Data structure

The data structure for intersection queries with a vertical query segment in a set of non-crossing line segments is a segment tree where the associated structures are binary search trees on the bottom-to-top order of the segments in the corresponding slab Since it is a segment tree with lists replaced by trees, the storage remains O(nlogn)

slide-183
SLIDE 183

180/180

Result

Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that intersection queries with a vertical query segment can be answered in O(log2 n+k) time, where k is the number of answers reported Theorem: A set of n non-crossing line segments can be stored in a data structure of size O(nlogn) so that windowing queries can be answered in O(log2 n+k) time, where k is the number of answers reported