SLIDE 1 ΒΑΣΕΙΣ ΔΕΔΟΜΕΝΩΝ ΙΙ
Spatial Access Methods (SAMs) I
Β. Μεγαλοοικονόμου Δ. Χριστοδουλάκης
(παρουσίαση βασισμένη εν μέρη σε σημειώσεις των Silberchatz, Korth και Sudarshan, του C. Faloutsos και του V. S. Subrahmanian)
SLIDE 2 General Overview
Multimedia Indexing
Spatial Access Methods (SAMs)
k-d trees Point Quadtrees MX-Quadtree z-ordering R-trees
SLIDE 3 SAMs - Detailed outline
spatial access methods
problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees
SLIDE 4 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
spatial queries (like??)
SLIDE 5 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
point queries range queries k-nn queries spatial joins (‘all pairs’ queries)
SLIDE 6 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
point queries range queries k-nn queries spatial joins (‘all pairs’ queries)
SLIDE 7 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
point queries range queries k-nn queries spatial joins (‘all pairs’ queries)
SLIDE 8 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
point queries range queries k-nn queries spatial joins (‘all pairs’ queries)
SLIDE 9 Spatial Access Methods - problem
Given a collection of geometric objects
(points, lines, polygons, ...)
- rganize them on disk, to answer
point queries range queries k-nn queries spatial joins (‘all pairs’ within ε)
SLIDE 10 SAMs - motivation
Q: applications?
SLIDE 11
SAMs - motivation
salary age traditional DB GIS
SLIDE 12
SAMs - motivation
salary age traditional DB GIS
SLIDE 13
SAMs - motivation
CAD/CAM find elements too close to each other
SLIDE 14
SAMs - motivation
CAD/CAM
SLIDE 15 day 1 365 day 1 365 S1 Sn
F(S1) F(Sn)
SAMs - motivation
eg, avg eg,. std
SLIDE 16 SAMs: solutions
K-d trees point quadtrees MX-quadtrees z-ordering R-trees (grid files)
Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)
SLIDE 17 SAMs - Detailed outline
spatial access methods
problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees
SLIDE 18 k-d trees
Used to store k dimensional point data It is not used to store region data A 2-d tree (i.e., for k=2) stores 2-
dimensional point data while a 3-d tree stores 3-dimensional point data, etc.
SLIDE 19 2-d trees – node structure
Binary trees Info: information field Xval,Yval: coordinates of a point associated with the node Llink, Rlink: pointers to children Properties (N: node):
If level N even ->
for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval
If level N odd ->
Similarly use Yvals
SLIDE 20
2-d trees – Example
SLIDE 21 2-d trees: Insertion/Search
To insert a node N into the tree pointed by T
If N and T agree on Xval, Yval then overwrite T Else, branch left if N.Xval < T.xval, right
Similarly for odd levels (branching on Yvals)
SLIDE 22 2-d trees – Example of Insertion
City (Xval, Yval) Banja Luka (19, 45) Derventa (40, 50) Toslic (38, 38) Tuzla (54, 35) Sinj (4, 4)
Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Sinj
SLIDE 23 2-d trees: Deletion
Deletion of point (x,y) from T
If N is a leaf node easy Otherwise either Tl (left subtree) or Tr (right
subtree) is non-empty
Find a “candidate replacement” node R in Tl or Tr Replace all of N’s non-link fields by those of R Recursively delete R from Ti
Recursion guaranteed to terminate - Why?
SLIDE 24 2-d trees: Deletion
Finding candidate replacement nodes for
deletion
Replacement node R must bear same spatial
relation to all nodes in Tl and Tr as node N
SLIDE 25 2-d trees: Range Queries
Q: Given a point (xc, yc) and a
distance r find all points in the 2-d tree that lie within the circle
A: Each node N in a 2-d tree
implicitly represents a region RN – If the circle (specified by the query) has no intersection with RN then there is no point in searching the subtree rooted at node N
SLIDE 26 SAMs - Detailed outline
spatial access methods
problem dfn k-d trees point quadtrees z-ordering R-trees
SLIDE 27 Point Quadtrees
Represent point data Always split regions into 4 parts 2-d tree: a node N splits a region into two by
drawing one line through the point (N.xval, N.yval)
Point quadtree: a node N splits a region by
drawing a horizontal and a vertical line through the point (N.xval, N.yval)
Four parts: NW, SW, NE, and SE quadrants Q: Quadtree nodes have 4 children?
SLIDE 28 Point Quadtrees
Nodes in point quadtrees represent
regions
SLIDE 29 Point quadtrees - Insertion
City (Xval, Yval) Banja Luka (19, 45) Derventa (40, 50) Toslic (38, 38) Tuzla (54, 35) Sinj (4, 4)
Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Sinj Splitting of region by Tuzla
SLIDE 30
Point Quadtrees - Insertion
SLIDE 31 Point quadtrees: Deletion
Deletion of point (x,y) from T
If N is a leaf node easy Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-
empty
Find a “candidate replacement” node R in one of the subtrees
such that:
Every other node R1 in N.NW is to the NW of R Every other node R2 in N.SW is to the SW of R etc… Replace all of N’s non-link fields by those of R Recursively delete R from Ti
In general, it may not always be possible to find such as
replacement node
Q: What happens in the worst case?
SLIDE 32 Point quadtrees: Deletion
Deletion of point (x,y) from T
If N is a leaf node easy Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-
empty
Find a “candidate replacement” node R in one of the subtrees
such that:
Every other node R1 in N.NW is to the NW of R Every other node R2 in N.SW is to the SW of R etc… Replace all of N’s non-link fields by those of R Recursively delete R from Ti
In general, it may not always be possible to find such as
replacement node
Q: What happens in the worst case? May require all
nodes to be reinserted
SLIDE 33 Point quadtrees: Range Searches
Each node in a point quadtree represents a
region
Do not search regions that do not intersect
the circle defined by the query
SLIDE 34 SAMs - Detailed outline
spatial access methods
problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees
SLIDE 35 MX-Quadtrees
Drawbacks of 2-d trees, point quadtrees:
shape of tree depends upon the order in which
- bjects are inserted into the tree
each node represents a region and splits the
region into two or four
splits may be uneven depending upon where the
point (N.xval, N.yval) is located inside the region (represented by N)
MX-quadtrees: shape (and height) of tree
independent of number of nodes and order of insertion
SLIDE 36 MX-Quadtrees
Assumption: the map is represented as
a grid of size (2k x 2k) for some k
They are like point quadtrees but when
a region gets “split” it is split down the middle
SLIDE 37
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
SLIDE 38
MX-Quadtrees - Insertion
After insertion of A, B, C, and D respectively
SLIDE 39 MX-Quadtrees - Deletion
Fairly easy – why? All points are represented at the leaf
level
Total time for deletion: O(k)
SLIDE 40 MX-Quadtrees –Range Queries
Same as in point quadtrees One difference:
Checking to see if a point is in the circle
defined by the range query needs to be performed at the leaf level (points are stored at the leaf level)
SLIDE 41 SAMs - Detailed outline
spatial access methods
problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees
SLIDE 42
z-ordering
Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points(!!) Q1: why? A: Q2: how?
SLIDE 43
z-ordering
Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points (!!) Q1: why? A: B-trees! Q2: how?
SLIDE 44
z-ordering
Q2: how? A: assume finite granularity; z-ordering = bit-shuffling = N-trees = Morton keys = geo-coding = ...
SLIDE 45
z-ordering
Q2: how? A: assume finite granularity (e.g., 232x232 ; 4x4 here) Q2.1: how to map n-d cells to 1-d cells?
SLIDE 46
z-ordering
Q2.1: how to map n-d cells to 1-d cells?
SLIDE 47
z-ordering
Q2.1: how to map n-d cells to 1-d cells? A: row-wise Q: is it good?
SLIDE 48
z-ordering
Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis
SLIDE 49
z-ordering
Q: How about the ‘snake’ curve?
SLIDE 50
z-ordering
Q: How about the ‘snake’ curve? A: still problems:
2^32 2^32
SLIDE 51
z-ordering
Q: Why are those curves ‘bad’? A: no distance preservation (~ clustering) Q: solution?
2^32 2^32
SLIDE 52
z-ordering
Q: solution? (w/ good clustering, and easy to compute, for 2-d and n-d?)
SLIDE 53 z-ordering
Q: solution? (w/ good clustering, and easy to compute, for 2-d and n-d?) A: z-ordering/bit-shuffling/linear- quadtrees
‘looks’ better:
- few long jumps;
- scoops out the whole quadrant
before leaving it
- a.k.a. space filling curves
SLIDE 54
z-ordering
z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x,y) )? A: 3 (equivalent) answers!
SLIDE 55 z-ordering
z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x,y))? A1: ‘z’ (or ‘N’) shapes, RECURSIVELY
...
SLIDE 56 z-ordering
Notice:
self similar (we’ll see about fractals,
soon)
method is hard to use: z =? f(x,y)
...
SLIDE 57
z-ordering
z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x,y) )? A: 3 (equivalent) answers!
Method #2?
SLIDE 58
z-ordering
bit-shuffling
00 0110 11 11 10 01 00 x y x 0 0 y 1 1 z =( 0 1 0 1 )2 = 5
SLIDE 59
z-ordering
bit-shuffling
00 0110 11 11 10 01 00 x y x 0 0 y 1 1 z =( 0 1 0 1 )2 = 5 How about the reverse: (x,y) = g(z) ?
SLIDE 60
z-ordering
bit-shuffling
00 0110 11 11 10 01 00 x y x 0 0 y 1 1 z =( 0 1 0 1 )2 = 5 How about n-d spaces?
SLIDE 61
z-ordering
z-ordering/bit-shuffling/linear-quadtrees Q: How to generate this curve (z = f(x,y) )? A: 3 (equivalent) answers!
Method #3?
SLIDE 62
z-ordering
linear-quadtrees : assign N->1, S->0 e.t.c.
1 1 00... 01... 10... 11... W E N S
SLIDE 63
z-ordering
... and repeat recursively. Eg.: zgray-cell = WN;WN = (0101)2 = 5
1 1 00... 01... 10... 11... W E N S 00 11
SLIDE 64
z-ordering
Drill: z-value of grey cell, with the three methods?
1 1 W E N S
SLIDE 65
z-ordering
Drill: z-value of grey cell, with the three methods?
1 1 W E N S method#1: 14 method#2: shuffle(11;10)= (1110)2 = 14
SLIDE 66
z-ordering
Drill: z-value of grey cell, with the three methods?
1 1 W E N S method#1: 14 method#2: shuffle(11;10)= (1110)2 = 14 method#3: EN;ES = ... = 14
SLIDE 67 z-ordering - Detailed outline
spatial access methods
z-ordering
main idea - 3 methods use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data analysis; variations
R-trees
SLIDE 68
z-ordering - usage & algo’s
Q1: How to store on disk? A: Q2: How to answer range queries etc
SLIDE 69
z-ordering - usage & algo’s
Q1: How to store on disk? A: treat z-value as primary key; feed to B-tree
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 70 z-ordering - usage & algo’s
MAJOR ADVANTAGES w/ B-tree:
already inside commercial systems (no coding
/debugging!)
concurrency & recovery is ready
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 71
z-ordering - usage & algo’s
Q2: queries? (eg.: find city at (0,3) )?
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 72
z-ordering - usage & algo’s
Q2: queries? (eg.: find city at (0,3) )? A: find z-value; search B-tree
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 73
z-ordering - usage & algo’s
Q2: range queries?
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 74
z-ordering - usage & algo’s
Q2: range queries? A: compute ranges of z-values; use B-tree
SF PGH
z cname etc 5 SF 12 PGH
9,11-15
SLIDE 75
z-ordering - usage & algo’s
Q2’: range queries - how to reduce # of qualifying ranges?
SF PGH
z cname etc 5 SF 12 PGH
9,11-15
SLIDE 76
z-ordering - usage & algo’s
Q2’: range queries - how to reduce # of qualifying ranges? A: Augment the query! SF PGH
z cname etc 5 SF 12 PGH
9,11-15 -> 8-15
SLIDE 77
z-ordering - usage & algo’s
Q2’’: range queries - how to break a query into ranges?
9,11-15
SLIDE 78
z-ordering - usage & algo’s
Q2’’: range queries - how to break a query into ranges? A: recursively, quadtree-style; decompose only non-full quadrants 9,11-15 12-15
SLIDE 79
z-ordering - usage & algo’s
Q2’’: range queries - how to break a query into ranges? A: recursively, quadtree-style; decompose only non-full quadrants 9,11-15 12-15 9, 11
SLIDE 80 z-ordering - Detailed outline
spatial access methods
z-ordering
main idea - 3 methods use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data analysis; variations
R-trees
SLIDE 81
z-ordering - usage & algo’s
Q3: k-nn queries? (say, 1-nn)?
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 82
z-ordering - usage & algo’s
Q3: k-nn queries? (say, 1-nn)? A: traverse B-tree; find nn wrt z-values and ...
SF PGH
z cname etc 5 SF 12 PGH
SLIDE 83
z-ordering - usage & algo’s
... ask a range query.
SF PGH 5 3 12 nn wrt z-value
SLIDE 84
z-ordering - usage & algo’s
... ask a range query.
SF PGH 5 3 12 nn wrt z-value
SLIDE 85
z-ordering - usage & algo’s
Q4: all-pairs queries? ( all pairs of cities within 10 miles from each other? )
SF PGH (we’ll see ‘spatial joins’ later: find all PA counties that intersect a lake)
SLIDE 86 z-ordering - Detailed outline
spatial access methods
z-ordering
main idea - 3 methods use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data analysis; variations
R-trees ...
SLIDE 87
z-ordering - regions
Q: z-value for a region?
zB = ?? zC = ?? A B C
SLIDE 88
z-ordering - regions
Q: z-value for a region? A: 1 or more z-values; by quadtree decomposition
zB = ?? zC = ?? A B C
SLIDE 89
z-ordering - regions
Q: z-value for a region?
1 1 00... 01... 10... 11... W E N S 00 11 A B C zB = 11** zC = ??
“don’t care”
SLIDE 90
z-ordering - regions
Q: z-value for a region?
1 1 00... 01... 10... 11... W E N S 00 11 A B C zB = 11** zC = {0010; 1000}
“don’t care”
SLIDE 91
z-ordering - regions
Q: How to store in B-tree? Q: How to search (range etc queries)
A B C
SLIDE 92 z-ordering - regions
Q: How to store in B-tree? A: sort (*<0<1) Q: How to search (range etc queries)
A B C z
etc 0010 C 0101 A 1000 C 11** B
SLIDE 93 z-ordering - regions
Q: How to search (range etc queries) – eg ‘red’ range query
A B C z
etc 0010 C 0101 A 1000 C 11** B
SLIDE 94 z-ordering - regions
Q: How to search (range etc queries) – eg ‘red’ range query A: break query in z-values; check B-tree
A B C z
etc 0010 C 0101 A 1000 C 11** B
SLIDE 95 z-ordering - regions
Almost identical to range queries for point data, except for the “don’t cares” - i.e.,
A B C z
etc 0010 C 0101 A 1000 C 11** B 1100 ?? 11**
SLIDE 96
z-ordering - regions
Almost identical to range queries for point data, except for the “don’t cares” - i.e., z1= 1100 ?? 11** = z2 Specifically: does z1 contain/avoid/intersect z2? Q: what is the criterion to decide?
SLIDE 97
z-ordering - regions
z1= 1100 ?? 11** = z2 Specifically: does z1 contain/avoid/intersect z2? Q: what is the criterion to decide? A: Prefix property: let r1, r2 be the corresponding regions, and let r1 be the smallest (=> z1 has fewest ‘*’s). Then:
SLIDE 98 z-ordering - regions
r2 will either contain completely, or
avoid completely r1.
it will contain r1, if z2 is the prefix of z1
A B C 1100 ?? 11** region of z1: completely contained in region of z2
SLIDE 99 z-ordering - regions
Drill (True/False). Given:
z1= 011001** z2= 01****** z3= 0100****
T/F r2 contains r1 T/F r3 contains r1 T/F r3 contains r2
SLIDE 100 z-ordering - regions
Drill (True/False). Given:
z1= 011001** z2= 01****** z3= 0100****
T/F r2 contains r1 - TRUE (prefix property) T/F r3 contains r1 - FALSE (disjoint) T/F r3 contains r2 - FALSE (r2 contains r3)
SLIDE 101 z-ordering - regions
Drill (True/False). Given:
z1= 011001** z2= 01****** z3= 0100****
z2
SLIDE 102 z-ordering - regions
Drill (True/False). Given:
z1= 011001** z2= 01****** z3= 0100****
z3 z2
T/F r2 contains r1 - TRUE (prefix property) T/F r3 contains r1 - FALSE (disjoint) T/F r3 contains r2 - FALSE (r2 contains r3)
SLIDE 103
z-ordering - regions
Spatial joins: find (quickly) all counties intersecting lakes
SLIDE 104
z-ordering - regions
Spatial joins: find (quickly) all counties intersecting lakes Naive algorithm: O( N * M) Something faster?
SLIDE 105 z-ordering - regions
Spatial joins: find (quickly) all counties intersecting lakes
z
etc 0011 Erie 0101 Erie … 10** Ont.
z
etc 0010 ALG … … 1000 WAS 11** ALG
SLIDE 106
z-ordering - regions
Spatial joins: find (quickly) all counties intersecting lakes Solution: merge the lists of (sorted) z-values, looking for the prefix property footnote#1: ‘*’ needs careful treatment footnote#2: need dup. elimination
SLIDE 107 z-ordering - Detailed outline
spatial access methods
z-ordering
main idea - 3 methods use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data analysis; variations
R-trees
SLIDE 108
z-ordering - variations
Q: is z-ordering the best we can do?
SLIDE 109
z-ordering - variations
Q: is z-ordering the best we can do? A: probably not - occasional long ‘jumps’ Q: then?
SLIDE 110
z-ordering - variations
Q: is z-ordering the best we can do? A: probably not - occasional long ‘jumps’ Q: then? A1: Gray codes
SLIDE 111
z-ordering - variations
A2: Hilbert curve! (a.k.a. Hilbert-Peano curve)
SLIDE 112
z-ordering - variations
‘Looks’ better (never long jumps). How to derive it?
SLIDE 113 z-ordering - variations
‘Looks’ better (never long jumps). How to derive it?
... order (n+1)
SLIDE 114
z-ordering - variations
Q: function for the Hilbert curve ( h = f(x,y) )? A: bit-shuffling, followed by post-processing, to account for rotations. Linear on # bits. See textbook, for pointers to code/algorithms (eg., [Jagadish, 90])
SLIDE 115 z-ordering - variations
Q: how about Hilbert curve in 3-d? n-d? A: Exists (and is not unique!). Eg., 3-d,
- rder-1 Hilbert curves (Hamiltonian
paths on cube)
#1 #2
SLIDE 116 z-ordering - Detailed outline
spatial access methods
z-ordering
main idea - 3 methods use w/ B-trees; algorithms (range, knn queries
...)
non-point (eg., region) data analysis; variations
R-trees ...
SLIDE 117
z-ordering - analysis
Q: How many pieces (‘quad-tree blocks’) per region? A: proportional to perimeter (surface etc)
SLIDE 118
z-ordering - analysis
(How long is the coastline, say, of England? Paradox: The answer changes with the yard- stick -> fractals ...)
SLIDE 119
z-ordering - analysis
Q: Should we decompose a region to full detail (and store in B-tree)?
SLIDE 120
z-ordering - analysis
Q: Should we decompose a region to full detail (and store in B-tree)? A: NO! approximation with 1-3 pieces/z- values is best [Orenstein90]
SLIDE 121
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
SLIDE 122
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve? A: e.g., avg. # of runs, for range queries
4 runs 3 runs (#runs ~ #disk accesses on B-tree)
SLIDE 123
z-ordering - analysis
Q: So, is Hilbert really better? A: 27% fewer runs, for 2-d (similar for 3-d) Q: are there formulas for #runs, #of quadtree blocks etc? A: Yes ([Jagadish; Moon+ etc] see textbook)
SLIDE 124 z-ordering - fun observations
Hilbert and z-ordering curves: “space filling curves”: eventually, they visit every point in n-d space - therefore:
... order (n+1)
SLIDE 125 z-ordering - fun observations
... they show that the plane has as many points as a line (-> headaches for 1900’s mathematics/topology). (fractals, again!)
... order (n+1)
SLIDE 126
z-ordering - fun observations
Observation #2: Hilbert (like) curve for video encoding [Y. Matias+, CRYPTO ‘87]: Given a frame, visit its pixels in randomized hilbert order; compress; and transmit
SLIDE 127
z-ordering - fun observations
In general, Hilbert curve is great for preserving distances, clustering, vector quantization etc
SLIDE 128 Conclusions
z-ordering is a great idea (n-d points ->
1-d points; feed to B-trees)
used by TIGER system and (most
probably) by other GIS products
works great with low-dim points