CMPUT 391 Database Management Systems Spatial Data Management
University of Alberta
- Dr. Jörg Sander, 2006
1
CMPUT 391 – Database Management Systems
CMPUT 391 Database Management Systems Spatial Data Management 1 - - PowerPoint PPT Presentation
CMPUT 391 Database Management Systems Spatial Data Management 1 Dr. Jrg Sander, 2006 University of Alberta CMPUT 391 Database Management Systems Spatial Data Management Shortcomings of Relational Databases and ORDBMS Modeling
University of Alberta
1
CMPUT 391 – Database Management Systems
University of Alberta
2
CMPUT 391 – Database Management Systems
University of Alberta
3
CMPUT 391 – Database Management Systems
University of Alberta
4
CMPUT 391 – Database Management Systems
University of Alberta
5
CMPUT 391 – Database Management Systems
University of Alberta
6
CMPUT 391 – Database Management Systems
University of Alberta
7
CMPUT 391 – Database Management Systems
University of Alberta
8
CMPUT 391 – Database Management Systems
University of Alberta
9
CMPUT 391 – Database Management Systems
University of Alberta
10
CMPUT 391 – Database Management Systems
University of Alberta
11
CMPUT 391 – Database Management Systems
in normalized relations
Parcels
FNr BNr F1 F1 F1 F1 F4 F4 F4 F4 F4 F4 F7 F7 F7 F7 B1 B2 B3 B4 B2 B5 B6 B7 B8 B9 B7 B10 B11 B12
… …
Borders
BNr PNr1 PNr2 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 P1 P2 P3 P4 P2 P5 P6 P7 P8 P6 P9 P10 P2 P3 P4 P1 P5 P6 P7 P8 P3 P9 P10 P7
Points
PNr
P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 X P1 X P2 X P3 X P4 X P5 X P6 X P7 X P8 X P9 X P10 Y P1 Y P2 Y P3 Y P4 Y P5 Y P6 Y P7 Y P8 Y P9 Y P10
X-Coord Y-Coord
F7 F4 F5 F2 F6 F3 F1
Redundancy free representation requires distribution of the information
University of Alberta
12
CMPUT 391 – Database Management Systems
– E.g.: if we want to determine if a given point P is inside parcel F2, we have to find all corner-points of parcel F2 first
SELECT Points.PNr, X-Coord, Y-Coord FROM Parcels, Border, Points WHERE FNr = ‘F2’ AND Parcel.BNr = Borders.BNr AND (Borders.PNr1 = Points.PNr OR Borders.PNr2 = Points.PNr)
University of Alberta
13
CMPUT 391 – Database Management Systems
– Data types such as Point, Line, Polygon – Operations such as ObjectIntersect, RangeQuery, etc.
– Natural extension of the relational model and query languages – Facilitates design and querying of spatial databases – Spatial data types and operations can be supported by spatial index structures and efficient algorithms, implemented in the core of a DBMS
University of Alberta
14
CMPUT 391 – Database Management Systems
Relation: ForestZones(Zone: Polygon, ForestOfficial: String, Area: Cardinal)
by a polygon S. Find all forest officials affected by this decision.
SELECT ForestOfficial FROM ForestZones WHERE ObjectIntersects (S, Zone)
R2 R4 R6 R3 R1 R5 ForestZones Zone ForestOfficial Area (m2) R1 R2 R3 R4 R5 R6 Stevens Behrens Lee Goebel Jones Kent 3900 4250 6700 5400 1900 4600
University of Alberta
15
CMPUT 391 – Database Management Systems
– Spatial Extent
system, which is at least 2-dimensional.
– Other Non-Spatial Attributes
University of Alberta
16
CMPUT 391 – Database Management Systems
2-dim. polygons Crop Forest Water 2-dim. lines 2-dim. points X Y
University of Alberta
17
CMPUT 391 – Database Management Systems
– Operations to retrieve certain subsets of spatial object from the database
– Operations that perform basic geometric computations and tests
University of Alberta
18
CMPUT 391 – Database Management Systems
W
Window Query
P
Point Query
Containment Query R Point Query P
contain R. If R is a Point: Point Query
(polygon or circle), find all spatial
rectangle: Window Query
region R, find all objects that are completely contained in R
closest to P (typically for points)
University of Alberta
19
CMPUT 391 – Database Management Systems
Region Query R Window Query R Enclosure Query R 2-nn Query P
– S1 = {R1, R2, …, Rm} and S2 = {R’1, R’2, …, R’n}
– R ∈ S1, R’ ∈ S2, – and R intersects R’ (R ∩ R’ ≠ ∅) – Spatial predicates other than intersection are also possible, e.g. all pairs of
B1 A2 A3 A4 A5 A6 A1 B2 B3
Answer Set (A5, B1) (A4, B1) (A1, B2) (A6, B2) (A2, B3) Spatial-Join
{A1, …, A6} {B1, …, B3}
University of Alberta
20
CMPUT 391 – Database Management Systems
– Group objects only along one dimension – Do not preserve spatial proximity
Nearest neighbor of Q is typically not the nearest neighbor in any single dimension
X Y
Q NN(Q) A B C D A and B closer in the X dimension; C and D closer in the Y dimension.
University of Alberta
21
CMPUT 391 – Database Management Systems
– Group objects that are close to each other on the same data page – Problem: the number of bytes to store extended spatial objects (lines, polygons) varies – Solution:
typically axis-parallel minimum bounding rectangles (MBR)
ER
MBR
Spatial Index (MBR, , ...) (ER) (MBR, , ...)... ... University of Alberta
22
CMPUT 391 – Database Management Systems
– Use the index to find all approximations that satisfy the query – Some objects already satisfy the query based on the approximation,
– Load the exact object representations for candidates left after the filter step and test whether they satisfies the query
query-window
b a c d
e
f g
e a und b sind sicherAntworten f, d und g sind sicher keine Antworten c und e sindKandidaten c ist einFehltreffer(false hit, d. h. ein Kandidat, der keine Antwort ist)
Filter candidates Refinement (exact evaluation) final results false hits Not an answer Query Query Window
not answers
Why?
University of Alberta
23
CMPUT 391 – Database Management Systems
University of Alberta
24
CMPUT 391 – Database Management Systems
– The data space is partitioned into rectangular cells. – Use a space filling curve to assign cell numbers to the cells (define a linear order on the cells)
good as possible
– Objects are approximated by cells. – Store the cell numbers for objects in a conventional index structure with respect to the linear order
43 63 62 59 58 47 46 42 1 21 20 17 16 5 4 3 23 22 19 18 7 6 2 9 29 28 25 24 13 12 8 11 31 30 27 26 15 14 10 33 53 52 49 48 37 36 32 35 55 54 51 50 39 38 34 41 61 60 57 56 45 44 40
University of Alberta
25
CMPUT 391 – Database Management Systems
Lexicographic Order
1 7 6 5 4 3 2 9 15 14 13 12 11 10 8 17 23 22 21 20 19 18 16 25 31 30 29 28 27 26 24 33 39 38 37 36 35 34 32 41 47 46 45 44 43 42 40 49 55 54 53 52 51 50 48 57 63 62 61 60 59 58 56
Hilbert-Curve
University of Alberta
26
CMPUT 391 – Database Management Systems
Z-Order
2 42 40 34 32 10 8 3 43 41 35 33 11 9 1 6 46 44 38 36 14 12 4 7 47 45 39 37 15 13 5 18 58 56 50 48 26 24 16 19 59 57 51 49 27 25 17 22 62 60 54 52 30 28 20 23 63 61 55 53 31 29 21 1 21 20 19 16 15 14 2 22 23 18 17 12 13 3 7 25 24 29 30 11 8 4 6 26 27 28 31 10 9 5 57 37 36 35 32 53 54 58 56 38 39 34 33 52 55 59 61 41 40 45 46 51 50 60 62 42 43 44 47 48 49 63
– Partition the data space recursively into two halves – Alternate X and Y dimension – Left/bottom 0 – Right/top 1
c = decimal value of the bit string l = level (number of bits) if all cells are on the same level, then l can be omitted
1 1
1
1 1
1
10 2 010000 16 0111 7 X Y
University of Alberta
27
CMPUT 391 – Database Management Systems
– Use a fixed a resolution of the space in both dimensions, i.e., each cell has the same size – Each point is then approximated by one cell
– minimum enclosing cell
the first partitions already
– improvement: use several cells
Query returns the same answer several times Query Window
R R
Coding of R by one cell
C C1 C2 C3 C4
2 42 40 34 32 10 8 3 43 41 35 33 11 9 1 6 46 44 38 36 14 12 4 7 47 45 39 37 15 13 5 18 58 56 50 48 26 24 16 19 59 57 51 49 27 25 17 22 62 60 54 52 30 28 20 23 63 61 55 53 31 29 21
Coding of R by several cells
University of Alberta
28
CMPUT 391 – Database Management Systems
(l1- l) (l2- l)
University of Alberta
29
CMPUT 391 – Database Management Systems
(0,2) (2,3) (7,4) (6,3) (7,3) (7,4) (4,3) (21,5) (11,4) (6,3) (6,3) (20,5) (6,4) (0,2) ≤ (7,4) ≤ (7,4) ≤ (6,3) (2,3) ≤ (7,4) ≤ (4,3) ≤ (6,3)
Exact representations stored in a different location
(6,4) ≤ (7,4) ≤ (20,5) ≤ (6,3)
University of Alberta
30
CMPUT 391 – Database Management Systems
Window: Min = (0,6), Max = (10,6) (0,2) (2,3) (7,4) (6,3) (7,4) (4,3) (21,5) (11,4) (6,3) (6,3) (20,5) (6,4) (7,3) Result: (0,2) (10,6) ≤ (2,3)
University of Alberta
31
CMPUT 391 – Database Management Systems
University of Alberta
32
CMPUT 391 – Database Management Systems
– MBR is a minimum bounding rectangle of a spatial object, which PointerToExactRepr is pointing to
– MBR is the minimum bounding rectangle of all entries in the subtree, which PointerToSubtree is pointing to.
Directory Data Level 1 Directory Level 2 Pages
. . .
Exact Representations
m
University of Alberta
33
CMPUT 391 – Database Management Systems
University of Alberta
34
CMPUT 391 – Database Management Systems
A5 A1 A4 A3 A6 A2 R S T Point Query X Y
A2 A3 A4 A5 A6 A1
R S T Answer Set: Paths that the query has to follow []
A5 A1 A4 A3 A6 A2 R S T Window Query X Y
A2 A3 A4 A5 A6 A1
R S T Answer Set: [A2, A3]
PointQuery (Page, Point); FOR ALL Entry ∈ Page DO IF Point IN Entry.MBR THEN IF Page = DataPage THEN PointInPolygonTest (load(Entry.ExactRepr), Point) ELSE PointQuery (Entry.Subtree, Point); Window Query (Page, Window); FOR ALL Entry ∈ Page DO IF Window INTERSECTS Entry.MBR THEN IF Page = DataPage THEN Intersection (load(Entry.ExactRepr), Window) ELSE WindowQuery (Entry.Subtree, Window);
First call: Page = Root of the R-tree
University of Alberta
35
CMPUT 391 – Database Management Systems
Start: empty data page (= root)
X Y
A3 A4 A5 A1
M = 3, m = 2 Insert: A5, A1, A3, A4 ⇒
A5, A1, A3, A4 * (overflow)
University of Alberta
36
CMPUT 391 – Database Management Systems
A5 A1 A4 A3 R S
? Split into 2 pages
X Y
A3 A4 A5 A1
R S
How to divide a set of rectangles into 2 sets?
University of Alberta
37
CMPUT 391 – Database Management Systems
A5 A1 A4 A3 R S
? Insert A2
X Y
A3 A4 A5 A1
R S
A2
?
A2
Where to insert a new rectangle?
follow Entry.Subtree
follow Ei.Subtree for entry ei with the smallest area of ei.MBR.
check the increase in area of the MBR for each entry when enlarging the MBR to enclose R. Choose Entry with the minimum increase in area (if this entry is not unique, choose the one with the smallest area); enlarge Entry.MBR and follow Entry.Subtree
University of Alberta
38
CMPUT 391 – Database Management Systems
– The parent entry for that page is deleted. – The page is split into 2 new pages - according to a split strategy – 2 new entries pointing to the newly created pages are inserted into the parent page. – A now possible overflow in the parent page is handled recursively in a similar way; if the root has to be split, a new root is created to contain the entries pointing to the newly created pages.
X Y
A3 A4 A5 A1 A2 A1 A4 A3 R S
R S
A2 A5 A6 A6 *
M = 3, m = 2
X Y
A3 A4 A5 A1 A5 A1 A4 A3 U S
U S
A2 A2 A6 A6
V
V Overlow split node
University of Alberta
39
CMPUT 391 – Database Management Systems
– Searching for the “best” split in the set of all possible splits is too expensive (O(2M) possibilities!)
– Choose the pair of rectangles R1 and R2 that have the largest value d(R1, R2) for empty space in an MBR, which covers both R1 und R2.
d (R1, R2) := Area(MBR(R1∪R2)) – (Area(R1) + Area(R2))
– Set K1 := {R1} and K2 := {R2} – Repeat until STOP
increase in area of the MBR by the assignment. If not unique: choose the Ki that covers the smaller area (if still not unique: the one with less entries).
University of Alberta
40
CMPUT 391 – Database Management Systems
– Same as the quadratic algorithm, except for the choice of the initial pair: Choose the pair with the largest normalized distance.
and the rectangle with the smallest maximal value (the difference is the maximal distance/separation).
the extensions of the rectangles in this dimension
Set K1 := {R1} and K2 := {R2}.
Smallest maximal value in X dimension Largest minimal value in X dimension X Y Smallest maximal value in Y dimension Largest minimal value in Y dimension
University of Alberta
41
CMPUT 391 – Database Management Systems
– e.g., the R*-tree, X-tree for higher dimensional point data, … – For further information see http://www.cs.umd.edu/~hjs/rtrees/index.html (includes an interactive demo)
– Multi-dimensional points, where a distance function between the points is defined play an important role for similarity search in so-called “feature” or “multi-media” databases.
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13
University of Alberta
42
CMPUT 391 – Database Management Systems
University of Alberta
43
CMPUT 391 – Database Management Systems
are similar to the query object up to a certain degree ε
query range ε query object 3-nn distance
University of Alberta
44
CMPUT 391 – Database Management Systems