5. Spatial databases Characteristics: The scene of spatial data is - - PowerPoint PPT Presentation

5 spatial databases
SMART_READER_LITE
LIVE PREVIEW

5. Spatial databases Characteristics: The scene of spatial data is - - PowerPoint PPT Presentation

5. Spatial databases Characteristics: The scene of spatial data is a geometric space , with some dimensionality (usually 2 or 3). The shape and location are essential components of information Dimension values are most often numeric,


slide-1
SLIDE 1

MMDB-5 J. Teuhola 2012 109

  • 5. Spatial databases

Characteristics:

The scene of spatial data is a geometric space, with some

dimensionality (usually 2 or 3).

The shape and location are essential components of information Dimension values are most often numeric, with some resolution,

plus lower and upper bounds.

Application areas: Geographic information systems (GIS),

computer-aided design (CAD), graphical user interfaces (GUI), virtual environments, computer games, robotics, animation, etc. Literature:

Güting, R.H.: ”An Introduction to Spatial Database Systems”, VLDB

Journal, Vol. 3, 1994, pp. 357-399.

  • P. Rigaux, M. Scholl, A. Voisard: Spatial Databases, with

Application to GIS, Morgan-Kaufmann, 2002

slide-2
SLIDE 2

MMDB-5 J. Teuhola 2012 110

Abstract space modeling: Entity-based models

Components of spatial objects:

  • identity
  • description
  • spatial extent

Classification based on dimensionality:

  • Choice depends on the application viewpoint.

(a) Zero-dimensional objects = points

Object does not have a shape, or it is not considered useful Area quite small with respect to the embedding space,

e.g. cities, buildings, road crossings on a map.

Depends e.g. on the scale of the map.

slide-3
SLIDE 3

MMDB-5 J. Teuhola 2012 111

Entity-based models (cont.)

(b) One-dimensional = line objects

E.g. roads on a map Main geometric type: polyline, consisting

  • f a finite set of line segments (edges),

so that each segment endpoint (vertex) is shared by exactly two segments, except the two endpoints (if any).

Simple polyline: no intersections. Closed polyline: endpoints meet. Any curve can be approximated arbitrarily

closely with a polyline.

slide-4
SLIDE 4

MMDB-5 J. Teuhola 2012 112

Entity-based models (cont.)

(c) Two-dimensional = surfacic objects

Represent entities with a non-zero area. Main geometric type:

Polygon = region bounded by a closed polyline.

Convex polygon:

For any points A, B ∈ P, line segment AB is fully included in P. (d) Three-dimensional = volumetric objects (polyhedrons) (e) Four-dimensional = spatio-temporal objects

slide-5
SLIDE 5

MMDB-5 J. Teuhola 2012 113

Abstract space modeling: Field-based models

Called also space-based models The spatial information is considered a continuous (though

approximated) field, i.e. a function of coordinates (e.g. x and y).

Each point of space is associated with one or more attributes. Examples:

temperature, air pressure, height from sea level etc. at

different points on maps, see e.g. http://ilmatieteenlaitos.fi/

grey-level in a grey-scale digital image red, green and blue components in a true-color

(photographic) digital image

slide-6
SLIDE 6

MMDB-5 J. Teuhola 2012 114

Representation modes of spatial objects

Tessellation mode

  • Cellular decomposition (grid, mesh, tiling, etc.)
  • Fixed tessellation: regular grid (rastering)
  • Variable tessellation: different sizes of decomposition units
  • Regular/irregular tessellation
  • Default: N x M rectangular (usually square) cells, called pixels
  • Natural (discrete) representation for field-based data
  • Entity-based data: one pixel for points, a set of pixels for

polylines and polygons.

  • A more precise representation requires more storage space,

and its processing takes more time.

slide-7
SLIDE 7

MMDB-5 J. Teuhola 2012 115

Representation modes of spatial objects (cont.)

Vector mode Natural for the entity model.

Representation primitives: points and edges Polygon and polyline are both represented as lists of points 2n representations for a polygon with n vertices (selection of

starting vertex, clockwise/counterclockwise order)

A region is a set of polygons Representation may be complemented by restrictions

(e.g. to simple polygons)

Representing field-based data in vector mode;

Digital Elevation Models (DEM):

Field values only for a subset of points The rest of the values are interpolated. Example: Triangulated Irregular Networks (TIN)

slide-8
SLIDE 8

MMDB-5 J. Teuhola 2012 116

Representation modes of spatial objects (cont.)

Half-plane representation

Only a single primitive: half-plane (half-space generally) Sound mathematical basis Half-space definition in d-dimensional space: inequality

a1x1 + a2x2 + ... + adxd + ad+1 ≤ 0

Convex polygon = intersection of a finite number of half-planes. Polygon = union of a finite number of convex polygons. Line segment = convex polygon of dimension 1

(intersection of two half lines or rays)

Polyline = union of some line segments

slide-9
SLIDE 9

MMDB-5 J. Teuhola 2012 117

Compatibility of models and representations

Space model Entity-based Field-based Tessellation Represen- tation Vectors Half-planes Possible Natural Natural Possible Possible Unlikely

slide-10
SLIDE 10

MMDB-5 J. Teuhola 2012 118

Computational geometry: typical problems

Is a point inside a polygon? Intersection of line segments Intersections of polylines Intersection of polygons Windowing and clipping with a rectangle Polygon triangulation Polygon trapezoidalization Partitioning of a polygon into convex sub-polygons

slide-11
SLIDE 11

MMDB-5 J. Teuhola 2012 119

Computational geometry: algorithmic techniques for big problems

(a) Incremental algorithms

  • Solve the problem for a small subset of input and add the rest one

by one, maintaining the solution at each step. (b) Divide-and-conquer strategy

  • Divide step: recursively split the task into subproblems, until those

can be solved easily.

  • Conquer step: Merge the subproblem solutions bottom-up into a

global solution. (c) Sweep-line method

  • Decompose the input into vertical strips, so that the information

related to the problem is located on lines separating the strips.

slide-12
SLIDE 12

MMDB-5 J. Teuhola 2012 120

Storage and retrieval of spatial objects

Preliminary issues:

Arbitrary shapes difficult to handle ⇒

Restriction to axis-oriented Minimum Bounding Rectangles (MBR), called also Bounding Boxes (BB).

Dimensions are often transformed to the real interval [0, 1);

the whole space is a hypercube, denoted Ek. Performance factors:

Selected data structure Dimensionality of the space Distribution of objects in space:

density at point P = number of rectangles containing P global density = maximum of local densities.

slide-13
SLIDE 13

MMDB-5 J. Teuhola 2012 121

Illustration of MBRs

slide-14
SLIDE 14

MMDB-5 J. Teuhola 2012 122

Query types for spatial objects

(1)Exact-match query: Not very common for spatial objects, except in the context of insert. (2)Point query: For a point P ∈ Ek, find all rectangles R in the database such that P ∈ R. (3)Rectangle intersection: For a given rectangle S ⊆ Ek, find all rectangles R with S ∩ R ≠ ∅. (4) Rectangle enclosure: For a given rectangle S ⊆ Ek, find all rectangles R with S ⊆ R. (5)Rectangle containment: For a given rectangle S ⊆ Ek, find all rectangles R with R ⊆ S. (6)Volume query: Given v1, v2 ∈ (0,1) and v1 ≤ v2, find all rectangles with volume within [v1, v2]. (7)Spatial join: For two sets of k-dimensional rectangles, find all related pairs, satisfying a given join condition, such as intersection, enclosure, or containment.

slide-15
SLIDE 15

MMDB-5 J. Teuhola 2012 123

Illustration of spatial join

Intersection-join of { R1, R2, R3 } and { S1, S2, S3, S4 } is

{ (R1, S2), (R2, S2), (R3, S3) }

R1 R2 R3 S1 S2 S3 S4

slide-16
SLIDE 16

MMDB-5 J. Teuhola 2012 124

Transformation approach for organizing sets of spatial objects

k-dim. rectangle can be represented as a 2k-dimensional point. Alternatives e.g. in 2-dim. space:

(a) (cx, cy, ex, ey), where (cx, cy) is the center point and ex and ey are the distances of the center from the sides. (b) (lx, ly, ux, uy), where (lx, ly) is the lower left, and (ux, uy) is the upper right corner of the rectangle.

Advantage of alternative (a): Location coordinates cx and cy are

distinct from extension coordinates ex and ey. Special case:

1-dimensional space [0, 1) Rectangle = Line segment ⊆ [0, 1) Alternative 2-dimensional representations:

(a) (c, e) = (center, half of length) (b) (l, u) = (lower endpoint, upper endpoint)

slide-17
SLIDE 17

MMDB-5 J. Teuhola 2012 125

Example of the transformation approach

Notes:

When PAMs are applied to transformed representation, they

suffer from the empty triangles (so called dead regions).

The (center, extension) approach can be improved, if we know

an upper bound to the rectangle side; the ‘live space’ will look like a trapezoid, and the dead triangles are relatively small. 1 L1 L2 L3 L4 0.5 1 c e

  • P1

P2 P3 P4 1 l u

  • S1
  • S2
  • S3
  • S4

1

slide-18
SLIDE 18

MMDB-5 J. Teuhola 2012 126

Answering queries in the transformation approach

Successful areas for different types of queries can be derived by

simple geometric calculations.

Example: 1-dim. rectangles (= line segments) represented as 2-

  • dim. points using the (center, extension) approach; query

rectangle S = (c, e):

Drawback: Close, but different-volume rectangles may be located

quite far in 2k-dimensional space. c e 0.5 1 R ⊇ S R ∩ S ≠ ∅ R ⊆ S R ∩ S ≠ ∅ R ∩ S = ∅ R ∩ S = ∅

slide-19
SLIDE 19

MMDB-5 J. Teuhola 2012 127

Clipping approach for organizing sets of spatial objects

Assumption: Space is partitioned into disjoint rectangular

regions (such as with most PAMs).

A new rectangle R may be located in two main positions:

R is inside one region: Simple to handle (as in PAM). R intersects at least two regions.

In clipping, each intersection piece is inserted as a separate

rectangle, but all pieces point to the same actual object (stored elsewhere). R1 R2 R31 R32 R33 R34 R41 R42 R51 R52 R6

slide-20
SLIDE 20

MMDB-5 J. Teuhola 2012 128

Clipping approach: viewpoints

Advantages:

Clipping can be implemented almost directly with any PAM Points and rectangles can be stored in the same file

Disadvantages:

Increased space demand (multiple pointers to the same object) Increased insert and delete costs Overflow pages are needed, if the global density is high

Query performance:

Exact match, point and enclosure queries need only one page

access, if there are no overflows.

Intersection and containment queries may require all pieces of

the clipped query rectangle to be inspected. The number of false drops may be high.

slide-21
SLIDE 21

MMDB-5 J. Teuhola 2012 129

Overlapping regions for organizing sets of spatial objects

Each rectangle presented only once in the database. Rectangles are grouped into disk pages. A group region is represented by its Minimum Bounding Rectangle. Regions may overlap. Example:

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

slide-22
SLIDE 22

MMDB-5 J. Teuhola 2012 130

Overlapping regions: viewpoints

Possible drawbacks:

High overlap deteriorates performance. Overlap of MBRs may be much higher than overlap of the base

rectangles.

Exact-match query, insert and delete may require accessing

more than one data page.

Intersection and containment queries may require accesses to

the same pages, though the latter has usually a much smaller result size (every contained rectangle also intersects). Generalization:

Regions (MBRs of groups) may be grouped further into higher-

level rectangles.

A tree structure is thus formed.

slide-23
SLIDE 23

MMDB-5 J. Teuhola 2012 131

Index utilizing overlapping regions: R-tree

R-tree = Rectangle tree (Guttman 1984)

Balanced, dynamic external tree structure, where node = page. Used e.g. by Oracle spatial extension.

Node types:

A leaf contains (R, ptr) pairs where R is the MBR of the actual

spatial object, and ptr points to its precise representation.

An internal node contains (R, ptr) pairs, where R is the MBR of the

rectangles in a child, and ptr points to that child.

slide-24
SLIDE 24

MMDB-5 J. Teuhola 2012 132

Example R-tree

R10 R1 R2 R3 R4 R5 R6 R7 R8 R9 R11 R12 R13 R14 R15 R16 R1 R2 R3 R15 R16 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14

slide-25
SLIDE 25

MMDB-5 J. Teuhola 2012 133

Properties of the R-tree

Bounding rectangles on the path from root to leaf are nested. Otherwise, there are no restrictions for overlaps;

however they should be minimized.

For page capacity M (entries), a lower bound m ≤ M/2 is defined for

the number of entries per page.

For N entries, height ≤ ⎡logmN⎤−1, and number of nodes ≤ N/(m −1)

slide-26
SLIDE 26

MMDB-5 J. Teuhola 2012 134

R-tree: Basic queries

(a) Point query: Find objects that contain a given point. From the root node, search all subtrees (recursively) where MBR contains the point. From the leaf level we get pointers to candidate objects which are finally checked. (b) Intersection query: Find objects intersecting with the query

  • rectangle. Processing is similar as in (a), but now the condition is
  • verlap, not containment.

Other query types are generalized in the same way. Performance:

No guarantee, because multiple paths may have to be followed. The amount of overlap in index regions (corresponding to internal

nodes.) determines the performance. The insert operation plays the most important role in minimizing the overlap. The splitting of pages at overflow should minimize the overlap of the halves.

slide-27
SLIDE 27

MMDB-5 J. Teuhola 2012 135

Other spatial index structures

R*-tree: Improved version of R-tree (Beckmann et al, 1990)

Defers splitting of pages by using forced reinsert of rectangles

that are the most remote from the center of page MBR.

Applies a more sophisticated O(M log M)-time splitting heuristic. Outperforms R-tree Good also as a PAM (in low-dimensional spaces) A popular ‘reference structure’ for other spatial data structures.

X-tree (Berchtold et al, 1996)

Outperforms R*-tree in high-dimensional spaces. Adapts to the number of dimensions. General conjecture: When the dimensionality of the space

grows, a sequential index becomes more and more

  • advantageous. X-tree solves this by using variable-size nodes.
slide-28
SLIDE 28

MMDB-5 J. Teuhola 2012 136

Geographic databases: vocabulary

Geographic object:

Two components:

Descriptive component with alphanumeric attributes,

e.g. city: name, population

Spatial component (called also spatial object) describes the

geometry (location, shape), e.g. city: polygon in 2-dim. space. Atomic/complex geographic objects:

Complex object consists of other atomic/complex objects.

Theme:

Class (type) of geographic objects. Corresponds to a relation; it has a schema and instances. Example themes: Rivers, cities, countries, roads.

slide-29
SLIDE 29

MMDB-5 J. Teuhola 2012 137

Geospatial operations

Theme projection to a subset of descriptive attributes:

Corresponds to relational projection. Visual effect: part of the map attributes are dropped.

Theme selection on the basis of descriptive attributes:

Corresponds to relational selection. Keeps only the geographic objects satisfying a selection condition. Visual effect: part of the objects are dropped.

Geometric selection:

Windowing selects objects intersecting with a given rectangle. Point query selects objects whose geometry contains a given point. Clipping differs from windowing in that only intersections,

not whole geometric objects, are taken to the result.

slide-30
SLIDE 30

MMDB-5 J. Teuhola 2012 138

Geospatial operations (cont.)

Theme union:

Corresponds to relational union. Combines two themes having the same schema.

Theme overlay:

A common operation in GIS applications. Spatial join: compute intersections. New geographic objects are created from intersections, with

descriptive attributes of both components, spatial component being the geometric intersection.

Metric operations, e.g.:

Distance between Turku and Helsinki.

Topological operations, e.g.:

List countries adjacent to Finland (Sweden, Norway, Russia, Estonia) List cities reachable by train from Turku without stops (Salo, Loimaa).

slide-31
SLIDE 31

MMDB-5 J. Teuhola 2012 139

Geospatial software products

ArcGIS:

  • Group of tools for geographic information systems
  • Geodatabase is a central component - an object-relational

implementation of spatial data. For internet applications:

  • ArcIMS (Esri)
  • Mapserver (open-source)
  • GeoServer (open-source)
slide-32
SLIDE 32

MMDB-5 J. Teuhola 2012 140

Geospatial extensions ro relational database systems

Oracle Spatial:

SQL extended with operators on the spatial data type. Spatial indexing

R-tree Quadtree based on z-order numbering

Query optimization, e.g. for spatial joins.

PostgreSQL:

’Object-relational’ DBMS; open-source, popular Extended features:

Geometric types: point, line, line segment, rectangle, open and

closed polyline, polygon, circle.

Operations on geometric types: translation, scaling, various tests Supports a generalized GiST index, with R-tree as special case

Further extension package: PostGIS