distributed databases
play

Distributed Databases Stefan Kufer and Andreas Henrich - PowerPoint PPT Presentation

Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases Stefan Kufer and Andreas Henrich stefan.kufer@uni-bamberg.de University of Bamberg Media Informatics Group Stuttgart, 09.03.2017 Motivation age of


  1. Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases Stefan Kufer and Andreas Henrich stefan.kufer@uni-bamberg.de University of Bamberg Media Informatics Group Stuttgart, 09.03.2017

  2. Motivation  age of social media: creation and distribution of media items → maintained in (personal) media archives  large, heterogeneous distributed database of various resources (= nodes in the network) … → adequate indexing techniques are needed heterogeneous resources in the distributed database Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 2) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  3. Problem Description  search criteria to be adressed:  text  timestamps  content features  geographic information  retrieval tasks in a distributed environment  resource description problem  resource selection problem  (result merging) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 3) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  4. Search Scenario resource A  general preliminaries:  set of resources  each resource maintains a set of geotagged media items [lat/y=48.22, [lat/y=-33.86, lon/x=11.62] lon/x=151.22]  plate-carrée projection summarize  lat/lon coordinates = y/x coordinates in a 2-dimensional plane  more general spatial data scenario  summaries of the spatial content of a resource resource description  query routing based on summaries Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 4) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  5. Search Scenario = query object resource description summarize A similarity query summarize criterion: d(q,o) resource d = Euclidean selection distance q = query object B 1. C o = database object 2. A 3. B summarize C = resource data point (database object) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 5) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  6. Resource Descriptions  objective: encoding sets of two-dimensional data points  effectiveness → accurate delineation (selectivity)  efficiency → compact storage (space efficiency)  categories of resource descriptions techniques (previous work): [KBH12], [KBH13], [KH14]  Geometric Approaches  Space Partitioning Approaches  Hybrid Approaches Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 6) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  7. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 2 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 7) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  8. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 3 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 8) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  9. Geometric Approaches  approaches that organize the data  one | several bounding volumes ( bv ) to delimit the set of data points → extents of bv described in summaries  evaluated approaches:  MBR (as a comparative baseline)  RecMAR k = maximum number of Minimum Area Rectangles k MBR RecMAR 6 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 9) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  10. Space Partitioning Approaches  approaches that organize the embedding space  decompose the space into disjoint subspaces identify regions (not) containing data points → information about cell occupancy in summaries (0 = non-occupied, 1 = occupied)  evaluated approach: other examples ( not evaluated !) uniform grid  UFS n n = number of sites/subspaces kd space partitioning UFS 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 10) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  11. Space Partitioning Approaches  global space partitioning → the same for all resources ! (summaries only need to contain information about cell occupancy) A B C D  space partitioning must be adapted to the data distribution of the whole data collection !  additional tasks:  collect information about the data distribution in the network  partition space, distribute information in the network  (update information as data collection changes) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 11) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  12. Hybrid Approaches  combine properties of two arbitrary resource description techniques  method A: builds foundation, method B: refines foundation  evaluated approach: b  KDMBR n = number of subspaces, b = number of bits per bound (4* b for an MBR) n → summary: binary information about cell occupancy (foundation), quantized MBR information for occupied cells (refinement) 3 KDMBR 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 12) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  13. Novel Quadtree-based Resource Description Techniques  quadtree: recursive division of space into four quadrants  regular decomposition (equal sized cells) → linear storage of quadtrees possible (memory efficient representation) [MRJ02]  linear quadtree encoding types:  only black nodes encoding cf. paper!  whole quadtree structure (all internal nodes + leaves) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 13) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  14. Novel Quadtree-based Resource Description Techniques  linear quadtrees: allow for local space partitioning  adapted to the data distribution of the single resource A B C D  area-driven decomposition of the space, parameters:  c → maximum number of subspaces of the quadtree structure (storage space oriented stopping criterion)  a → threshold area, if undercut by all black cells: end of construction (selectivity oriented stopping criterion) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 14) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  15. Novel Quadtree-based Resource Description Techniques  QT c,a  space partitioning ( sp ) technique resource-individual sp (local sp ) QT 32,0.1 c,a  GridQT r = number of rows (columns = 2* r ) r  hybrid technique uniform grid (global sp ) + qt-structure (local sp ) 32,0.1 GridQT 4 c,a  KDQT n  hybrid technique kd-structure (global sp ) + qt-structure (local sp ) 32,0.1 KDQT 32 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 15) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  16. Novel Quadtree-based Resource Description Techniques b  QTMBR c,a  hybrid technique qt structure (local sp ) 3 + quantized MBRs ( bv ) QTMBR 32,0.1 c,a  MBRQT  hybrid technique external MBR ( bv ) 32,0.1 MBRQT + qt-structure (local sp ) Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 16) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  17. Resource Selection - Ranking  all techniques describe areas containing data points → ranking is based on minimum distance between cf. paper the areas of a resource and the query point q for details! = resource data point = query point q A B example: mindist of the areas described by the summary of resource B < mindist of the areas described by the summary of resource A ⇒ B ranked higher than A Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 17) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

  18. Evaluation – Data Collection  406,450 geo-referenced images from Flickr  5,951 different users → 5,951 resources  long-tail distribution of data to resources  data space: densely populated and unpopulated areas vary 4 log-scaled! n=4.0 → 10 – 1 = 9.999 Quadtree-based Resource Description Techniques for Spatial Data in Distributed Databases (p. 18) Stefan Kufer and Andreas Henrich − BTW 2017 in Stuttgart, March 09, 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend