WTF (What Type of Feature)? Classifying Features for Advanced Data - - PowerPoint PPT Presentation

wtf what type of feature classifying features for
SMART_READER_LITE
LIVE PREVIEW

WTF (What Type of Feature)? Classifying Features for Advanced Data - - PowerPoint PPT Presentation

WTF (What Type of Feature)? Classifying Features for Advanced Data Linking, Searching and Analysis Capabilities International Cartographic Conference, Dresden Laura Kostanski | Pier Giorgio Zaccheddu| Linda Merrin | Rob Atkinson August 2013


slide-1
SLIDE 1

International Cartographic Conference, Dresden

WTF (What Type of Feature)?‐ Classifying Features for Advanced Data Linking, Searching and Analysis Capabilities

GOVERNMENT AND COMMERCIAL SERVICES THEME

Laura Kostanski| Pier‐Giorgio Zaccheddu| Linda Merrin | Rob Atkinson August 2013

slide-2
SLIDE 2

Today’s Presentation

1. Known Issues in this Domain 2. Possible Ways of Resolving Problems 3. Spatial Identifier Reference Framework (SIRF) 4. Next Steps

slide-3
SLIDE 3

What Happened?

Mildura, Victoria, Australia December 2012, Victoria Police warned motorists to not rely on maps produced by iOS6 “the official gazetteer – which is the authoritative reference for the names and locations of 384,104 places, objects and towns in the continent – contains an entry at the precise place to which Apple was directing hapless drivers until making a hurried correction on Monday” (Arthur, 2012) The Gazetteer of Australia contains over 380,000 place names, including 33 for Mildura.

slide-4
SLIDE 4

Which Data?

A search of the Gazetteer of Australia (www.ga.gov.au/place‐names) identifies 33 records which include the name ‘Mildura’, of these there are ‐ Mildura Rural City (with the same coordinates as where the tourists were unfortunately directed) ‐ Mildura (the actual township). The former is identified as POPL, while the latter is identified as LOCB. ‐ POPL is described by the Committee for Geographic Names Australasia (CGNA) as a ‘mapped populated place’ ‐ LOCB is described as a ‘Locality (bounded), Town, Village, Populated place, Local government town, Town site’.

slide-5
SLIDE 5

Confused Feature Type Referencing

From the original dataset, Victorian State Gazetteer ‐ Rural City of Mildura is a LGA‐ government administrative region covering 22,214 square kilometres‐ ‐ Mildura is a township, which the misdirected tourists were intent on visiting. A centroid coordinate of the expansive LGA boundary is positioned in the exact location to which hapless tourists were directed by iOS 6.

slide-6
SLIDE 6

Composition of a Gazetteer

Gazetteer entries consist of three minimum components‐ 1. An identifier (names) [N] 2. Feature type [F] 3. Footprint (point, line or polygon) [T] An identifier (N) can often be utilised multiple times for different location instances, and it is more often than not the feature type (T) which is identified to disambiguate one reference from another. However, current practices can hamper the discoverability and reliability of data for end users.

slide-7
SLIDE 7

Why Feature Typing?

In the world of gazetteers, a feature can be described as a real world entity and a feature type is ascribed for categorization purposes (usually from a pre‐ determined typing scheme or ontology) (Janowicz, 2008). Finney and Watts (2011) have identified three use‐case scenarios common to the discovery of feature‐level information‐

‐ data providers, when developing their exchange schema to check for conformity with community‐agreed standards on feature type definitions; ‐ dynamically by a machine client, attempting to resolve and interpret an online reference to some repository content, where the reference was located in an exchanged dataset; and, ‐ by a machine client requesting, or posting respository content to fulfill some component of its

  • wn internal functionality.
slide-8
SLIDE 8

What is the problem?

‐ It is in the typing and categorization of gazetteer data that information can be correlated and linked to make it more readily accessible and understood by both machines (for processing queries) and humans (for application and use). ‐ Machines are incapable of interpreting and understanding geospatial data in a way similar to human capabilities (Wiegand & Garcia, 2007). ‐ One of the main metadata requirements therefore is the adequate description of geospatial data feature types, so that similar details can be discovered and linked across multiple heterogeneous systems.

slide-9
SLIDE 9

Issues facing Interoperable Gazetteer Services

Based on the available literature, it can be assumed that the general issues affecting the development of truly interoperable gazetteer services (both thesauri and ontologies) include‐

Lack of clear definition of types (Zhao, Zhang, Wei, & Peng, 2008) ‐ Ambiguous referencing of types to categories (Jung, Sun, & Yuan, 2013) ‐ Missing metadata from original sources (Athanasis, Kalabokidis, Vaitis, & Soulakellis, 2009) ‐ Problems asserting ‘same‐as’

  • r ‘similar‐to’

definitions across gazetteers (Wiegand & Garcia, 2007)

  • CSIRO. UNSDI Gazetteer for Social Protection in Indonesia
slide-10
SLIDE 10

Ontology‐Defined Feature Catalogues

  • Various theorists prefer to explore the options for ontology‐defined feature

catalogues.

  • An ontology is generally referred to as an ‘explicit specification of a

conceptualization used to achieve a shared and common understanding of a particular domain of interest’ (Gruber, 1993).

  • An ontology includes specific reference to the characteristics of a feature type

so that it might readily be distinguished from all others, while also being recognized with various relationship definitions for other types (such as the is‐ a hierarchical relationship).

  • It also defines all relations so that the scope can be validated

for query and reasoning services required or applied by users (Janowicz, 1132).

slide-11
SLIDE 11

What is being done in this space?

1. Multiple international gazetteers and Feature Type Catalogues have emerged. 2. Two major catalogues referenced and accessed internationally

 Alexandria Digital Library  Geonames.org

3. National Gazetteers and Regional Gazetteers Developed

 INSPIRE  Australian Gazetteer

Presentation title | Presenter name | Page 11

slide-12
SLIDE 12

Alexandria Digital Library (ADL)

1. The ADL FTT has been segmented into six categories which contain amongst them 200 feature types:

 administrative areas  hydrographic features  land parcels  manmade features  physiographic features  Regions

2. 209 ‘preferred terms’ and 978 ‘lead‐in terms’

Presentation title | Presenter name | Page 12

slide-13
SLIDE 13

ADL‐ Example of ‘School’

Presentation title | Presenter name | Page 13

/concept/#id /concept/BT /concept/descriptor /concept/non‐descriptor /concept/NT /concept/RT /concept/UF 9 administrative areas school districts 18 agricultural schools 350 institutional sites educational facilities academies 350 institutional sites educational facilities agricultural schools 350 institutional sites educational facilities campuses 350 institutional sites educational facilities colleges 350 institutional sites educational facilities military schools 350 institutional sites educational facilities schools 350 institutional sites educational facilities seminaries 350 institutional sites educational facilities training centers 350 institutional sites educational facilities universities 350 institutional sites educational facilities library buildings 350 institutional sites educational facilities research facilities 679 military schools 1005 administrative areas school districts 1006 Schools

slide-14
SLIDE 14

Geonames.org

1. 660 fine‐level feature codes mapped to 9 categories

Presentation title | Presenter name | Page 14

Category CODE Description A country, state, region,... H stream, lake, ... L parks, area, ... P city, village, ... R road, railroad S spot, building, farm T mountain, hill, rock,... U Undersea V forest, heath, ...

slide-15
SLIDE 15

Geonames.org‐ Example of ‘School’

Presentation title | Presenter name | Page 15

Category CODE Feature CODE Feature Type Description S CTRM medical center a complex

  • f

health care buildings including two

  • r

more

  • f

the following: hospital, medical school, clinic, pharmacy, doctor's offices, etc. S CTRR religious center a facility where more than

  • ne

religious activity is carried

  • ut,

e.g., retreat, school, monastery, worship S MSSN mission a place characterized by dwellings, school, church, hospital and

  • ther facilities
  • perated

by a religious group for the purpose

  • f

providing charitable services and to propagate religion S MSSNQ abandoned mission S NOV novitiate a religious house or school where novices are trained S SCH school building(s) where instruction in

  • ne
  • r

more branches

  • f

knowledge takes place S SCHA agricultural school a school with a curriculum focused on agriculture S SCHC college the grounds and buildings of an institution of higher learning S SCHL language school Language Schools & Institutions S SCHM military school a school at which military science forms the core of the curriculum S SCHN maritime school a school at which maritime sciences form the core of the curriculum S SCHT technical school post‐secondary school with a specifically technical or vocational curriculum S UNIP university prep school University Preparation Schools & Institutions S UNIV university An institution for higher learning with teaching and research facilities constituting a graduate school and professional schools that award master's degrees and doctorates and an undergraduate division that awards bachelor's degrees.

slide-16
SLIDE 16

Australian Gazetteer‐ Categories

24 Feature Categories for 120 fine‐level feature types

Presentation title | Presenter name | Page 16

Category Type(s) Administrative AF Airfields BATH CHAN LDGE OCEN SEA Bathymetric BAY BGHT COVE GULF BaysandGulfs BORE RH SOAK SPRG TANK BoresTanksandWaterpoints BCST BLDG COMM CP FARM HMSD RSTA RUIN SCHL SITE YD BuiltStructures CAPE ISTH PEN PT SPIT Caves CAVE CoastalFeatures BANK BCH BRKW ENTR ESTY LH NAVB SHOL SND STR WRCK DamsandLocks DAM LOCK ForestsandAgriculture FRNG MONU TOWR TREE HillsandMountains FRST GRDN PLAN IslandsandReefs HILL MT PASS PEAK RDGE RNGE SLP Landmarks ARCH BRK IS REEF MineandFuelsites GASF MINE QUAR OtherLandforms CLAY CLIF DSRT DUNE PL PLN ROCK SPAN ParksandReserves CEM RESV PointsCapesandPeninsulas ANCH DOCK HBR PIER PORT PortsandDocks LOCB LOCU SUB URBN RoadsandTrails BRDG FORD GATE HWY ROAD RTRK STOK TRK TUNN TrigStations TRIG Unknown CRTR DEPR GORG VAL ValleysandDepressions INTL LAGN LAKE RES SWP WTRH WaterBodies BEND CNAL DRN GLCR RCH STRM WRFL WaterCourses CNTY CONT DI PRSH STAT

slide-17
SLIDE 17

Eurogeonames & INSPIRE

EuroGeoNames

  • From 2009 EuroGeoNames (EGN) has been combining Geographic Names

from 17 National Mapping and Cadastral Agencies across Europe to create a unique european‐centric gazetteer service and data set [insert PGZ ref]. The updated EGN service provides features as (http://www.eurogeographics.org/eurogeonames)

  • The EGN feature classification was originally developed distinctly for the

purposes of this European gazetteer service, as the existing pre‐defined European catalogues, or nationalised thesauri had been considered to be unsatisfactory for the purposes of a harmonised data model across the jurisdictions (Zaccheddu & Afflerbach, 2009; Zaccheddu & Sievers, 2005).

Presentation title | Presenter name | Page 17

slide-18
SLIDE 18

Eurogeonames & INSPIRE

  • Eurogeonames consists of 8 categories and 27 feature types.
  • The feature types have been translated into about 30 languages, comprising
  • f the “official”

and the minority language group ones for each participating

  • country. e.g. Germany translated the feature types into German, (Upper)

Sorbian and (Western) Frisian.

  • The designations of the feature type classifications were translated, rather

than the data content itself, i.e. the mapping of the content was done mostly

  • n a feature type basis. The EGN feature type designations were derived in

English (and mostly displayed as such on the portal) and translated from English into the defined EGN languages, e.g. in Germany into German, West (Frisian) and (Upper) Sorbian.

Presentation title | Presenter name | Page 18

slide-19
SLIDE 19

http://id.unsdis/id/catchment/567 http://id.unsdis/id/catchment/567 URL: spatial data access URL: spatial data access URL: observation data archive access URL: observation data archive access URL: live data access URL: live data access representations Basic properties provenance URL: virtual data product URL: virtual data product

Identifier Architecture

Sensor Web (image OGC 2006) Spatial Data Infrastructure

WFS WCS

Spatial databases Services

Linked Data Web

Observation Archive (Data Warehouse)

Transactions Data Marts

Services

Application Reports

Computational Models

SIRF

slide-20
SLIDE 20

Feature Types and SIRF

  • demonstrable user‐case scenarios associated with developing feature type

classifications to which finer‐level feature types are mapped ‐ Beard (2011)

  • a common feature type list be approached through the development
  • f one
  • r more high‐level classification schemes that can be implemented reliably in

a multilingual, multi‐script format.

  • As the target

for classifying detailed feature types in use in different source gazetteers.

  • Additional levels of detail could be added as processes, interests and

capabilities emerge to allow standardisation of a finer level of detail.

  • SIRF publishes all source and target vocabs as online resources

Presentation title | Presenter name | Page 20

slide-21
SLIDE 21

Presentation title | Presenter name | Page 21

slide-22
SLIDE 22

Feature Types and SIRF

  • The requirement within SIRF will be threefold

1. Support search narrowing 2. to be fine‐grained enough to allow for disambiguation

  • f similar feature

names 3. providing users with access to the original data sources to allow for validation and direct querying of the original feature type categories and descriptions

Presentation title | Presenter name | Page 22

slide-23
SLIDE 23

Particular work to be focused on in the immediate future includes:

work with UNGEGN on definition of feature categories

Methods for registering and subscribing existing gazetteer and ontologies to SIRF

Definitions of similarities between sets and instances (ie same‐as

  • r

similar‐to)

  • Scope for crowd‐sourcing

for determining similarities across heterogenous systems and the strength of these similarities (ie what are the minimum amount of similar connections required?)

Presentation title | Presenter name | Page 24

slide-24
SLIDE 24

Thank you

For more information Rob.atkinson@csiro.au

GOVERNMENT AND COMMERCIAL SERVICES THEME