Integrating Heterogeneous and Distributed Information about Marine - - PDF document

integrating heterogeneous and distributed information
SMART_READER_LITE
LIVE PREVIEW

Integrating Heterogeneous and Distributed Information about Marine - - PDF document

Yannis Tzitzikas et al., MTSR 2013, 1 Thessaloniki Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 , M.


slide-1
SLIDE 1

1

1 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

  • Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 ,
  • M. Doerr 1 , N. Minadakis 1 , T.Patkos 1 , L. Candela 3

1 Institute of Computer Science, FORTH-ICS 2 Computer Science Department, University of Crete, GREECE 3 Consiglio Nazionale delle Ricerche, CNR-ISTI, Pisa, Italy

7th Metadata and Semantics Research Conference (MTSR), Thessaloniki, Nov 19-22, 2013

slide-2
SLIDE 2

2

Outline

  • Context, Problem, Objectives
  • Main Approaches for Integration
  • The Followed Approach

– The Ontology MarineTLO

  • Objectives, Benefits, Architecture

– The MarineTLO-based Warehouse

  • Exploitation Scenarios
  • Concluding Remarks

3 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Context: iMarine

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 4

Id: It is an FP7 Research Infrastructure Project (2011-2014) Final goal: launch an initiative aimed at establishing and operating an e- infrastructure supporting the principles of the Ecosystem Approach to fisheries management and conservation of marine living resources. Partners:

slide-3
SLIDE 3

3

Problem and objectives

The Problem

  • There are several sources of the marine domain, but each of

them stores complementary information structured according to its needs. Our objective

  • Harmonize and integrate (link, connect) information of the

marine domain

– Specific motivating scenario and use cases will be given at the end

5 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Marine Information: in several sources

6

WoRMS: World Register of Marine Species Registers more than 200K species ECOSCOPE- A Knowledge Base About Marine Ecosystems (IRD, France) FLOD (Fisheries Linked Data) of Food and Agriculture Organization (FAO) of the United Nations FishBase: Probably the largest and most extensively accessed online database

  • f fish species.

DBpedia

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-4
SLIDE 4

4

Marine Information: in several sources

7

Taxonomic information Ecosystem information (e.g. which fish eats which fish) Commercial codes General information, occurrence data, including information from other sources General information, figures

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Storing

complementary

information

Marine Information: in several sources

8

Web services (SOAP/WSDL) RDF + OWL files SPARQL Endpoint Relational Database SPARQL Endpoint

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Using and accessed through

different technologies

slide-5
SLIDE 5

5

Main approaches for Integration

In general there are two main approaches for integration Warehouse approach (materialized integration)

  • Design Phase: The underlying sources (and their parts) have to be selected
  • Creation Phase: Process for getting and creating the warehouse
  • Maintenance Phase: Ability to create the warehouse from scratch, and/or ability

to update parts of it

  • Mappings are exploited to extract information from data sources, to transform it

to the target model and then to store it at the central repository Mediator approach (virtual integration)

  • The mediator receives a query formulated in terms of the unified model/schema.

The mappings are used to enable query translation. The derived sub-queries are sent to the wrappers of the individual sources, which transform them into queries over the underlying sources. The results of these sub-queries are sent back to the mediator where they are assembled to form the final answer

9 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Main approaches for integration (cont.)

Warehouse

  • Benefit: Flexibility in transformation

logic (including ability to curate and fix problems)

  • Benefit: Decoupling of the release

management of the integrated resource from the management cycles of the underlying sources

  • Benefit: Decoupling of access load from

the underlying sources.

  • Benefit: Faster responses (in query

answering but also in other tasks, e.g. if

  • ne wants to use it for applying an entity

matching technique).

  • Shortcomings You have to pay the cost for

hosting the warehouse. You have to refresh periodically the warehouse

10

Mediator

  • Benefit: One advantage (but in some

cases disadvantage) of virtual integration is the real-time reflection of source updates in integrated access

  • Comment: The higher complexity of

the system (and the quality of service demands on the sources) is

  • nly justified if immediate access to

updates is indeed required.

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-6
SLIDE 6

6

Main approaches for integration (cont.)

In both cases we need a unified model/schema

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 11

The ontology MarineTLO

(Marine Top Level Ontology)

slide-7
SLIDE 7

7

MarineTLO: Objectives

  • MarineTLO aims at being a global core model that

– provides a common, agreed-upon and understanding of the concepts and relationships holding in the marine domain to enable knowledge sharing, information exchanging and integration between heterogeneous sources – covers with suitable abstractions the marine domain to enable the most fundamental queries, – can be extended to any level of detail on demand, and – allows data originating from distinct sources to be adequately mapped and integrated

  • MarineTLO is not supposed to be the single ontology covering

the entirety of what exists

13 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

MarineTLO: Benefits from a Top-Level Ontology

  • The adoption of a global core model has various

benefits:

– reduced effort for improving and evolving

  • the focus is given on one model, rather than many (the results are

beneficial for the entire community

– reduced effort for constructing mappings

  • this approach avoids the inevitable combinatorial explosion and

complexities that results from pair-wise mappings between individual metadata formats and/or ontologies

14 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-8
SLIDE 8

8

MarineTLO: Key Design Principles

  • Formulation

– It is an object-oriented semantic model, expressed to a form comprehensible to both documentation experts and information scientists while readily can be converted to machine-readable formats such as RDF Schema, OWL, etc

  • Metaclasses

– certain types of inference about classes is supported in an analogous way as classes support certain types of inference about instances

  • Monotonicity

– It aims to be monotonic in the sense of Domain Theory: the existing constructs and the deductions made from them should remain valid and well-formed, even as new constructs are added to the MarinTLO

15 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

MarineTLO: Query capabilities

It allows formulating complex queries, e.g.:

1.Given the scientific name of a species, find its predators with the related taxon-rank classification and with the different codes that the organizations use to refer to them.

  • 2. Given the scientific name of a species, find the ecosystems,

waterareas and countries that this species is native to, and the common names that are used for this species in each of the countries

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 16

slide-9
SLIDE 9

9

The notion of competence queries as driver

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 17

#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps), find/give me Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and more general descriptive information of it (such as the country) Q2 its common names and their complementary info (e.g. languages and countries where they are used) Q3 the water areas and their FAO codes in which the species is native Q4 the countries in which the species lives Q5 the water areas and the FAO portioning code associated with a country Q6 the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of the water area) Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identification information (e.g. several codes provided by different organizations) Q8 a map w.r.t. Country and Predator, providing for each predator both the identification information and the biological classification Q9 who discovered it, in which year, the biological classification, the identification information, the common names - providing for each common name the language, the countries where it is used in.

MarineTLO as Product

  • The “full” version of MarineTLO (Version3.0.0)

– aims at covering any part of the marine domain – contains 70 classes and 41 properties

  • The “operational” version, for the needs of

iMarine(Version 3.0.0)

– used for building MarineTLO Warehouse (Version 3.0.0) – contains 92 classes and 41 properties – applied for integrating data mainly from FLOD, ECOSCOPE, part of WoRMS and FISHBASE sources

  • URL: www.ics.forth.gr/isl/MarineTLO

18 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-10
SLIDE 10

10

FORTH, i-Marine, Ostend, January 2013 19 TLO Entity Temporal Phenomenon

Persistent Item Actor

Physical Man Made Thing

Man Made Thing

Conceptual Object

Physical Thing Event

S-Class Level (Version 3.0.0)

Exclusive Economic Zone

Codification System Identifier

EEZCode

FAOGearTypeIdentifier FAOVesselTypeIdentifier Man Made Object

Vessel

Water Area Area Sub Area Division Sub Division Ecosystem

Human Activity

Attribute Assignment

Country Code Assignment Ecosystem Code Assignment

Scientific Name Assignment Common Name Assignment Water Area Code Assignment

Country

Class Level (excerpt)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 19 FORTH, i-Marine, Ostend, January 2013 20 TLO Entity Type

Temporal Phenomen

  • n Type

Persistent Item Type

Actor Type Digital Object type Conceptual Object Type

Identifier Type

Physical Thing Type Event Type

Equipment Type

Gear Type

Vessel Type Ecosystem Type Human Activity Type

Marine Ecosystem Type

Attribute Assignment Type

Biotic Element Type Fish Base Marine Animal Type

Marine Animal Type DBpedia Marine Animal Type

WoRMS Marine Animal Type

FLOD Marine Animal Type

ECOSCOPE Marine Animal Type

Meta Class Level (Version 3.0.0)

Meta Class Level (excerpt)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 20

slide-11
SLIDE 11

11

Example 1: ThunnusAlbacares

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 21

MarineSpecies

relatedIdentifierAssigment relatedAuthorshipAssigment

Thunnus_alba cares blank_node_Thu nnus_albacares assignedName “Thunnus Albacares”

Actor

blank_node_Bo nnaterre

name reference

Attribute Assignment PersistentItem Scientific Name Assignment Event

assignedDate assignedIdentifier

assignedDate

“1788” relatedIdentifierAssigment relatedAuthorshipAssigment

“Bonnaterre” name

assignedName

Example 2: Scientific name assignment

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 22

slide-12
SLIDE 12

12

Ecosystem

isAssocitedWith isAssociatedWith

Antarctic Elephant I Atlantic Antarctic

Country Water Area Marine Species

usualluIsBioticElementOf

native Introduced Endemic usualluIsBioticElementOf native Introduced Endemic

usualluIsBioticElementOf

native Introduced Endemic

isAssocitedWith isAssocitedWith

Poromitra crassiceps

Example 3: Species Establishment

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 23

Exploiting MarineTLO

slide-13
SLIDE 13

13

Ways to use/exploit MarineTLO

1. For constructing semantic warehouses which:

– can answer queries which cannot be answered by the underlying sources individually – can aid the construction of mappings between instances – can be exploited for various other task

  • We shall see how they are exploited in the context of semantic post-

processing of search results

2. Various other uses

– For publishing Linked Data – For mashing up facts

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 25

Publishing Linked Data, Mashups For semantic- post processing

  • f search results

Constructing Warehouses offering Complex query answering

26 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-14
SLIDE 14

14

The MarineTLO-based Warehouse

MarineTLO Warehouse

Warehouse construction and evolution process

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 28

Define requirements in terms

  • f competence queries

Fetch the data from the selected sources (SPARQL endpoints, services, etc) Queries Transform and Ingest to the Warehouse Inspect the connectivity of the Warehouse Formulate rules creating sameAs relationships Apply the rules to the warehouse Rules for Instance Matching sameAs triples Ingest the sameAs relationships to the warehouse Test and evaluate the Warehouse (using competence queries) produces creates

Warehouse

produces Triples uses uses uses

MaTWare MaTWare MaTWare

slide-15
SLIDE 15

15

The MarineTLO-based warehouse’s contents: used sources

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 29 RDF Triple Store

MarineTLO

FLOD ECOSCOPE WoRMS (part of)

FLOD-to-TLO mapping Replicate Replicate ECOSCOPE-to-TLO mapping WoRMS-to-TLO mapping Replicate DBpedia-to-TLO mapping FishBase-to-TLO mapping

DBpedia (part of) FishBase (part of)

Replicate Replicate

The MarineTLO-based warehouse’s contents: in numbers

iMarine 2nd Review, September 2013,Brussels

Source Species Number DBpedia 14,291 FLOD 10,849 WoRMS 1124 Ecoscope 277 FishBase 31,277 Common Species (size of intersections) FLOD WoRMS Ecoscope Fishbase DBpedia 3,046 731 56 9833 FLOD 768 73 6141 WoRMS 53 1288 Ecoscope 53

  • Now contains information about 37,000 distinct marine

species (including Fishbase). Number of triples: 2,970,058

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 30

slide-16
SLIDE 16

16

The MarineTLO-based warehouse’s contents: concepts

iMarine 2nd Review, September 2013,Brussels

Concepts Ecoscope FLOD WoRMS DBpedia Fishbase

Species Scientific Names Authorships Common Names Predators Ecosystems Countries Water Areas Vessels Gears EEZ

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 31

Exploiting the MarineTLO-based Warehouse for Semantic Post-Processing of Search Results

slide-17
SLIDE 17

17

For Semantic Post-Processing: The process

query terms (top-L) results (+ metadata)

Entity Mining Semantic Analysis Visualization/Interaction

(faceted search, entity exploration, annotation, top-k graphs, etc.)

entities / contents semantic data web browsing contents

  • Grouping,
  • Ranking
  • Retrieving more

properties

33 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

MarineTLO Warehouse

XSearch-Portlet Screenshot

Search Results Result of Entity Mining Result of textual clustering

34 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

The Warehou se is used The Warehou se is used

slide-18
SLIDE 18

18

From FLOD From DBpedia From Ecoscope From WoRMS

Example of an EntityCard

  • f Xsearch (if the entity’s

type is Species)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

The Warehou se is used

XSearch as a bookmarklet

Annotating entities over the original page

exploration Entity exploration

36 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

The Warehou se is used

slide-19
SLIDE 19

19

Concluding Remarks

Concluding Remarks

  • To tackle the need for having integrated sets of facts about marine species,

and thus to assist research about species and biodiversity, we have described a top level ontology for that domain.

– It provides a unified and coherent core model for schema mapping which enables formulating and answering queries which cannot be answered by any individual source.

  • We detailed the process of constructing MarineTLO-based warehouses. The

current warehouse contains information about more than 37K marine species

  • We have identified and described particular use cases and applications that

exploit this ontology and it warehouse.

38 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-20
SLIDE 20

20

Future Work and Research

  • Next steps

– Finalize and make accessible the next release of the warehouse (in 2013)

  • Current and Future Research

– Focus on quality/connectivity issues

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki 39

Links

  • MarineTLO
  • http://www.ics.forth.gr/isl/MarineTLO/
  • TripleStores

– MarineTLO-Warehouse: http://virtuoso.i-marine.d4science.org:8890/sparql

– also browsable through http://virtuoso.i-marine.d4science.org:8890/fct

  • Systems

– X-Search and gCube Search

  • Portlet: https://i-marine.d4science.org/ (in various VREs, e.g. FCPPS ,

iSearch)

  • Web Applications:

– http://62.217.127.118/x-search/ (over Bing and MarineTLO-Warehouse) – http://62.217.127.118/x-search-fao/ (over ECOSCOPE and MarineTLO-Warehouse)

40 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

slide-21
SLIDE 21

21

Thank you for your attention

41 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Visit and send us feedback:

www.ics.forth.gr/isl/MarineTLO