adding biodiversity datasets from argentinian patagonia
play

Adding Biodiversity Datasets from Argentinian Patagonia to the Web - PowerPoint PPT Presentation

Adding Biodiversity Datasets from Argentinian Patagonia to the Web of Data S4BioDiv 2017 2nd International Workshop on Semantics for Biodiversity Marcos Zrate, CENPAT - CONICET Germn Braun, GILIA - UNCOMA Pablo Fillottrani, DCIC - UNS


  1. Adding Biodiversity Datasets from Argentinian Patagonia to the Web of Data S4BioDiv 2017 – 2nd International Workshop on Semantics for Biodiversity Marcos Zárate, CENPAT - CONICET Germán Braun, GILIA - UNCOMA Pablo Fillottrani, DCIC - UNS

  2. Motivation • Currently there is a steadily growing wealth of biodiversity data from a wide range of disciplines which are available from on-line information systems around the world. • Biodiversity community has standardized shared common vocabularies such as Darwin Core (DwC) together with platforms as the Integrated Publishing Toolkit (IPT).

  3. Motivation • Since 2011 CENPAT has started to publicly share its biodiversity data under Open Data license. • These data are available as Darwin Core Archive (DwC-A) through IPT (http://ipt.cenpat- conicet.gob.ar:8081/)

  4. The Problem • IPT platform focuses on publishing content in unstructured or semi-structured formats but reducing the possibilities to interoperate with other datasets and make them accessible for machines. • Though the DwC is defined in an RDF document, integration of biodiversity data in the Semantic Web (SW) is in its early stages.

  5. Proposed Solution • We present a transformation process to publish biodiversity data as RDF datasets. • This process uses OpenRefine and RDF refine for generating RDF triples and define URIs. • We use GraphDB for storing, browsing, accessing and linking data with external RDF datasets.

  6. Proposed Architecture

  7. URI Definition • In order to generate URI for each resource, we use GREL (General Refine Expression Language) also provided by OpenRefine. • The general structure of the URIs is : ▫ http://[base uri]/[DwC class]/[value] • The resulting RDF triple for an occurrence is: ▫ SUBJECT <base_uri/occurrence/f6bbf85d-85ea-4605-87fad81aca73a1cd > ▫ PREDICATE rdf:type ▫ OBJECT dwc:Occurrence

  8. Exploitation: Conservation Status of Species • Information about the state of conservation is missing in CENPAT datasets.

  9. Exploitation: Occurrences by Year • The following query allows to observe the temporality of the occurrences and its results are visualised using R and ggplot2 package.

  10. Exploitation: Locations of Marine Mammals • This query retrieve the locations (latitude and longitude) for the species Mirounga Leonina , and its results are visualized using R and ggmap package.

  11. Results • In this initial stage only a few datasets were converted to RDF, our platform stored 502.00 RDF triples. • Also for the user to be able to exploit the dataset we define some SPARQL queries and their corresponding visualization using the statistical software R.

  12. Future work • As future works, we plan to automate some tasks of the process and interlink with more datasets. • Providing easier SPARQL access for non-skilled users. • We are analyzing other ontologies such as ENVO, NCBI and OWL Time and working on a suite of complementary ontologies for describing every aspect of semantic biodiversity.

  13. Links of interest • Github project ▫ https://github.com/cenpat-gilia/CENPAT-GILIA- LOD • SPARQL Endpoint ▫ http://crowd.fi.uncoma.edu.ar:3333/repositories/ BIO_CNP_GILIA • R scripts ▫ https://github.com/cenpat-gilia/CENPAT-GILIA- LOD/tree/master/r-scripts

  14. Thank you for your attention

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend