Mauro Michielon ICT Development NTTS conference, Brussels, 12 th - - PowerPoint PPT Presentation

mauro michielon ict development ntts conference brussels
SMART_READER_LITE
LIVE PREVIEW

Mauro Michielon ICT Development NTTS conference, Brussels, 12 th - - PowerPoint PPT Presentation

Web tools for accessing and disseminating data of different formats Mauro Michielon ICT Development NTTS conference, Brussels, 12 th March 2015 EEAs context EEA needs to deal with a vast number of very heterogeneous datasets, coming from


slide-1
SLIDE 1

Web tools for accessing and disseminating data of different formats

Mauro Michielon

ICT Development

NTTS conference, Brussels, 12th March 2015

slide-2
SLIDE 2

EEA’s context

EEA needs to deal with a vast number of very heterogeneous datasets, coming from Member states, EU institutions, universities, research centres and private sector

this is a challenging task and for long time there has been a need of creating procedures for streamlining incoming data in order overcome:

  • inconsistency of dataset layouts which has leaded to data management difficulties
  • poor interoperability levels due to data format fragmentation
  • constant need re-adapt processing chains…
  • …or need of constant manual interaction for dataset normalization
  • …and induced instability of layouts in data visualization

Technological advancements, based on Open Linked Data and web based interactive visualizations libraries, are helping to improve the situation in

  • rder to remove such obstacles for our stakeholders.
slide-3
SLIDE 3

Data storage – Open Data - Virtuoso OpenLink

Linked open data approach, triple store based on: https://github.com/openlink/virtuoso-opensource Data transformation to RDF and ingestion from reliable sources are automated Datasets are exposed and can be queried via the EEA’s public endpoint:

http://semantic.eea.europa.eu/sparql

Possibility to link data coming from different data sources by the use common dictionaries

  • Interoperability: easier (machine to machine) exchange of data with partner institutions

The endpoint can process requests which must be coded in Sparql language

  • utput results can be of format:
  • humane readable formats:

HTML,CSV,TSV

  • machine to machine consumables:

JSON, XML, XML+schema

slide-4
SLIDE 4

Data visualization: Daviz

Daviz: https://github.com/eea/eea.daviz - based on Google Charts libraries Data visualization tool developed and used by EEA to create interactive data visualizations

Daviz is capable to consume the output of a sparql query and visualize the results

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX sdmx-measure: <http://purl.org/linked-data/sdmx/2009/measure#> PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#> PREFIX property: <http://rdfdata.eionet.europa.eu/eurostat/property#> PREFIX geo: <http://dd.eionet.europa.eu/vocabulary/eurostat/geo/> PREFIX unit: <http://dd.eionet.europa.eu/vocabulary/eurostat/unit/> PREFIX product: <http://dd.eionet.europa.eu/vocabulary/eurostat/product/> PREFIX indic_nrg: <http://dd.eionet.europa.eu/vocabulary/eurostat/indic_nrg/> PREFIX sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#> SELECT year(?date) as ?date ?product_label ( sum(?B_100900) - COALESCE(sum(?B_101600),0) ) as ?value WHERE { { GRAPH <http://rdfdata.eionet.europa.eu/eurostat/data/nrg_100a.rdf.gz> { _:nrg_100a sdmx-dimension:refArea ?geo . FILTER (?geo = geo:EU28) . _:nrg_100a sdmx-attribute:unitMeasure unit:1000TOE . _:nrg_100a sdmx-dimension:timePeriod ?date . _:nrg_100a property:product ?product . FILTER (?product in (product:2000, product:3000, product:4000, product:5100, product:5500)) { _:nrg_100a property:indic_nrg indic_nrg:B_100900 . _:nrg_100a sdmx-measure:obsValue ?B_100900 . } UNION { _:nrg_100a property:indic_nrg indic_nrg:B_101600 . _:nrg_100a sdmx-measure:obsValue ?B_101600 . } } ?product rdfs:label ?product_label . } } GROUP BY ?date ?product_label ?product ORDER BY ?date ?product_label

The query is published: it is the document where the methodology is exposed to the general public

slide-5
SLIDE 5

Key-facts

With the combined use of Linked Open Data techniques and Daviz application in the context of web products (indicators, SOER 2015, etc…), we try to enforce:

  • Consistency: via the use of reusable programmatic procedures for data processing
  • Transparency: methodologies/algorithms are made public (general public QC/QA allowed)
  • Traceability: data harvesting processes and sparql queries executions are time stamped

consistency + transparency + traceability = trust