Homer: a case study of federation among open data portals Nives - - PowerPoint PPT Presentation

homer a case study of federation among open data portals
SMART_READER_LITE
LIVE PREVIEW

Homer: a case study of federation among open data portals Nives - - PowerPoint PPT Presentation

Homer: a case study of federation among open data portals Nives Alciato - CSI Piemonte nives.alciato@csi.it The initiative of Piedmont Region Regional law on Open Data Guidelines for reuse Adoption of a standard licence model


slide-1
SLIDE 1

Homer: a case study of federation among

  • pen data portals

Nives Alciato - CSI Piemonte nives.alciato@csi.it

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
  • Regional law on Open Data
  • Guidelines for reuse
  • Adoption of a standard licence model
  • Creation of a working group
  • Diffusion to other Public Administrations
  • Reuse at national level
  • European Projects
  • Metadata catalogues
  • Data uploading platform
  • A portal as an access point for data and information

The initiative of Piedmont Region

slide-5
SLIDE 5

Regional Law n. 24 dated 23/12/2011

  • First regional law in Italy on Open Data

Basic principle:

  • Data belong to people

Cornerstones of reusability of data:

  • Diffusion without restriction and in open

and standard digital formats

  • Use of standard legal tools Creative

Common Licences

  • Re-use and re-distribution of data is free
  • f charge

Legal framework

slide-6
SLIDE 6

Organizational framework

Regional level An initiative whith ANCI Piemonte (association of municipalities): dati.piemonte.it is the infrastructure for all the regional territory (120 Municipalities and other bodies like ARPA Piemonte and Unioncamere) National level Re-use of the platform and joint project with Emilia Romagna Region and Milano Municipality European level HOMER project to transfer methodological / technical standards and increase circulation and re-use of public data OPENDAI project to improve a new architectural model to increase digital services and business

  • pportunities
slide-7
SLIDE 7

Technological framework: a permanent beta

Si riesce a trasformare queste scatole in una grafica più carina? Portal Search DATA Operational data bases

  • f PAs

New p w pla latform rm from Op Open dat ata a to Dat Data S a Services an and a F a Federat ated Se Searc rch Engin gine

  • Harmonize policies and

licenses for the re-use of data

  • Federation of Open Data

Portal

  • Open data silos PA
  • Cloud architecture
  • Open data Services
slide-8
SLIDE 8
slide-9
SLIDE 9

HOMER is the acronym of Harmonising Open data in the MEditerranean through better access and Reuse of public sector information

www.homerproject.eu

  • It is a project within the MED Programme financed by the EU

Commission

  • Implementation Starting date 01/04/2012
  • Implementation End date 31/03/2015
slide-10
SLIDE 10

Who are the Homer’s Partners

13 Partners as territorial government and 6 Partners as technological support

Country Partner Mission Spain SARGA - Agencia de Gestion Agraria y Pesquera de Andalucia Territorial Gov. AGAPA - Sociedad Aragonesa de Gestión Agroambiental Territorial Gov. FUNDITEC – Foundation for Development, Innovation and Technology Technical Support France Région Provence-Alpes-Côte d'Azur, Territorial Gov. Région Corse Territorial Gov. AVITEM – Agency for sustainable Mediterranean cities and territories Technical Support FING – Fondation Internet Nouvelle Generation Technical Support Italy Piedmont Region Project Leader Sardinia, Emilia-Romagna and Veneto Regions Territorial Gov. CSI Piemonte Technical Support Slovenia Geodetic Institute Territorial Gov. Montenegro Mediterranean University of Montenegro Territorial Gov. Greek GFOSS – The Greek Free Open Source Software Society Technical Support Crete Decentralized Administration of Crete Territorial Gov. University of Crete Technical Support Cyprus Sewerage Board of Limassol – Amathus Territorial Gov. Malta Local Council Ass. of Malta Gozo Territorial Gov.

slide-11
SLIDE 11

HOMER’s objectives

a federation of Open Data portals among partners, sharing common datasets related to MED strategic domains (agriculture, culture, energy, environment, tourism), ensuring long sustainability and exploiting a huge number

  • f harmonized and federated datasets, enhancing the e-

participation and digital market opportunities of the MED citizens

CSI Piemonte’s responsabilities in HOMER

it is the developer of a Federation of Open Data Portals among partners providing ICT and legal support an and it is the promoter of the reuse of the technological solutions underlying the portal, developed in the context of the project

slide-12
SLIDE 12

What we intend for federation of open data portals? “Federation” means the virtual system composed by a software able to collect and retrieve the metadata of published data derived from the 5 categories (agriculture, culture, energy, environment, tourism) exposed and searched by Open Data Partners Portals ‘

Look at this symbol: it represents the metadata catalogue

slide-13
SLIDE 13
  • Memorandum of Understanding
  • Definition of a metadata common structure for federation
  • Use of EuroVoc
  • The cross lingual search
  • The federated search multi-language engine
  • The indexing scenario
  • The searching scenario

Design, methodology, and approach

slide-14
SLIDE 14

Legal framework - Memorandum of Understanding

Partners have been involved upon signing a Memorandum of Understanding where technological, organizational and legal boundaries have been defined as common understanding for everybody and referring to the Directive 2013/37/EU It is indicated that all technological components of the solution for the Federation (Index, Semantic Search Engine, Translator) are provided and managed – under the conditions and the coordination

  • f CSI Piemonte – that releases them on the basis of an open

source philosophy

slide-15
SLIDE 15

Data framework – the metadata structure

Each Open Data Portals share metadata common fields: this structure builds the Federated Index

title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id

  • wner

Inspire DCAT CKAN Dublin Core

Intersecting the Protocols and Directives in the schema, it has been identified the minimun common set of fields for the definition of a metadata structure and to federate, indipendent from the type of dataset geographical or alphanumerical

slide-16
SLIDE 16

Data framework – the use of EuroVoc (1)

Homer, now, speaks 7 languages (spanish, french, italian, slovenian, serbian-montenegrin, greek and english) ​with 4 different alphabets and we must share a dictionary to communicate

title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id

  • wner

iso code 639-1 to identify the language

slide-17
SLIDE 17

Data framework – the use of EuroVoc (2)

EuroVoc is a multilingual, multidisciplinary thesaurus of the EU conformant to W3C recommendations and in it a specific concept

  • f the 5 categories involved has the same classification and

meaning in the domains and languages

title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id

  • wner

iso code 639-1 to identify the language Homer’s categories = EuroVoc domains WATER νερό VODA вода AGUA EAU ACQUA Each ODP inserts tags in the metadata cards in its own language without the burden of translation The same concept is identified in all languages

slide-18
SLIDE 18

The semantic search multi language engine needs a specific common structure to index and retrieve the metadata of all metadata catalogues of the Homer’s Partners’ Open Data Portals.

The search engine is like a librarian who finds books only if the request form is filled out in a specific way

Field_0

Data framework – The cross lingual search

slide-19
SLIDE 19

The technological solution for indexing and searching among all the federated

  • pen data portals has 4 components:
  • 1. Fed-Index Homer: the federated index file component containing the

complete list of metadata

  • 2. Fed-Translator: the component that translates every tags of the datasets via

EuroVoc

  • 3. Fed-Searcher: the centralized semantic search engine component
  • 4. Fed-Loader API: the loader that calls the API o Webservices exposed by

each Open Data Portal to create the federated Index Based on the open source project Apache Sorl Released open source on sourceforge

Technological framework: the federated search multi language engine

slide-20
SLIDE 20

Technological framework: the indexing scenario (1) The indexing process requires that each federated portal exposes the metadata cards of the data using 2 types of url

url1 that returns the list of the data id: Package List

1

url2 that returns the attributes for the single data: Package Dataset

2

It is a stand alone process scheduled, which could be nightly

slide-21
SLIDE 21

Technological framework: the indexing scenario (2)

Scheduled

Eau Voda Agua Water

Opendata Portals Search Engine

slide-22
SLIDE 22

Technological framework: the indexing scenario (3)

3 ways supported to expose the metadata: API CKAN compliant: Package List >url1 that returns a xml file1 with the list of the data id Package Dataset > url2 that returns a xml file2 with the attributes for the single data Web services dati.piemonte.it compliant: Package List >url1 that returns a xml file1 with the list of the data id http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_list2&format

=xml&layout=xml

Package Dataset > url2 that returns a xml file2 with the attributes for the single data

http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_item2&format =xml&layout=xml&itemid=1083

API Catalogue Service for the Web compliant: Package List > url1 that returns a csw file1 with the list of the data id Package Dataset > url2 that returns a csw file2 with the attributes for the single data

slide-23
SLIDE 23

Field_0

Technological framework: the indexing scenario (4)

3 ways supported to expose the metadata: API CKAN compliant Web services dati.piemonte.it compliant API CSW compliant

slide-24
SLIDE 24

Technological framework: the searching scenario

User

1 Search in lang of the portal Open Data Portal (ODP)

Search Engine (SE)

2 ODP call SE adding lang 3 ODP use EuroVoc and search in the index in all lang 3 5 The User chooses a data and goes on the corresponding portal portal 5 4 SE return a list of result

slide-25
SLIDE 25

Results and ongoing activities

The Federation in terms of:

  • shared knowledge, experiences and relationships among the Partners
  • pen hundreds of public datasets enhancing digital heritage

transparency and promoting open data culture across the Mediterranean

  • looking for new stakeholders as it is possible to configure new

categories and new languages

slide-26
SLIDE 26

Nives Alciato – CSI Piemonte nives.alciato @csi.it

www.dati.piemonte.it www.homerproject.eu

Thank you !

slide-27
SLIDE 27

Step 4- technical requirements API Web Services like ‘www.dati.piemonte.it’

An open data portal like dati.piemonte.it exposes 2 urls url1 that returns a xml file1 with the list of the data id: Package List http://www.dati.piemonte.it/index.ph p?option=com_rd&view=pceli_list2&f

  • rmat=xml&layout=xml

<urlOggetti totale="434" baseUrl="http://www.dati.piemonte.i t/index.php?option=com_rd&view=pcel i_item2&format=xml&layout=xml&itemi d=" data=""> <urlOggetto>1083</urlOggetto>

1

url2 that returns a xml file2 with the attributes for the single data Package Dataset http://www.dati.piemonte.it/index.ph p?option=com_rd&view=pceli_item2 &format=xml&layout=xml&itemid=10 83

<package> <package_id>1083</package_id> <url>http://www.dati.piem..</url> <title>DWUMA DW Utenti ..</title> <description> Base dati decisionale ... </description>

2

slide-28
SLIDE 28

Step 4 - technical requirements API set interface like CKAN

A Ckan compliant API expects 2 urls url1 that returns a json file1 with the list of the data id: Package List http://data.gov.uk/api/rest/package

[ "human-resources-datasets", "veterinary-residues-data", ... ]

1

url2 that returns a json file2 with the attributes for the single data Package Dataset http://data.gov.uk/api/rest/package/h uman-resources-datasets

{ license_title: "", maintainer: null, maintainer_email: null, id: "00029d8d-1be7-4435-9ef8", metadata_created: "2013-08-30", relationships: [ ], ...

2

slide-29
SLIDE 29

Step 4 - technical requirements Catalogue Services for the Web (CSW)

A Geoportals exposing metadata with 2 methods of CSW protocols: url1 that returns a csw file1 with the list of the data id: Package List http://webgis.arpa.piemonte.it/geopo rtalserver_arpa/csw?REQUEST=GetRe cords

1

url2 that returns a csw file2 with the attributes for the single data Package Dataset http://webgis.arpa.piemonte.it/geopo rtalserver_arpa/csw?request=GetRec

  • rdById&service=CSW&version=2.0.2

&id=ARLPA_TO_16.08.01-D_2011-11- 03-9:58

<csw:GetRecordByIdResponse> <gmd:MD_Metadata xsi:schemaLocat <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58 </gco:CharacterString> </gmd:fileIdentifier> <gmd:language> ...

2

<csw:GetRecordsResponse> <csw:SearchStatus timestamp="201 <csw:SearchResults ... <gmd:MD_Metadata> <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58