SLIDE 1 Homer: a case study of federation among
Nives Alciato - CSI Piemonte nives.alciato@csi.it
SLIDE 2
SLIDE 3
SLIDE 4
- Regional law on Open Data
- Guidelines for reuse
- Adoption of a standard licence model
- Creation of a working group
- Diffusion to other Public Administrations
- Reuse at national level
- European Projects
- Metadata catalogues
- Data uploading platform
- A portal as an access point for data and information
The initiative of Piedmont Region
SLIDE 5 Regional Law n. 24 dated 23/12/2011
- First regional law in Italy on Open Data
Basic principle:
Cornerstones of reusability of data:
- Diffusion without restriction and in open
and standard digital formats
- Use of standard legal tools Creative
Common Licences
- Re-use and re-distribution of data is free
- f charge
Legal framework
SLIDE 6 Organizational framework
Regional level An initiative whith ANCI Piemonte (association of municipalities): dati.piemonte.it is the infrastructure for all the regional territory (120 Municipalities and other bodies like ARPA Piemonte and Unioncamere) National level Re-use of the platform and joint project with Emilia Romagna Region and Milano Municipality European level HOMER project to transfer methodological / technical standards and increase circulation and re-use of public data OPENDAI project to improve a new architectural model to increase digital services and business
SLIDE 7 Technological framework: a permanent beta
Si riesce a trasformare queste scatole in una grafica più carina? Portal Search DATA Operational data bases
New p w pla latform rm from Op Open dat ata a to Dat Data S a Services an and a F a Federat ated Se Searc rch Engin gine
licenses for the re-use of data
Portal
- Open data silos PA
- Cloud architecture
- Open data Services
SLIDE 8
SLIDE 9 HOMER is the acronym of Harmonising Open data in the MEditerranean through better access and Reuse of public sector information
www.homerproject.eu
- It is a project within the MED Programme financed by the EU
Commission
- Implementation Starting date 01/04/2012
- Implementation End date 31/03/2015
SLIDE 10 Who are the Homer’s Partners
13 Partners as territorial government and 6 Partners as technological support
Country Partner Mission Spain SARGA - Agencia de Gestion Agraria y Pesquera de Andalucia Territorial Gov. AGAPA - Sociedad Aragonesa de Gestión Agroambiental Territorial Gov. FUNDITEC – Foundation for Development, Innovation and Technology Technical Support France Région Provence-Alpes-Côte d'Azur, Territorial Gov. Région Corse Territorial Gov. AVITEM – Agency for sustainable Mediterranean cities and territories Technical Support FING – Fondation Internet Nouvelle Generation Technical Support Italy Piedmont Region Project Leader Sardinia, Emilia-Romagna and Veneto Regions Territorial Gov. CSI Piemonte Technical Support Slovenia Geodetic Institute Territorial Gov. Montenegro Mediterranean University of Montenegro Territorial Gov. Greek GFOSS – The Greek Free Open Source Software Society Technical Support Crete Decentralized Administration of Crete Territorial Gov. University of Crete Technical Support Cyprus Sewerage Board of Limassol – Amathus Territorial Gov. Malta Local Council Ass. of Malta Gozo Territorial Gov.
SLIDE 11 HOMER’s objectives
a federation of Open Data portals among partners, sharing common datasets related to MED strategic domains (agriculture, culture, energy, environment, tourism), ensuring long sustainability and exploiting a huge number
- f harmonized and federated datasets, enhancing the e-
participation and digital market opportunities of the MED citizens
CSI Piemonte’s responsabilities in HOMER
it is the developer of a Federation of Open Data Portals among partners providing ICT and legal support an and it is the promoter of the reuse of the technological solutions underlying the portal, developed in the context of the project
SLIDE 12 What we intend for federation of open data portals? “Federation” means the virtual system composed by a software able to collect and retrieve the metadata of published data derived from the 5 categories (agriculture, culture, energy, environment, tourism) exposed and searched by Open Data Partners Portals ‘
Look at this symbol: it represents the metadata catalogue
SLIDE 13
- Memorandum of Understanding
- Definition of a metadata common structure for federation
- Use of EuroVoc
- The cross lingual search
- The federated search multi-language engine
- The indexing scenario
- The searching scenario
Design, methodology, and approach
SLIDE 14 Legal framework - Memorandum of Understanding
Partners have been involved upon signing a Memorandum of Understanding where technological, organizational and legal boundaries have been defined as common understanding for everybody and referring to the Directive 2013/37/EU It is indicated that all technological components of the solution for the Federation (Index, Semantic Search Engine, Translator) are provided and managed – under the conditions and the coordination
- f CSI Piemonte – that releases them on the basis of an open
source philosophy
SLIDE 15 Data framework – the metadata structure
Each Open Data Portals share metadata common fields: this structure builds the Federated Index
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id
Inspire DCAT CKAN Dublin Core
Intersecting the Protocols and Directives in the schema, it has been identified the minimun common set of fields for the definition of a metadata structure and to federate, indipendent from the type of dataset geographical or alphanumerical
SLIDE 16 Data framework – the use of EuroVoc (1)
Homer, now, speaks 7 languages (spanish, french, italian, slovenian, serbian-montenegrin, greek and english) with 4 different alphabets and we must share a dictionary to communicate
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id
iso code 639-1 to identify the language
SLIDE 17 Data framework – the use of EuroVoc (2)
EuroVoc is a multilingual, multidisciplinary thesaurus of the EU conformant to W3C recommendations and in it a specific concept
- f the 5 categories involved has the same classification and
meaning in the domains and languages
title description url metadata source package_id topics language tags geographic bounding box refresh date creation date spatial scale resolution license id
iso code 639-1 to identify the language Homer’s categories = EuroVoc domains WATER νερό VODA вода AGUA EAU ACQUA Each ODP inserts tags in the metadata cards in its own language without the burden of translation The same concept is identified in all languages
SLIDE 18
The semantic search multi language engine needs a specific common structure to index and retrieve the metadata of all metadata catalogues of the Homer’s Partners’ Open Data Portals.
The search engine is like a librarian who finds books only if the request form is filled out in a specific way
Field_0
Data framework – The cross lingual search
SLIDE 19 The technological solution for indexing and searching among all the federated
- pen data portals has 4 components:
- 1. Fed-Index Homer: the federated index file component containing the
complete list of metadata
- 2. Fed-Translator: the component that translates every tags of the datasets via
EuroVoc
- 3. Fed-Searcher: the centralized semantic search engine component
- 4. Fed-Loader API: the loader that calls the API o Webservices exposed by
each Open Data Portal to create the federated Index Based on the open source project Apache Sorl Released open source on sourceforge
Technological framework: the federated search multi language engine
SLIDE 20
Technological framework: the indexing scenario (1) The indexing process requires that each federated portal exposes the metadata cards of the data using 2 types of url
url1 that returns the list of the data id: Package List
1
url2 that returns the attributes for the single data: Package Dataset
2
It is a stand alone process scheduled, which could be nightly
SLIDE 21
Technological framework: the indexing scenario (2)
Scheduled
Eau Voda Agua Water
Opendata Portals Search Engine
SLIDE 22
Technological framework: the indexing scenario (3)
3 ways supported to expose the metadata: API CKAN compliant: Package List >url1 that returns a xml file1 with the list of the data id Package Dataset > url2 that returns a xml file2 with the attributes for the single data Web services dati.piemonte.it compliant: Package List >url1 that returns a xml file1 with the list of the data id http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_list2&format
=xml&layout=xml
Package Dataset > url2 that returns a xml file2 with the attributes for the single data
http://www.dati.piemonte.it/index.php?option=com_rd&view=pceli_item2&format =xml&layout=xml&itemid=1083
API Catalogue Service for the Web compliant: Package List > url1 that returns a csw file1 with the list of the data id Package Dataset > url2 that returns a csw file2 with the attributes for the single data
SLIDE 23
Field_0
Technological framework: the indexing scenario (4)
3 ways supported to expose the metadata: API CKAN compliant Web services dati.piemonte.it compliant API CSW compliant
SLIDE 24
Technological framework: the searching scenario
User
1 Search in lang of the portal Open Data Portal (ODP)
Search Engine (SE)
2 ODP call SE adding lang 3 ODP use EuroVoc and search in the index in all lang 3 5 The User chooses a data and goes on the corresponding portal portal 5 4 SE return a list of result
SLIDE 25 Results and ongoing activities
The Federation in terms of:
- shared knowledge, experiences and relationships among the Partners
- pen hundreds of public datasets enhancing digital heritage
transparency and promoting open data culture across the Mediterranean
- looking for new stakeholders as it is possible to configure new
categories and new languages
SLIDE 26
Nives Alciato – CSI Piemonte nives.alciato @csi.it
www.dati.piemonte.it www.homerproject.eu
Thank you !
SLIDE 27 Step 4- technical requirements API Web Services like ‘www.dati.piemonte.it’
An open data portal like dati.piemonte.it exposes 2 urls url1 that returns a xml file1 with the list of the data id: Package List http://www.dati.piemonte.it/index.ph p?option=com_rd&view=pceli_list2&f
<urlOggetti totale="434" baseUrl="http://www.dati.piemonte.i t/index.php?option=com_rd&view=pcel i_item2&format=xml&layout=xml&itemi d=" data=""> <urlOggetto>1083</urlOggetto>
1
url2 that returns a xml file2 with the attributes for the single data Package Dataset http://www.dati.piemonte.it/index.ph p?option=com_rd&view=pceli_item2 &format=xml&layout=xml&itemid=10 83
<package> <package_id>1083</package_id> <url>http://www.dati.piem..</url> <title>DWUMA DW Utenti ..</title> <description> Base dati decisionale ... </description>
2
SLIDE 28 Step 4 - technical requirements API set interface like CKAN
A Ckan compliant API expects 2 urls url1 that returns a json file1 with the list of the data id: Package List http://data.gov.uk/api/rest/package
[ "human-resources-datasets", "veterinary-residues-data", ... ]
1
url2 that returns a json file2 with the attributes for the single data Package Dataset http://data.gov.uk/api/rest/package/h uman-resources-datasets
{ license_title: "", maintainer: null, maintainer_email: null, id: "00029d8d-1be7-4435-9ef8", metadata_created: "2013-08-30", relationships: [ ], ...
2
SLIDE 29 Step 4 - technical requirements Catalogue Services for the Web (CSW)
A Geoportals exposing metadata with 2 methods of CSW protocols: url1 that returns a csw file1 with the list of the data id: Package List http://webgis.arpa.piemonte.it/geopo rtalserver_arpa/csw?REQUEST=GetRe cords
1
url2 that returns a csw file2 with the attributes for the single data Package Dataset http://webgis.arpa.piemonte.it/geopo rtalserver_arpa/csw?request=GetRec
- rdById&service=CSW&version=2.0.2
&id=ARLPA_TO_16.08.01-D_2011-11- 03-9:58
<csw:GetRecordByIdResponse> <gmd:MD_Metadata xsi:schemaLocat <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58 </gco:CharacterString> </gmd:fileIdentifier> <gmd:language> ...
2
<csw:GetRecordsResponse> <csw:SearchStatus timestamp="201 <csw:SearchResults ... <gmd:MD_Metadata> <gmd:fileIdentifier> <gco:CharacterString> ARLPA_TO_16.08.01-D_2011-11-03-9:58