The Europeana Use Case Multilingual & Semantic Interoperability - - PowerPoint PPT Presentation
The Europeana Use Case Multilingual & Semantic Interoperability - - PowerPoint PPT Presentation
The Europeana Use Case Multilingual & Semantic Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop Contents Europeana:
Contents
- Europeana: Multilingual Collections & Users
- Multilingual Interoperability
- Semantic Enrichment
- Preview: New Enrichment Plans
- Playing with Europeana Data
2
Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html
Europeana
3
- 15.2 million images
- 10 million texts
- 450,000 sound files
- 170,000 video files
> 2,200 institutions > 30 countries
Europeana Multilingual Collections
German 18% Multilingual 12% French 11% Dutch 10% Swedish 9% Spanish 8% English 7% Norwegian 6% Polish 6% Italian 6% Finnish 3% Danish 2% Hungarian 1% Slovenian 1%
à Most Europeana
- bjects are language-
independent (e.g. images), but the meta- data is multilingual.
4
Multilingual Europeana Users
- Native language browser: 69%
- Native language Google (entry point): 91%
- Native language objects: 43% (SV 77%, DE 71%)
à Native language use increases as soon as native language content increases.
Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries.
5
Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9
Multilingual Interface in 31 Languages
- users seem to assume that search is affected
6
Query Result Filtering by Language
- language of record vs. language of content
7
Document Translation
- general MT – not domain-specific
8
Query Translation – Planned for 2013
- How many languages?
- How much user interaction?
9
- concept (GEMET Thesaurus), agent (DBpedia), period
(Semium time ontology), place (Geonames)
10
Semantic Enrichment
Poisonous India…
11
Enrichment Challenges
- Metadata quality & sparsity
- Vocabulary ambiguity
– domain GEMET print (German) Druck pressure – language
electrical Power (German) Strom (Czech) strom tree
– context
Córdoba = Spain | Argentina
Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain.
12
Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html
Preview: New Enrichment Plans
13
à transition to linked data-based Europeana Data Model (EDM)
- links to contextual vocabularies from providers
- enrich during ingestion
Playing with Europeana Data
- CHiC: Cultural Heritage in CLEF
à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org
- Europeana Linked Open Data
à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/
- Contact: vivien.petras@ibi.hu-berlin.de
14
Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4