The Europeana Use Case Multilingual & Semantic Interoperability - - PowerPoint PPT Presentation

the europeana use case
SMART_READER_LITE
LIVE PREVIEW

The Europeana Use Case Multilingual & Semantic Interoperability - - PowerPoint PPT Presentation

The Europeana Use Case Multilingual & Semantic Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop Contents Europeana:


slide-1
SLIDE 1

Multilingual & Semantic Interoperability in Cultural Heritage Information Systems

Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop

The Europeana Use Case

slide-2
SLIDE 2

Contents

  • Europeana: Multilingual Collections & Users
  • Multilingual Interoperability
  • Semantic Enrichment
  • Preview: New Enrichment Plans
  • Playing with Europeana Data

2

Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html

slide-3
SLIDE 3

Europeana

3

  • 15.2 million images
  • 10 million texts
  • 450,000 sound files
  • 170,000 video files

> 2,200 institutions > 30 countries

slide-4
SLIDE 4

Europeana Multilingual Collections

German 18% Multilingual 12% French 11% Dutch 10% Swedish 9% Spanish 8% English 7% Norwegian 6% Polish 6% Italian 6% Finnish 3% Danish 2% Hungarian 1% Slovenian 1%

à Most Europeana

  • bjects are language-

independent (e.g. images), but the meta- data is multilingual.

4

slide-5
SLIDE 5

Multilingual Europeana Users

  • Native language browser: 69%
  • Native language Google (entry point): 91%
  • Native language objects: 43% (SV 77%, DE 71%)

à Native language use increases as soon as native language content increases.

Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries.

5

Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9

slide-6
SLIDE 6

Multilingual Interface in 31 Languages

  • users seem to assume that search is affected

6

slide-7
SLIDE 7

Query Result Filtering by Language

  • language of record vs. language of content

7

slide-8
SLIDE 8

Document Translation

  • general MT – not domain-specific

8

slide-9
SLIDE 9

Query Translation – Planned for 2013

  • How many languages?
  • How much user interaction?

9

slide-10
SLIDE 10
  • concept (GEMET Thesaurus), agent (DBpedia), period

(Semium time ontology), place (Geonames)

10

Semantic Enrichment

slide-11
SLIDE 11

Poisonous India…

11

slide-12
SLIDE 12

Enrichment Challenges

  • Metadata quality & sparsity
  • Vocabulary ambiguity

– domain GEMET print  (German) Druck  pressure – language

electrical Power  (German) Strom  (Czech) strom  tree

– context

Córdoba = Spain | Argentina

Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain.

12

Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html

slide-13
SLIDE 13

Preview: New Enrichment Plans

13

à transition to linked data-based Europeana Data Model (EDM)

  • links to contextual vocabularies from providers
  • enrich during ingestion
slide-14
SLIDE 14

Playing with Europeana Data

  • CHiC: Cultural Heritage in CLEF

à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org

  • Europeana Linked Open Data

à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/

  • Contact: vivien.petras@ibi.hu-berlin.de

14

Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4