Semantic Web Techniques for Multiple Views on Heterogeneous - - PowerPoint PPT Presentation

semantic web techniques for multiple views on
SMART_READER_LITE
LIVE PREVIEW

Semantic Web Techniques for Multiple Views on Heterogeneous - - PowerPoint PPT Presentation

Semantic Web Techniques for Multiple Views on Heterogeneous Collections A Case Study Marjolein van Gendt, Antoine Isaac , Lourens van der Meij, Stefan Schlobach ECDL 2006 ECDL 2006 Outline Motivations and project Experiment


slide-1
SLIDE 1

Semantic Web Techniques for Multiple Views on Heterogeneous Collections

A Case Study

Marjolein van Gendt, Antoine Isaac, Lourens van der Meij, Stefan Schlobach ECDL 2006

slide-2
SLIDE 2

ECDL 2006

Outline

  • Motivations and project
  • Experiment
  • Collection formalization
  • Collection integration
  • Integrated collection access
  • Conclusion
slide-3
SLIDE 3

ECDL 2006

Motivation

  • Current CH trend: portals that build on heterogeneous

collections

  • Different databases
  • Documents described/ accessed according to different points
  • f view (controlled vocabularies/ MD schemes)
slide-4
SLIDE 4

ECDL 2006

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Description Base Y

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

Description Base X

Document Collection X Document Collection Y

Thesaurus x Thesaurus y

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Description Base Y

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

Description Base X

Document Collection X Document Collection Y

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Description Base Y

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

Description Base X

Document Collection X Document Collection Y

Thesaurus x Thesaurus y

slide-5
SLIDE 5

ECDL 2006

CH I nteroperability Problems

  • Current CH trend: portals that build on heterogeneous

collections

Different databases/ vocabularies/ MD schem es

  • Syntactic interoperability problem is being solved

Access can be granted, cf. deployed portals

  • Semantic interoperability still to be addressed

Links w ith original vocabularies/ MD structures are lost

slide-6
SLIDE 6

ECDL 2006

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Unified (Virtual) Description Base

DB X

Unified MD Scheme

  • Field 1
  • Field 1.1
  • Field 1.2

DB Y

No semantic information for description vocabulary

slide-7
SLIDE 7

ECDL 2006

STI TCH General Goals [Sem anTic Interoperability To access Cultural Heritage] Allow heterogeneous CH collections to be accessed

  • In a seamless way
  • Still benefiting from specific collection commitments

Keeping original m etadata schem es and vocabularies

slide-8
SLIDE 8

ECDL 2006

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Knowledge base DB Y DB X

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Knowledge base DB Y DB X

slide-9
SLIDE 9

ECDL 2006

STI TCH General Goals (2) Allow heterogeneous CH collections to be accessed

  • In a seamless way
  • Still benefiting from specific collection commitments

Keeping original m etadata schem es and vocabularies

Using Sem antic Web m eans for

  • Representation of the different points of view in one system
  • Creation and use of the alignment knowledge

2 m ethodological concerns

  • Generalize as much as possible
  • Automatize as much as possible
slide-10
SLIDE 10

ECDL 2006

Experiment On a reduced scale

  • 2 collections and associated vocabularies

Output w ished: insights on

  • Use of SW off-the-shelf techniques with CH-specific

resources

  • Impact of turning to standard proposals (SW-linked tools and

methods)

  • In a context of natural semantics (thesauri)
  • Added value of this effort
  • Quantitative and qualitative evaluation
  • Simple prototype for accessing documents
slide-11
SLIDE 11

ECDL 2006

1st Collection: KB I llustrated Manuscripts

slide-12
SLIDE 12

ECDL 2006

1st Collection: KB I llustrated Manuscripts

slide-13
SLIDE 13

ECDL 2006

2nd Collection: Rijksmuseum ARI A collection

slide-14
SLIDE 14

ECDL 2006

2nd Collection: Rijksmuseum ARI A collection

slide-15
SLIDE 15

ECDL 2006

Outline

  • Motivations and project
  • Experiment
  • Collection formalization
  • Collection integration
  • Integrated collection access
  • Conclusion
slide-16
SLIDE 16

ECDL 2006

Experiment Steps

slide-17
SLIDE 17

ECDL 2006

Steps

slide-18
SLIDE 18

ECDL 2006

Steps

  • Gathering vocabulary and collection data
  • Analyzing it
  • Transforming it using SW standards

All record/ vocabulary inform ation in one repository

slide-19
SLIDE 19

ECDL 2006

Collection Formalization Choices

  • Representation of vocabularies
  • Standard RDFS/ OWL encoding scheme: SKOS
  • Representation of records
  • Adhoc ontologies for collection MD schemes
  • Linking to SKOS concepts
  • RDF Schema repository: Sesam e
slide-20
SLIDE 20

ECDL 2006

Vocabulary Formalisation: ARI A in SKOS

slide-21
SLIDE 21

ECDL 2006

Steps

slide-22
SLIDE 22

ECDL 2006

Steps

  • Provide mappers with vocabulary data
  • Proceed to evaluation/ selection of their results
  • Put the alignment in the repository
slide-23
SLIDE 23

ECDL 2006

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Knowledge base DB Y DB X

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

MDS 1

  • Field 1
  • Field 1.1
  • Field 2
  • Field 2.1
  • Field 2.2

MDS 2

  • Field 1
  • Field 1.1
  • Field 1.2
  • Field 1.2.1
  • Field 1.3
  • Field 2

Knowledge base DB Y DB X

slide-24
SLIDE 24

ECDL 2006

Collection I ntegration: Ontology Mapping Tools Tests with 2 mapping tools

  • S-Match, Trento
  • Tree-like structures mapper
  • Falcon-AO, Nanjing
  • Standard OWL ontology mapper
  • Using
  • Lexical comparisons
  • Structural comparisons
  • Third resource (Wordnet as ‘oracle’)
slide-25
SLIDE 25

ECDL 2006

Collection I ntegration: Mappings

"Marine and other animals" "plants behaving as human beings or animals" "29B" "Marine and other animals" "animals acting as human beings" "29A" "Flowers, plants" "bush, shrubs ~ forest" "25H153" "Flowers, plants" "forest of coniferous trees" "25H152" "Flowers, plants" "deciduous forest" "25H151" "Flowers, plants" "fabulous lower plants" "25GG5" "Flowers, plants" "fabulous trees" "25GG3" "Flowers, plants" "language of flowers" "25G7" "Flowers, plants" "plants (in general)" "25G1" "Flowers, plants" "plants behaving as human beings or animals" "29B" ARIA label IC label IC code

42G family, relationship, descent Brothel scenes

slide-26
SLIDE 26

ECDL 2006

Partial evaluation

  • Conceptual level
  • evaluating links, not results of document searches
  • S-Match: 46% precision (subset of IC: 1500 concepts )
  • Falcon-AO: 16% precision (subset of IC)

Not m uch sense?

  • Difficulty to carry out com plete evaluation
  • Qualitative analysis reveals that im provem ent is

possible

slide-27
SLIDE 27

ECDL 2006

Nice results (S-Match)

  • Lexical matching: 23L
  • Lemmatization: 25A271
  • Background knowledge: 23U1
slide-28
SLIDE 28

ECDL 2006

Errors

  • Not enough NLP – 23H
  • Wrong Wordnet Disambiguation – 29D
slide-29
SLIDE 29

ECDL 2006

Steps

slide-30
SLIDE 30

ECDL 2006

Steps

  • Adapted faceted browsing paradigm

(Flam enco)

  • Search by navigating through several dimensions
  • Adaptation of the paradigm:

From facets corresponding to orthogonal dim ensions

  • f object description (‘m aterial’, ‘location’) to facets

corresponding to different conceptual schem es (ARIA, IconClass)

  • 3 views (sets of facet definitions) on

integrated collections

  • Single view
  • Combined view
  • Merged view
slide-31
SLIDE 31

ECDL 2006

Collections Access: Single View

  • Facets based on 1 concept scheme
  • Access to objects indexed against concepts from other schemes

If mapping between their index and the selected concepts A single point of view on integrated data set

slide-32
SLIDE 32

ECDL 2006

Collections Access: Combined View

  • Search based on 2 concepts schemes

Facets attached to the different vocabularies are presented Sim ultaneous access from different points of view on the sam e data

slide-33
SLIDE 33

ECDL 2006

Collections Access: Merged View

  • Facets using a merged concept scheme

with hierarchical links coming from schemes and alignment Making the links betw een vocabularies m ore visible during search A w ay to ‘enrich’w eakly structured vocabularies

slide-34
SLIDE 34

ECDL 2006

Outline

  • Motivations and project
  • Experiment
  • Collection formalization
  • Collection integration
  • Integrated collection access
  • Conclusion
slide-35
SLIDE 35

ECDL 2006

Steps

slide-36
SLIDE 36

ECDL 2006

Lessons learned: Collection Formalization Representing different vocabulary types using form al standards is feasible, but not trivial

  • Influence of the use of vocabularies on interpretation
  • Expressivity level is variable (weakly structured model
  • vs. complex ones)
  • Implies some loss of data

Part of the form alization is application and system - specific

  • E.g. depending on standard RDF Schema reasoning

services for SKOS axioms

slide-37
SLIDE 37

ECDL 2006

Steps

slide-38
SLIDE 38

ECDL 2006

Lessons learned: Collection I ntegration We have ontology m appers, not thesaurus m appers

  • Input: pre-processing to pure RDFS/ OWL ontologies
  • Mapping process
  • Using resources that may be absent from CH vocabularies
  • Rich formal/ structural information
  • Not (properly) using all information found in CH vocabularies
  • E.g. rich lexical information
  • Output: needs re-interpretation of mapping relations
slide-39
SLIDE 39

ECDL 2006

Steps

slide-40
SLIDE 40

ECDL 2006

Lessons learned: Collection Access

  • Prototype is thin layer on top of SW/ RDF technology
  • Easily adaptable for experimentation with different

views (without programming)

View data is also in RDF

http:/ / stitch.cs.vu.nl/ demo

slide-41
SLIDE 41

ECDL 2006

Conclusion: Food for Thought

  • Generally positive
  • Existing guidelines give a good starting point
  • Alignment results were weak, still demand human effort
  • Implementation is feasible
  • Questions
  • Is it technically scalable?
  • Can it be easily reproduced?
slide-42
SLIDE 42

ECDL 2006

Conclusion: Food for Thought

  • Standards methods and tools are important
  • They help for representation and dataflow
  • SKOS and linked methodologies
  • Alignment tool standard input and output (representation of

alignment)

  • They help for implementation
  • RDF/ S and related tools
  • Not everything can/ should be standardized
  • Dependence from applicative tools and tasks
  • Alignment require dealing with (variable) CH peculiarities

Methodological guidance has to take that into account: approaches can be general if they allow for some tuning

slide-43
SLIDE 43

ECDL 2006

Thanks

  • KB
  • Rijksmuseum
  • Tool developers
  • CATCH