Semantic Web Techniques for Multiple Views on Heterogeneous - - PowerPoint PPT Presentation
Semantic Web Techniques for Multiple Views on Heterogeneous - - PowerPoint PPT Presentation
Semantic Web Techniques for Multiple Views on Heterogeneous Collections A Case Study Marjolein van Gendt, Antoine Isaac , Lourens van der Meij, Stefan Schlobach ECDL 2006 ECDL 2006 Outline Motivations and project Experiment
ECDL 2006
Outline
- Motivations and project
- Experiment
- Collection formalization
- Collection integration
- Integrated collection access
- Conclusion
ECDL 2006
Motivation
- Current CH trend: portals that build on heterogeneous
collections
- Different databases
- Documents described/ accessed according to different points
- f view (controlled vocabularies/ MD schemes)
ECDL 2006
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Description Base Y
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
Description Base X
Document Collection X Document Collection Y
Thesaurus x Thesaurus y
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Description Base Y
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
Description Base X
Document Collection X Document Collection Y
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Description Base Y
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
Description Base X
Document Collection X Document Collection Y
Thesaurus x Thesaurus y
ECDL 2006
CH I nteroperability Problems
- Current CH trend: portals that build on heterogeneous
collections
Different databases/ vocabularies/ MD schem es
- Syntactic interoperability problem is being solved
Access can be granted, cf. deployed portals
- Semantic interoperability still to be addressed
Links w ith original vocabularies/ MD structures are lost
ECDL 2006
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Unified (Virtual) Description Base
DB X
Unified MD Scheme
- Field 1
- Field 1.1
- Field 1.2
- …
DB Y
No semantic information for description vocabulary
ECDL 2006
STI TCH General Goals [Sem anTic Interoperability To access Cultural Heritage] Allow heterogeneous CH collections to be accessed
- In a seamless way
- Still benefiting from specific collection commitments
Keeping original m etadata schem es and vocabularies
ECDL 2006
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Knowledge base DB Y DB X
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Knowledge base DB Y DB X
ECDL 2006
STI TCH General Goals (2) Allow heterogeneous CH collections to be accessed
- In a seamless way
- Still benefiting from specific collection commitments
Keeping original m etadata schem es and vocabularies
Using Sem antic Web m eans for
- Representation of the different points of view in one system
- Creation and use of the alignment knowledge
2 m ethodological concerns
- Generalize as much as possible
- Automatize as much as possible
ECDL 2006
Experiment On a reduced scale
- 2 collections and associated vocabularies
Output w ished: insights on
- Use of SW off-the-shelf techniques with CH-specific
resources
- Impact of turning to standard proposals (SW-linked tools and
methods)
- In a context of natural semantics (thesauri)
- Added value of this effort
- Quantitative and qualitative evaluation
- Simple prototype for accessing documents
ECDL 2006
1st Collection: KB I llustrated Manuscripts
ECDL 2006
1st Collection: KB I llustrated Manuscripts
ECDL 2006
2nd Collection: Rijksmuseum ARI A collection
ECDL 2006
2nd Collection: Rijksmuseum ARI A collection
ECDL 2006
Outline
- Motivations and project
- Experiment
- Collection formalization
- Collection integration
- Integrated collection access
- Conclusion
ECDL 2006
Experiment Steps
ECDL 2006
Steps
ECDL 2006
Steps
- Gathering vocabulary and collection data
- Analyzing it
- Transforming it using SW standards
All record/ vocabulary inform ation in one repository
ECDL 2006
Collection Formalization Choices
- Representation of vocabularies
- Standard RDFS/ OWL encoding scheme: SKOS
- Representation of records
- Adhoc ontologies for collection MD schemes
- Linking to SKOS concepts
- RDF Schema repository: Sesam e
ECDL 2006
Vocabulary Formalisation: ARI A in SKOS
ECDL 2006
Steps
ECDL 2006
Steps
- Provide mappers with vocabulary data
- Proceed to evaluation/ selection of their results
- Put the alignment in the repository
ECDL 2006
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Knowledge base DB Y DB X
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
MDS 1
- Field 1
- Field 1.1
- Field 2
- Field 2.1
- Field 2.2
- …
MDS 2
- Field 1
- Field 1.1
- Field 1.2
- Field 1.2.1
- Field 1.3
- Field 2
- …
Knowledge base DB Y DB X
ECDL 2006
Collection I ntegration: Ontology Mapping Tools Tests with 2 mapping tools
- S-Match, Trento
- Tree-like structures mapper
- Falcon-AO, Nanjing
- Standard OWL ontology mapper
- Using
- Lexical comparisons
- Structural comparisons
- Third resource (Wordnet as ‘oracle’)
ECDL 2006
Collection I ntegration: Mappings
"Marine and other animals" "plants behaving as human beings or animals" "29B" "Marine and other animals" "animals acting as human beings" "29A" "Flowers, plants" "bush, shrubs ~ forest" "25H153" "Flowers, plants" "forest of coniferous trees" "25H152" "Flowers, plants" "deciduous forest" "25H151" "Flowers, plants" "fabulous lower plants" "25GG5" "Flowers, plants" "fabulous trees" "25GG3" "Flowers, plants" "language of flowers" "25G7" "Flowers, plants" "plants (in general)" "25G1" "Flowers, plants" "plants behaving as human beings or animals" "29B" ARIA label IC label IC code
42G family, relationship, descent Brothel scenes
ECDL 2006
Partial evaluation
- Conceptual level
- evaluating links, not results of document searches
- S-Match: 46% precision (subset of IC: 1500 concepts )
- Falcon-AO: 16% precision (subset of IC)
Not m uch sense?
- Difficulty to carry out com plete evaluation
- Qualitative analysis reveals that im provem ent is
possible
ECDL 2006
Nice results (S-Match)
- Lexical matching: 23L
- Lemmatization: 25A271
- Background knowledge: 23U1
ECDL 2006
Errors
- Not enough NLP – 23H
- Wrong Wordnet Disambiguation – 29D
ECDL 2006
Steps
ECDL 2006
Steps
- Adapted faceted browsing paradigm
(Flam enco)
- Search by navigating through several dimensions
- Adaptation of the paradigm:
From facets corresponding to orthogonal dim ensions
- f object description (‘m aterial’, ‘location’) to facets
corresponding to different conceptual schem es (ARIA, IconClass)
- 3 views (sets of facet definitions) on
integrated collections
- Single view
- Combined view
- Merged view
ECDL 2006
Collections Access: Single View
- Facets based on 1 concept scheme
- Access to objects indexed against concepts from other schemes
If mapping between their index and the selected concepts A single point of view on integrated data set
ECDL 2006
Collections Access: Combined View
- Search based on 2 concepts schemes
Facets attached to the different vocabularies are presented Sim ultaneous access from different points of view on the sam e data
ECDL 2006
Collections Access: Merged View
- Facets using a merged concept scheme
with hierarchical links coming from schemes and alignment Making the links betw een vocabularies m ore visible during search A w ay to ‘enrich’w eakly structured vocabularies
ECDL 2006
Outline
- Motivations and project
- Experiment
- Collection formalization
- Collection integration
- Integrated collection access
- Conclusion
ECDL 2006
Steps
ECDL 2006
Lessons learned: Collection Formalization Representing different vocabulary types using form al standards is feasible, but not trivial
- Influence of the use of vocabularies on interpretation
- Expressivity level is variable (weakly structured model
- vs. complex ones)
- Implies some loss of data
Part of the form alization is application and system - specific
- E.g. depending on standard RDF Schema reasoning
services for SKOS axioms
ECDL 2006
Steps
ECDL 2006
Lessons learned: Collection I ntegration We have ontology m appers, not thesaurus m appers
- Input: pre-processing to pure RDFS/ OWL ontologies
- Mapping process
- Using resources that may be absent from CH vocabularies
- Rich formal/ structural information
- Not (properly) using all information found in CH vocabularies
- E.g. rich lexical information
- Output: needs re-interpretation of mapping relations
ECDL 2006
Steps
ECDL 2006
Lessons learned: Collection Access
- Prototype is thin layer on top of SW/ RDF technology
- Easily adaptable for experimentation with different
views (without programming)
View data is also in RDF
http:/ / stitch.cs.vu.nl/ demo
ECDL 2006
Conclusion: Food for Thought
- Generally positive
- Existing guidelines give a good starting point
- Alignment results were weak, still demand human effort
- Implementation is feasible
- Questions
- Is it technically scalable?
- Can it be easily reproduced?
ECDL 2006
Conclusion: Food for Thought
- Standards methods and tools are important
- They help for representation and dataflow
- SKOS and linked methodologies
- Alignment tool standard input and output (representation of
alignment)
- They help for implementation
- RDF/ S and related tools
- Not everything can/ should be standardized
- Dependence from applicative tools and tasks
- Alignment require dealing with (variable) CH peculiarities
Methodological guidance has to take that into account: approaches can be general if they allow for some tuning
ECDL 2006
Thanks
- KB
- Rijksmuseum
- Tool developers
- CATCH