Semantic integration
- f bibliographic records
(Linked Open Data )
Author: Malakhov D. A.
Semantic integration of bibliographic records (Linked Open Data ) - - PowerPoint PPT Presentation
Semantic integration of bibliographic records (Linked Open Data ) Author: Malakhov D. A. Introduction 2 There are many different sources of library data. Each organization can use only their information, which is not connected with
Author: Malakhov D. A.
2
There are many different sources of library data. Each organization can use only their information, which is not
connected with other sources.
Integration by space LOD (Linked Open Data) is a universal
solution of this problem.
LOD was created to integrate as much information as possible in
each subject area of it.
Publication of data in this space allows to enrich this
information and to provide an access to it.
2/13
3
The purpose is to integrate the NLR (National Library)
bibliographic records with records of the BNB (British National Library).
The NLR dataset has millions records (test set 17 th.). BNB data
set consists of 3.5 million units, it was published in the LOD.
To reach the purpose, it’s necessary to solve such problems as:
– Publication the NLR data according to the principles of LOD; – Integrating NLR data with BNB data.
3/13
4
Necessary actions for the publication of data : – Describing the subject area (creating an ontology). – Converting the NLR data (RUSMARC / bin) to RDF. – Configure the semantic RDF data repository for NLR data. – Providing an access to the NLR data (via HTTP and SPARQL).
4/13
There are three ways of presenting bibliographic records in RDF :
– MODS – the data model Library of Congress (USA). – Dublin core – the set of terms describing the network resources. – FOAF – the set of terms describing a person.
BNB reported it's data using Dublin core and FOAF. These
standards for data presentation were used.
5
5/13
6
6/13
7
Preparation XSLT transformation (RUSMARC/xml to RDF) Converting RUSMARC/bin to RDF
7/13
There are some ways to store semantic data :
There are 3 API for semantic storage:
We selected the TDB format and the Jena.
8
8/13
9
The server Jetty was chosen for processing HTTP requests. The server returns information about the record, the author or the
links, then it gets the full information about the object from the semantic storage via SPARQL.
The access point Fuseki which is set up with a logical conclusion
Pellet OWL is selected for processing SPARQL queries to storage.
9/13
The clustering algorithm has been developed to create a link.
The documents were linked by clusters.
The clustering algorithm :
1) Clusters are created on the basis of a set of data (for a few passages in this set). 2) The remaining elements are distributed in clusters (in one pass on these elements).
In the first instance the clusters of the NLR data were created. Then BNB data were distributed by the clusters. Links of documents and clusters were presented in RDF.
10
10/13
11
11/13
12
12/13
13
13/13