Semantic integration of bibliographic records (Linked Open Data ) - - PowerPoint PPT Presentation

semantic integration
SMART_READER_LITE
LIVE PREVIEW

Semantic integration of bibliographic records (Linked Open Data ) - - PowerPoint PPT Presentation

Semantic integration of bibliographic records (Linked Open Data ) Author: Malakhov D. A. Introduction 2 There are many different sources of library data. Each organization can use only their information, which is not connected with


slide-1
SLIDE 1

Semantic integration

  • f bibliographic records

(Linked Open Data )

Author: Malakhov D. A.

slide-2
SLIDE 2

Introduction

2

 There are many different sources of library data.  Each organization can use only their information, which is not

connected with other sources.

 Integration by space LOD (Linked Open Data) is a universal

solution of this problem.

 LOD was created to integrate as much information as possible in

each subject area of it.

 Publication of data in this space allows to enrich this

information and to provide an access to it.

2/13

slide-3
SLIDE 3

Formulation of the problem

3

 The purpose is to integrate the NLR (National Library)

bibliographic records with records of the BNB (British National Library).

 The NLR dataset has millions records (test set 17 th.). BNB data

set consists of 3.5 million units, it was published in the LOD.

 To reach the purpose, it’s necessary to solve such problems as:

– Publication the NLR data according to the principles of LOD; – Integrating NLR data with BNB data.

3/13

slide-4
SLIDE 4

Publication of data on the principles of LOD

4

Necessary actions for the publication of data : – Describing the subject area (creating an ontology). – Converting the NLR data (RUSMARC / bin) to RDF. – Configure the semantic RDF data repository for NLR data. – Providing an access to the NLR data (via HTTP and SPARQL).

4/13

slide-5
SLIDE 5

Ontology

 There are three ways of presenting bibliographic records in RDF :

– MODS – the data model Library of Congress (USA). – Dublin core – the set of terms describing the network resources. – FOAF – the set of terms describing a person.

 BNB reported it's data using Dublin core and FOAF. These

standards for data presentation were used.

5

5/13

slide-6
SLIDE 6

Ontology

6

6/13

slide-7
SLIDE 7

Preparation of RDF

7

Preparation XSLT transformation (RUSMARC/xml to RDF) Converting RUSMARC/bin to RDF

7/13

slide-8
SLIDE 8

Storage creation

 There are some ways to store semantic data :

  • storage in a relational database;
  • format TDB.

 There are 3 API for semantic storage:

  • the Jena;
  • the Sesame;
  • the Virtuoso.

 We selected the TDB format and the Jena.

8

8/13

slide-9
SLIDE 9

Providing access to data NLR

9

 The server Jetty was chosen for processing HTTP requests.  The server returns information about the record, the author or the

links, then it gets the full information about the object from the semantic storage via SPARQL.

 The access point Fuseki which is set up with a logical conclusion

Pellet OWL is selected for processing SPARQL queries to storage.

9/13

slide-10
SLIDE 10

Creating links

 The clustering algorithm has been developed to create a link.

The documents were linked by clusters.

 The clustering algorithm :

1) Clusters are created on the basis of a set of data (for a few passages in this set). 2) The remaining elements are distributed in clusters (in one pass on these elements).

 In the first instance the clusters of the NLR data were created.  Then BNB data were distributed by the clusters.  Links of documents and clusters were presented in RDF.

10

10/13

slide-11
SLIDE 11

The scheme of the system

11

11/13

slide-12
SLIDE 12

Conclusion

Further work can be carried out in such areas as :

  • full-text search in titles and descriptions;
  • distributed semantic repository;
  • searching by classifiers UDC and BDC;
  • searching by ISSN and ISBN.

12

12/13

slide-13
SLIDE 13

Thank you for your attention!

13

13/13