Linking library data: contributions and role of subject data Nuno - - PowerPoint PPT Presentation

linking library data contributions and role of subject
SMART_READER_LITE
LIVE PREVIEW

Linking library data: contributions and role of subject data Nuno - - PowerPoint PPT Presentation

Linking library data: contributions and role of subject data Nuno Freire The European Library Outline Introduction to The European Library Motivation for Linked Library Data The European Library Open Dataset Linking Subject Data


slide-1
SLIDE 1
slide-2
SLIDE 2

Linking library data: contributions and role of subject data

Nuno Freire The European Library

slide-3
SLIDE 3

Outline

  • Introduction to The European Library
  • Motivation for Linked Library Data
  • The European Library Open Dataset
  • Linking Subject Data
  • Linking person names
  • Linking place names
  • Linking other entities/concepts
  • Concluding remarks
slide-4
SLIDE 4

About The European Library

  • Project started 1996, full operational service

from 2005

  • Membership of national and research libraries of

47 Council of Europe states

  • European hub of library bibliographic resources,

and full text collections.

slide-5
SLIDE 5

Library data aggregator

  • Library domain aggregator for Europena
  • Digital collections of cultural heritage resources
  • Aggregation of metadata
  • Aggregation of full text of Historical Newspapers
  • General Library domain aggregator
  • Aggregation of other bibliographic resources
  • Currently developping its capabilities for

aggregating content (metadata + digital resource)

slide-6
SLIDE 6

Library open data distribution and supporting its reuse

http://www.theeuropeanlibrary.org/tel4/access

  • Opening access to

library data

  • Distributing data into

strategic channels

  • European research

infrastructures, Portals, etc.

  • Facilitating re-use
  • Supporting interoperability: APIs, data formats, etc.
  • ...and Linking Data
slide-7
SLIDE 7

The European Library Open Dataset

www.theeuropeanlibrary.org

slide-8
SLIDE 8

Library LOD: Motivation

LOD provides a set of procedures and technical standards to allow the reuse of data across communities. LOD allows for:

  • Opening access to the data

… in order to allow others to obtain, process and re-use the data.

  • Linking the data to other datasets

… in order to allow others to find the data more easily, better understand its meaning, match it with other data...

slide-9
SLIDE 9

Library LOD: Motivation

Linking data makes it more precise and informative. Data links allow computers to better understand the data, enabling more use cases.

slide-10
SLIDE 10

Linked data is not new to libraries, and its value clearly realized

  • Libraries have perceived the value of linked data for decades:
  • Authority files, union catalogues, ...
  • Library data is already contributing with LOD datasets which are

being re-used across all communities Nowadays LOD framework addresses the same benefits:

  • but beyond libraries … at a global level … across all

communities.

Library LOD: Motivation

slide-11
SLIDE 11

The Data Model

slide-12
SLIDE 12

The Data Model

  • RDA Element Vocabularies
  • The most extensivelly used vocabularies
  • Used entensivelly in the properties of the Bibliographic Resources
  • FRBRer model
  • Used for context
  • Not used for Item, Manifestation, Expression, Work
  • The LOD data is derived from non-FRBR MARC data
  • Europeana Data Model
  • Used for Web Resources
  • OWL 2 Web Ontology Language
  • Used for linking to external datasets
  • For linking duplicate Bibliographic Resources within libraries
  • Dublin Core Terms
  • Used where more general semantics could/should be applied
  • WGS84 Geo Positioning
slide-13
SLIDE 13

Example statistics from the Research Libraries UK collection

Resulting usage of classes (from MARC data)

slide-14
SLIDE 14

Resulting properties usage (from MARC data)

Example statistics from the Research Libraries UK collection

slide-15
SLIDE 15

External LOD Datasets Linked To

  • Links to external datasets are available for the

following:

  • VIAF Virtual Union Authority File
  • Geonames
  • Library of Congress Subject Headings
  • Library of Congress Children’s Subject Headings
  • Library of Congress Classification
  • data.bnf.fr
  • Gemeinsame Normdatei
  • Dewey Decimal Classification
  • Universal Decimal Classification
  • ISO639-2 Languages
  • MARC Countries
slide-16
SLIDE 16

The European Library Open Dataset

Current Status

slide-17
SLIDE 17

Linking the main entities present in Subject Data

  • Person names
  • Place names
  • Other entities/concepts

www.theeuropeanlibrary.org

slide-18
SLIDE 18

Linked Data at The European Library

Linking person names

slide-19
SLIDE 19

The matching process

  • VIAF data used for matching,

disambiguation, and match probability

slide-20
SLIDE 20

Matching Person names with VIAF

  • Names are matched by similarity
  • Confirmation of the correctness of a name

match is taken from other matching data

  • The dates of birth and death
  • The title of the work is compared against the list
  • f titles available in VIAF
  • All the contributors of the work are matched

against the list of known co-authors in VIAF

  • The publisher(s) of the work are matched against

the list of known publishers in VIAF

  • A match is only chosen if enough supporting

evidence is found

slide-21
SLIDE 21

Linked Data at The European Library

Linking place names

slide-22
SLIDE 22

The approach for place name linking

  • The alignment is performed with Geonames
  • Using the RDF dump of Geonames
  • It aims to find a single entity in Geonames for

linking to the place name

  • The first step of this task is to find all possible

candidates for the resolution in Geonames

  • Uses a heuristic based predictive model:
  • Assigns a probability for each resolution candidate as a

match

  • A link is established if a minimum probability threshold

for a match is achieved.

slide-23
SLIDE 23

Feature Description Number of words The number of words in the place name. Name match If the recognized place name matched: the main name of the place, an alternate name, etc. Exact name match If the recognized place name matched exactly the place name. Relative population Relative population of the candidate in comparison with

  • ther candidates.

Geographic feature type The type of geographic feature: continent, country, city, etc. Related places found The number of other place names found in the administrative hierarchy. Relative related places The relative number of administrative divisions found in the bibliographic record In source country If it is located in one of the source countries of the bibliographic record.

Which information supports the place name linking

slide-24
SLIDE 24

The approach for place name linking

  • This approach was recently evaluated by the

EuropeanaTech task force on Evaluation and Enrichments.

  • Final report of this task force will be published very

soon: http://pro.europeana.eu/taskforce/evaluation-and-enrichments

slide-25
SLIDE 25

Linked Data at The European Library

Linking Subject Data

slide-26
SLIDE 26

Linking Subject Data

  • The challenges
  • Diversity of languages
  • Diversity of knowledge organization systems

in use across European libraries

  • Heterogeneous levels of detail in subject

information

slide-27
SLIDE 27

Linking Subject Data

  • Current status at The European Library
  • Use of alignments between ontologies:
  • Alignments were created manually or semi-

automatically

  • Alignments in use include: CERIF, MACS

(LCSH, RAMEAU, SWD),

  • Linking is performed also for classifications:
  • For UDC and DDC
  • … but only shallow linking is done, for the

most general classifications

slide-28
SLIDE 28

Linking Subject Data

  • …but much more subject data is known to

exist at The European Library

  • A data mining study conducted in the

aggregated digital collections, revealed several knowledge organization systems in use:

  • 5 systems which are available as LOD
  • 34 systems not known to us at this time
slide-29
SLIDE 29

Concluding remarks (1/3)

  • Some of the best LOD datasets for linking

bibliographic subject data originate from the library domain

  • High quality and mature datasets exist for Subject

Heading Systems

  • Classification systems as LOD, are not as well

developed

  • In the case of UDC, very good LOD data is

available, it is compliant with standards and best practices, but lacks linking to other LOD datasets

  • Linking to Dbpedia would potentially promote

the wider usage of UDC

slide-30
SLIDE 30

Concluding remarks (2/3)

  • Linking classifications to LOD datasets is

not as straight forward due to combined classifications

  • In Semantic Web terms, the combination of

concepts represented in this kind of classification requires multiple RDF statements

  • Most LOD linking tools are not prepared, and

require adaptation to these cases

slide-31
SLIDE 31

Concluding remarks (3/3)

  • Uses cases for supporting Linked Subject

Data are plentiful

  • … but much diversity of knowledge
  • rganization systems are in use, making it a

challenge in terms of scale

slide-32
SLIDE 32

Thank hank you

  • u

Nuno Freire nuno.freire@theeuropeanlibrary.org