Whats New in Semantic Enrichment 4 Million Content Items, 120 - - PowerPoint PPT Presentation

what s new in semantic enrichment
SMART_READER_LITE
LIVE PREVIEW

Whats New in Semantic Enrichment 4 Million Content Items, 120 - - PowerPoint PPT Presentation

Whats New in Semantic Enrichment 4 Million Content Items, 120 Disciplines, and 1 Metadata Repository Jess Lawson Head of Content Architecture, GAB-IT Its all in the Title Why semantic enrichment: 4 million content items (and


slide-1
SLIDE 1

What’s New in Semantic Enrichment

Head of Content Architecture, GAB-IT

4 Million Content Items, 120 Disciplines, and 1 Metadata Repository

Jess Lawson

slide-2
SLIDE 2

It’s all in the Title…

  • Why semantic enrichment: 4 million content items (and

counting)…

  • What are the challenges: 4 million content items and 120

subject disciplines…

  • How are we facing them: 1 metadata repository

2

slide-3
SLIDE 3

Semantic Technologies Seminar 13th November 2013

The case for semantic enrichment in GAB

3

  • More accurate data integration (e.g. mashups, integrating internal

silos)

  • Reuse and repurposing (e.g. microsites or other custom websites)
  • Link generation based on an understanding of what each content

unit (chapter, article, dictionary definition) is actually about.

  • Semantic search (e.g. Google Hummingbird & Knowledge Graph)

– focuses on the meaning behind the query and content

Describing what your content is about enables…

Intelligent and sustainable content

slide-4
SLIDE 4

Semantic Technologies Seminar 13th November 2013

The challenges we face…

4

From this:

slide-5
SLIDE 5

Semantic Technologies Seminar 13th November 2013

The challenges we face…

5

to this:

slide-6
SLIDE 6

Semantic Technologies Seminar 13th November 2013

The challenges we face…

6

with limited amounts of this:

slide-7
SLIDE 7

Semantic Technologies Seminar 13th November 2013

The challenges we face…

7

  • r this:
slide-8
SLIDE 8

Semantic Technologies Seminar 13th November 2013

How GAB are facing the challenge

8

unstructured

content

intelligent content structured content

processes mark-up semantic meaning logical structure text manually controlled partially automated highly automated User: human User: human computer

  • rganisation

presentation

User: human computer

  • rganisation

presentation interpretation

Low value specific content Medium value reusable content High value multifunctional content

From structured content to intelligent content

slide-9
SLIDE 9

Semantic Technologies Seminar 13th November 2013

How GAB are facing the challenge

9

Documents versus data

  • Currently GAB publishes documents created from XML

– HTML – eBook – print

  • We structure our content as documents, as separate files, with a

sequential order of information, in display order

  • We are moving towards data

– Data that can be understood by anyone – Data can used in software applications, but not necessarily directly published as text – Discoverability of our data

  • RDF data model captures meaning and relationships

independently of what is displayed

slide-10
SLIDE 10

Semantic Technologies Seminar 13th November 2013

How GAB are facing the challenge

10

Adding meaning to our data

  • Implicit structures (headings, text order, cross-references)
  • Book indexes
  • Keywords and subject taxonomy categorisation
  • Biographical metadata (life dates, occupations, family

groups)

  • Oxford Index Authorities (bespoke multi-domain ontology)
  • Dictionary entries and their metadata

Increasing intelligence

Move towards explicit meaning that can be easily understood Using what we’ve already got!

slide-11
SLIDE 11

Semantic Technologies Seminar 13th November 2013

How GAB are facing the challenge Metadata Repository

  • Aim: To have an overview of all GAB’s content

– Uses metadata, since content in multiple silos – Metadata: data about data for each chapter/article – One common XML schema => OxMetaML – Architecture uses Solr-indexed XML file store (c.f. PIM/title by title) plus triple store

  • Using metadata as documents (XML)

– Published on the Oxford Index for discoverability

  • Using metadata as data (RDF)

– Understanding of its meaning allows link generation – E.g. this OSO chapter discusses the person who has this ODNB biography

slide-12
SLIDE 12

Semantic Technologies Seminar 13th November 2013

Isis (MarkLogic CMS) DNB CMS PubMan CMS

Product website

PubFactory repository High Wire

Onix Data

Product website Product website

Oxford Index

Content + Product Metadata Metadata for products included on Oxford Index

Safari PubFactory platform

Metadata for all OUP Content

Content + product metadata

Pre-ingestion layer

Triple Store

Full Text

Link generation and Semantic Enrichment

Metadata

Product website

Metadata Repository REST API Library Services, Aggregators

Metadata for products requested by Library Service

XML File STore

Solr index

Star (UK)

slide-13
SLIDE 13

Semantic Technologies Seminar 13th November 2013

How do we add meaning to our content?

13

Content enrichment - “Semantic tagging”

  • Uses text mining:

– Split into words/phrases – Tag different parts of speech – Coreference (identify terms that refer to the same object) – Named entity recognition (find people,

  • rganisations, place

names etc)

slide-14
SLIDE 14

Semantic Technologies Seminar 13th November 2013

Metadata Repository: Cross-product linking

is primary topic of is same entity as is same entity as

Dictionary of National Biography Oxford Music Online Oxford Reference Online

slide-15
SLIDE 15

Semantic Technologies Seminar 13th November 2013

Metadata Repository: Cross-product linking

is author of has same author as is author of

Link generation rule: If A is the author of B and A is the author of C, then B has same author as C.

slide-16
SLIDE 16

Semantic Technologies Seminar 13th November 2013

And finally…

16

  • Embedding RDF metadata in HTML web pages
  • Improves click-through rate (30% reported by BestBuy) as

search results more eye-catching

  • BBC reported 20% increase in search rankings
  • Adding RDFa to the Safari platform and Oxford Index

SEO using RDFa (RDF in attributes)

Semantic Technologies Seminar 13th November 2013