SKOS COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk - - PowerPoint PPT Presentation

skos
SMART_READER_LITE
LIVE PREVIEW

SKOS COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk - - PowerPoint PPT Presentation

SKOS COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk Ontologies Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language ; Terminologies Provide shared and


slide-1
SLIDE 1

SKOS

COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk

slide-2
SLIDE 2

2

Ontologies

  • Metadata

– Resources marked-up with descriptions of their content. No good unless everyone speaks the same language;

  • Terminologies

– Provide shared and common vocabularies of a domain, so search engines, agents, authors and users can communicate. No good unless everyone means the same thing;

  • Ontologies

– Provide a shared and common understanding of a domain that can be communicated across people and applications, and will play a major role in supporting information exchange and discovery.

slide-3
SLIDE 3

3

Ontology

  • A representation of the shared background knowledge for

a community

  • Providing the intended meaning of a formal vocabulary

used to describe a certain conceptualisation of objects in a domain of interest

  • A vocabulary of terms plus explicit characterisations of the

assumptions made in interpreting those terms

  • Nearly always includes some notion of hierarchical

classification (is-a)

  • Richer languages allow the definition of classes through

description of their characteristics

slide-4
SLIDE 4

4

Catalogue Terms/ glossary Thesauri Informal is-a Formal is-a Frames Value Restrictions Expressive Logics

A Spectrum of Representation

  • Formal representations are not always the most appropriate for

applications

slide-5
SLIDE 5

5

COHSE

  • Conceptual driven navigation around documents
  • Simple text processing + vocabulary + open hypermedia

architecture – Separating link and document – Explicit navigation around a domain vocabulary

  • DLS agent adds links to

documents based on the

  • ccurrence of concepts in

those documents.

slide-6
SLIDE 6

6

Demo

slide-7
SLIDE 7

7

COHSE’s Architecture

HTML Document in Linked HTML Document out

DLS Agent Knowledge Service Resource Service

Ontology SKOS Search Engine Annotation DB

slide-8
SLIDE 8

8

Generic Links

  • Generic Links in Open Hypermedia are based on words.

Link Service Linkbase Document Linked Document

slide-9
SLIDE 9

9

Generic Links + Thesaurus

  • A thesaurus can bridge gaps between terms.

Document Link Service Thesaurus Linkbase Linked Document

slide-10
SLIDE 10

10

Generic Links + Ontology

  • An ontology can bridge gaps between concepts.

Link Service Document Linked Document Ontology Linkbase

slide-11
SLIDE 11

11

Reflection

  • Our original approach involved the use of OWL ontologies to

support the conceptual models.

  • Over time, we came to see this as a “mistake” -- looser

vocabularies were perhaps more appropriate.

  • The timely appearance of SKOS….
  • S. Bechhofer,

Y. Yesilada, R. Stevens, S. Jupp, and B.

  • Horan. Using Ontologies and Vocabularies

for Dynamic Linking IEEE Internet Computing12(3), p.32--39 2008 http://dx.doi.org/ 10.1109/MIC.2008.68

http://www.flickr.com/photos/buildscharacter/443708336/

slide-12
SLIDE 12

12

SKOS

  • SKOS: Simple Knowledge Organisation Scheme
  • Used to represent term lists, controlled vocabularies and

thesauri

  • Lexical labelling
  • Simple broader/narrower hierarchies (with no formal

semantics)

  • W3C Recommendation
slide-13
SLIDE 13

13

Primary Use Cases/Scenarios

  • A. Single controlled vocabulary used to index and then

retrieve objects

  • Query/retrieval may make use of some structure in the

vocabulary

  • B. Different controlled vocabularies used to index and

retrieve objects

  • Mappings required between the vocabularies
  • Also other possible uses (e.g. navigation)
slide-14
SLIDE 14

14

SKOS Goals

  • to provide a simple, machine-understandable,

representation framework for Knowledge Organisation Systems (KOS)…

  • that has the flexibility and extensibility to cope with the

variation found in KOS idioms…

  • that is fully capable of supporting the publication and use of

KOS within a decentralised, distributed, information environment such as the world wide (semantic) web.

slide-15
SLIDE 15

15

SKOS

  • A model for expressing basic structure of “concept

schemes”

  • Thesauri, classification schemes, taxonomies and other

controlled vocabularies – Many of these already exist and are in use in cultural heritage, library sciences, medicine etc. – A wide range of knowledge sources that can potentially provide value for Semantic Web applications

  • SKOS aims to provide an RDF vocabulary for the

representation of such schemes. – A migration path bringing such resources “into the Semantic Web”.

slide-16
SLIDE 16

16

Concept Schemes

  • A concept scheme is a set of concepts, potentially including

statements about relationships between those concepts – Semantic Relationships

  • Broader/Narrower Terms
  • Related Terms

– Lexical Labels

  • Preferred, alternative and hidden labels

– Additional documentation

  • Notes, comments, descriptions
slide-17
SLIDE 17

Knowledge Organisation

17

Controlled Vocabulary Synonym Ring Authority File Taxonomy Thesaurus Collection of Terms Equivalent Terms Preferred Terms Hierarchy Related Terms Controlled vocabularies: designed for use in classifying or indexing documents and for searching them. Thesaurus: Controlled vocabulary in which concepts are represented by preferred terms, formally organised so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms.

slide-18
SLIDE 18

18

SKOS Example

slide-19
SLIDE 19

19

SKOS and OWL

  • SKOS and OWL are intended for different (but related)

purposes

  • SKOS Concept schemes are not formal ontologies in the way

that, e.g. OWL ontologies are formal ontologies.

  • There is no formal semantics given for the conceptual

hierarchies (broader/narrower)represented in SKOS.

  • Contrast with OWL subclass hierarchies which have a formal

interpretation (in terms of sets of instances).

  • A weaker ontological commitment.
slide-20
SLIDE 20

20

SKOS and OWL

  • SKOS Concepts not intended for instantiation in the same way that OWL

Classes are instantiated – Leo is an instance of Lion – Born Free is a book about Lions

  • Concept Schemes allow us to capture general statements about things that

aren’t necessarily strictly true of everything – It’s useful to be able to navigate from Cell to Nucleus, even though it’s not the case that all Cells have a Nucleus – Relationships between Polio and Polio virus, Polio vaccine, Polio disease… – Relationships between Accident and Accident Prevention, Accidents in the Home, Radiation Accidents…

  • But we can’t necessarily draw the same kinds of inferences about SKOS

hierarchies. – Broader hierarchy is not transitive.

  • Although mechanisms are available which allow us to query the transitive

closure of the hierarchy.

slide-21
SLIDE 21

21

SKOS and OWL

  • SKOS itself is defined as an OWL ontology.
  • A particular SKOS vocabulary is an instantiation of that
  • ntology/schema

– E.g. SKOS Concept is a Class, particular concepts are instances of that class

  • Allows us to use some of the mechanisms of OWL to define

properties of SKOS (e.g. the querying of the transitive closure

  • f broader).
  • Allows us to use generic tooling to construct/maintain our

vocabularies

slide-22
SLIDE 22

22

Annotation in OWL

  • OWL data and object properties allow us to define the

characteristics of classes – Necessary/sufficient conditions etc. – Model theory/semantics provides interpretations of the assertions involving the properties

  • Ontology engineering (and use) also requires annotation

– Decoration of concepts/properties/individuals with information which is useful, but does not impact on the formal semantics or logical interpretations

  • Separation of the concept from its concrete label is usually seen

as a Good Thing.

slide-23
SLIDE 23

23

Annotation

General

  • Labels

– Human readable

  • Textual Definitions

– Scope notes

  • DC style metadata

– authorship

  • Change History
  • Provenance information

Application Specific

  • Entry points for forms
  • Driving User interaction
  • Syntax round-tripping
  • Hiding engineering aspects of

the model

  • Methodological support, e.g.

OntoClean

slide-24
SLIDE 24

24

SKOS as Annotation

  • SKOS labelling and documentation properties are defined as

OWL Annotation Properties – Preferred/Alternate/Hidden Labels – Documentation/Notes

  • SKOS then provides a standardised vocabulary for annotating

OWL ontologies

  • Leverage existing tooling.

– OWL API – Protégé

slide-25
SLIDE 25

25

SKOS and OWL

  • SKOS and OWL are intended for different purposes.
  • OWL allows the explicit modelling/description of a domain
  • SKOS provides vocabulary and navigational structure
  • Interaction between representations is ongoing work.

– Presenting OWL ontologies as SKOS vocabularies

  • Principled “dumbing down”

– Enriching SKOS vocabularies as OWL ontologies.

  • How to handle “related”

– Use of SKOS as annotation vocabulary

slide-26
SLIDE 26

26

Mapping Concept Schemes

  • SKOS also provides a collection of mapping properties that

express relationships between concepts in different schemes – broadMatch/narrowMatch – closeMatch – exactMatch

  • Support alignment of different concept schemes

– Indiscriminate use of properties such as owl:sameAs can lead to undesirable consequences.

slide-27
SLIDE 27

SKOS and Linked Data

  • Linked Data standardised “guidelines” for publishing data

– URIs for identification – Provide useful information when dereferenced – Link to other URIs

  • SKOS as lightweight semantics for LD
  • Facilitating publication of existing KOS/data.
  • Mapping relationships

27

SKOS LD Indexing/Retrieval Discovery Semantic Relations Navigation Mapping Linking and Integration beyond URI matching

slide-28
SLIDE 28

28

Tooling: SKOSEd

  • Editor supporting construction of SKOS vocabularies
  • “Native” SKOS implementation

– Protégé 4 plugin exploiting OWL definition of SKOS vocabulary – Reasoning support for classification

  • Lexical labelling

– Alternate language support

  • Extension points for

domain relationships

slide-29
SLIDE 29

29

Examples

  • Astronomy thesauri:

– http://www.ivoa.net/Documents/latest/Vocabularies.html

  • AGROVOC (FAO)
  • IPSV
  • UMBEL

– http://www.umbel.org

  • E-Culture

– http://e-culture.multimedian.nl/ – Europeana – ONKI (Finland)

  • LCSH

– http://id.loc.gov

slide-30
SLIDE 30

30

Resources

  • SKOS:!

– http://www.w3.org/TR/skos-reference/

  • SKOSEd: !

– http://code.google.com/p/skoseditor/

  • COHSE:!

– http://cohse.cs.manchester.ac.uk

slide-31
SLIDE 31

Linked Data and RDF

Sean Bechhofer sean.bechhofer@manchester.ac.uk

slide-32
SLIDE 32

6

Building a Semantic Web

  • Annotation

– Associating metadata with resources

  • Integration

– Integrating information sources

  • Inference

– Reasoning over the information we have. – Could be light-weight (taxonomy) – Could be heavy-weight (logic-style)

  • Interoperation and Sharing are key goals
slide-33
SLIDE 33

8

Linked Data*

  • Linked Data or the Data Web is about using the Web to connect

related data that wasn’t previously linked.

  • The intention is that we move from a web of documents to a web of

data – The Web as database

  • The Linked Data approach builds heavily on RDF.

*Linked data slides based on material from Ian Davis and Tom Heath: http://www.slideshare.net/iandavis/30-minute-guide-to-rdf-and-linked-data

slide-34
SLIDE 34

9

Database tables

isbn title author publisherId pages 0743267478 Q&A Vikas Swarup 1435 336 014029466X The Rotters’ Club Jonathan Coe 1546 416 … … … … … .. … … … …

slide-35
SLIDE 35

10

Rows represent “things”

isbn title author publisherId pages 0743267478 Q&A Vikas Swarup 1435 336 014029466X The Rotters’ Club Jonathan Coe 1546 416 … … … … … .. … … … …

slide-36
SLIDE 36

11

Columns represent “properties”

isbn title author publisherId pages 0743267478 Q&A Vikas Swarup 1435 336 014029466X The Rotters’ Club Jonathan Coe 1546 416 … … … … … .. … … … …

slide-37
SLIDE 37

12

Intersections represent properties of things

isbn title author publisherId pages 0743267478 Q&A Vikas Swarup 1435 336 014029466X The Rotters’ Club Jonathan Coe 1546 416 … … … … … .. … … … …

slide-38
SLIDE 38

13

Graphical Representation

book The Rotters’ Club title

subject value property

more generally:

slide-39
SLIDE 39

14

isbn title author publisherId pages 0743267478 Q&A Vikas Swarup 1435 336 014029466X The Rotters’ Club Jonathan Coe 1546 416 … … … … … .. … … … …

Selecting multiple properties

slide-40
SLIDE 40

15

Multiple properties graphically

book 014029466X isbn Jonathan Coe author The Rotters’ Club title

slide-41
SLIDE 41

16

Relations between “things”

book 014029466X isbn Jonathan Coe author The Rotters’ Club title publisher Penguin Books name publisher

slide-42
SLIDE 42

17

Identification

  • We need to be able to identify things globally and uniquely.
  • URIs provide this capability
  • Key to Linked Data is the use of URIs, specifically http:// URIs.
slide-43
SLIDE 43

18

URIs in graphs

http://example.com/person/176 Jonathan Coe http://example.com/name

URIs as names for nodes URIs as names for relations

slide-44
SLIDE 44

19

URIs and naming

  • URIs identify the things we are describing.
  • If two people create data using the same URI, the assumption is that

they are describing the same thing.

  • Merging/integrating data then becomes easy

– Although introduces issues of URI control.

slide-45
SLIDE 45

20

Graph Merging

http://example.com/person/176 Jonathan Coe http://example.com/book/014029466X http://example.com/person/176 http://example.com/place/xyz765 Birmingham http://example.com/birthplace http://example.com/name http://example.com/name http://example.com/author

slide-46
SLIDE 46

21

Graph Merging

http://example.com/person/176 Jonathan Coe http://example.com/book/014029466X http://example.com/place/xyz765 Birmingham http://example.com/birthplace http://example.com/name http://example.com/name http://example.com/author

slide-47
SLIDE 47

22

URIs are active

  • URIs can be more than just names -- they can be dereferenced, and

information can be retrieved.

  • In particular, we can lookup the URIs in a graph and potentially

retrieve more information about the URI.

  • “Follow your nose” navigation
  • Information should be returned in appropriate, machine readable

formats (e.g. another graph)

slide-48
SLIDE 48

23

Linked Data Principles

  • 1. Use URIs as names for things
  • 2. Use http URIs so that those names can be dereferenced.
  • 3. When a URI is looked up, provide useful information
  • 4. Include statements that link to other URIs so that more information

can be discovered.

  • Common infrastructure facilitates construction of applications.

– Largely browsers up to now….

  • Other guidelines relating to connecting documents with the data that

describes them. – Use of content negotiation to supply “appropriate” representations – Use of microformats/RDFa to publish data

slide-49
SLIDE 49

24

RDF

  • RDF stands for Resource Description Framework
  • It is a W3C Recommendation

– http://www.w3.org/RDF

  • RDF is a graphical formalism ( + concrete syntax)

– for representing metadata – for describing the semantics of information in a machine- accessible way

  • Provides a simple data model based on triples.
  • Allows us to represent relationships between things.
slide-50
SLIDE 50

25

The RDF Data Model

  • Statements are <subject, predicate, object> triples:

– ! <Sean,hasColleague,Uli>

  • Can be represented as a graph:
  • Statements describe properties of resources
  • A resource is any object that can be pointed to by a URI
  • Properties themselves are also resources (URIs)

Sean Uli

hasColleague

slide-51
SLIDE 51

26

Linking Statements

  • The subject of one statement can be the object of another
  • Such collections of statements form a directed, labeled graph
  • Note that the object of a triple can also be a “literal” (a string)

Sean Uli

hasColleague

Carole

http://www.cs.man.ac.uk/~sattler hasColleague hasHomePage “Sean K. Bechhofer” hasName

slide-52
SLIDE 52

27

RDF Syntax

  • RDF has an XML syntax that has a specific meaning:
  • Every Description element describes a resource
  • Every attribute or nested element inside a Description is a

property of that Resource

  • We can refer to resources by URIs

<Description about="some.uri/person/sean_bechhofer"> <hasColleague resource="some.uri/person/uli_sattler"/> <hasName rdf:datatype="&xsd;string">Sean K. Bechhofer</hasName> </Description> <Description about="some.uri/person/uli_sattler"> <o:hasHomePage>http://www.cs.mam.ac.uk/~sattler</o:hasHomePage> </Description> <Description about="some.uri/person/carole_goble"> <o:hasColleague resource="some.uri/person/uli_sattler"/> </Description>

slide-53
SLIDE 53

28

What does RDF give us?

  • A mechanism for annotating data and resources.
  • Single (simple) data model.
  • Syntactic consistency between names (URIs).
  • Low level integration of data.
  • The Linked Data/Web of Data approach.
slide-54
SLIDE 54

29

RDF(S): RDF Schema

  • RDF gives a formalism for meta data annotation, and a way to write it

down in XML, but it doesn’t give any special meaning to vocabulary such as subClassOf or type – Interpretation is an arbitrary binary relation

  • RDF Schema extends RDF with a schema vocabulary that allows you

to define basic vocabulary terms and the relations between those terms – Class, Property – type, subClassOf – range, domain

slide-55
SLIDE 55

30

RDF(S)

  • These terms are the RDF Schema building blocks (constructors) used

to create vocabularies: – <Person,type,Class> – <hasColleague,type,Property> – <Professor,subClassOf,Person> – <Carole,type,Professor> – <hasColleague,range,Person> – <hasColleague,domain,Person>

  • Semantics gives “extra meaning” to particular RDF predicates and

resources – specifies how terms should be interpreted

slide-56
SLIDE 56

31

RDF(S) Inference

Lecturer Academic

Person

rdfs:subClassOf rdf:subClassOf rdfs:subClassOf rdf:type

rdfs:Class

rdf:type rdf:type

slide-57
SLIDE 57

32

RDF(S) Inference

Sean Lecturer

rdf:type

rdfs:Class

Academic

rdfs:subClassOf rdf:type rdf:type rdfs:type

slide-58
SLIDE 58

33

RDF/RDF(S) “Liberality”

  • No distinction between classes and instances (individuals)

<Species,type,Class> <Lion,type,Species> <Leo,type,Lion>

  • No distinction between language constructors and ontology

vocabulary, so constructors can be applied to themselves/each other <type,range,Class> <Property,type,Class> <type,subPropertyOf,subClassOf>

  • In order to cope with this, RDF(S) has a particular non-standard

model theory.

slide-59
SLIDE 59

34

What does RDF(S) give us?

  • Ability to use simple schema/vocabularies when describing our

resources.

  • Consistent vocabulary use and sharing.
  • Basic inference
slide-60
SLIDE 60

35

Problems with RDF(S)

  • RDF(S) is too weak to describe resources in sufficient detail

– No localised range and domain constraints

! Can’t say that the range of hasChild is Person when applied to Persons and Elephant when applied to Elephants

– No existence/cardinality constraints

! Can’t say that all instances of Person have a mother that is also a Person, or that Persons have exactly 2 parents

– No transitive, inverse or symmetrical properties

! Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical

  • Difficult to provide reasoning support

– No “native” reasoners for non-standard semantics – May be possible to reason via FO axiomatisation

slide-61
SLIDE 61

36

OWL

  • OWL: Web Ontology Language
  • Extends existing Web standards

– Such as XML, RDF, RDFS

  • Is (hopefully) easy to understand and use

– Based on familiar KR idioms

  • Of “adequate” expressive power
  • Formally specified

– Possible to provide automated reasoning support

  • But you already know all this…
  • What about SKOS???
slide-62
SLIDE 62

SKOS and Linked Data

  • Linked Data standardised “guidelines” for publishing data

– URIs for identification – Provide useful information when dereferenced – Link to other URIs

  • SKOS as lightweight semantics for LD
  • Facilitating publication of existing KOS/data.
  • Mapping relationships

32

SKOS LD Indexing/Retrieval Discovery Semantic Relations Navigation Mapping Linking and Integration beyond URI matching

slide-63
SLIDE 63

Microformats, RDFa

  • Microformats: open data formats for embedding data in pages.
  • Fixed schemas for calendars, address books, opinions, licenses etc.
  • RDFa: general mechanism for embedding RDF metadata within pages.
  • No fixed schema
  • Used with Linked Data resources

– E.g. LCSH: http://id.loc.gov/authorities/subjects/sh85052522

33

slide-64
SLIDE 64

Linked Data Benefits

  • Separation of data from formatting and presentational aspects
  • Self-describing data. Applications encountering unfamiliar vocabularies

can dereference and access definitions

  • Simplified data access via HTTP and RDF.

– Heterogeneity of Web APIs

  • Open

– Applications not implemented against fixed set of data sources.

34

slide-65
SLIDE 65

37

Linked Data Examples

  • Universities

– http://data.open.ac.uk

! http://lucero-project.info/

– http://data.southampton.ac.uk

  • dbpedia

– RDFised version of wikipedia – Scraping structured information from info-boxes. – Quality?

  • BBC

– Programme catalogues published as Linked Data – Crosslinking to resources like dbpedia and MusicBrainz

  • GeoNames

– Geographical data – Lat/long, postal codes etc.

  • LCSH

– SKOS

slide-66
SLIDE 66

dbpedia

36

slide-67
SLIDE 67

Ordnance Survey

37

slide-68
SLIDE 68

Open University

38

slide-69
SLIDE 69

BBC

39

slide-70
SLIDE 70

Library of Congress

40

slide-71
SLIDE 71

fishdelish

41

slide-72
SLIDE 72

42

slide-73
SLIDE 73

43

slide-74
SLIDE 74

data.gov.uk

  • Open
  • Public Data
  • Empowering citizen developers

– But pushback against formats

  • URI Design

– Hash vs slash

  • Statistical information
  • Tensions

– Centralised vs Distributed – Rules vs Guidance – Trust in the Web

! “Darwinian Evolution”

38

slide-75
SLIDE 75

LD in Use

  • Five Stars of Open Linked Data

– http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked- data/ ! Make data available ! Make it available as structured data ! Use non-proprietary formats ! Use URLs to identify things ! Link your data

  • Costs and Benefits

– http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

45

slide-76
SLIDE 76

39

Issues with Linked Data

  • Identity and co-reference

– How do we handle the fact that different URIs may be used to refer to the same things? – Use of owl:sameAs may be too strong (can result in all information, including annotations, metadata etc.) being merged.

  • Versioning

– Version information in URLs? – Versioning at architectural level (Memento) – How does versioning play with a “follow your nose paradigm”?

  • Querying

– Distributed query across data sets – LD applications tend to use an “extract, transform, load” approach.

slide-77
SLIDE 77

40

Issues with Linked Data

  • Still primarily about data

– Vocabularies used to facilitate integration – Little deep semantics. – “Big O vs little o”

! Role of SKOS and RDF(S)

  • Scalability
  • Focus on mechanisms for data publication rather than consumption

– Lots of work on “recipes”, mangling relation sources into RDF etc. – What do you actually do with the stuff? – Applications (so far) mainly browsing/CV generation/introspective apps – End user applications? – Build it and they will come….???