The Semantic Web: Web of (integrated) Data Frank van Harmelen - - PDF document

the semantic web web of integrated data
SMART_READER_LITE
LIVE PREVIEW

The Semantic Web: Web of (integrated) Data Frank van Harmelen - - PDF document

The Semantic Web: Web of (integrated) Data Frank van Harmelen Vrije Universiteit Amsterdam Take home message Semantic Web = Web of Data (no longer only web of text, web of pictures) Set of open, stable W3C standards Rapidly


slide-1
SLIDE 1

1

The Semantic Web: Web of (integrated) Data

Frank van Harmelen Vrije Universiteit Amsterdam

Take home message

Semantic Web = Web of Data

(no longer only web of text, web of pictures)

Set of open, stable W3C standards Rapidly emerging tools & vendors Use cases:

data integration web services knowledge management search (intranets)

slide-2
SLIDE 2

2

Outline

The vision What is required Machine representation

XML, RDF, OWL

Where are we now? Examples

Things we would like to do on the Web

slide-3
SLIDE 3

3

“I ntelligent” things we can’t do today

Search engines

  • concepts, not keywords
  • semantic narrowing/widening of queries

Shopbots

  • semantic interchange, not screenscraping

E-commerce

Negotiation, catalogue mapping, personalisation

Web Services

Need semantic characterisations to find them, to combine them

Navigation

  • by semantic proximity, not hardwired links

.....

harmelen harmelen

Why can’t Google do this…

slide-4
SLIDE 4

4

Other use-case are

personalisation semantic linking data integration web services ...

Sounds good, so.. how is this tackled?

slide-5
SLIDE 5

5

Outline

The vision What is required Machine representation

XML, RDF, OWL

Where are we now? Examples

machine accessible meaning

(What it’s like to be a machine)

disease name symptoms drug administration

Meta-data !

slide-6
SLIDE 6

6

What is meta-data?

it's just data it's data describing other data its' meant for machine consumption

disease name symptoms drug administration

meta-data + ontologies

< name> < symptoms> < drug> < drug administration> < disease>

<treatment>

IS-A reduces

slide-7
SLIDE 7

7

What’s inside an ontology?

terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …) relations between classes (disjoint, covers) reasoning tasks: classification, subsumption

Increasing semantic “weight”

I n short

(for the duration of this tutorial) Ontologies are not

definitive descriptions of what exists in the world (= philosphy)

Ontologies are

shared models of the world constructed to facilitate communication

Yes, ontologies exist

(because we build them)

slide-8
SLIDE 8

8

Real life examples

handcrafted (often by communities)

music: CDnow (2410/5), MusicMoz (1073/7) biomedical: SNOMED (200k), GO (15k),

Emtree(45k+ 190k)

ranging from lightweight (Yahoo, UNSPC)

to heavyweight (Cyc)

ranging from small (METAR)

to large (UNSPC)

allright, but how to represent all this in a computer?

slide-9
SLIDE 9

9

Outline

The vision What is required machine representation

XML, RDF, OWL

Where are we now? Examples

Semantic Web “architecture”

slide-10
SLIDE 10

10

What was XML again?

country name capital “Netherlands” name areacode “Amsterdam” “020”

<country name=”Netherlands”> <capital name=”Amsterdam”> <areacode>020</areacode> </capital> </country>

So why not just use XML?

No agreement on:

structure

  • is country a:

–object? –class? –attribute? –relation? –something else?

  • what does nesting

mean?

vocabulary

  • is country the

same as nation?

<country name=”Netherlands”> <capital name=”Amsterdam”> <areacode>020</areacode> </capital> </country> <nation> <name>Netherlands</name> <capital>Amsterdam</capital> <capital_areacode> 020 </capital_areacode> </nation>

  • Are the above XML documents the same?
  • Do they convey the same information?
  • Is the answer machine-derivable?
slide-11
SLIDE 11

11

So: XML ≠ machine accessible meaning

CV name education work private < > < > < > < > < > < Χς > < ναμε > <εδυχατιον> <ωορκ> <πριϖατε>

The semantic pyramid again

slide-12
SLIDE 12

12

W3C Stack

XML:

Surface syntax, no semantics

XML Schema:

Describes structure of XML documents

RDF:

Datamodel for “relations” between “things”

RDF Schema:

RDF Vocabulary Definition Language

OWL:

A more expressive

Vocabulary Definition Language

RDF & RDF Schema

RDF =

relations between things all objects are URL’s (both things and relations)

RDF Schema =

hierarchical organisation of an RDF vocabulary all things are URL’s

(classes of things, subclass relations)

For more details: see slides later today

slide-13
SLIDE 13

13

The semantic pyramid again OWL: things RDF Schema can’t do

equality enumeration number restrictions

Single-valued/multi-valued Optional/required values

inverse, symmetric, transitive boolean algebra

Union, complement

Again:

For more details: see slides later today

slide-14
SLIDE 14

14

Sounds good in theory. How far are you with this in practice?

Where are we now: tools

Languages are stable (W3C) Tooling is rapidly emerging

HP, IBM, Oracle, Adobe, … Parsers, Editors, visualisers, large scale storage and querying Portal generation

Aduna I ntellidim ension

slide-15
SLIDE 15

15

Three example use-cases

Closed-world data integration:

DOPE browser @ Elsevier

Open-world data integration:

streaming media @ Philips

Semantic Web services Conclusions

This section joint with Aduna and Anita de Waard@Elsevier This section joint with Aduna and Anita de Waard@Elsevier

Closed-world data integration: DOPE Browswer @ Elsevier

slide-16
SLIDE 16

16

Background

Vertical Information Provision

Buy a topic instead of a Journal ! Web provides new opportunities

Business driver: drug development

Rich, information-hungry market Good thesaurus (EMTREE)

The Data

Document repositories:

ScienceDirect: approx. 500.000 fulltext articles MEDLINE: approx. 10.000.000 abstracts

Extracted Metadata

The Collexis Metadata Server: concept-

extraction ("semantic fingerprinting")

Thesauri and Ontologies

EMTREE:

60.000 preferred terms 200.000 synonyms

slide-17
SLIDE 17

17

RDF Schema EMTREE Query interface RDF Datasource 1 RDF Datasource n

….

Architecture:

slide-18
SLIDE 18

18

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21 This section material from Zharko Aleksovski @ VU & Philips This section material from Zharko Aleksovski @ VU & Philips

Web-based data integration scenario:

  • heterogeneous
  • open
slide-22
SLIDE 22

22

Motivating scenario

consum er.philips.com

User devices

Sem antic W eb

iTunes W al* Mart Buy.com Napster eMusic Musicm atch Rhapsody

Providers

MusicNet MusicNow LaunchCast

Example

Evergreens and Golden hits are related: Golden hits is mostly subclass of Evergreens

Music Ontology

Mediator

“Hits” from the “60s” “Evergreens”

slide-23
SLIDE 23

23

Domain characteristics

Many music providers Wide variety of music offered Constantly increasing in size and evolving Cumbersome to browse and retrieve music There is no agreement

Different terms are used The same terms contain different sets of artists

CDNow (Amazon.com) All Music Guide MusicMoz ArtistGigs Artist Direct Network CD baby Yahoo Size: 96 classes Depth: 2 levels Size: 2410 classes Depth: 5 levels Size: 382 classes Depth: 4 levels Size: 222 classes Depth: 2 levels Size: 1073 classes Depth: 7 levels Size: 465 classes Depth: 2 levels Size: 403 classes Depth: 3 levels

data-sources

slide-24
SLIDE 24

24

Why approximate matching

Genre is not precisely defined Pop and Rock have no common definition

  • n the big portals AllMusic.com,

Amazon.com and MP3.com

Exact reasoning will not be useful

A X

% 1 % 99

Results

A - AllMusicGuide B - ArtistDirectNetwork

100000 200000 300000 400000 500000 600000 . . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 1 . B subClass of A A subClass of B equivalences

slide-25
SLIDE 25

25 This section material from Marta Sabou @ VU This section material from Marta Sabou @ VU

Semantic Web Services

What are web-services

a software system designed to support

interoperable machine-to-machine interaction over a network.

has an interface described in a machine

processable format (specifically WSDL).

Other systems interact with a web service

in a manner specified by its descriptions using SOAP messages

slide-26
SLIDE 26

26

Web Service Tasks

Web Service Discovery & Selection

Find an airline that can fly me to Marina del Rey

Web Service I nvocation

Book flight tickets from NWA to arrive 12th Oct.

Web Service Composition & I nteroperation

Arrange taxis, flights and hotel for travel from

Southampton to Portland, OR, via Marina del Rey, CA. Web Service Execution Monitoring

Has the taxi to Gatwick Airport been reserved yet?

Limitations of WS Technology

Manual Discovery Manual Invocation Manual (ad hoc) Mediation Manual (ad hoc) Composition

slide-27
SLIDE 27

27

Use of Semantics: Example

< do:HotelBooking rdf:ID= ”WS1"> < owls:hasInput rdf:resource= ”do:Hotel”/> < /do:HotelBooking > < do:HostelBooking rdf:ID= ”WS2"> < owls:hasInput rdf:res= ”do:Hostel”/> < /do:HostelBooking > R:(BookingService,Hotel)= > * exact match with WS1 * plug-in match for WS2

Degrees of WS Matching

Match Advertisement with Request: Exact:

Adv equals Req

Plug-I n:

Adv is more general than Req

Subsume:

Adv is less general than Req

I ntersection:

Adv and Req overlap (a bit)

Disjoint:

Adv and Req don’ t overlap

Matchmaking algorithms (primarily) employ subsumption reasoning over the knowledge provided by the domain

  • ntologies.
slide-28
SLIDE 28

28

Take home message again:

Take home message

Semantic Web = Web of Data

(no longer web of text, web of pictures)

Set of open, stable W3C standards Rapidly emerging tools & vendors Use cases:

data integration web services knowledge management search (intranets)