The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases - - PowerPoint PPT Presentation

the ics forth rdfsuite managing voluminous rdf
SMART_READER_LITE
LIVE PREVIEW

The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases - - PowerPoint PPT Presentation

Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases (http://www.ics.forth.gr/proj/isst/RDF) Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris


slide-1
SLIDE 1

1 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases

(http://www.ics.forth.gr/proj/isst/RDF)

Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris Plexousakis

Computer Science Department, University of Crete and Institute for Computer Science - FORTH Heraklion, Crete, Greece

Karsten Tolle

Johann Wolfgang Goethe University Frankfurt, Germany

slide-2
SLIDE 2

2 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Motivation

The WWW is rapidly evolving into a conceptual structure assimilating

vast information resources of diverse nature (sites, documents,

data, images etc.)

user communities (corporate, e-marketplaces, etc.) description and brokering services

Large volumes of various types of metadata need to be managed to

ensure:

fast deployment and easy maintenance

  • f large-scale applications for the Semantic Web
slide-3
SLIDE 3

3 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Motivation

The Semantic Web and RDF

a standard representation language for resource descriptions with a humanly readable / machine understandable syntax enabling content syndication via superimposed resource

descriptions

interpreted within or across communities using extensible

schemata

Fact: several content providers and web portals already adopt RDF,

thus giving rise to voluminous RDF description bases

Our thesis: take advantage of three decades of research in DB

technology to support

declarative access and logical / physical independence for RDF

description bases

slide-4
SLIDE 4

4 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Outline

The Open Directory Portal: a case study A Formal Data Model for RDF/S The RDF Query Language (RQL) Architecture

Core Middleware:

RDF Store, Parser/Loader, Query Interpreter

Testbed: the ODP RDF dump

Representative queries Performance

Summary and Outlook

slide-5
SLIDE 5

5 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

ODP Knowledge Catalog

Class

Regional Recreation Lodging Vacation- Rentals related Ext.Resource string title description related

typeOf (instance) subClassOf (isA) property

ns1: http://www.dmoz.org/topic.rdf ns2: www.oclc.org/dublincore.rdfs rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema#

integer date file_size last_modified Ile-de-France Paris Travel Hotel Directories Hotel

SunScale

title title &r1 &r3 &r2 &r4

Pulitzer Opera

title

Bedford Site officiel de Disneyland Paris Disneyland Official site of Disneyland Paris

title description description

slide-6
SLIDE 6

6 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

ODP Statistics

ODP Version: 16-01-2001

170 Mbytes of class hierarchies 700 Mbytes of resource descriptions 337,085 topics 16 hierarchies with

max depth: 13 ( 6.86 on average) max # subclasses: 314 ( 4.02 on average)

2,342,978 URIs

slide-7
SLIDE 7

7 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Resource Description Framework (RDF/S)

RDF: Resource Descriptions

Data Model: Directed Labeled Graphs

Nodes: Resources (URIs) or Litterals Edges: Properties – Attributes or Relationships Labels: Nodes (Class names) and Edges (Property names) Statement: assertion of the form resource, property, value Description: collection of statements concerning a resource

XML syntax

RDF Schema (RDFS): Schema Vocabularies

Specialization of both classes & properties (simple & multiple) Multiple classification under several classes Unordered, optional, and multi-valued properties Domain and range polymorphism of properties

slide-8
SLIDE 8

8 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

A Formal Data Model for RDF/S

resources URI

U

V

  • T

[[ . ]]

Property Class

< <

  • C
  • L
  • P
  • {[val,val}]

containers

[[ . ]]

  • N

H S

val

  • { }
  • [ ]
  • literals
slide-9
SLIDE 9

9 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

The RDF Query Language RQL

Declarative query language for RDF description bases

relies on a typed data model (literal & container types + union

types)

follows a functional approach (basic queries and filters) adapts the functionality of semistructured or XML query languages

to RDF, but also: treats properties as self-existent individuals exploits taxonomies of node and edge labels allows querying of schemas as semistructured data

Relational interpretation of schemas & resource descriptions

Classes (unary relations) Properties (binary relations) Containers (n-ary relations)

slide-10
SLIDE 10

10 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Portal Navigation with RQL

Browsing large description bases is cumbersome! RQL provides powerful path expressions permitting filtering and

navigation on both portal schemas and resource descriptions

E.g., to find (under the Regional ODP hierarchy) URI’s of hotels in

Paris whose title matches “Opera” select Z from (select $X from Regional {:$X} where $X like “*Hotel*” and $X < Paris){Y}.{Z}title{T} where T like “*Opera*”

slide-11
SLIDE 11

11 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

The ICS-FORTH RDFSuite

The Validating RDF Parser (VRP): Karsten Tolle Diploma Thesis

The first RDF Parser supporting semantic validation of both

resource descriptions and schemas

The RDF Schema Specific DataBase (RSSDB): Sophia Alexaki

M.Sc. Thesis

The first RDF Store using schema knowledge to automatically

generate an Object-Relational (SQL3) representation of RDF metadata and load resource descriptions

The RDF Query Language (RQL): Greg Karvournarakis

M.Sc. Thesis

The first Declarative Language for uniformly querying RDF

schemas and resource descriptions

slide-12
SLIDE 12

12 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

The RDFSuite Architecture

Parser

VRP Internal RDF Model

Validator

R D F L

  • a

d e r L

  • a

d i n g R D F J a v a A P I s

ICS-VRP JDBC Class Property ORDBMS

D B M S R D F q u e r y A P I S Q L 3 + S P I f u n c t i

  • n

s

LIB C++

p_name domain range Resource title Literal c_name Hotel Hotel Dir URI creates subcl

Hotel Dir

supcl Hotel subpr suppr

SubClass SubProperty

source paints target creates

Hotel title SQL3 SQL3 ICS-RQL Interpreter Typing Evaluation Graph Constructor Parser

slide-13
SLIDE 13

13 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Generic Representation

id: int 1 uri: text http://www.dmoz.org/topics.rdfs#Hotel

Resources

3 http://www.oclc.org/dublincore.rdfs#title 2 http://www.dmoz.org/topics.rdfs#Hotel Directories 9 r1 4 http://www.dmoz.org/schema.rdf#Ext.Resource predid: int 6

Triples

subid: int 2

  • bjid: int

1 5 3 7 5 1 8

  • bjvalue: text

5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 6 http://www.w3.org/2000/01/rdf-schema#subClassOf 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property 5 9 2 8 http://www.w3.org/2000/01/rdf-schema#Class 3 9 SunScale

slide-14
SLIDE 14

14 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Specific Representation

subid: int 11 13

SubClass

superid: int 1 12 subid: int 16

SubProperty

superid: int 14 12 1

Namespace Type

id: int 11 rangeid: int 4 4 12 13 id:int 1 uri: text http://www.w3.org/2000/01/rdf-schema# 3 http://www.oclc.org/dublincore.rdfs# 4 http://www.dmoz.org/topics.rdfs# id: int 1 nsid: int 1 lpart: text Resource 2 2 Bag 2 http://www.w3.org/1999/02/22-rdf-syntax-ns# 3 2 Seq 4 String

Class

nsid: int 5 lpart: text Ext.Resource 14 15

Property

nsid: int 3 3 lpart: text title description domainid: int 1 1 4 Hotel 4 Hotel Directories id: int 16 5 title 11 4 subtable

t12

URI: text

t1

source: text

t15

target: text URI: text r1

t11

URI: text r2 URI: text r1

t13

r2 source: text target: text source: text r1

t14

target: text SunScale r2 Pulitzer Opera

t16

classid: int 11 13 11 uri: text r1 r1

Instances

r2 r2 12

slide-15
SLIDE 15

15 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

DBMS Size vs. Schema Triples

DBMS size scales

linearly with the number

  • f schema triples

SpecRepr GenRepr

  • Aver. triple

size (with indexes)

0.086 KB (0.1734 KB) 0.1582 KB (0.3062 KB )

  • Aver. triple

storage time (with indexes)

0.0021 sec (0.0025) sec 0.0025 sec (0.0032 sec)

slide-16
SLIDE 16

16 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Graph 2: DBMS Size vs. Data Triples

DBMS size scales

linearly with the number

  • f data triples

SpecRepr GenRepr

  • Aver. triple size

(with indexes)

0.123 KB (0.2566 KB) 0.123 KB (0.2706 KB )

  • Aver. triple

storage time (with indexes)

0.0033 sec (0.0043) sec 0.0039 sec (0.00457 sec)

slide-17
SLIDE 17

17 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Query Templates for RDF description bases

Pure schema queries Q1 Find the range (or domain) of a property Q2 Find the direct subclasses of a class Q3 Find the transitive subclasses of a class Q4 Check if a class is a subclass of another class Queries on resource descriptions using available schema knowledge Q5 Find the direct extent of a class (or property) Q6 Find the transitive extent of a class (or property) Q7 Find if a resource is an instance of a class Q8 Find the resources having a property with a specific (or range of) value(s) Q9 Find the instances of a class having a given property Schema queries for specific resource descriptions Q10 Find the properties of a resource and their values Q11 Find the classes under which a resource is classified

slide-18
SLIDE 18

18 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Execution Time of RDF Benchmark Queries

Query

Generic Specific Case 1 Case 2 Case 3 Case 1 Case 2 Case 3

Q1 0.0015 0.0012 Q2 0.0017 0.0028 0.02 0.0012 0.0022 0.0124 Q3 0.0460 0.082 344.91 0.0463 0.0612 341.98 Q4 0.033 0.0415 0.0662 0.0333 0.0415 0.0662 Q5 0.0043 0.008 0.04 0.0015 0.0028 0.027 Q6 0.0573 0.315 627.43 0.0508 0.1118 482.45 Q7 0.0034 0.0034 0.0034 0.0016 0.0016 0.0017 Q8 124.20 365.73 675.42 0.0013 0.0069 0.0466 Q9 110.58 117.68 185.7 0.031 0.0338 0.1059 Q10 0.0072 0.0072 0.0072 0.0071 0.0071 0.0076 Q11 0.0035 0.0043 0.0056 0.0013 0.0015 0.0015

slide-19
SLIDE 19

19 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Comparison

Specific Representation permits the customization of the physical

representation of RDF metadata

Specific Representation outperforms the Generic Representation for

all types of queries

Q1, Q2, Q5, Q7, Q10, Q11: by a factor up to 3.73 Q3, Q4, Q6: by a factor up to 2.8 Q8, Q9: by a factor up to 95,538

Generic representation pays severe penalty for maintaining large

tables (Triples, Resources)

e.g., queries Q8, Q9 require (self-) joins of Triples, Resources

slide-20
SLIDE 20

20 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Summary and Outlook

RDFSuite addresses the needs of effective RDF metadata

management by providing tools for validation, storage and querying

validation follows a formal data model and constraints enforcing

consistency of RDF schemas

incremental loading of voluminous description bases in a

persistent store

declarative query language for schema and data querying

Ongoing efforts:

RQL query optimization transactional aspects alternative encoding and representation schemes for access

  • ptimization
slide-21
SLIDE 21

21 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

What’s next for the Semantic Web?

W3C Semantic Web activity:

RDF Interest Group / Advanced Technology Development

“Perform advanced development to design and develop supporting XML and RDF technologies” “Current threads that may lead to WG proposals include query languages and services, alternate XML syntaxes for RDF graphs, and interoperability testbeds” “provide testbeds for early results on implementing technologies in Working Drafts” “stimulate, where necessary, the development and open source availability of key Semantic Web infrastructure components such as parsers and APIs”

Formation of WGs on RDF infrastructure components (notably

query languages)

slide-22
SLIDE 22

22 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Acknowledgements

Funding was generously provided by the projects:

C-WEB (IST-1999-13479): “A Generic Platform Supporting

Community Webs”

MESMUSES (IST-2000-26074): “Metaphor for Science Museums” CYCLADES (IST-2000-25456): “An Open Collaborative Virtual

Archive Environment”

slide-23
SLIDE 23

23 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

A Formal Data Model for RDF

Η(Ν,<) is a well-defined hierarchy of classes/properties iff: c C => c < Class p P => p < Property p1,p2 P and p1 < p2 => domain(p1) domain(p2) and range(p1) range(p2) Type System:

= L | U | {} | [] | (1: + 2: + … + n:)

Interpretation Function: Literal types, [[ L ]] = dom(L) Bag types, [[ {} ]] = {1, 2,…, n}, 1, 2,…, n V are values of type Seq types, [[ [] ]] = {1, 2,…, n}, 1, 2,…, n V are values of type Alt types, [[ (1:1 + 2:2 +…+ n:n ) ]] = I, i V, 1<i<n is a value of type i c C, [[c]] = { | (c)}{(c’) | c’ < c} p P, [[p]] = {[1, 2] | 1 [[domain(p)]], 1 [[range(p)]]}{(p’) | p’ < p}

slide-24
SLIDE 24

24 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

A Formal Data Model for RDF

An RDF schema is a 5-tuple: RS = (VS, ES, H, , ) VS a set of nodes ES a set of edges Η = (Ν,<) a well-formed hierarchy of names an incidence function: Es VsVs

a labeling function: VS ES Ν Τ

An RDF description base, instance of a schema RS, is a 5-tuple:

RD = (VD, ED, , , )

VD a set of nodes ED a set of edges an incidence function: ED VDVD a valuation function: VD V

a labeling function: VD ED 2ΝΤ :

u VD, n CT: (u) [[n]] e ED [u,u’], p

slide-25
SLIDE 25

25 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001 C2 P1 r1 r2 P1 Resource

  • URI

RDF_Resource

  • rdf:type
  • ………...

RDF_Class

  • rdfs:subClassOf

RDF_Property

  • rdfs:domain
  • rdfs:range
  • rdfs:subPropertyOf
  • link_list

RDF_Statement

  • rdf:predicate
  • rdf:subject
  • rdf:object

Extended VRP Validator RDF Querying APIs Persistent Namespace (DBMS) Additional Constraints RDF Loading APIs

C1 p_name domain range Property

DBMS RDF Model

ns#C1 URI ns#C1

store() RDF_Class@2344 URI ns#C1 rdf:type rdfs#Class

c_name Class

store() RDF_Resource@7844 URI r1 rdf:type ns#C1

r1

store() RDF_Property@5678 rdf:type rdf#Property rdfs:range ns#C2 rdfs:domain ns#C1 link_list (r1,r2) URI ns#P1

source target ns#P1 ns#P1 ns#C1 ns#C2 r1 r2

The RDF to DBMS Loader

slide-26
SLIDE 26

26 ICS-FORTH & Univ. of Crete

Dimitris Plexousakis May 2001

Query Templates for RDF description bases