1 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases - - PowerPoint PPT Presentation
Dimitris Plexousakis May 2001 ICS-FORTH & Univ. of Crete The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases (http://www.ics.forth.gr/proj/isst/RDF) Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris
1 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
2 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
The WWW is rapidly evolving into a conceptual structure assimilating
vast information resources of diverse nature (sites, documents,
user communities (corporate, e-marketplaces, etc.) description and brokering services
Large volumes of various types of metadata need to be managed to
fast deployment and easy maintenance
3 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
The Semantic Web and RDF
a standard representation language for resource descriptions with a humanly readable / machine understandable syntax enabling content syndication via superimposed resource
interpreted within or across communities using extensible
Fact: several content providers and web portals already adopt RDF,
Our thesis: take advantage of three decades of research in DB
declarative access and logical / physical independence for RDF
4 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
The Open Directory Portal: a case study A Formal Data Model for RDF/S The RDF Query Language (RQL) Architecture
Core Middleware:
Testbed: the ODP RDF dump
Representative queries Performance
Summary and Outlook
5 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
typeOf (instance) subClassOf (isA) property
ns1: http://www.dmoz.org/topic.rdf ns2: www.oclc.org/dublincore.rdfs rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema#
SunScale
Pulitzer Opera
Bedford Site officiel de Disneyland Paris Disneyland Official site of Disneyland Paris
6 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
ODP Version: 16-01-2001
170 Mbytes of class hierarchies 700 Mbytes of resource descriptions 337,085 topics 16 hierarchies with
2,342,978 URIs
7 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
RDF: Resource Descriptions
Data Model: Directed Labeled Graphs
XML syntax
RDF Schema (RDFS): Schema Vocabularies
Specialization of both classes & properties (simple & multiple) Multiple classification under several classes Unordered, optional, and multi-valued properties Domain and range polymorphism of properties
8 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
9 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Declarative query language for RDF description bases
relies on a typed data model (literal & container types + union
follows a functional approach (basic queries and filters) adapts the functionality of semistructured or XML query languages
Relational interpretation of schemas & resource descriptions
Classes (unary relations) Properties (binary relations) Containers (n-ary relations)
10 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Browsing large description bases is cumbersome! RQL provides powerful path expressions permitting filtering and
E.g., to find (under the Regional ODP hierarchy) URI’s of hotels in
11 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
The Validating RDF Parser (VRP): Karsten Tolle Diploma Thesis
The first RDF Parser supporting semantic validation of both
The RDF Schema Specific DataBase (RSSDB): Sophia Alexaki
The first RDF Store using schema knowledge to automatically
The RDF Query Language (RQL): Greg Karvournarakis
The first Declarative Language for uniformly querying RDF
12 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
VRP Internal RDF Model
p_name domain range Resource title Literal c_name Hotel Hotel Dir URI creates subcl
Hotel Dir
supcl Hotel subpr suppr
source paints target creates
13 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
id: int 1 uri: text http://www.dmoz.org/topics.rdfs#Hotel
3 http://www.oclc.org/dublincore.rdfs#title 2 http://www.dmoz.org/topics.rdfs#Hotel Directories 9 r1 4 http://www.dmoz.org/schema.rdf#Ext.Resource predid: int 6
subid: int 2
1 5 3 7 5 1 8
5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 6 http://www.w3.org/2000/01/rdf-schema#subClassOf 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property 5 9 2 8 http://www.w3.org/2000/01/rdf-schema#Class 3 9 SunScale
14 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
subid: int 11 13
superid: int 1 12 subid: int 16
superid: int 14 12 1
id: int 11 rangeid: int 4 4 12 13 id:int 1 uri: text http://www.w3.org/2000/01/rdf-schema# 3 http://www.oclc.org/dublincore.rdfs# 4 http://www.dmoz.org/topics.rdfs# id: int 1 nsid: int 1 lpart: text Resource 2 2 Bag 2 http://www.w3.org/1999/02/22-rdf-syntax-ns# 3 2 Seq 4 String
nsid: int 5 lpart: text Ext.Resource 14 15
nsid: int 3 3 lpart: text title description domainid: int 1 1 4 Hotel 4 Hotel Directories id: int 16 5 title 11 4 subtable
URI: text
source: text
target: text URI: text r1
URI: text r2 URI: text r1
r2 source: text target: text source: text r1
target: text SunScale r2 Pulitzer Opera
classid: int 11 13 11 uri: text r1 r1
r2 r2 12
15 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
DBMS size scales
size (with indexes)
storage time (with indexes)
16 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
DBMS size scales
(with indexes)
storage time (with indexes)
17 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
18 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Query
Q1 0.0015 0.0012 Q2 0.0017 0.0028 0.02 0.0012 0.0022 0.0124 Q3 0.0460 0.082 344.91 0.0463 0.0612 341.98 Q4 0.033 0.0415 0.0662 0.0333 0.0415 0.0662 Q5 0.0043 0.008 0.04 0.0015 0.0028 0.027 Q6 0.0573 0.315 627.43 0.0508 0.1118 482.45 Q7 0.0034 0.0034 0.0034 0.0016 0.0016 0.0017 Q8 124.20 365.73 675.42 0.0013 0.0069 0.0466 Q9 110.58 117.68 185.7 0.031 0.0338 0.1059 Q10 0.0072 0.0072 0.0072 0.0071 0.0071 0.0076 Q11 0.0035 0.0043 0.0056 0.0013 0.0015 0.0015
19 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Specific Representation permits the customization of the physical
Specific Representation outperforms the Generic Representation for
Q1, Q2, Q5, Q7, Q10, Q11: by a factor up to 3.73 Q3, Q4, Q6: by a factor up to 2.8 Q8, Q9: by a factor up to 95,538
Generic representation pays severe penalty for maintaining large
e.g., queries Q8, Q9 require (self-) joins of Triples, Resources
20 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
RDFSuite addresses the needs of effective RDF metadata
validation follows a formal data model and constraints enforcing
incremental loading of voluminous description bases in a
declarative query language for schema and data querying
Ongoing efforts:
RQL query optimization transactional aspects alternative encoding and representation schemes for access
21 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
W3C Semantic Web activity:
RDF Interest Group / Advanced Technology Development
Formation of WGs on RDF infrastructure components (notably
22 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Funding was generously provided by the projects:
C-WEB (IST-1999-13479): “A Generic Platform Supporting
MESMUSES (IST-2000-26074): “Metaphor for Science Museums” CYCLADES (IST-2000-25456): “An Open Collaborative Virtual
23 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
Η(Ν,<) is a well-defined hierarchy of classes/properties iff: c C => c < Class p P => p < Property p1,p2 P and p1 < p2 => domain(p1) domain(p2) and range(p1) range(p2) Type System:
Interpretation Function: Literal types, [[ L ]] = dom(L) Bag types, [[ {} ]] = {1, 2,…, n}, 1, 2,…, n V are values of type Seq types, [[ [] ]] = {1, 2,…, n}, 1, 2,…, n V are values of type Alt types, [[ (1:1 + 2:2 +…+ n:n ) ]] = I, i V, 1<i<n is a value of type i c C, [[c]] = { | (c)}{(c’) | c’ < c} p P, [[p]] = {[1, 2] | 1 [[domain(p)]], 1 [[range(p)]]}{(p’) | p’ < p}
24 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001
An RDF schema is a 5-tuple: RS = (VS, ES, H, , ) VS a set of nodes ES a set of edges Η = (Ν,<) a well-formed hierarchy of names an incidence function: Es VsVs
a labeling function: VS ES Ν Τ
An RDF description base, instance of a schema RS, is a 5-tuple:
VD a set of nodes ED a set of edges an incidence function: ED VDVD a valuation function: VD V
a labeling function: VD ED 2ΝΤ :
25 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001 C2 P1 r1 r2 P1 Resource
RDF_Resource
RDF_Class
RDF_Property
RDF_Statement
Extended VRP Validator RDF Querying APIs Persistent Namespace (DBMS) Additional Constraints RDF Loading APIs
C1 p_name domain range Property
DBMS RDF Model
ns#C1 URI ns#C1
store() RDF_Class@2344 URI ns#C1 rdf:type rdfs#Class
c_name Class
store() RDF_Resource@7844 URI r1 rdf:type ns#C1
r1
store() RDF_Property@5678 rdf:type rdf#Property rdfs:range ns#C2 rdfs:domain ns#C1 link_list (r1,r2) URI ns#P1
source target ns#P1 ns#P1 ns#C1 ns#C2 r1 r2
26 ICS-FORTH & Univ. of Crete
Dimitris Plexousakis May 2001