Building RDF- and Schema-Based Peer-to-Peer Systems / University - - PDF document

building rdf and schema based peer to peer systems
SMART_READER_LITE
LIVE PREVIEW

Building RDF- and Schema-Based Peer-to-Peer Systems / University - - PDF document

Building RDF- and Schema-Based Peer-to-Peer Systems / University of Hannover Wolfgang Nej dl Germany L3S Overview Relevant L3S Proj ect Background Motivation Proj ect Background - PADLR, Edutella, et al S chema-Based Peer-to-Peer


slide-1
SLIDE 1

Building RDF- and Schema-Based Peer-to-Peer Systems

Wolfgang Nej dl L3S / University of Hannover Germany

slide-2
SLIDE 2

14/ 10/ 03 Wolfgang Nej dl 2

Overview

Relevant L3S Proj ect Background

Motivation Proj ect Background - PADLR, Edutella, et al

S chema-Based Peer-to-Peer Networks

Characteristics and Building Blocks Resource Description Framework (RDF) and RDF S

chema

Edutella Query S

ervice / RDF Query Exchange Language RDF-QEL

Semantic Web Inferencing S

ubscriptions

Efficient Routing / HyperCuP & S

uper-Peers

Integration of new Peers / Clustering Distributed Query Processing in P2P Networks Mediation

S ummary and Conclusions

slide-3
SLIDE 3

14/ 10/ 03 Wolfgang Nej dl 3

Motivation

Distributed Peer-to-Peer Infrastructures for the S emantic Web S emantic Web Metadata S tandards for describing (learning) resources and users

How can we use distributed (learning) resources in a personalized way?

Personalized Environments in the Adaptive Web

slide-4
SLIDE 4

14/ 10/ 03 Wolfgang Nej dl 4

PADLR and Edutella

Personalized Access to Distributed Learning Repositories

(www.learninglab.de/ english/ proj ects/ padlr.html)

Most important (CS) modules

  • Peer-to-Peer Infrastructure

(incl. Clients and Providers)

  • Courseware Watchdog and

Metadata Extraction

  • Personalization and

Personalized Queries based

  • n Metadata
  • Web-(Learning-) Services

PADLR Participants (Stanford, Hannover, Karlsruhe, Stockholm, Uppsala) + Edutella Participants (Vienna, Berlin, Darmstadt, etc.)

slide-5
SLIDE 5

14/ 10/ 03 Wolfgang Nej dl 5

Edutella: Goal and Approach

S pecify and implement a RDF-based meta-data infrastructure for P2P networks Developed as part of the

  • pen source peer-to-peer

proj ect JXTA edutella.j xta.org > 60 contributors from various institutions

slide-6
SLIDE 6

14/ 10/ 03 Wolfgang Nej dl 6

EU/FP6 NoE KnowledgeWeb

Semantic Web Services Languages Heterogeneity Dynamics Scalabity

Knowledge Web

slide-7
SLIDE 7

14/ 10/ 03 Wolfgang Nej dl 7

EU/FP6 NoE PROLEARN

Working towards

innovative elearning

resources

interoperable elearning

resources and systems

sustainable elearning

infrastructures and processes for S MEs

slide-8
SLIDE 8

14/ 10/ 03 Wolfgang Nej dl 8

EU/FP6 NoE REWERSE

Reasoning on the Web with Rules and S emantics

Develop reasoning languages for advanced Web systems Test these languages on adaptive Web systems and Web-based decision support systems Bring these languages to the level of pre-standards

S elected applications for proof-of-concept purposes

Personalized Web systems Web-based decision support Towards a Bioinformatics S emantic Web

slide-9
SLIDE 9

14/ 10/ 03 Wolfgang Nej dl 9

Overview

Relevant L3S Proj ect Background

Motivation Proj ect Background - PADLR, Edutella, et al

S chema-Based Peer-to-Peer Networks

Characteristics and Building Blocks Resource Description Framework (RDF) and RDF S

chema

Edutella Query S

ervice / RDF Query Exchange Language RDF-QEL

Semantic Web Inferencing S

ubscriptions

Efficient Routing / HyperCuP & S

uper-Peers

Integration of new Peers / Clustering Distributed Query Processing in P2P Networks Mediation

S ummary and Conclusions

slide-10
SLIDE 10

14/ 10/ 03 Wolfgang Nej dl 10

Schema-Based Peer-to-Peer Networks

User-definable schemas S tructured schemas Query language

(system list not complete)

Decentralized control Node autonomy Transient peers S elf organization

Database Systems P2P Systems Schema-based P2P Systems schema- based peer-to-peer CAN CHORD DIRECTCONNECT GNUTELLA KAZAA P-GRID NAPSTER AMOSII OBJECTGLOBE TSIMMIS TUKWILA CHATTY WEB EDUTELLA PIAZZA ANY RDBMS CONCEPTBASE ONTOBROKER fixed schema/ keywords key local distributed

slide-11
SLIDE 11

14/ 10/ 03 Wolfgang Nej dl 11

Building Blocks

Flexible S chema Language

to describe complex and heterogeneous resources in the P2P

network

Expressive Query Language

to retrieve data from heterogeneous data stores

Efficient Network Topology

to allow appropriate routing algorithms

Mediation Facilities

to integrate and combine (possibly heterogeneous) information

slide-12
SLIDE 12

14/ 10/ 03 Wolfgang Nej dl 12

RDF / RDF Schema for Describing Distributed Resources

Basic Formalisms for the S emantic Web

URIs to identify resources Combine resources and annotate resources with attributes, using

<S ubj ect, Property, Value> Tuples

Graph as basic model, easy to translate to logic facts RDFS

allows us to define the RDF vocabulary used (classes and attributes), and thus to represent simple semantic models

Possible extensions towards more expressive semantic descriptions, e.g.

description logic (DAML+OIL / OWL)

Using RDF / RDFS in the P2P context

Distributed annotations for distributed resources Flexible schema definitions, which can be uniquely identified and

combined, as well as extended by additional properties

slide-13
SLIDE 13

14/ 10/ 03 Wolfgang Nej dl 13

Characterization of Peers using RDFS

S chema level

S

upporting specific schemas: dc, lom, dcq

Property level

S

upporting specific properties: dc:subj ect, lom:type, dc:format

Property value range

S

upported ranges for specific properties, e.g. ccs:dbms for dc:subj ect

Property values

S

pecific attribute values, e.g. „ exercise“ for lom:type, „ en“ for dc:language

slide-14
SLIDE 14

14/ 10/ 03 Wolfgang Nej dl 14

RDF-QEL: RDF Query (Exchange) Language

Datalog-based Query Exchange Language (RDF-QEL)

RDF QEL1: conj unctive query up to RDF QEL5: RDF QEL4 (SQL3) + general recursion see Nejdl et al: „EDUTELLA: A P2P Networking Infrastructure Based on RDF“, WWW 2002

Datalog-based ECDM RDF QEL 1-5 Edutella consumer Local query RDF query result repository Edutella Provider Edutella query data flow

Datalog is used as the internal data model (ECDM:

Edutella Common Data Model) and provided as a set

  • f Java classes

RDF is used to represent the queries transmitted

between the peers

Wrappers for ot her RDF query languages (RQL,

TRIPLE, etc.) and XML query languages (like Xpath)

slide-15
SLIDE 15

14/ 10/ 03 Wolfgang Nej dl 15

From Querying to Reasoning

World Wide Web Data as Distributed (Web) content + S emantic Web Metadata Distributed and interoperable (RDF) metadata descriptions about:

Content Relationships between the content Learner

+ S emantic Web Inferencing (Logic) Programs and Rules to:

Adapt the content and

relationships (links)

Infer new metadata

= Declarative and Composable Web S ervices siehe auch REWERS E NoE

P2P

Content Relationships Content Metadata Logic Programs Learner Model

slide-16
SLIDE 16

14/ 10/ 03 Wolfgang Nej dl 16

P2P and Semantic Web Inferencing: Edutella as basic infrastructure for ELENA (EU/FP5)

slide-17
SLIDE 17

14/ 10/ 03 Wolfgang Nej dl 17

Another Possibility: Don‘t query, subscribe

S ubscriptions are a good idea, too (get the NYTimes each morning, get new teaching material on P2P topologies … ) Example: S elective Information Dissemination in P2P-DIET Instead of Queries and Answers we need

Profile forwarding Notification forwarding / Filtering Advertisement forwarding Dynamicity of P2P network storing notifications / rendezvous

S ee e.g. Koubarakis et al: S elective Information Dissemination in P2P Networks: Problems and S

  • lutions, S

IGMOD Record, S pecial P2P Issue, S eptember 2003

slide-18
SLIDE 18

14/ 10/ 03 Wolfgang Nej dl 18

P2P and Efficient Routing

How do peer-to-peer networks scale? Requirements:

S

ymmetric topology (every node is a root)

Low network diameter (small worlds property, should be

O(log n))

Limited node degrees (number of peer-connections from a node,

should be O(log n))

Load balancing of traffic Efficient broadcast (receive broadcast messages only once) Adaptable to dynamic number of peers

slide-19
SLIDE 19

14/ 10/ 03 Wolfgang Nej dl 19

HyperCuP Peer-to-Peer Topology

Details: see e.g. S chlosser, S intek, Decker, Nej dl: „ HyperCuP – S haping Up Peer-to-Peer Networks“ , 2nd Intl. WS

  • n Agents and P2P Computing, 2002
slide-20
SLIDE 20

14/ 10/ 03 Wolfgang Nej dl 20

Hypercube Topology

Broadcast Algorithm

Annotate messages with the “ dimension” of the peer-to-peer

connection, and only forward it along “ higher” dimensions

Properties

Network diameter, characteristic path length and number of nodes are

O(logbN)

Fault tolerant, vertex-symmetric

8 1 2

1 1

3 4 5 7

1 1

6

2 2 2 2 Step 1 Step 2 Step 3

slide-21
SLIDE 21

14/ 10/ 03 Wolfgang Nej dl 21

Super-Peer Networks

Observation: Peers vary significantly in availability, bandwidth, processing power, etc. Create network backbone from highly available and powerful peers to distribute load better. S ee also Yang, Garcia-Molina: Improving S earch in P2P S ystems, Intl.

  • Conf. on Distributed Computing S

ystems, Vienna, 2002, or file sharing networks like KaZaa

slide-22
SLIDE 22

14/ 10/ 03 Wolfgang Nej dl 22

Super-Peers and Routing Indices

Nejdl et al. Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-To-Peer Networks. WWW 2003

slide-23
SLIDE 23

14/ 10/ 03 Wolfgang Nej dl 23

Extension to Distributed Query Processing

Interleave P2P techniques and query processing

Push abstract query plans through the super peer network S

uper peers pick and expand those parts of the query plan that can be executed locally

On the fly distribution and expansion of query plans

S ee Brunkhorst, Dhraief, Kemper, Nej dl, Wiesner: Distributed Queries and Query Optimization in S chema-Based P2P-S ystems, VLDB-P2P- Workshop

slide-24
SLIDE 24

14/ 10/ 03 Wolfgang Nej dl 24

Clustering for Better Routing

Have to use Clustering to make routing indices efficient

Query-Based Clustering: cluster on query and data

characteristics, using frequency counting algorithms to identify the most relevant item sets to be included in the indices and to be used for clustering

Rule-based clustering: cluster based on user-specified rules

(cmp. DirectConnect and E-Donkey file sharing networks), which explicitly state the clustering criteria (see Löser, Nej dl, Wolpers, S iberski: Information Integration in S chema-Based P2P Networks, CAIS E 2003)

slide-25
SLIDE 25

14/ 10/ 03 Wolfgang Nej dl 25

Mediation: Some P2P-Specific Issues

Which basic assumptions should we take?

Peer databases: relational databases or deductive databases based on

Datalog (and definitively with minimal model property)

Motivation: Moving from key-word based P2P systems to schema-

based systems is a good idea for more general P2P information systems, but these schema-based systems should not be too complicated

No global schema, but mapping rules between two peers: range-

restricted rules with conj unctive queries in body and head

Motivation: These or simpler mapping rules are probably not too

difficult to create (a P2P system might need many of them), and they take care of the dynamicity of the P2P environment

„ global program“ can again be seen as a Datalog program see Franconi et al, VLDB-P2P-Workshop

slide-26
SLIDE 26

14/ 10/ 03 Wolfgang Nej dl 26

Mediation: Some P2P-Specific issues

Further observations

Acyclical mapping rules might actually be sufficient (see also

Piazza: Halevy et al, WWW 2003 (Peer data management systems: Infrastructure for the S emantic Web), ICDE 2003)

Cycles in the mapping rules might not be meant as recursions,

but could be used for checking the quality / completeness of mapping rules (see also Aberer et al: The Chatty Web, WWW 2003)

Mapping / matching vocabularies might be sufficient, too (see

He, Chang: S tatistical S chema Matching across Web Query Interfaces, S IGMOD 2003)

slide-27
SLIDE 27

14/ 10/ 03 Wolfgang Nej dl 27

Overview

Relevant L3S Proj ect Background

Motivation Proj ect Background - PADLR, Edutella, et al

S chema-Based Peer-to-Peer Networks

Characteristics and Building Blocks Resource Description Framework (RDF) and RDF S

chema

Edutella Query S

ervice / RDF Query Exchange Language RDF-QEL

Semantic Web Inferencing S

ubscriptions

Efficient Routing / HyperCuP & S

uper-Peers

Integration of new Peers / Clustering Distributed Query Processing in P2P Networks Mediation

S ummary and Conclusions

slide-28
SLIDE 28

14/ 10/ 03 Wolfgang Nej dl 28

Summary and Conclusions

S chema-based P2P networks and P2P-based data management infrastructures build upon traditional P2P networks and distributed / heterogeneous database research, while posing new challenges as well as additional functionalities Building blocks are flexible / extendable schema languages (such as RDFS ), expressive query and reasoning languages, efficient network topologies as well as routing and clustering algorithms, and finally advanced, but not too complicated, data integration and mediation functionalities S ee also S IGMOD Record S eptember 2003, S pecial P2P Issue: Nej dl, S iberski, S intek: „ Design Issues and Challenges for RDF- and S chema- Based Peer-to-Peer S ystems“