Federated Semantic Data Management 25-30 June 2017 - dagstuhl - - - PowerPoint PPT Presentation

federated semantic data management
SMART_READER_LITE
LIVE PREVIEW

Federated Semantic Data Management 25-30 June 2017 - dagstuhl - - - PowerPoint PPT Presentation

Federated Semantic Data Management 25-30 June 2017 - dagstuhl - Germany Hala Skaf-Molli Pascal Molli Universit de Nantes NANTES GDD: Distributed Data Management Group Foundations of Distributed Distributed Data Management Federated


slide-1
SLIDE 1

Federated Semantic Data Management

Hala Skaf-Molli Pascal Molli Université de Nantes 25-30 June 2017 - dagstuhl - Germany

slide-2
SLIDE 2

NANTES

slide-3
SLIDE 3

GDD: Distributed Data Management Group

Foundations of Distributed Systems

  • Distributed algorithms
  • Distributed Data

Structures

  • Consistency criteria &

protocols

  • Fog Computing

Distributed Data Management

  • Federated Query Processing :

○ source selection, decomposition,

  • ptimization, operators
  • Data Integration
  • Replication, synchronization &

consistency

  • Queries in the Fog
slide-4
SLIDE 4

Replication, synchronization, Consistency

How to improve linked data together ? I see a mistake how to fix it, especially if i cannot edit ?

  • Idea: Replicate and synchronize… git for RDF

data...

  • Live Linked Data: synchronising semantic stores with

commutative replicated data types. JMSO13

  • Towards Writable and Scalable Linked Open Data. ISWC14
slide-5
SLIDE 5

LinkedCT wifo5-mannheim.de:DB00087 rdf:type Drug

5

Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymhoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wiwiss-berlin:DB00087]

Wiwiss-berlin.de:DB00087 rdf:type Drug

I’m ready to fix the problem. How can I update?

Replication, synchronization, Consistency

slide-6
SLIDE 6

MyOrg (My Endpoint)

wifo5-mannheim.de:DB00087 DB:Half-Life 288h

6

Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymphoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wiwiss-berlin:DB00087 ] wifo5-mannheim.de:DB00087 DB:Half-Life 288h

Lct:intervention1 rdfs:seeAlso wifo5-mannheim:DB00087

Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymphoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wifo5-mannheim.de:DB00087]

Update Feed Update Feed CONSTRUCT WHERE { ?x rdfs:seeAlso ?y} CONSTRUCT WHERE { ?x rdfs:seeAlso ?y} CONSTRUCT WHERE { ?x rdf:type drugbank:drug }

slide-7
SLIDE 7

Data Integration

How to query the deep web and linked data with SPARQL ?

  • Idea: Local-as-view mediator with smart materialization of

views

  • Semlav: Local-as-view mediation for sparql queries. TLDKS14
  • Semlav: Querying deep web and linked open data with

SPARQL, ESWC14 (demo)

  • Gun: An efficient execution strategy for querying the web of

data Dexa2013

slide-8
SLIDE 8

SELECT DISTINCT * WHERE { ?P foaf:member ?C . ?C rdfs:label “Semantic Web“ . ?P foaf:knows ?WKP . ?WKP foaf:name “Barack Obama“ }

Client SemLAV

Global Schema Query Executor

foaf:name rdfs:label foaf:mem ber foaf:name rdfs:label foaf:mem ber foaf:na me rdfs:labe l foaf:name rdfs:label foaf:name rdfs:label

8

slide-9
SLIDE 9

9

Q(P,C,WKP,N):- member(P,C), label(C,”Semantic Web”), knows(P,WKP), name(WKP, ,”Barack Obama”) v1(P,A,I,C,L):-made(P,A),affiliation(P,I),member(P,C),label(C,L) v2(A,T,P,N,C):-title(A,T),made(P,A),name(P,N),member(P,C) v3(P,N,R,M):-name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C):-name(P,N),gender(P,G),knows(P,R),member(P,C) v5(P,N,R,C,L):-name(P,N),knows(P,R),member(P,C),label(C,L)

Query : LAV mappings:

member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) 4 3 2

2

slide-10
SLIDE 10

Federated Queries & Replication

  • How to improve data availability of the linked data ?

○ Idea: Partial data replication to create new data locality and smart source selection ■ Federated SPARQL Queries Processing with Replicated

  • Fragments. ISWC15

○ Idea: Partial data replication and query decomposition ■ Decomposing federated queries in presence of replicated fragments JWS17

slide-11
SLIDE 11

Data replication & query decomposition

  • consider a BGP with three triple patterns tp1,tp2, and tp3.

○ Endpoint C1 is relevant for tp1 and tp3, ○ Endpoint C2 is relevant for tp1 and tp2. ○ tp1@c1=tp1@c2

  • Existing source selection strategies prevent from

assigning tp1.tp3 to C1 and tp1.tp2 to C2, even if these sub-queries generate less intermediate results...

slide-12
SLIDE 12

Federated queries & Replication

  • How to improve query performance on the linked

data ?

  • Idea: Partial replication and intra-query

parallelization

  • PeNeLoop: Parallelizing Federated SPARQL

Queries in Presence of Replicated Fragments - QUWEDA@ESWC17

slide-13
SLIDE 13

PeNeLoop Query Processing

13

E1 E2 Start E3

π ⋈

M1, M2 M3, M4 M7

{ ?movie = dbo:Seven_Samurai, ?name = “Samurai movie” }

{ tp1. tp2. } Join ⋈ performed in local at E2 Both E1 & E3 are used to process the join SELECT ?movie ?name WHERE { ?movie dbo:director ?director . (tp1) ?movie lmdb:genre ?genre . (tp2) ?genre lmdb:genre_name ?name . (tp3) } M6

slide-14
SLIDE 14

Queries in the Fog

How to have data availability *and* performances ?

  • Idea: P2P resource sharing but on client side… in

the fog of browsers

○ CyCLaDEs: A Decentralized Cache for Linked Data Fragments ESWC 2016 ○ SPARQL Queries in the Fog of Browsers Demo@ESWC 2017

slide-15
SLIDE 15

HTTP Cache

DrugBank DBpedia

LDF Server

c1 c2 c4 c7 c8 c9 c5 c6 c3 c1 c2 c4 c3 c6 c7 c8 c9 c1 c2 c4 c3 c5 c6 c7 c8 c9

HTTP Cache

DrugBank DBpedia

LDF Server

c5

1 2 2 2

slide-16
SLIDE 16

SPARQL Queries in the Fog of Browsers

Fog of Browsers: P2P network of Browsers with Browser to browser Connections (WebRTC)

WebRTC: https://webrtc.org/

slide-17
SLIDE 17

FoB with Triple Pattern Fragments

Servers run TPF servers Browsers run TPF Clients: C1, C2...

C1 C2 TPFs TPFs TPFc TPFc

TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.

slide-18
SLIDE 18

Clients receive SPARQL queries...

Any Client can receive at anytime SPARQL queries...

W1:Q1,Q2, Q3,Q4 W2:Q5,Q6 C1 C2 TPFs TPFs TPFc TPFc

TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.

slide-19
SLIDE 19

Clients receive SPARQL queries...

Any Client can receive anytime SPARQL queries. Do it yourself, or delegate some to neighbors : Client-side Inter-query parallelism

  • Q4@C4, Q3@C3...

W1:Q1,Q2, Q3,Q4 W2:Q5,Q6 C1 C2 TPFs TPFs Q4 C3 C4 C5

slide-20
SLIDE 20

Clients receives SPARQL queries...

W1:Q1,Q2, Q3,Q4 W2:Q5,Q6

Can we reduce the global Execution Time (ET) of W1 and W2 by delegating queries to neighbours ? ET(W1@C1 // W2@C2) > ET({W1 ∪ W2}@{C1-C5} ?

C1 C2 TPFs TPFs

slide-21
SLIDE 21

ladda-demo.herokuapp.com

slide-22
SLIDE 22
slide-23
SLIDE 23

GDD Research Group Distributed Data Management

  • P. Molli - H. Skaf

Mcf Univ Nantes 25-30 June 2017 - dagstuhl - Germany