Federated Semantic Data Management 25-30 June 2017 - dagstuhl - - - PowerPoint PPT Presentation
Federated Semantic Data Management 25-30 June 2017 - dagstuhl - - - PowerPoint PPT Presentation
Federated Semantic Data Management 25-30 June 2017 - dagstuhl - Germany Hala Skaf-Molli Pascal Molli Universit de Nantes NANTES GDD: Distributed Data Management Group Foundations of Distributed Distributed Data Management Federated
NANTES
GDD: Distributed Data Management Group
Foundations of Distributed Systems
- Distributed algorithms
- Distributed Data
Structures
- Consistency criteria &
protocols
- Fog Computing
Distributed Data Management
- Federated Query Processing :
○ source selection, decomposition,
- ptimization, operators
- Data Integration
- Replication, synchronization &
consistency
- Queries in the Fog
Replication, synchronization, Consistency
How to improve linked data together ? I see a mistake how to fix it, especially if i cannot edit ?
- Idea: Replicate and synchronize… git for RDF
data...
- Live Linked Data: synchronising semantic stores with
commutative replicated data types. JMSO13
- Towards Writable and Scalable Linked Open Data. ISWC14
LinkedCT wifo5-mannheim.de:DB00087 rdf:type Drug
5
Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymhoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wiwiss-berlin:DB00087]
Wiwiss-berlin.de:DB00087 rdf:type Drug
I’m ready to fix the problem. How can I update?
Replication, synchronization, Consistency
MyOrg (My Endpoint)
wifo5-mannheim.de:DB00087 DB:Half-Life 288h
6
Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymphoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wiwiss-berlin:DB00087 ] wifo5-mannheim.de:DB00087 DB:Half-Life 288h
Lct:intervention1 rdfs:seeAlso wifo5-mannheim:DB00087
Lct:intervention1 [ Lct:type Drug . Lct:condition Lct:T-Cell-Lymphoma rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wifo5-mannheim.de:DB00087]
Update Feed Update Feed CONSTRUCT WHERE { ?x rdfs:seeAlso ?y} CONSTRUCT WHERE { ?x rdfs:seeAlso ?y} CONSTRUCT WHERE { ?x rdf:type drugbank:drug }
Data Integration
How to query the deep web and linked data with SPARQL ?
- Idea: Local-as-view mediator with smart materialization of
views
- Semlav: Local-as-view mediation for sparql queries. TLDKS14
- Semlav: Querying deep web and linked open data with
SPARQL, ESWC14 (demo)
- Gun: An efficient execution strategy for querying the web of
data Dexa2013
SELECT DISTINCT * WHERE { ?P foaf:member ?C . ?C rdfs:label “Semantic Web“ . ?P foaf:knows ?WKP . ?WKP foaf:name “Barack Obama“ }
Client SemLAV
Global Schema Query Executor
foaf:name rdfs:label foaf:mem ber foaf:name rdfs:label foaf:mem ber foaf:na me rdfs:labe l foaf:name rdfs:label foaf:name rdfs:label
8
9
Q(P,C,WKP,N):- member(P,C), label(C,”Semantic Web”), knows(P,WKP), name(WKP, ,”Barack Obama”) v1(P,A,I,C,L):-made(P,A),affiliation(P,I),member(P,C),label(C,L) v2(A,T,P,N,C):-title(A,T),made(P,A),name(P,N),member(P,C) v3(P,N,R,M):-name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C):-name(P,N),gender(P,G),knows(P,R),member(P,C) v5(P,N,R,C,L):-name(P,N),knows(P,R),member(P,C),label(C,L)
Query : LAV mappings:
member(P,C) label(C,L) knows(P,WKP) name(WKP,N) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) v2(A,T,P,N,C) v3(P,N,R,M) 4 3 2
2
Federated Queries & Replication
- How to improve data availability of the linked data ?
○ Idea: Partial data replication to create new data locality and smart source selection ■ Federated SPARQL Queries Processing with Replicated
- Fragments. ISWC15
○ Idea: Partial data replication and query decomposition ■ Decomposing federated queries in presence of replicated fragments JWS17
Data replication & query decomposition
- consider a BGP with three triple patterns tp1,tp2, and tp3.
○ Endpoint C1 is relevant for tp1 and tp3, ○ Endpoint C2 is relevant for tp1 and tp2. ○ tp1@c1=tp1@c2
- Existing source selection strategies prevent from
assigning tp1.tp3 to C1 and tp1.tp2 to C2, even if these sub-queries generate less intermediate results...
Federated queries & Replication
- How to improve query performance on the linked
data ?
- Idea: Partial replication and intra-query
parallelization
- PeNeLoop: Parallelizing Federated SPARQL
Queries in Presence of Replicated Fragments - QUWEDA@ESWC17
PeNeLoop Query Processing
13
E1 E2 Start E3
π ⋈
M1, M2 M3, M4 M7
{ ?movie = dbo:Seven_Samurai, ?name = “Samurai movie” }
{ tp1. tp2. } Join ⋈ performed in local at E2 Both E1 & E3 are used to process the join SELECT ?movie ?name WHERE { ?movie dbo:director ?director . (tp1) ?movie lmdb:genre ?genre . (tp2) ?genre lmdb:genre_name ?name . (tp3) } M6
Queries in the Fog
How to have data availability *and* performances ?
- Idea: P2P resource sharing but on client side… in
the fog of browsers
○ CyCLaDEs: A Decentralized Cache for Linked Data Fragments ESWC 2016 ○ SPARQL Queries in the Fog of Browsers Demo@ESWC 2017
HTTP Cache
DrugBank DBpedia
LDF Server
c1 c2 c4 c7 c8 c9 c5 c6 c3 c1 c2 c4 c3 c6 c7 c8 c9 c1 c2 c4 c3 c5 c6 c7 c8 c9
HTTP Cache
DrugBank DBpedia
LDF Server
c5
1 2 2 2
SPARQL Queries in the Fog of Browsers
Fog of Browsers: P2P network of Browsers with Browser to browser Connections (WebRTC)
WebRTC: https://webrtc.org/
FoB with Triple Pattern Fragments
Servers run TPF servers Browsers run TPF Clients: C1, C2...
C1 C2 TPFs TPFs TPFc TPFc
TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.
Clients receive SPARQL queries...
Any Client can receive at anytime SPARQL queries...
W1:Q1,Q2, Q3,Q4 W2:Q5,Q6 C1 C2 TPFs TPFs TPFc TPFc
TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.
Clients receive SPARQL queries...
Any Client can receive anytime SPARQL queries. Do it yourself, or delegate some to neighbors : Client-side Inter-query parallelism
- Q4@C4, Q3@C3...
W1:Q1,Q2, Q3,Q4 W2:Q5,Q6 C1 C2 TPFs TPFs Q4 C3 C4 C5
Clients receives SPARQL queries...
W1:Q1,Q2, Q3,Q4 W2:Q5,Q6
Can we reduce the global Execution Time (ET) of W1 and W2 by delegating queries to neighbours ? ET(W1@C1 // W2@C2) > ET({W1 ∪ W2}@{C1-C5} ?
C1 C2 TPFs TPFs
ladda-demo.herokuapp.com
GDD Research Group Distributed Data Management
- P. Molli - H. Skaf