Semantic federation of distributed neurodata Alban Gaignard, Johan - - PowerPoint PPT Presentation

semantic federation of distributed neurodata
SMART_READER_LITE
LIVE PREVIEW

Semantic federation of distributed neurodata Alban Gaignard, Johan - - PowerPoint PPT Presentation

Semantic federation of distributed neurodata Alban Gaignard, Johan Montagnat, Catherine Faron Zucker, Olivier Corby alban.gaignard@i3s.unice.fr CNRS / UNS, lab. I3S, Sophia Antipolis, Modalis team INRIA Sophia Antipolis, Wimmics team A.


slide-1
SLIDE 1
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Semantic federation of distributed neurodata

Alban Gaignard, Johan Montagnat, Catherine Faron Zucker, Olivier Corby alban.gaignard@i3s.unice.fr CNRS / UNS, lab. I3S, Sophia Antipolis, Modalis team INRIA Sophia Antipolis, Wimmics team

1

slide-2
SLIDE 2
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Neuroscience data repositories

2

  • Raw neuroimaging data
  • Several natures : modalities
  • Several structures : formats, multi-dimensional datasets
  • Associated metadata
  • Relational databases

➡ Constraints

  • Distribution
  • Hardly relocatable (sensitive) data
  • Need for collaborations (multi-centric studies)
  • Autonomy
  • Deal with legacy (relational) databases

Federation

slide-3
SLIDE 3
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

NeuroLOG platform: federated data integration

3

Data Federator

Paris

NeuroLOG server Data Federator

Grenoble

NeuroLOG server

Shanoir relational DB

Data Federator

Rennes

NeuroLOG server

Sophia

NeuroLOG server Data Federator

NeuroLOG services Metadata federated view

Shanoir relational DB GIN-DMS relational DB CAC relational DB

Semantic federation ?

slide-4
SLIDE 4
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Objectives & Method

Objectives:

  • Uniform semantic querying :
  • distribution
  • heterogeneity

Method:

  • Feasibility of the approach (technology & tooling)
  • Performance issues

4

slide-5
SLIDE 5
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Semantic Web standards

  • Ontologies (OWL / RDF Schema) to capture domain knowledge:
  • Model the nature of data (classes)
  • Model data relationships (properties)
  • Graph-based data representation (RDF)
  • RDF triples (edge) : <subject> <property> <object>
  • RDF graphs
  • SPARQL querying as graph pattern matching
  • Sequences of edge requests :

5

  • High expressivity and reasoning
  • Deduce the data nature ,
  • Inference rules, etc.

modality:MRI ?x rdf:type ?x rdf:type modality:MRI

slide-6
SLIDE 6
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

KGRAM: a Knowledge GRaph Abstract Machine

  • Representing, querying and reasoning on Knowledge Graphs (INRIA - Wimmics team)
  • Generic engine
  • SPARQL 1.1 interpreter
  • several data sources
  • several models (RDF, XML, SQL)
  • KGRAM - Producers:
  • navigating abstract Graphs and enumerating Edges and Nodes ;
  • Producer specific to a data structure ("graph mediator") ;
  • MetaProducers (glueing several producers).

➡ mashup applications over distributed Link Data

6

slide-7
SLIDE 7
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Dealing with data source heterogeneity

  • Uniform querying over:
  • one SQL producer and
  • multiple RDF producers.
  • Ad-hoc "on-the-fly" SQL producer:
  • Predefined mappings (RDF predicate → SQL sub-query) ;
  • SQL embedded into a generated SPARQL query ;
  • SQL tuples translated back as RDF triples.

7

slide-8
SLIDE 8
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

KGRAM - Distributed Query Processor (DQP)

8

  • Distributed query processing performance :
  • Service parallelism
  • Static and dynamic optimizations
slide-9
SLIDE 9
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

DQP Optimizations (1/2) : pushing applicable FILTERs

  • Idea = filtering irrelevant results the sooner

(to avoid unnecessary network communications) ; ➡ Aggregating an applicable FILTER to each single edge request.

9

Global SPARQL query Optimized sub-query Rewriten sub-query

slide-10
SLIDE 10
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

  • Idea = exploiting intermediates results to avoid re-evaluation

(and transmission of already known values) ; ➡ Replacing variables by their known values for each single edge request.

DQP Optimizations (2/2) : pushing bindings

10

Global SPARQL query

?x = http://dbpedia.org/resource/Bobby_Abel

Intermediate result Rewriten sub-query Optimized sub-query

slide-11
SLIDE 11
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

  • Three federation setups / 2 queries
  • DataFederator (SAP) reference high performance relational engine
  • Q1 : costly evaluation (336 remote invocations)
  • Q2 : selective query (5 only resulting T2-weighted datasets)

➡ DataFederator is slightly better for costly queries (Q1) but KGRAM still performs similarly ; ➡ Comparable results for very selective queries.

Performance-oriented experiments

11

slide-12
SLIDE 12
  • A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI

Workshop 2012

Conclusion & perspectives

  • KGRAM:
  • Optimized distributed semantic querying ;
  • Heterogeneous data sources.
  • Still on-going:
  • Source selection through dynamic index creation,
  • allowing for coarse-grained parallelism (grouped sub-queries, query planning).
  • Benchmarking (FedBench) to compare KGRAM with state of the art approaches:

SPLENDID, DARQ, FedX.

  • Other approaches to address data source heterogeneity:
  • R2RML, D2RQ, etc.

12