Scalable End-user Access to Big Data . . Type column is T. Out: - - PowerPoint PPT Presentation

scalable end user access to big data
SMART_READER_LITE
LIVE PREVIEW

Scalable End-user Access to Big Data . . Type column is T. Out: - - PowerPoint PPT Presentation

. . HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 / 12 Scalable End-user Access to Big Data . . Type column is T. Out: Print Sensor Nr. x for all rows x in Sensors table where In: List


slide-1
SLIDE 1

. .

Scalable End-user Access to Big Data

HELLENIC REPUBLIC

National and Kapodistrian University of Athens

1 / 12

slide-2
SLIDE 2

. . .

Ontology-based Data Access

. Capture End-user vocabulary in an “Ontology”

. ≈ Domain model . Classes and relations known to end-users . Some minimal domain knowledge

. Mappings that relate Ontology with data sources

. ‘Column “Type” is “T” in row x of table “Sensors” if sensor

  • Nr. x is a Temperature Sensor’

. Automatically translate queries in End-user language to queries

  • ver data sources.

In: ‘List all temperature sensors.’ Out: ‘Print “Sensor Nr. x” for all rows x in “Sensors” table where “Type” column is “T.”’

2 / 12

slide-3
SLIDE 3

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. . Generator(g1) hasFault(g1, f1) CondenserFault(f1) .

3 / 12

slide-4
SLIDE 4

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. . g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

3 / 12

slide-5
SLIDE 5

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. . g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

3 / 12

slide-6
SLIDE 6

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. . g1 is a Generator g1 has fault f1 f1 is a CondenserFault . . ∅

3 / 12

slide-7
SLIDE 7

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

Condenser ⊑ CoolingDevice ⊓ ∃isPartOf.Turbine CondenserFault ≡ Fault ⊓ ∃affects.Condenser TurbineFault ≡ Fault ⊓ ∃affects.( ∃isPartOf.Turbine)

3 / 12

slide-8
SLIDE 8

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

Condenser is a CoolingDevice that is part of a Turbine Condenser Fault is a Fault that affects a Condenser Turbine Fault is a Fault that affects part of a Turbine

3 / 12

slide-9
SLIDE 9

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

Condenser is a CoolingDevice that is part of a Turbine Condenser Fault is a Fault that affects a Condenser Turbine Fault is a Fault that affects part of a Turbine

3 / 12

slide-10
SLIDE 10

. . .

OBDA: Example

. .

engineer

. Generators with a turbine fault? .

Based on slides by Ian Horrocks

. g1 is a Generator g1 has fault f1 f1 is a CondenserFault .

Condenser is a CoolingDevice that is part of a Turbine Condenser Fault is a Fault that affects a Condenser Turbine Fault is a Fault that affects part of a Turbine

. g1

3 / 12

slide-11
SLIDE 11

. . .

Unique Combination of Techniques

4 / 12

slide-12
SLIDE 12

. . .

Optique Architecture

. . End-user . . IT-expert . Data models

  • Std. ontologies

… . Appli- cation . Query Formulation . Ontology & Mapping Management . Ontology . Mappings . Query Transformation . Query Planning . Stream Adapter . Query Execution . Query Execution . · · · . · · · . · · · .

streaming data

. query . results .

cross-component optimization

5 / 12

slide-13
SLIDE 13

. . .

Integrated Platform

data streams RDBs, triple stores, temporal DBs, etc.

... ...

Cloud (virtual resource pool) Answer visualisation Query Formulation Rich Interface Ontology and Mapping Management Rich Interface

Client Tier Data Tier and Cloud

mining log analyses, etc Stream analytics Query Formulation Processing Components Query by Navigation Context Sens. Ed Direct Ed. Faceted Search

1-time Q SPARQL Stream Q

QDriven ont construction Export funct. Feedback funct. Shared triple store Ontology reasoner 1 Ontology reasoner 2 ...

  • ontology
  • mappings
  • configuration
  • queries
  • answers
  • history
  • etc.

Processing Components of Ontology and Mapping Manager

  • ntology

mapping Bootstrapper Analyser Evolution Engine Transformator Approximator Ontology and Mapping Revision control & Editing

Application Tier

Visualisation engines Query Answering Component Query transformation Query Rewriting Semantic QOpt Syntacti QOpt Sem indexing

1-time Q SPARQL Stream Q

Distributed Query Execution Q Planner Optimization Data Federaion

1-time Q SQL Stream Q

Shared database Answ Manager Query Execution Data Federation

1-time Q SPARQL Stream Q

6 / 12

slide-14
SLIDE 14

. . .

The Query Formulation Interface

. Let users formulate ad-hoc queries

. filtering on attributes . connecting objects . selecting what information to extract . choosing types (Facility → FixedFacility | MovableFacility)

. Until end of year:

. specify time ranges . choose entities (licenses, fields, etc.) from map

. Later:

. aggregation: sums, averages, etc. . negation (“all turbines without a fault”)

. Intentionally restricted expressivity

. As powerful as SQL → as hard to learn

. Demo

. Data from NPD FactPages (http://factpages.npd.no/)

7 / 12

slide-15
SLIDE 15

. . .

Ontology & Mapping Management

. OBDA relies on Ontology and Mappings . Tool support to create and maintain O&M . Results so far: Bootstrapping components . . Database . DM Ontology . DM Mappings . HQ Ontology . Direct Mapping . Ontology . Alignment . Coming up: tool support for O&M QC and evolution

. when ontology changes . when data sources change

8 / 12

slide-16
SLIDE 16

. . .

Time & Streams

. Query processing extended for stream queries (STARQL)

. combined queries on real-time and historical data . rewrite queries over temporal data . execution with streaming answers in ADP (→ slide 11)

. . . Coming up: integration with platform architecture

. register/unregister queries . stream answers . (also useful for one-shot queries)

9 / 12

slide-17
SLIDE 17

. . .

Query Transformation

. Based on open source -ontop- system

. Query rewriting for OWL 2 QL ontologies . Covers almost all of standard SPARQL query language

. Now testing on real queries from Statoil on EPDS

. Efficiency problems with some rewritten queries . Targeted optimisation based on use-case requirements

10 / 12

slide-18
SLIDE 18

. . .

Distributed Query Execution

. Query Execution (“backend”)

. Based on ADP – Athena Distributed Processing . Cutting edge parallelised database engine . Optimisation w.r.t. many dimensions . “Hadoop for Databases”

. For Optique:

. stream processing . federation (one query, many sources) . parallelisation (elastic clouds)

. Cross-component optimisation of query processing

11 / 12

slide-19
SLIDE 19

. .

www.optique-project.eu