WP6: Query Transformation WP6 Team Presented by: Diego Calvanese - - PowerPoint PPT Presentation

wp6 query transformation
SMART_READER_LITE
LIVE PREVIEW

WP6: Query Transformation WP6 Team Presented by: Diego Calvanese - - PowerPoint PPT Presentation

WP6: Query Transformation WP6 Team Presented by: Diego Calvanese Free University of Bozen-Bolzano, Italy Free University of Bozen-Bolzano Optique Y3 Review 09/12/2015 Munich, Germany T6.1 Transformation System Configuration T6.2 Runtime


slide-1
SLIDE 1

WP6: Query Transformation

WP6 Team Presented by: Diego Calvanese

Free University of Bozen-Bolzano, Italy

Free University of Bozen-Bolzano

Optique Y3 Review 09/12/2015 – Munich, Germany

slide-2
SLIDE 2

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

WP6 in the Optique Architecture

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (1/35)

slide-3
SLIDE 3

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

WP6 in the Optique Architecture

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (1/35)

slide-4
SLIDE 4

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Objectives of WP6

1

Development of techniques for (semi-)automatic configuration and

  • ptimization of the query transformation system.

2

Development of techniques for efficient rewriting of end-user queries into efficiently executable datasource queries by exploiting datasource metadata.

3

Techniques for updating and tuning the query transformation system taking into account the ontology, mapping, and datasource metadata and feedback from the execution layer.

4

Implementation of the above techniques into the query transformation subsystem.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (2/35)

slide-5
SLIDE 5

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Tasks of WP6

Task 6.1 Transformation System Configuration [M2-M24] Lead by FUB, with UniRoma1 and UoA Task 6.2 Runtime Query Rewriting [M2-M36] Lead by FUB, with UniRoma1 and UOXF .BL Task 6.3 Transformation System Tuning [M24-M43] Lead by FUB, with UoA Task 6.4 Transformation Sub-System Implementation and Evaluation [M7-M46] Lead by FUB, heavy interaction with WP2

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (3/35)

slide-6
SLIDE 6

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

WP6 Year 3 Objectives and Progress

Main Objectives: Evaluate realistic queries from the Optique use cases. In WP6, we have concentrated mostly on Statoil queries. (Siemens queries are dealt with in WP5). Analyze and optimize for problematic queries

We have developed tuning techniques, based on exact mappings and redundant join elimination, that improve the performance of some complex queries by orders of magnitude.

Semi-automatic configuration and optimization

We have developed tuning techniques to automatically detect exact mappings and forms of implicit keys, by analyzing the mappings, the ontology, and the database instance.

Improve coverage of the query catalogue

With the introduction of the tuning techniques, we have significantly extended the coverage of the Statoil catalogue to 98% (i.e., +28%).

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (4/35)

slide-7
SLIDE 7

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Ontop in the Optique Platform

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (5/35)

slide-8
SLIDE 8

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Outline

1

Task 6.1: Transformation System Configuration

2

Task 6.2: Runtime Query Rewriting

3

Task 6.3: Transformation System Tuning

4

Task 6.4: Query Transformation Sub-System Implementation and Evaluation

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (6/35)

slide-9
SLIDE 9

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.1: Transformation System Configuration

The workplan does not foresee additional work on Task 6.1 during Y3. However, we have further refined the results obtained in Y1 and Y2 towards the achievement of the objectives:

New datatypes, required by the Statoil use case, are now supported in the mappings and the ontology: xsd:int, xsd:long, . . .

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (7/35)

slide-10
SLIDE 10

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Outline

1

Task 6.1: Transformation System Configuration

2

Task 6.2: Runtime Query Rewriting

3

Task 6.3: Transformation System Tuning

4

Task 6.4: Query Transformation Sub-System Implementation and Evaluation

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (8/35)

slide-11
SLIDE 11

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.2: Runtime Query Rewriting

Integrating cross-linked datasets (in collaboration with Oslo) [Calvanese, Giese, et al., 2015, 14th Int. Semantic Web Conference (ISWC)] Integration of rules in OBDA [Xiao et al., 2014, 8th Int. Conference on Web Reasoning and Rule Systems (RR)] Answering and rewriting for nested regular path queries [Bienvenu et al., 2014, 14th Int. Conference on the Principles of Knowledge Representation and Reasoning (KR)]

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (9/35)

slide-12
SLIDE 12

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.2: Runtime Query Rewriting

Integrating cross-linked datasets (in collaboration with Oslo) [Calvanese, Giese, et al., 2015, 14th Int. Semantic Web Conference (ISWC)] Integration of rules in OBDA [Xiao et al., 2014, 8th Int. Conference on Web Reasoning and Rule Systems (RR)] Answering and rewriting for nested regular path queries [Bienvenu et al., 2014, 14th Int. Conference on the Principles of Knowledge Representation and Reasoning (KR)]

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (9/35)

slide-13
SLIDE 13

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Data integration

Deals with the problem of integrated access to multiple data sources. Key problem: Information about a real-world entity can be distributed over several data sources.

1

Execute queries over multiple data sources, performing distributed joins, and collecting the results. In Optique, this functionality is provided by Exareme, provided it is supplied with the information about which objects to join.

2

Identify equal entities, i.e., which data records actually represent the same real world entity (entity resolution). Example: Wellbore-431170 in EPDS ≈ NPDWellbore-1/6-5 in NPD We assume that entity resolution has already been performed. An issue is how to represent this information so that it can be processed efficiently.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (10/35)

slide-14
SLIDE 14

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Data integration

Deals with the problem of integrated access to multiple data sources. Key problem: Information about a real-world entity can be distributed over several data sources.

1

Execute queries over multiple data sources, performing distributed joins, and collecting the results. In Optique, this functionality is provided by Exareme, provided it is supplied with the information about which objects to join.

2

Identify equal entities, i.e., which data records actually represent the same real world entity (entity resolution). Example: Wellbore-431170 in EPDS ≈ NPDWellbore-1/6-5 in NPD We assume that entity resolution has already been performed. An issue is how to represent this information so that it can be processed efficiently.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (10/35)

slide-15
SLIDE 15

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Data integration

Deals with the problem of integrated access to multiple data sources. Key problem: Information about a real-world entity can be distributed over several data sources.

1

Execute queries over multiple data sources, performing distributed joins, and collecting the results. In Optique, this functionality is provided by Exareme, provided it is supplied with the information about which objects to join.

2

Identify equal entities, i.e., which data records actually represent the same real world entity (entity resolution). Example: Wellbore-431170 in EPDS ≈ NPDWellbore-1/6-5 in NPD We assume that entity resolution has already been performed. An issue is how to represent this information so that it can be processed efficiently.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (10/35)

slide-16
SLIDE 16

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Data integration

Deals with the problem of integrated access to multiple data sources. Key problem: Information about a real-world entity can be distributed over several data sources.

1

Execute queries over multiple data sources, performing distributed joins, and collecting the results. In Optique, this functionality is provided by Exareme, provided it is supplied with the information about which objects to join.

2

Identify equal entities, i.e., which data records actually represent the same real world entity (entity resolution). Example: Wellbore-431170 in EPDS ≈ NPDWellbore-1/6-5 in NPD We assume that entity resolution has already been performed. An issue is how to represent this information so that it can be processed efficiently.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (10/35)

slide-17
SLIDE 17

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Data integration

Deals with the problem of integrated access to multiple data sources. Key problem: Information about a real-world entity can be distributed over several data sources.

1

Execute queries over multiple data sources, performing distributed joins, and collecting the results. In Optique, this functionality is provided by Exareme, provided it is supplied with the information about which objects to join.

2

Identify equal entities, i.e., which data records actually represent the same real world entity (entity resolution). Example: Wellbore-431170 in EPDS ≈ NPDWellbore-1/6-5 in NPD We assume that entity resolution has already been performed. An issue is how to represent this information so that it can be processed efficiently.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (10/35)

slide-18
SLIDE 18

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Ontology-based data integration (OBDI)

We need to take into account that we are performing integration in a setting where the data sources are accessed through an ontology. Ontology-based data integration (OBDI)

3

Transfer to the ontology level the information about merged data items. Example: EPDS:Wellbore-431170 sameAs NPD:NPDWellbore-1/6-5

4

Query answering should be done transparently. Example: The user should specify: ?i EPDS:hasInterval ?wb. ?wb NPD:isInWell ?w. and not: ?i EPDS:hasInterval ?wb1. ?wb1 sameAs ?wb2. ?wb2 NPD:isInWell ?w. We have developed techniques to address these issues.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (11/35)

slide-19
SLIDE 19

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Ontology-based data integration (OBDI)

We need to take into account that we are performing integration in a setting where the data sources are accessed through an ontology. Ontology-based data integration (OBDI)

3

Transfer to the ontology level the information about merged data items. Example: EPDS:Wellbore-431170 sameAs NPD:NPDWellbore-1/6-5

4

Query answering should be done transparently. Example: The user should specify: ?i EPDS:hasInterval ?wb. ?wb NPD:isInWell ?w. and not: ?i EPDS:hasInterval ?wb1. ?wb1 sameAs ?wb2. ?wb2 NPD:isInWell ?w. We have developed techniques to address these issues.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (11/35)

slide-20
SLIDE 20

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Ontology-based data integration (OBDI)

We need to take into account that we are performing integration in a setting where the data sources are accessed through an ontology. Ontology-based data integration (OBDI)

3

Transfer to the ontology level the information about merged data items. Example: EPDS:Wellbore-431170 sameAs NPD:NPDWellbore-1/6-5

4

Query answering should be done transparently. Example: The user should specify: ?i EPDS:hasInterval ?wb. ?wb NPD:isInWell ?w. and not: ?i EPDS:hasInterval ?wb1. ?wb1 sameAs ?wb2. ?wb2 NPD:isInWell ?w. We have developed techniques to address these issues.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (11/35)

slide-21
SLIDE 21

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Ontology-based data integration (OBDI)

We need to take into account that we are performing integration in a setting where the data sources are accessed through an ontology. Ontology-based data integration (OBDI)

3

Transfer to the ontology level the information about merged data items. Example: EPDS:Wellbore-431170 sameAs NPD:NPDWellbore-1/6-5

4

Query answering should be done transparently. Example: The user should specify: ?i EPDS:hasInterval ?wb. ?wb NPD:isInWell ?w. and not: ?i EPDS:hasInterval ?wb1. ?wb1 sameAs ?wb2. ?wb2 NPD:isInWell ?w. We have developed techniques to address these issues.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (11/35)

slide-22
SLIDE 22

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

The traditional approach to OBDI

Based on merging data. See, e.g., the demo scenario, where 5 data sources were integrated

(EPDS, OpenWorks, CoreDB, Recall, and GeoChemDB).

Physical merge (as done in ETL).

Requires full control over the data sources. Requires to move the data Issues with freshness, privacy, legal aspects.

Virtual merge using mappings, by consistently generating a unique URI for each real world entity.

Requires a central authority for defining URI schemas Scaling issues in a dynamic environment. For efficiency, URIs should be generated from the primary keys of the data sources, which in general differ.

Merging data is not always possible!

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (12/35)

slide-23
SLIDE 23

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

The traditional approach to OBDI

Based on merging data. See, e.g., the demo scenario, where 5 data sources were integrated

(EPDS, OpenWorks, CoreDB, Recall, and GeoChemDB).

Physical merge (as done in ETL).

Requires full control over the data sources. Requires to move the data Issues with freshness, privacy, legal aspects.

Virtual merge using mappings, by consistently generating a unique URI for each real world entity.

Requires a central authority for defining URI schemas Scaling issues in a dynamic environment. For efficiency, URIs should be generated from the primary keys of the data sources, which in general differ.

Merging data is not always possible!

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (12/35)

slide-24
SLIDE 24

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

The traditional approach to OBDI

Based on merging data. See, e.g., the demo scenario, where 5 data sources were integrated

(EPDS, OpenWorks, CoreDB, Recall, and GeoChemDB).

Physical merge (as done in ETL).

Requires full control over the data sources. Requires to move the data Issues with freshness, privacy, legal aspects.

Virtual merge using mappings, by consistently generating a unique URI for each real world entity.

Requires a central authority for defining URI schemas Scaling issues in a dynamic environment. For efficiency, URIs should be generated from the primary keys of the data sources, which in general differ.

Merging data is not always possible!

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (12/35)

slide-25
SLIDE 25

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Entity linking

Alternative approach: Entity linking Entities keep their identity in the various data sources. We explicitly maintain the information about which pairs of entities in different sources represent the same real world entity. The entity links are used to join the data coming from different sources, by embedding linking atoms in the queries. Problems to be addressed:

1

We need an effective mechanism to represent the linking information at the level of the data sources.

2

At the ontology level, links over object ids should be represented via sameAs statements (standard OWL construct).

3

Due to transitivity of sameAs, we lose rewritability of queries into SQL. Can we recover rewritability by restricting the linking mechanism?

4

Performance, to guarantee scalability over large enterprise data.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (13/35)

slide-26
SLIDE 26

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Entity linking

Alternative approach: Entity linking Entities keep their identity in the various data sources. We explicitly maintain the information about which pairs of entities in different sources represent the same real world entity. The entity links are used to join the data coming from different sources, by embedding linking atoms in the queries. Problems to be addressed:

1

We need an effective mechanism to represent the linking information at the level of the data sources.

2

At the ontology level, links over object ids should be represented via sameAs statements (standard OWL construct).

3

Due to transitivity of sameAs, we lose rewritability of queries into SQL. Can we recover rewritability by restricting the linking mechanism?

4

Performance, to guarantee scalability over large enterprise data.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (13/35)

slide-27
SLIDE 27

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Linking tables

Our proposal for Optique: Linking tables We consider different pairwise disjoint categories C1, . . . , Cm of entities. Each data source may contain data records for different categories. The equality between data records of category C representing the same entity in data sources Di, Dj is represented in a linking table LC

ij(idi, idj).

The linking tables are considered as dedicated data sources.

L12 ¡ L32 ¡ L13 ¡

C1=Wellbores ¡ C2=Company ¡Names ¡

L12 ¡ L13 ¡

D3 ¡ D2 ¡ D1 ¡ EPDS ¡ NPD ¡ OpenWorks ¡ epds:1=npd:2 ¡ epds:Statoil ¡ = ¡ Prio:Statoil ¡SA ¡

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (14/35)

slide-28
SLIDE 28

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Assumptions on the linking tables

To allow for efficient query processing, we assume that:

1

Linking tables are complete, i.e., all the information about which objects of category C are linked in data sources Di and Dj is contained in LC

ij .

2

Linking tables predicate only over objects in different data sources, i.e., one cannot infer equality for elements in the same data source. These assumptions are in line with what occurs in practice.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (15/35)

slide-29
SLIDE 29

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

From linking tables to sameAs statements

We propose to use a different URI template uriC,D for each pair constituted by a category C and a data source D. Mapping linking tables to sameAs

1

We map the linking tables to sameAs statements. uriC,Di({idi}) sameAs uriC,Dj({idj})

  • SELECT idi, idj FROM LC

ij

2

We embed symmetry and transitivity axioms of sameAs into the mappings.

3

We rewrite the user query by embedding sameAs statements, and explicitly take into account reflexivity.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (16/35)

slide-30
SLIDE 30

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Results on Statoil data

We integrated EPDS and the NPD fact pages at Statoil: We extended the existing ontology and the set of mappings, and created the linking tables. We ran 22 queries covering the information needs of end-users. The execution times range from 3.2 seconds to 12.8 minutes, with mean 53 secs, and median 8.6 secs.

Number of queries 2 4 6 8 0−10secs 10−30secs 30−60secs 1−5mins 5−20mins

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (17/35)

slide-31
SLIDE 31

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Experiment on synthetic data

We set up an artificial OBDI environment based on the Wisconsin Benchmark: We created a single database with 10 tables exposing ∼2B triples:

4 Wisconsin tables representing different data sources; and 6 linking tables.

Each Wisconsin table contains 100M rows. The tables occupied ∼100GB

  • f disk space.

We experimented with two settings: 2 and 3 linked data sources. ∼120M equal objects (60%). In total we ran 1332 queries.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (18/35)

slide-32
SLIDE 32

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Results on synthetic data

In the 2 linked data sources scenario:

Most of the queries run in less than 5 min. The query that performs worst (4 joins, 2 data properties, 2 object properties, 105 selectivity) returns 480,000 results, and takes ∼13 mins.

In the 3 linked data sources scenario:

Most of the queries take less than 15 mins. The worst query takes around 1.5 hs and returns 1,620,000 results.

Worst execution times including fetching time (left: 2 DS, right: 3 DS):

G1 ¡ G2 ¡ G3 ¡ G4 ¡ G5 ¡ G6 ¡ G7 ¡ G8 ¡ G9 ¡ 0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ groups ¡ seconds ¡ 60% ¡equality ¡ 30% ¡equality ¡ 10% ¡equality ¡ 60% ¡equality ¡ 30% ¡equality ¡ 10% ¡equality ¡ 60% ¡equality ¡ ¡ 30% ¡equality ¡ 10% ¡equality ¡ G1 ¡ G2 ¡ G3 ¡ G4 ¡ G5 ¡ G6 ¡ G7 ¡ G8 ¡ G9 ¡ 0 ¡ 1000 ¡ 2000 ¡ 3000 ¡ 4000 ¡ 5000 ¡ 6000 ¡ groups ¡ seconds ¡ 60% ¡equality ¡ 30% ¡equality ¡ 10% ¡equality ¡ 60% ¡equality ¡ 30% ¡equality ¡ 10% ¡equality ¡ 60% ¡equality ¡ ¡ 30% ¡equality ¡ 10% ¡equality ¡

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (19/35)

slide-33
SLIDE 33

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Outline

1

Task 6.1: Transformation System Configuration

2

Task 6.2: Runtime Query Rewriting

3

Task 6.3: Transformation System Tuning

4

Task 6.4: Query Transformation Sub-System Implementation and Evaluation

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (20/35)

slide-34
SLIDE 34

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.3: Transformation System Tuning

30% of the queries in the Statoil query catalog were timing out. This is due to:

too many redundant unions resulting from deep hierarchies and mappings; too many redundant joins coming from missing DB metadata (such as keys).

To address this, we have developed optimizations based on two novel OBDA constraints:

exact mappings for removing redundant unions; virtual functional dependencies for removing redundant joins.

To apply these techniques, we exploit domain constraints and storage properties that are not explicitly defined as database integrity constraints. This work is currently under submission to an international conference.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (21/35)

slide-35
SLIDE 35

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Example: Data storage at Statoil

For the view wellbore, we know that:

1

It must contain all the wellbores in the database.

2

Each wellbore entry must contain the information about name, date, and well (no nulls).

3

For each wellbore, there is a single date/well that is tagged as ’actual’. The SPARQL query SELECT * WHERE {?wlb rdf:type :Wellbore.} gets unfolded and optimized into an SQL query containing the union: sql1 = sql:Wellbore ∪ sql:ProdWellbore ∪ π#sql:hasInterval Exploiting Item 1, one can show that this union is equivalent to sql:Wellbore. The SPARQL query

SELECT * WHERE {?wlb rdf:type :Wellbore; :isInWell ?w.}

gets unfolded and optimized into an SQL query containing the join: sql1 sql:isInWell Since sql1 gets simplified to sql:Wellbore, and exploiting Items 1 and 2,

  • ne can show that this join is equivalent to sql:Wellbore.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (22/35)

slide-36
SLIDE 36

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Results over Statoil query catalog

No tuning w/FD w/exact Both mappings tunings Number of queries timing-out 17 10 11 4 (2 w/o UDF) Number of fully answered queries 43 50 49 56 (58 w/o UDF)

  • Avg. SQL query length (in characters)

51521 28112 32364 8954

  • Avg. unfolding time

3.929 s 3.917 s 1.142 s 0.026 s

  • Avg. total query exec. time with timeouts

376.540 s 243.935 s 267.863 s 147.248 s Median total query exec. time with timeouts 35.241 s 11.135 s 21.602 s 14.936 s

  • Avg. successful query exec. time (without timeouts)

36.540 s 43.935 s 51.217 s 67.248 s Median successful query exec. time (without timeouts) 12.551 s 8.277 s 12.437 s 12.955 s

  • Avg. number of unions in generated SQL

6.3 3.4 5.1 2.2

  • Avg. number of tables joined per union in generated SQL

21.0 18.2 20.0 14.2

  • Avg. total number of tables in generated SQL

132.7 62.0 102.2 31.4

Comparison of query execution time with and without tuning (logarithmic scale).

0.1 s 10 s 20 m

Query execution time

No tuning With tuning

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (23/35)

slide-37
SLIDE 37

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Outline

1

Task 6.1: Transformation System Configuration

2

Task 6.2: Runtime Query Rewriting

3

Task 6.3: Transformation System Tuning

4

Task 6.4: Query Transformation Sub-System Implementation and Evaluation

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (24/35)

slide-38
SLIDE 38

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.4 Query Transformation Sub-System Implementation and Evaluation

We have created the NPD benchmark and have used it to evaluate Ontop [Lanti et al., 2015, EDBT]. We have implemented our preliminary solution for supporting owl:sameAs [Calvanese, Giese, et al., 2015, ISWC]. We have implemented the approximation techniques for expressive

  • ntologies developed with Uniroma1 in the prototype system OntoProx

[Botoeva et al., 2016, AAAI, to appear]. We have improved the performance of the ontology classification and query containment checking implementations. We have implemented most SPARQL functions in SPARQL 1.1. We have developed optimizations exploiting foreign key constraints over multiple columns.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (25/35)

slide-39
SLIDE 39

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Task 6.4 Query Transformation Sub-System Implementation and Evaluation (cont.)

Support for richer mappings: CONCAT and REPLACE in the source SQL query.

Support for literal templates in the target query.

Stronger validation for Sesame workbench. Upgrade of Ontop plugin to Prot´ eg´ e v5. New command line interface of Ontop. Three stable versions of Ontop released in Y3: v1.14, v1.15, and v1.16. Migration of the host of Ontop bundles from an internal server of FUB to SourceForge since v1.15 in May 2015.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (26/35)

slide-40
SLIDE 40

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Download statistics by month

+1,600 downloads in the last 7 Months (provided by SourceForge)

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (27/35)

slide-41
SLIDE 41

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Download statistics by country

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (28/35)

slide-42
SLIDE 42

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Google Analytics of the Ontop website

http://ontop.inf.unibz.it received +11K hits in Y3

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (29/35)

slide-43
SLIDE 43

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Discussions on Google Group

Established in July 2013; 183 Topics

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (30/35)

slide-44
SLIDE 44

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Members of Google Group

Established in July 2013; 162 members now

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (31/35)

slide-45
SLIDE 45

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Posts and topics on Google Group in Y3

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (32/35)

slide-46
SLIDE 46

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Development on GitHub

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (33/35)

slide-47
SLIDE 47

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Publications in Y3 I

Bienvenu, Meghyn, Diego Calvanese, Magdalena Ortiz, and Mantas Simkus (2014). “Nested Regular Path Queries in Description Logics”. In: Proc. of the 14th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2014). AAAI Press. Xiao, Guohui, Martin Rezk, Mariano Rodriguez-Muro, and Diego Calvanese (2014). “Rules and Ontology Based Data Access”. In: Proc. 8th Int. Conference on Web Reasoning and Rule Systems (RR 2014). Ed. by Marie-Laure Mugnier and Roman Kontchakov. Lecture Notes in Computer Science. Springer. Ahmeti, Albin, Diego Calvanese, Axel Polleres, and Vadim Savenkov (2015). “Dealing with Inconsistencies due to Class Disjointness in SPARQL Update”. In: Proc. of the 28th Int. Workshop on Description Logics (DL).

  • Vol. 1350. CEUR Electronic Workshop Proceedings, http://ceur-ws.org/.

Calvanese, Diego, Benjamin Cogrel, Elem Guzel Kalayci, Sarah Komla Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao (2015). “OBDA with the Ontop Framework”. In: Proc.

  • f the 23rd Italian Symposium on Advanced Database Systems (SEBD).

Calvanese, Diego, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati (2015). “Data Complexity of Query Answering in Description Logics (Extended Abstract)”. In: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence (IJCAI). AAAI Press, pp. 4163–4167. Calvanese, Diego, Martin Giese, Dag Hovland, and Martin Rezk (2015). “Ontology-based Integration of Cross-linked Datasets”. In: Proc. of the 14th Int. Semantic Web Conference (ISWC). Lecture Notes in Computer Science. Springer. Calvanese, Diego and Boris Konev, eds. (2015). Proc. of the 28th Int. Workshop on Description Logics (DL).

  • Vol. 1350. CEUR Electronic Workshop Proceedings, http://ceur-ws.org/.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (34/35)

slide-48
SLIDE 48

T6.1 Transformation System Configuration T6.2 Runtime Query Rewriting T6.3 Transformation System Tuning T6.4 Implementation and Evaluation

Publications in Y3 II

Calvanese, Diego, Manolis Koubarakis, and David Toman (2015). “Preface to the Special Issue on Ontology-Based Data Access”. In: Journal of Web Semantics 33, pp. 1–2. Calvanese, Diego, Marco Montali, Fabio Patrizi, and Andrey Rivkin (2015). “Implementing Data-Centric Dynamic Systems over a Relational DBMS”. In: Proc. of the 9th Alberto Mendelzon Int. Workshop on Foundations of Data Management (AMW). Vol. 1378. CEUR Electronic Workshop Proceedings, http://ceur-ws.org/. Calvanese, Diego, Marco Montali, Alifah Syamsiyah, and Wil M. P . van der Aalst (2015). “Ontology-Driven Extraction of Event Logs from Relational Databases”. In: Proc. of the 11th Int. Workshop on Business Process Intelligence (BPI). Calvanese, Diego, Alessandro Mosca, Jose Remesal, Martin Rezk, and Guillem Rull (2015). “A ’Historical Case’

  • f Ontology-Based Data Access”. In: Proc. of Digital Heritage 2015 (DH). IEEE Computer Society Press.

Lanti, Davide, Martin Rezk, Guohui Xiao, and Diego Calvanese (2015). “The NPD Benchmark: Reality Check for OBDA Systems”. In: Proc. of the 18th Int. Conference on Extending Database Technology (EDBT). OpenProceedings.org, pp. 617–628. Milo, Tova and Diego Calvanese, eds. (2015). Proc. of the 34th ACM SIGACT SIGMOD SIGAI Symposium on Principles of Database Systems (PODS). ACM. Botoeva, Elena, Diego Calvanese, Valerio Santarelli, Domenico F. Savo, Alessandro Solimando, and Guohui Xiao (2016). “Beyond OWL 2 QL in OBDA: Rewritings and Approximations”. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence (AAAI). To appear.

Diego Calvanese (FUB) WP6: Query Transformation Optique Y3 Review – 09/12/2015 (35/35)