Query Answering in Data Integration Piotr Wieczorek Institute of - PowerPoint PPT Presentation

Query Answering in Data Integration Piotr Wieczorek Institute of Computer Science University of Wrocław Dagstuhl, November 2010

Outline Quick reminder 1 Computing certain answers under OWA/CWA 2 Inverse rules algorithm 3 MiniCon algorithm 4 Coping with integrity constraints and access patterns 5 6 Rewriting using views in presence of access patterns, integrity constraints, disjunction and negation

Bibliography S. Abiteboul and O.M. Duschka: Complexity of Answering Queries 1 Using Materialized Views . In Proc. PODS’98 (Symposium on Principles of Database Systems), pp. 254-263, 1998. O.M. Duschka, M.R. Genesereth, and A.Y. Levy: Recursive Query 2 Plans for Data Integration. J. Log. Program. 43(1), pp. 49-73, 2000. R. Pottinger and A.Y. Halevy: MiniCon: A scalable algorithm for 3 answering queries using views. VLDB J. 10(2-3), pp. 182-198, 2001. A. Deutsch, B. Ludäscher, and A. Nash: Rewriting queries using 4 views with access patterns under integrity constraints. Theor. Comput. Sci. 371(3), pp. 200-226, 2007.

Quick reminder Data integration global relations (mediated schema)—used in queries source relations—store actual data, mapping: LAV—each source relation described as a result of a query over the global relations,

Quick reminder Data integration global relations (mediated schema)—used in queries source relations—store actual data, view instance I , mapping: LAV—each source relation described as a result of a query over the global relations, view definitions V = ( V 1 , . . . V n ) , Certain answers certain answers for Q —a set of tuples Q ( D ) for each database D consistent with a given instance of source relations,

Quick reminder Data integration global relations (mediated schema)—used in queries source relations—store actual data, view instance I , mapping: LAV—each source relation described as a result of a query over the global relations, view definitions V = ( V 1 , . . . V n ) , Certain answers certain answers for Q —a set of tuples Q ( D ) for each database D consistent with a given instance of source relations, t is a certain answer ◮ under OWA (views are sound) if t is an element of Q ( D ) for each database D such that I ⊆ V ( D ) ◮ under CWA (views are exact) if t is an element of Q ( D ) for each database D such that I = V ( D )

Quick reminder Data integration global relations (mediated schema)—used in queries source relations—store actual data, view instance I , mapping: LAV—each source relation described as a result of a query over the global relations, view definitions V = ( V 1 , . . . V n ) , Query rewriting query rewriting using views—mentions the source relations only, can be equivalent or maximally-contained (possibly relative to a set of constraints).

Query Answering vs. Incomplete Databases Idea Views (=source data) represent many possible (global) databases Idea: use techniques in incomplete databases Example View definitions: View instance: v ( 0 , Y ) : − p ( 0 , Y ) { v ( 0 , 1 ) , v ( 1 , 1 ) } v ( X , Y ) : − p ( X , Z ) , p ( Z , Y )

Query Answering vs. Incomplete Databases Idea Views (=source data) represent many possible (global) databases Idea: use techniques in incomplete databases Example View definitions: View instance: v ( 0 , Y ) : − p ( 0 , Y ) { v ( 0 , 1 ) , v ( 1 , 1 ) } v ( X , Y ) : − p ( X , Z ) , p ( Z , Y ) Conditional table (OWA): w = 1 p: 0 1 0 x w � = 1 x 1 w � = 1 1 u true u 1 true

Query Answering vs. Incomplete Databases Idea Views (=source data) represent many possible (global) databases Idea: use techniques in incomplete databases Example View definitions: View instance: v ( 0 , Y ) : − p ( 0 , Y ) { v ( 0 , 1 ) , v ( 1 , 1 ) } v ( X , Y ) : − p ( X , Z ) , p ( Z , Y ) Conditional table (OWA): Conditional table (CWA): w = 1 p: 0 1 p: 0 1 true 0 x w � = 1 1 1 true x 1 w � = 1 1 u true u 1 true

Query Answering under OWA vs. Query Containment Simple reductions between the two problems in both directions exist (for views and queries in CQ, CQ � = , PQ, datalog) Reduction to query containment Input: V = ( v 1 , . . . , v k ) , Q , I and a tuple t . Let Q ′ be the query consisting of all the definitions V together with: q ′ ( t ) : − v 1 ( t 11 ) , . . . , v 1 ( t 1 n 1 ) , . . . v 1 ( t k 1 ) , . . . , v k ( t kn 1 ) where I ( v i ) = { t i 1 , . . . , t in i } Then t is a certain answer iff Q ′ ⊆ Q .

Query Answering under OWA vs. Query Containment Simple reductions between the two problems in both directions exist (for views and queries in CQ, CQ � = , PQ, datalog) Reduction to computing certain answers Input: Q 1 and Q 2 . Let the view definition be the rules of Q 1 together with v ( c ) : − q 1 ( X ) , p ( X ) Let the instance I = { v ( c ) } and let Q consists of all the rules of Q 2 together with q ( c ) : − q 2 ( X ) , p ( X ) Then Q 1 ⊆ Q 2 iff ( c ) is a certain answer.

Query Answering under OWA vs. Query Containment Simple reductions between the two problems in both directions exist (for views and queries in CQ, CQ � = , PQ, datalog) Consequences Decidability and undecidability results carry over in both directions. If the problems are decidable then the combined complexity of computing certain answers is the same as the query complexity of query containment.

Data complexity of computing certain answers under OWA query CQ CQ � = PQ datalog FO views CQ PTIME coNP PTIME PTIME undec. CQ � = PTIME coNP PTIME PTIME undec. PQ coNP coNP coNP coNP undec. datalog coNP undec. coNP undec. undec. FO undec. undec. undec. undec. undec.

Data complexity of computing certain answers under CWA query CQ CQ � = PQ datalog FO views CQ coNP coNP coNP coNP undec. CQ � = coNP coNP coNP coNP undec. PQ coNP coNP coNP coNP undec. datalog undec. undec. undec. undec. undec. FO undec. undec. undec. undec. undec.

Maximally contained rewriting vs. certain answers A datalog query P is a query plan if all EDB predicates in P are views literals. The expansion P exp of a query plan P is P with all views literals replaced with their definitions. A query plan P is maximally-contained in a datalog query Q w.r.t. view definitions V if ◮ P exp ⊆ Q , and ◮ for each query plan P ′ with ( P ′ ) exp ⊆ Q we have ( P ′ ) exp ⊆ P exp .

Maximally contained rewriting vs. certain answers Theorem Let V ⊆ CQ, Q ∈ datalog, let P be maximally-contained in Q w.r.t. V . Then for each view instance I the query plan P computes exactly the certain answers of Q under OWA. Proof. I - view instance such that P fails to compute a certain answer t . P ′ - the query plan P with two additional rules: q ′ ( X ) r 1 : : − q ( X ) q ′ ( t ) r 2 : : − v 1 ( t 11 ) , . . . , v 1 ( t 1 n 1 ) , . . . v 1 ( t k 1 ) , . . . , v k ( t kn 1 ) where I ( v i ) = { t i 1 , . . . , t in i } and q is the answer predicate of P . ( P ′ ) exp is contained in Q but it is not contained in ( P ) exp . That contradicts the maximal containment of P in Q .

Inverse rules Example Data sources s 1 ( X , Y ) : − edge ( X , Z ) , edge ( Z , W ) , edge ( W , Y ) s 2 ( X ) : − edge ( X , Z )

Inverse rules Example Data sources s 1 ( X , Y ) : − edge ( X , Z ) , edge ( Z , W ) , edge ( W , Y ) s 2 ( X ) : − edge ( X , Z ) Inverse rules edge ( X , f 1 ( X , Y )) : − s 1 ( X , Y ) The fresh function symbol f r , i for each rule r and each existential variable X i in r

Inverse rules Example Data sources s 1 ( X , Y ) : − edge ( X , Z ) , edge ( Z , W ) , edge ( W , Y ) s 2 ( X ) : − edge ( X , Z ) Inverse rules edge ( X , f 1 ( X , Y )) : − s 1 ( X , Y ) edge ( f 1 ( X , Y ) , f 2 ( X , Y )) : − s 1 ( X , Y ) The fresh function symbol f r , i for each rule r and each existential variable X i in r

Inverse rules Example Data sources s 1 ( X , Y ) : − edge ( X , Z ) , edge ( Z , W ) , edge ( W , Y ) s 2 ( X ) : − edge ( X , Z ) Inverse rules edge ( X , f 1 ( X , Y )) : − s 1 ( X , Y ) edge ( f 1 ( X , Y ) , f 2 ( X , Y )) : − s 1 ( X , Y ) edge ( f 1 ( X , Y ) , Y ) : − s 1 ( X , Y ) edge ( X , f 3 ( X )) : − s 2 ( X ) The fresh function symbol f r , i for each rule r and each existential variable X i in r

Inverse rules algorithm (1) Example Query Q : q ( X , Y ) : − edge ( X , Y ) q ( X , Y ) : − edge ( X , Z ) , edge ( Z , Y )

Inverse rules algorithm (1) Example Query Q : q ( X , Y ) : − edge ( X , Y ) q ( X , Y ) : − edge ( X , Z ) , edge ( Z , Y ) Data source: s ( X , Y ) : − edge ( X , Z ) , edge ( Z , Y )

Query Answering in Data Integration Piotr Wieczorek Institute of - PowerPoint PPT Presentation

Query Answering in Data Integration Piotr Wieczorek Institute of Computer Science University of Wrocaw Dagstuhl, November 2010 Outline Quick reminder 1 Computing certain answers under OWA/CWA 2 Inverse rules algorithm 3 MiniCon

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

RWIS Data Integration for Performance Measures Improved Decision Making Planning Operations

IPCC Emission Factor Database (EFDB) Bonn Climate Change Conference (SB50) IPCC TFI Side Event

Perceptions of the Human-Earth relationship through time : njen, Available for all at the

Power Oscillation Damping Controller for Wind Power Plant Utilizing Wind Turbine Inertia as

1H 2019 Results Half year ended 31 December 2018 15 February 2019 Our Group Design concept A

BC-15 On-air Testing Procedures for Broadcasting Undertakings March 17, 2020 Background In

USING OPENSTACK TO INTEGRATE NON-OPENSTACK SERVICE JUNHO YOON, ANDREW LIU, JACK NING AGENDA

Agile Software Testing Strategies Agile Software Testing Strategies Presented by: Jared

Query Answering in Data Integration Piotr Wieczorek Institute of - PowerPoint PPT Presentation

Query Answering in Data Integration Piotr Wieczorek Institute of Computer Science University of Wrocaw Dagstuhl, November 2010 Outline Quick reminder 1 Computing certain answers under OWA/CWA 2 Inverse rules algorithm 3 MiniCon

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

query answering with description logic ontologies Meghyn Bienvenu ( CNRS &amp; Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

RWIS Data Integration for Performance Measures Improved Decision Making Planning Operations

IPCC Emission Factor Database (EFDB) Bonn Climate Change Conference (SB50) IPCC TFI Side Event

Perceptions of the Human-Earth relationship through time : njen, Available for all at the

Power Oscillation Damping Controller for Wind Power Plant Utilizing Wind Turbine Inertia as

1H 2019 Results Half year ended 31 December 2018 15 February 2019 Our Group Design concept A

BC-15 On-air Testing Procedures for Broadcasting Undertakings March 17, 2020 Background In

USING OPENSTACK TO INTEGRATE NON-OPENSTACK SERVICE JUNHO YOON, ANDREW LIU, JACK NING AGENDA

Agile Software Testing Strategies Agile Software Testing Strategies Presented by: Jared

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de