Processing Regular Path Queries Using Views or What Do We Need for - PowerPoint PPT Presentation

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome “La Sapienza” joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi Logic-based Methods for Information Integration Vienna – August 23, 2003

Data integration Deals with the problem of providing a uniform access to a collection of data stored in multiple, autonomous, and heterogeneous data sources. Basic problem in: • management of distributed information systems • data warehousing • data re-engineering • enterprise knowledge management • querying multiple sources on the web • e-commerce, e-business, e-government, e- · · · • integration of data from distributed scientific experiments • · · · D. Calvanese Processing Regular Path Queries Using Views 1

Framework for data integration Query Global Schema A R B C S D T E Mapping U V W Source Source u 1 v 1 w 1 Schema 1 Schema 2 u 2 v 2 w 2 Source 1 Source 2 D. Calvanese Processing Regular Path Queries Using Views 2

Quality in query answering Among the various tasks in data integration, we deal with how to answer queries expressed on the global schema: ❀ View-based query processing D. Calvanese Processing Regular Path Queries Using Views 3

Quality in query answering Among the various tasks in data integration, we deal with how to answer queries expressed on the global schema: ❀ View-based query processing The data integration system should be designed in such a way that suitable quality criteria are met. Here, we concentrate on: • Soundness: the answer to a query includes only what is known to be true • Completeness: the answer to a query includes all that is known to be true We aim at getting exactly what is known. But, what is known depends on how the data integration system is modeled D. Calvanese Processing Regular Path Queries Using Views 3-a

Formal framework A data integration system I is a triple �G , S , M� , where • G is the global schema • S is the source schema • M is the mapping between G and S D. Calvanese Processing Regular Path Queries Using Views 4

Formal framework A data integration system I is a triple �G , S , M� , where • G is the global schema • S is the source schema • M is the mapping between G and S Semantics of I : which are the (global) databases that satisfy I ? • We start from a source database D , representing the data at the sources • The (global) databases B that satisfy I wrt D are those that: – are legal wrt the global schema G , and – satisfy the mapping M wrt D D. Calvanese Processing Regular Path Queries Using Views 4-a

Semistructured data Semistructured data are an abstraction for data on the web, structured documents, XML: • A semistructured database is an edge-labeled graph bib o1 ... article article book book reference reference reference reference o15 o37 o42 o58 ... ... author ... author title ... title author ... o68 o75 o64 o71 o83 ”Victor Vianu””Regular ...” ”Tova Milo” ”Type Inference ...” firstname lastname o52 o53 ”Dan” ”Suciu” D. Calvanese Processing Regular Path Queries Using Views 5

Semistructured data Semistructured data are an abstraction for data on the web, structured documents, XML: • A semistructured database is an edge-labeled graph • Queries need to provide the ability to navigate the graph: regular path queries (RPQs) and 2-way regular-path-queries (2RPQs) bib o1 ... Q 1 ( x, y ) ← article article book book reference reference reference reference x (( article + book ) · ref ∗ · title ) y o15 o37 o42 o58 ... ... author ... author title ... title author ... Q 2 ( x, y ) ← o68 o75 o64 o71 o83 ”Victor Vianu””Regular ...” ”Tova Milo” ”Type Inference ...” firstname lastname x ( article · ( ref + ref − ) ∗ · title ) y o52 o53 ”Dan” ”Suciu” D. Calvanese Processing Regular Path Queries Using Views 5-a

Integrating semistructured data We consider data integration systems I = �G , S , M� where: • The global schema G simply fixes the set of labels of the database • The sources in S are binary relations • The mapping M is of type local-as-view (LAV): to each source s it associates a 2RPQ view V s over G D. Calvanese Processing Regular Path Queries Using Views 6

Integrating semistructured data We consider data integration systems I = �G , S , M� where: • The global schema G simply fixes the set of labels of the database Example: G = { article , ref , title , author , . . . } • The sources in S are binary relations Example: S = { s 1 , s 2 , s 3 } , where – s 1 stores for each bibliography its articles – s 2 stores for each publ. the ones it references directly or indirectly – s 3 stores for each publication its title • The mapping M is of type local-as-view (LAV): to each source s it associates a 2RPQ view V s over G Example: V s 1 ( b, a ) ← b ( article ) a p 1 ( ref ∗ ) p 2 V s 2 ( p 1 , p 2 ) ← V s 3 ( p, t ) ← p ( title ) t D. Calvanese Processing Regular Path Queries Using Views 6-a

Assumptions on the sources Let D be a source database and B a global database that satisfies I wrt D : sound source: s ( D ) ⊆ V s ( B ) all tuples in the source satisfy V s , but there may be other tuples satisfying V s that are not in the source complete source: s ( D ) ⊇ V s ( B ) all tuples that satisfy V s are in the source, but there may be also tuples in the source not satisfying V s exact source: s ( D ) = V s ( B ) the tuples in the source are exactly those that satisfy V s (i.e., both sound and complete) We will assume that sources are sound (unless we explicitly say otherwise) D. Calvanese Processing Regular Path Queries Using Views 7

View-based query processing tasks View-based query answering: compute the set of certain answers to a query over the global schema ❀ is the basic basic query processing task View-based query rewriting: reformulate a query over the global schema in terms of the sources ❀ provides an indirect means for view-based query answering Query containment and view-based query containment: check whether the answers to one query are contained in the answers to another query, possibly taking into account the views in the mapping ❀ allow for establishing quality criteria of the answering process D. Calvanese Processing Regular Path Queries Using Views 8

View-based query answering Given: • a semistructured data integration system I = �G , S , M� • a source database D • a 2RPQ Q over G • a pair of objects ( c, d ) check whether ( c, d ) is a certain answer to Q wrt I and D A certain answer is a tuple that is in the answer to Q for every database B that satisfies I wrt D View-based query answering is the basic query processing task in data integration [Levy+al ’95, Rajaraman+al ’95, Abiteboul+Duschka ’98, — ICDE’00, — PODS’00, — LICS’00, . . . ] D. Calvanese Processing Regular Path Queries Using Views 9

View-based query answering for 2RPQs Technique based on search for a counterexample database: 1. it is sufficient to restrict the attention to counterexamples of a special form (canonical databases) 2. represent canonical databases by means of words 3. construct two-way finite automaton that accepts words encoding canonical counterexample databases 4. check for emptiness of the automaton D. Calvanese Processing Regular Path Queries Using Views 10

View-based query answering for 2RPQs Technique based on search for a counterexample database: 1. it is sufficient to restrict the attention to counterexamples of a special form (canonical databases) 2. represent canonical databases by means of words 3. construct two-way finite automaton that accepts words encoding canonical counterexample databases 4. check for emptiness of the automaton The non-emptiness of the automaton can be rephrased in terms of constraint satisfaction (CSP) ❀ tight relationship between view-based query answering and CSP D. Calvanese Processing Regular Path Queries Using Views 10-a

Constraint satisfaction problems Let A and B be relational structures over the same alphabet A homomorphism h is a mapping from A to B such that for every relation R , if ( c 1 , . . . , c n ) ∈ R ( A ) , then ( h ( c 1 ) , . . . , h ( c n )) ∈ R ( B ) . Non-uniform constraint satisfaction problem CSP ( B ) : the set of relational structures A such that there is a homomorphism from A to B . Complexity: • CSP ( B ) is in NP • there are structures B for which CSP ( B ) is NP-hard D. Calvanese Processing Regular Path Queries Using Views 11

CSP and view-based query answering for 2RPQs Consider I = �G , S , M� and a 2RPQ query Q over G : • We can define a relational structure CT Q, M , called constraint template of Q wrt M • Given a source database D and two objects c , d , we can define another relational structure CI c,d D over the same alphabet, called the constraint instance CT Q, M can be computed in exponential time in Q and polynomial time in M D. Calvanese Processing Regular Path Queries Using Views 12

Processing Regular Path Queries Using Views or What Do We Need for - PowerPoint PPT Presentation

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi Logic-based Methods for Information

An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries

Path Query Patterns CIP2017-02-06 a.k.a. (Conjunctive) Regular Path Queries Regular

Optimization of in collaboration with: Parke Godfrey and Jarek Gryz Regular Path Queries in

A Theory of Regular Queries Moshe Y. Vardi Rice University Theory of Regular Languages, I

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Views 2 Designing the user interface Roy Scholten hi Views . Views 2 Views 2 have you heard

Regular Expressions A regular expression describes a language using three operations. Regular

Dynamic Complexity: From Regular Languages to Regular Path Queries Thomas Schwentick AutoMathA

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

PAPER DISCUSSION: Adding Regular Expressions to Graph Reachability and Pattern Queries Xilun Wu

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Objectives You should be able to ... Regular Languages Use the syntax of regular expressions

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

1 Example: a :1/ 2, b :1/ 4, c d , :1/8 How are the trees used? Q: {

Outline Memory safety and security CSci 4271W Stack buffer overflow Development of Secure

Homotopy Type Theory Steve Awodey Carnegie Mellon University Logic Colloquium 2011 Barcelona

Computational content of the fan theorem for coconvex bars Helmut Schwichtenberg Mathematisches

Your Data in the Cloud Week 7 Frank Chen | Spring 2017 Frank Chen | Spring 2017 Agenda

The setting of the research ( s ) { 0 , 1 } 1 ( ( s )) = s s S

Realistic analysis of algorithms Application to some popular algorithms Julien Clment

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski