2/16/16 Outline 0) Course Info CS520 1) Introduction Data - PDF document

2/16/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2) Data Preparation and Cleaning Provenance 3) Schema matching and mapping 4) Virtual Data Integration 4. Virtual Data Integration 5) Data Exchange 6) Data Warehousing IIT DBGroup 7) Big Data Analytics Boris Glavic http://www.cs.iit.edu/~glavic/ 8) Data Provenance http://www.cs.iit.edu/~cs520/ http://www.cs.iit.edu/~dbgroup/ 1 CS520 - 3) Matching and Mapping 4. Virtual Data Integration 4. Virtual Data Integration • Virtual Data Integration Problems: • How to create mappings? – Discussed in previous part of the course • How to compute query Q Query – This is the main focus of this part Global Schema Mappings Local Local Local Schema Schema Schema 1 2 n 2 3 CS520 - 3) Matching and Mapping CS520 - 3) Matching and Mapping 4. Query Answering with Views 4.1 Query Answering with Views • How to compute query Q over global Example: Solu-ons schema based on source schemas only? Local Schema Global Schema Person Person – What language is used to express mappings? Name Name Address Address – What language due we allow for Q? Office-phone Address Office-address – What language(s) can we use to query local Id Home-phone City sources? Office-contact – What language can we use to compute Q from ∀ x, y, z, a : Person ( x, y ) ∧ Address ( y, z, a ) → ∃ b, c : Person ( x, z, a, b, c ) query results returned by local sources? Query: Q(Name) :- Person(Name, A, OP, OA, HP). – How to deal with incompleteness? Name Address Id City Office-contact Peter 1 1 Chicago (312) 123 4343 Alice 2 2 Chicago (312) 555 7777 Bob 3 3 New York (465) 123 1234 4 5 CS520 - 5) Data Exchange CS520 - 5) Data Exchange 1

2/16/16 4.1 Query Answering with Views 4.1 Query Answering with Views Example: Solu-ons Example: Solu-ons Local Schema Local Schema Global Schema Global Schema Person Person Person Person Name Address Id City Office-contact Name Name Name Name Peter 1 Address 1 Chicago Address (312) 123 4343 Address Address Office-phone Office-phone Alice 2 2 Chicago (312) 555 7777 Address Office-address Address Office-address Bob 3 3 New York (465) 123 1234 Id Home-phone Id Home-phone Values of home-phone are not City City Office-contact available in the source Office-contact ∀ x, y, z, a : Person ( x, y ) ∧ Address ( y, z, a ) → ∃ b, c : Person ( x, z, a, b, c ) ∀ x, y, z, a : Person ( x, y ) ∧ Address ( y, z, a ) → ∃ b, c : Person ( x, z, a, b, c ) Query: Q(Name) :- Person(Name, A, OP, OA, HP). Query: Q(Home-ph) :- Person(N, A, OP, OA, Home-ph). Name RewriKen query over the source: Name Address Id City Office-contact Peter Q(Name) :- Person(Name, AI), Peter 1 1 Chicago (312) 123 4343 Alice Address(AI,A,OP). Alice 2 2 Chicago (312) 555 7777 Bob Bob 3 3 New York (465) 123 1234 6 7 CS520 - 5) Data Exchange CS520 - 5) Data Exchange 4. Query Answering with Views Motivating Example (Part 1) • Problems Movie(ID,Ttle,year,genre) – How to determine whether query can be answered Director(ID,director) Actor(ID, actor) at all? – Given a rewriting of the query using views, how Q ( T , Y , D ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " do we know it is correct? Director ( I , D ), Actor ( I , D ) – What to do if views can only return some of the query results? V 1 ( T , Y , D ) : − Movie ( I , T , Y , G ), Y ≥ 1940, G = " comedy " Director ( I , D ), Actor ( I , D ) V 1 ⊇ Q Q '( T , Y , D ) : − V 1 ( T , Y , D ), Y ≥ 1950 ⇒ Containment is enough to show that V 1 can be used to answer Q. 8 CS520 - 5) Data Exchange Motivating Example (Part 2) Problem Definition Q ( T , Y , D ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " Input: Query Q Director ( I , D ), Actor ( I , D ) View definiTons: V 1 ,… ,V n V 2 ( I , T , Y ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " A rewriTng: a query Q ’ that refers only V 3 ( I , D ) : − Director ( I , D ), Actor ( ID , D ) to the views and interpreted predicates Containment does not hold, but intuiTvely, V 2 and V 3 are (comparisons) useful for answering Q . Q ''( T , Y , D ) : − V 2 ( I , T , Y ), V 3 ( I , D ) An equivalent rewriTng of Q using V 1 ,… ,V n : How do we express that intuiTon? a rewriTng Q ’ , such that Q ’ ⇔ Q Answering queries using views! 2

2/16/16 Naïve approach Motivating Example (Part 3) • Given Q and views Movie(ID,Ttle,year,genre) – Randomly combine views into a query Q’ Director(ID,director) Actor(ID, actor) – Check equivalence of Q’ and Q – If Q’ is equivalent we are done Q ( T , Y , D ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " – Else repeat Director ( I , D ), Actor ( I , D ) • Why is this not good? V 4 ( I , T , Y ) : − Movie ( I , T , Y , G ), Y ≥ 1960, G = " comedy " – There are infinitely many ways of combining V 3 ( I , D ) : − Director ( I , D ), Actor ( ID , D ) views • E.g., V, V x V, V x V x V, … Q '''( T , Y , D ) : − V 4 ( I , T , Y ), V 3 ( I , D ) – We are not using any information in the query maximally-contained rewri-ng Maximally-Contained Rewritings Why again? Input: Query Q Rewriting query language L View definitions: V 1 , … ,V n Q ’ is a maximally-contained rewriting of Query Q given V 1 , … ,V n and L if: Global LAV/GLAV! 1. Q ’ ∈ L, Schema 2. Q ’ ⊆ Q, and Mappings 3. there is no Q ’’ in L such Local Local Local Schema Schema Schema that 1 2 n Q ’’ ⊆ Q and Q ’ ⊂ Q ’’ Exercise: which of these views Other use-cases can be used to answer Q ? • Query opTmizaTon with materialized views Q ( T , Y , D ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " – Need equivalent rewriTngs Director ( I , D ), Actor ( I , D ) – Implemented in many commercial DBMS – Here interest is cost: how to speed-up query V 2 ( I , T , Y ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " processing by using materialized views V 3 ( I , D ) : − Director ( I , D ), Actor ( I , D ) V 6 ( T , Y ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy " V 7 ( I , T , Y ) : − Movie ( I , T , Y , G ), Y ≥ 1950, G = " comedy ", Award ( I , W ) V 8 ( I , T ) : − Movie ( I , T , Y , G ), Y ≥ 1940, G = " comedy " 3

2/16/16 Algorithms for answering queries Bounding the Rewriting Length using views • Step 1 : we ’ ll bound the space of possible Theorem : if there is an equivalent rewriTng, query rewriTngs we need to consider (no there is one with at most n subgoals. comparisons) Query: Q ( X ) : − p 1 ( X 1 ),..., p n ( X n ) • Step 2 : we ’ ll find efficient methods for searching the space of rewriTngs Rewriting: Q '( X ) : − V 1 ( X 1 ),..., V m ( X m ) – Bucket Algorithm, MiniCon Algorithm ϕ • Step 2b : we consider “ logical approaches ” to 1 ,... g k 1 m ,..., g j m Expansion: Q ''( X ) : − g 1 ,..., g 1 the problem:         – The Inverse-Rules Algorithm Proof: Only n subgoals in Q can contribute to the image of the containment mapping ϕ Complexity Result The Bucket Algorithm [LMSS, 1995] • Applies to queries with no interpreted Key idea : predicates. – Create a bucket for each subgoal g in the query. • Finding an equivalent rewriTng of a query – The bucket contains views that contribute to g . using views is NP-complete – Create rewriTngs from the Cartesian product of the buckets (select one view for each goal) – Need only consider rewriTngs of query length or less. • Maximally-contained rewriTng: • Step 1 : assign views with renamed vars to buckets – Union of all conjuncTve rewriTngs of length n or less. • Step 2 : create rewriTngs, refine them, unTl equivalent/all contained rewriTng(s) are found The Bucket Algorithm The Bucket Algorithm Step 1 : Step 1 Intui-on – We want to construct buckets with views that – A view can only be used to provide informaTon have parTally mapped variables about a goal R(X) if it has a goal R(Y) – For each goal g = R in query • Q(X) :- R(X,Y) • V(X) :- S(X,Y) – For each view V – If the query goal contains variables that are in the – For each goal v = R in V head of the query, then the view is only useful if it • If the goal has head variables in the same places as g gives access to these values (they are in the head) then • Q(X) :- R(X,Y) – rename the view head variables to match the query goal vars – choose a new unique name for each other var • V(X) :- S(X,Y), R(Y,Z) – add the resulTng view atom to the bucket 4

2/16/16 Outline 0) Course Info CS520 1) Introduction Data - PDF document

2/16/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2) Data Preparation and Cleaning Provenance 3) Schema matching and mapping 4) Virtual Data Integration 4. Virtual Data Integration 5) Data Exchange 6) Data

Classical Planning CE417: Introduction to Artificial Intelligence Sharif University of Technology

Conditional Acceptance Functions Tjitze Rienstra April 3, 2012 Introduction Argumentation

Planning for Web Services Evren Sirin & Bijan Parsia MINDSWAP Research Group University of

Rationalizing Medical Relation Prediction from Corpus-level Statistics Zhen Wang The Ohio State

Closed Loop Fund Increasing Recycling in the U.S. August 12, 2014 Conversation in partnership

1 The induced current in the closed loop is largest in which one of these diagrams? A B D C

Open loop synthesis for closed loop control Kazufumi Ito, North Carolina State University June

Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution Yong Guo, Jian

This document must be cited according to its fjnal version which is published in a conference as:

Content Introduction 1. Modeling, incl. reality, target, goal and vision 2. Modeling sequences

r s Prs r

Up-To Techniques for Weighted Systems Barbara K onig Universit at Duisburg-Essen Joint

Simulations of Weighted Tree Automata Zoltn sik 1 and Andreas Maletti 2 1 University of Szeged,

Pebble weighted automata and transitive closure logics B. Bollig, P. Gastin, B. Monmege, M.

Quantifiers on languages and codensity monads Luca Reggio Joint work with Mai Gehrke and Daniela

Marginal Consistency: Unifying Constraint Propagation on Commutative Semirings Tom a s

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Lean in new research Neil Strickland January 7, 2020 Using Lean as a tool for new research I

Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy

mathlib : Leans mathematical library Johannes Hlzl VU Amsterdam Lean Together 2019

Coinduction in concurrent timed systems Jan Komenda Institute of Mathematics, Czech Academy of

FLABloM: Functional linear algebra with block matrices Adam Sandberg Eriksson Patrik Jansson

Formulas for various domination numbers of products of paths and cycles Janez c 1 Zerovnik 1 ,

Organizing Numerical Theories using Axiomatic Type Classes Lawrence C Paulson Computer

2/16/16 Outline 0) Course Info CS520 1) Introduction Data - PDF document

2/16/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2) Data Preparation and Cleaning Provenance 3) Schema matching and mapping 4) Virtual Data Integration 4. Virtual Data Integration 5) Data Exchange 6) Data

Classical Planning CE417: Introduction to Artificial Intelligence Sharif University of Technology

Conditional Acceptance Functions Tjitze Rienstra April 3, 2012 Introduction Argumentation

Planning for Web Services Evren Sirin &amp; Bijan Parsia MINDSWAP Research Group University of

Rationalizing Medical Relation Prediction from Corpus-level Statistics Zhen Wang The Ohio State

Closed Loop Fund Increasing Recycling in the U.S. August 12, 2014 Conversation in partnership

1 The induced current in the closed loop is largest in which one of these diagrams? A B D C

Open loop synthesis for closed loop control Kazufumi Ito, North Carolina State University June

Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution Yong Guo, Jian

This document must be cited according to its fjnal version which is published in a conference as:

Content Introduction 1. Modeling, incl. reality, target, goal and vision 2. Modeling sequences

r s Prs r

Up-To Techniques for Weighted Systems Barbara K onig Universit at Duisburg-Essen Joint

Simulations of Weighted Tree Automata Zoltn sik 1 and Andreas Maletti 2 1 University of Szeged,

Pebble weighted automata and transitive closure logics B. Bollig, P. Gastin, B. Monmege, M.

Quantifiers on languages and codensity monads Luca Reggio Joint work with Mai Gehrke and Daniela

Marginal Consistency: Unifying Constraint Propagation on Commutative Semirings Tom a s

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Lean in new research Neil Strickland January 7, 2020 Using Lean as a tool for new research I

Distributional Learning of Context-Free Grammars. Alexander Clark Department of Philosophy

mathlib : Leans mathematical library Johannes Hlzl VU Amsterdam Lean Together 2019

Coinduction in concurrent timed systems Jan Komenda Institute of Mathematics, Czech Academy of

FLABloM: Functional linear algebra with block matrices Adam Sandberg Eriksson Patrik Jansson

Formulas for various domination numbers of products of paths and cycles Janez c 1 Zerovnik 1 ,

Organizing Numerical Theories using Axiomatic Type Classes Lawrence C Paulson Computer

Planning for Web Services Evren Sirin & Bijan Parsia MINDSWAP Research Group University of