Counting Problems over Incomplete Databases Marcelo Arenas, Pablo - PowerPoint PPT Presentation

Counting Problems over Incomplete Databases Marcelo Arenas, Pablo Barceló, Mikaël Monet June 15th, 2020

Incomplete databases • Probabilistic databases: one way of dealing with uncertain data → But this is not what is used in practice most of the time... ProductId ProductName Price Color Localisation 439 Printer $100 NULL Paris center 782 Mouse $10 red NULL 398 Mouse $30 red Santiago center ... ... ... ... ... CustomerId Name Phone number Gender Adress 6 Bob NULL male 36 main street 76 Mary 551780726 NULL NULL ... ... ... ... ... 1 / 12

Incomplete databases • Probabilistic databases: one way of dealing with uncertain data → But this is not what is used in practice most of the time... ProductId ProductName Price Color Localisation 439 Printer $100 NULL Paris center 782 Mouse $10 red NULL 398 Mouse $30 red Santiago center ... ... ... ... ... CustomerId Name Phone number Gender Adress 6 Bob NULL male 36 main street 76 Mary 551780726 NULL NULL ... ... ... ... ... → Incomplete databases: relational databases with missing values 1 / 12

How do we query incomplete databases? • Default approach of database theorists for querying incomplete data: certain answers • for a valuation ν of the nulls of D into constants, let us write ν ( D ) the corresponding complete database → a tuple ¯ a is a certain answer of q ( ¯ x ) over the incomplete database D if for every valuation ν of the nulls of D , we have ¯ a ∈ q ( ν ( D )) 2 / 12

How do we query incomplete databases? • Default approach of database theorists for querying incomplete data: certain answers • for a valuation ν of the nulls of D into constants, let us write ν ( D ) the corresponding complete database → a tuple ¯ a is a certain answer of q ( ¯ x ) over the incomplete database D if for every valuation ν of the nulls of D , we have ¯ a ∈ q ( ν ( D )) Problem : what if there are no certain answers? 2 / 12

How do we query incomplete databases? • Default approach of database theorists for querying incomplete data: certain answers • for a valuation ν of the nulls of D into constants, let us write ν ( D ) the corresponding complete database → a tuple ¯ a is a certain answer of q ( ¯ x ) over the incomplete database D if for every valuation ν of the nulls of D , we have ¯ a ∈ q ( ν ( D )) Problem : what if there are no certain answers? → Recently, Libkin [PODS’18] proposes the notion of better answers a is a better answer than another tuple ¯ • a tuple ¯ b if { ν ∣ ¯ b ∈ q ( D )} ⊆ { ν ∣ ¯ a ∈ q ( D )} → we can compare (some) tuples 2 / 12

Another approach: counting a is a better answer than another tuple ¯ • a tuple ¯ b if { ν ∣ ¯ b ∈ q ( D )} ⊆ { ν ∣ ¯ a ∈ q ( D )} → we can compare (some) tuples To compare all the tuples, why not study the associated counting problems? 3 / 12

Another approach: counting a is a better answer than another tuple ¯ • a tuple ¯ b if { ν ∣ ¯ b ∈ q ( D )} ⊆ { ν ∣ ¯ a ∈ q ( D )} → we can compare (some) tuples To compare all the tuples, why not study the associated counting problems? → “How many valuations ν are such that ¯ a ∈ q ( ν ( D )) ?” → “How many distinct databases of the form ν ( D ) are such that ¯ a ∈ q ( ν ( D )) ?” → we can compare all tuples 3 / 12

Another approach: counting a is a better answer than another tuple ¯ • a tuple ¯ b if { ν ∣ ¯ b ∈ q ( D )} ⊆ { ν ∣ ¯ a ∈ q ( D )} → we can compare (some) tuples To compare all the tuples, why not study the associated counting problems? → “How many valuations ν are such that ¯ a ∈ q ( ν ( D )) ?” → “How many distinct databases of the form ν ( D ) are such that ¯ a ∈ q ( ν ( D )) ?” → we can compare all tuples → This is what we do! 3 / 12

Setting • Incomplete databases with named (marked) nulls • Each null � comes with its own finite domain dom (�) ; all valuations ν are such that ν (�) ∈ dom (�) • ν ( D ) : the (complete) database obtained from D by substituting every null � by ν (�) , and then removing duplicate tuples . We call such a database a completion of D 4 / 12

Setting • Incomplete databases with named (marked) nulls • Each null � comes with its own finite domain dom ( � ) ; all valuations ν are such that ν ( � ) ∈ dom ( � ) • ν ( D ) : the (complete) database obtained from D by substituting every null � by ν ( � ) , and then removing duplicate tuples . We call such a database a completion of D R D = dom ( � 1 ) = { a , b } , dom ( � 2 ) = { b , c } � 1 � 1 � 2 a 4 / 12

Setting • Incomplete databases with named (marked) nulls • Each null � comes with its own finite domain dom ( � ) ; all valuations ν are such that ν ( � ) ∈ dom ( � ) • ν ( D ) : the (complete) database obtained from D by substituting every null � by ν ( � ) , and then removing duplicate tuples . We call such a database a completion of D R D = dom ( � 1 ) = { a , b } , dom ( � 2 ) = { b , c } � 1 � 1 � 2 a ν = { � 1 ↦ b , � 2 ↦ c } → ν ( D ) = { R ( b , b ) , R ( a , c )} 4 / 12

Setting • Incomplete databases with named (marked) nulls • Each null � comes with its own finite domain dom ( � ) ; all valuations ν are such that ν ( � ) ∈ dom ( � ) • ν ( D ) : the (complete) database obtained from D by substituting every null � by ν ( � ) , and then removing duplicate tuples . We call such a database a completion of D R D = dom ( � 1 ) = { a , b } , dom ( � 2 ) = { b , c } � 1 � 1 � 2 a ν = { � 1 ↦ b , � 2 ↦ c } → ν ( D ) = { R ( b , b ) , R ( a , c )} ν = { � 1 ↦ a , � 2 ↦ a } → ν ( D ) = { R ( a , a )} 4 / 12

Problems studied • Fix a Boolean query q Definition: problem # Val ( q ) Input : an incomplete database D , together with finite domains dom ( � ) for each null of D Output : the number of valuations ν such that ν ( D ) ⊧ q 5 / 12

Problems studied • Fix a Boolean query q Definition: problem # Val ( q ) Input : an incomplete database D , together with finite domains dom ( � ) for each null of D Output : the number of valuations ν such that ν ( D ) ⊧ q Definition: problem # Comp ( q ) Input : an incomplete database D , together with finite domains dom ( � ) for each null of D Output : the number of completions ν ( D ) such that ν ( D ) ⊧ q 5 / 12

Example • Example: q = ∃ x S ( x , x ) , D = { S ( a , b ) , S (� 1 , a ) , S ( a , � 2 )} , dom (� 1 ) = { a , b , c } , dom (� 2 ) = { a , b } 6 / 12

Example • Example: q = ∃ x S ( x , x ) , D = { S ( a , b ) , S (� 1 , a ) , S ( a , � 2 )} , dom (� 1 ) = { a , b , c } , dom (� 2 ) = { a , b } ( ν (� 1 ) ,ν (� 2 )) ( a , a ) ( a , b ) ( b , a ) ( b , b ) ( c , a ) ( c , b ) ν ( D ) S S S S S S a b a b a b a b a b a b a a a a b a b a c a c a a a a a ν ( D ) ⊧ Q ? Yes Yes Yes No Yes No 6 / 12

Example • Example: q = ∃ x S ( x , x ) , D = { S ( a , b ) , S (� 1 , a ) , S ( a , � 2 )} , dom (� 1 ) = { a , b , c } , dom (� 2 ) = { a , b } ( ν (� 1 ) ,ν (� 2 )) ( a , a ) ( a , b ) ( b , a ) ( b , b ) ( c , a ) ( c , b ) ν ( D ) S S S S S S a b a b a b a b a b a b a a a a b a b a c a c a a a a a ν ( D ) ⊧ Q ? Yes Yes Yes No Yes No 4 satisfying valuations, 3 satisfying completions 6 / 12

Example • Example: q = ∃ x S ( x , x ) , D = { S ( a , b ) , S (� 1 , a ) , S ( a , � 2 )} , dom (� 1 ) = { a , b , c } , dom (� 2 ) = { a , b } ( ν (� 1 ) ,ν (� 2 )) ( a , a ) ( a , b ) ( b , a ) ( b , b ) ( c , a ) ( c , b ) ν ( D ) S S S S S S a b a b a b a b a b a b a a a a b a b a c a c a a a a a ν ( D ) ⊧ Q ? Yes Yes Yes No Yes No 4 satisfying valuations, 3 satisfying completions → Study the complexity of these problems depending on q ( data complexity ). Obtain dichotomies ? Can we efficiently approximate the number of solutions? Etc. 6 / 12

Problems variants and query language • We also study the setting where all labeled nulls are distinct ( Codd tables ; by contrast to naïve tables ) • We also study the setting where all nulls share the same domain ( uniform setting ) → In total we consider 8 different problems 7 / 12

Problems variants and query language • We also study the setting where all labeled nulls are distinct ( Codd tables ; by contrast to naïve tables ) • We also study the setting where all nulls share the same domain ( uniform setting ) → In total we consider 8 different problems • We focus on self-join free Boolean conjunctive queries ( sjfBCQs ) 7 / 12

Results (very simplified) 1. For 7 / 8 of the variants of our problems, we show a dichotomy for sjfBCQs between # P -hard and in PTIME 2. We show that counting valuations for Unions of Boolean Conjunctives Queries always has a fully polynomial-time randomized approximation scheme (FPRAS) 3. We show that counting completions does not have a FPRAS 4. We show that counting completions can be SpanP -complete, while it is # P -complete for counting valuations • ( SpanP = number of distinct outputs of a nondeterministic Turing machine with output tape running in polynomial time) 8 / 12

Counting Problems over Incomplete Databases Marcelo Arenas, Pablo - PowerPoint PPT Presentation

Counting Problems over Incomplete Databases Marcelo Arenas, Pablo Barcel, Mikal Monet June 15th, 2020 Incomplete databases Probabilistic databases: one way of dealing with uncertain data But this is not what is used in practice most

Counting Problems over Incomplete Databases Mikal Monet Formal Methods team seminar at LaBRI

Incomplete Information Econ 400 University of Notre Dame Econ 400 (ND) Incomplete Information

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Synthesis under incomplete information Andreas Augustin June 12, 2008 Andreas Augustin

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Randomness Task 6: Coping with Incomplete Knowledge: Overview You flip a coin. It either

Bayesian Games and Auctions Mihai Manea MIT Games of Incomplete Information Incomplete

Foundations of Incomplete Contracts Oliver Hart and John Moore Ana McDowall, Francesco Palazzo,

Some counting problems related to A relational structure M consists of a set X and a family of

Counting algorithms and complexity. A brief overview of the field . Ioannis Nemparis 1 1

Complexity of Counting Lecture 20 #P 1 FP 2 FP Turing Machines computing a (not necessarily

3. Elementary Counting Problems 4.1,4.2. Binomial and Multinomial Theorems 2. Mathematical

Counting Kings: some experimental investigations Neil Calkin Department of Mathematical Sciences

Counting Pattern-Avoiding Permutations Lara Pudwell Valparaiso University Trinity University

Triangle Counting in Large Sparse Graph Meng-Tsung Tsai r95065@cise.ntu.edu.tw Triangle Counting

Divide and Conquer Algorithms Mergesort, Quicksort Strassens Algorithm CSE 421