Decidable Classes of Datalog Programs with Arithmetic Mark Kaminski - - PowerPoint PPT Presentation

decidable classes of datalog programs with arithmetic
SMART_READER_LITE
LIVE PREVIEW

Decidable Classes of Datalog Programs with Arithmetic Mark Kaminski - - PowerPoint PPT Presentation

Decidable Classes of Datalog Programs with Arithmetic Mark Kaminski joint work with Bernardo Cuenca Grau, Egor Kostylev, Boris Motik, and Ian Horrocks Department of Computer Science, University of Oxford Metafinite 2017 Data Analytics


slide-1
SLIDE 1

Decidable Classes of Datalog Programs with Arithmetic

Mark Kaminski

joint work with
 Bernardo Cuenca Grau, Egor Kostylev, Boris Motik, and Ian Horrocks

Department of Computer Science, University of Oxford

Metafinite 2017

slide-2
SLIDE 2

Data Analytics

  • identifying patterns or trends in raw data:


market predictions, spot production bottlenecks, …

  • gaining importance in research and business
  • major challenge: heterogeneous data
  • collected from different sources
  • no uniform data format
slide-3
SLIDE 3

State of the Art

  • custom-made imperative data processing code
slide-4
SLIDE 4

State of the Art

  • custom-made imperative data processing code
  • labour-intensive
  • requires deep technical understanding
  • error-prone
slide-5
SLIDE 5

Declarative Analytics

  • describe what to compute rather than how
  • delegate low-level details to the query engine
  • improve speed and cost of code development

Alvaro et al. 2010, Markl 2014, Seo et al. 2015, Shkapsky et al. 2016

slide-6
SLIDE 6

Declarative Analytics

  • describe what to compute rather than how
  • delegate low-level details to the query engine
  • improve speed and cost of code development
  • query language: recursive rules + arithmetic


Loo et al. 2009, Alvaro et al. 2010, Eisner & Filardo 2011, Chin et al. 2015, 
 Seo et al. 2015, Wang et al. 2015, Shkapsky et al. 2016 Alvaro et al. 2010, Markl 2014, Seo et al. 2015, Shkapsky et al. 2016

slide-7
SLIDE 7

Challenges

  • datalog + arithmetic undecidable see Dantsin et al. 2011
  • no universally agreed-on semantics for aggregation
  • proposals in the literature suffer from
  • high complexity / undecidability


Van Gelder 1993, Ross & Sagiv 1997, Greco 1999, Mazuran et al. 2013

  • limited expressivity Mumick et al. 1990,


Consens & Mendelzon 1993, Greco 1999, Faber et al. 2011

  • unnatural syntactic restrictions Ross & Sagiv 1997
slide-8
SLIDE 8

Our Goal

unifying formal foundation for declarative analytics

  • generalise existing approaches
  • natural syntax and semantics
  • sufficient expressive power
  • theoretically understood computational properties
  • amenable to efficient implementation
slide-9
SLIDE 9

Overview

  • datalogℤ
  • decidability
  • non-monotonic extension
  • metafinite model theory?
slide-10
SLIDE 10

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

slide-11
SLIDE 11

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • rdinary


datalog
 atoms

slide-12
SLIDE 12

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n) numeric
 atoms

slide-13
SLIDE 13

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • ne numeric argument


per atom m n m +n

slide-14
SLIDE 14

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n) comparison
 atoms

slide-15
SLIDE 15

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • P ⊧ A(a) if ∀I: I ⊧ P implies I ⊧ A(a)
slide-16
SLIDE 16

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • P ⊧ A(a) if ∀I: I ⊧ P implies I ⊧ A(a)

two-sorted
 FO interpretation
 with integers

slide-17
SLIDE 17

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • P ⊧ A(a) if ∀I: I ⊧ P implies I ⊧ A(a)
  • P ⊧ A(a) iff A(a)∈TP(∅) TP immed. cons. operator

slide-18
SLIDE 18

Datalogℤ

  • positive datalog extended with integer arithmetic
  • example rule

A(x ) ∧ B(x,y,m ) ∧ C(y,z,n ) ∧ (m +1 ≤ 2 ·n) → D(y,z,m +n)

  • P ⊧ A(a) if ∀I: I ⊧ P implies I ⊧ A(a)
  • P ⊧ A(a) iff A(a)∈TP(∅) TP immed. cons. operator
  • undecidable even when + is the only operator

slide-19
SLIDE 19

Limit Predicates

  • keep only the minimal/maximal numeric value
  • restrict interpretations to satisfy

A(x,m ) ∧ (m ≤ n ) → A(x,n ) for A a min predicate B(x,m ) ∧ (n ≤ m ) → B(x,n ) for B a max predicate

slide-20
SLIDE 20

Limit Predicates

  • keep only the minimal/maximal numeric value
  • restrict interpretations to satisfy

A(x,m ) ∧ (m ≤ n ) → A(x,n ) for A a min predicate B(x,m ) ∧ (n ≤ m ) → B(x,n ) for B a max predicate

  • limit datalogℤ: all numeric predicates in rule heads

limit predicates

slide-21
SLIDE 21

Example

cheapest route from London to Reykjavík?

flight(x,y,c) → route(x,y,c)
 route(x,z,c1) ∧ flight(z,y,c2) → route(x,y,c1+c2)

route a min predicate 
 
 
 


flight London Hamburg 100 Hamburg Reykjavík 150 London Reykjavík 300

slide-22
SLIDE 22

Example

cheapest route from London to Reykjavík?

flight(x,y,c) → route(x,y,c)
 route(x,z,c1) ∧ flight(z,y,c2) → route(x,y,c1+c2)

route a min predicate 
 
 
 


flight London Hamburg 100 Hamburg Reykjavík 150 London Reykjavík 300 route London Reykjavík 250 London Reykjavík 300 … … …

slide-23
SLIDE 23

Example

cheapest route from London to Reykjavík?

flight(x,y,c) → route(x,y,c)
 route(x,z,c1) ∧ flight(z,y,c2) → route(x,y,c1+c2)

route a min predicate 
 
 
 


flight London Hamburg 100 Hamburg Reykjavík 150 London Reykjavík 300 route London Reykjavík 250 London Reykjavík 300 … … …

slide-24
SLIDE 24

Pseudo-Interpretations

  • Herbrand interpretations J
  • for each min/max predicate A and constants a


store only the minimal/maximal k ∈ ℤ s.t. J ⊧ A(a,k )

slide-25
SLIDE 25

Pseudo-Interpretations

  • Herbrand interpretations J
  • for each min/max predicate A and constants a


store only the minimal/maximal k ∈ ℤ s.t. J ⊧ A(a,k )

  • each limit datalogℤ program P


has a pseudo-model J with |J | ≤ |P |

slide-26
SLIDE 26

Limit Linearity

  • limit datalogℤ undecidable: consider P as follows


→ A(0)
 A(x1) ∧ … ∧ A(xn) ∧ p(x1,…,xn)=0 → B P ⊧ B iff p(x1,…,xn)=0 has non-negative integer solution

slide-27
SLIDE 27

Limit Linearity

  • limit datalogℤ undecidable: consider P as follows


→ A(0)
 A(x1) ∧ … ∧ A(xn) ∧ p(x1,…,xn)=0 → B P ⊧ B iff p(x1,…,xn)=0 has non-negative integer solution

  • limit linearity:


disallow multiplication between limit variables

slide-28
SLIDE 28

Limit Linearity

  • limit datalogℤ undecidable: consider P as follows


→ A(0)
 A(x1) ∧ … ∧ A(xn) ∧ p(x1,…,xn)=0 → B P ⊧ B iff p(x1,…,xn)=0 has non-negative integer solution

  • limit linearity:


disallow multiplication between limit variables A(x ) ∧ B(y ) → C(x ·y ) not limit linear

slide-29
SLIDE 29

Limit Linearity

  • limit datalogℤ undecidable: consider P as follows


→ A(0)
 A(x1) ∧ … ∧ A(xn) ∧ p(x1,…,xn)=0 → B P ⊧ B iff p(x1,…,xn)=0 has non-negative integer solution

  • limit linearity:


disallow multiplication between limit variables A(x ) ∧ B(y ) → C(x ·y ) not limit linear A(x ) ∧ B(y ) → C(x ·y ) limit linear

slide-30
SLIDE 30

Limit-Linear Datalogℤ

  • fact entailment coNEXPTIME-complete


and coNP-complete in data complexity

  • upper bounds (data complexity)
  • fact entailment reducible to Presburger validity

A(x ) → B(x +1) ↝ ∀x.defA ∧ (x ≤valA) → defB ∧ (x +1≤valB)

  • magnitude of integers in countermodels


exponentially bounded using Chistikov & Haase 2016

  • NP guess-and-check procedure for non-entailment
slide-31
SLIDE 31

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
slide-32
SLIDE 32

Square Tiling input: finite set T of tiles horizontal compatibility relation H⊆T⨯T vertical compatibility relation V⊆T⨯T number N problem: is there a function N⨯N → T
 satisfying H and V (tiling)?

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
slide-33
SLIDE 33

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
  • interpret each N2⎡log2|T|⎤bit number n


as a candidate tiling; initialise n with 0 · -

slide-34
SLIDE 34

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
  • interpret each N2⎡log2|T|⎤bit number n


as a candidate tiling; initialise n with 0

  • if n not a tiling, increase n

· -

slide-35
SLIDE 35

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
  • interpret each N2⎡log2|T|⎤bit number n


as a candidate tiling; initialise n with 0

  • if n not a tiling, increase n
  • if n > 2N ⎡log |T|⎤-1, return ‘noSolution’

· -

2· 2

slide-36
SLIDE 36

Limit-Linear Datalogℤ

  • lower bounds: reduction from square tiling
  • interpret each N2⎡log2|T|⎤bit number n


as a candidate tiling; initialise n with 0

  • if n not a tiling, increase n
  • if n > 2N ⎡log |T|⎤-1, return ‘noSolution’
  • P ⊧ noSolution iff no tiling exists

· -

2· 2

slide-37
SLIDE 37

Accessing Limit Values

  • want to check if max. value of A < max. value of B
slide-38
SLIDE 38

Accessing Limit Values

  • want to check if max. value of A < max. value of B
  • A(x ) ∧ B(y ) ∧ (x <y ) → A_lessthan_B
slide-39
SLIDE 39

Accessing Limit Values

  • want to check if max. value of A < max. value of B
  • A(x ) ∧ B(y ) ∧ (x <y ) → A_lessthan_B
  • does the rule apply to ? yes!

I = { A(1) B(0) } A(0) B(-1) A(-1) B(-2) ⋮ ⋮

slide-40
SLIDE 40

Accessing Limit Values

  • want to check if max. value of A < max. value of B
  • A(x ) ∧ B(y ) ∧ (x <y ) → A_lessthan_B
  • does the rule apply to ? yes!

I = { A(1) B(0) } A(0) B(-1) A(-1) B(-2) ⋮ ⋮

slide-41
SLIDE 41

Accessing Limit Values

  • want to check if max. value of A < max. value of B
  • A(x ) ∧ B(y ) ∧ (x <y ) → A_lessthan_B
  • does the rule apply to ? yes!
  • want to restrict rule application to min./max. values

I = { A(1) B(0) } A(0) B(-1) A(-1) B(-2) ⋮ ⋮

slide-42
SLIDE 42

LUB Operator

  • ⎡A(a,k)⎤ satisfied in I iff k ∈ ℤ maximal s.t. A(a,k) ∈ I
slide-43
SLIDE 43

LUB Operator

  • ⎡A(a,k)⎤ satisfied in I iff k ∈ ℤ maximal s.t. A(a,k) ∈ I
  • ⎡A(x )⎤ ∧ ⎡B(y )⎤ ∧ (x <y ) → A_lessthan_B


does not apply to

I = { A(1) B(0) } A(0) B(-1) A(-1) B(-2) ⋮ ⋮

slide-44
SLIDE 44

LUB Operator

  • ⎡A(a,k)⎤ satisfied in I iff k ∈ ℤ maximal s.t. A(a,k) ∈ I
  • ⎡A(x )⎤ ∧ ⎡B(y )⎤ ∧ (x <y ) → A_lessthan_B


does not apply to

  • can simulate negation as failure

→ B(0)
 A → B(1) simulates not A → C
 ⎡B(0)⎤→ C

I = { A(1) B(0) } A(0) B(-1) A(-1) B(-2) ⋮ ⋮

slide-45
SLIDE 45

Stratified-Linear Datalogℤ

  • stratify application of⎡⎤
slide-46
SLIDE 46

Stratified-Linear Datalogℤ

  • stratify application of⎡⎤
  • recall: A(x ) ∧ B(y ) → C(x ·y ) not limit linear


if A,B both max predicates

slide-47
SLIDE 47

Stratified-Linear Datalogℤ

  • stratify application of⎡⎤
  • recall: A(x ) ∧ B(y ) → C(x ·y ) not limit linear


if A,B both max predicates

  • A(x ) ∧⎡B(y )⎤→ C(x ·y ) stratified-linear if B <s C
slide-48
SLIDE 48

Complexity

fact entailment for stratified-linear datalogℤ
 Δ2 -complete and Δ2-complete in data complexity

  • upper bounds: stratified-linear programs

deterministic compositions of limit-linear programs

EXP P

slide-49
SLIDE 49

Complexity

fact entailment for stratified-linear datalogℤ
 Δ2 -complete and Δ2-complete in data complexity

  • upper bounds: stratified-linear programs

deterministic compositions of limit-linear programs

  • lower bounds:
  • show that stratified-linear datalogℤ captures Δ2


by simulating a DTM with an NP oracle

  • Gottlob et al. 1999: if a logic L captures a

complexity class C in the polynomial hierarchy, then L is Exp(C )-hard in combined complexity

EXP P P

slide-50
SLIDE 50

Metafinite Model Theory

  • stratified-linear datalogℤ captures Δ2

  • n ordered datasets containing no numeric facts

P

slide-51
SLIDE 51

Metafinite Model Theory

  • stratified-linear datalogℤ captures Δ2

  • n ordered datasets containing no numeric facts
  • what about datasets with numeric facts,


i.e., metafinite structures?

  • does stratified-linear datalogℤ capture Δ2?
  • does limit-linear datalogℤ capture coNP?

P P

slide-52
SLIDE 52

Metafinite Model Theory

  • stratified-linear datalogℤ captures Δ2

  • n ordered datasets containing no numeric facts
  • what about datasets with numeric facts,


i.e., metafinite structures?

  • does stratified-linear datalogℤ capture Δ2?
  • does limit-linear datalogℤ capture coNP?
  • what about datalogℝ?

P P

slide-53
SLIDE 53

Alternative Semantics

  • interpret limit atoms by their maximal value


rather than a half-open interval following Ross & Sagiv. 1997
 ≈ put⎡⎤around every limit atom

  • decidability without limit-linearity
  • data complexity of monotone datalogℤ


in PPosSLP=BP(Pℝ) as defined in Allender et al. 2009

slide-54
SLIDE 54

Thank you!