Challenges for Efficient Query Evaluation on Structured - - PowerPoint PPT Presentation

challenges for efficient query
SMART_READER_LITE
LIVE PREVIEW

Challenges for Efficient Query Evaluation on Structured - - PowerPoint PPT Presentation

Challenges for Efficient Query Evaluation on Structured Probabilistic Data SUM2016 SEPTEMBER 23, 2016, NICE Antoine Amarilli, Silviu Maniu, Mikal Monet 2 A probabilistic database R S a d d e f c f e a e d a c e b e Q c


slide-1
SLIDE 1

Challenges for Efficient Query Evaluation on Structured Probabilistic Data

SUM2016 SEPTEMBER 23, 2016, NICE

Antoine Amarilli, Silviu Maniu, Mikaël Monet

slide-2
SLIDE 2

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f

2

slide-3
SLIDE 3

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66

2

slide-4
SLIDE 4

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66

TID model

2

slide-5
SLIDE 5

Probability of a possible world

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible World I

3

slide-6
SLIDE 6

Probability of a possible world

Probability Pr(I) of this possible world = 0.7*0.13*0.23*0.66

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible World I

3

slide-7
SLIDE 7

Probability of a possible world

Probability Pr(I) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7).

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible World I

3

slide-8
SLIDE 8

Probabilistic query evaluation (PQE)

4

slide-9
SLIDE 9

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)

4

slide-10
SLIDE 10

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀:

P(Q) = 𝐽 ⊆𝖀, Q⊨𝐽 Pr(𝐽)

4

slide-11
SLIDE 11

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀:

P(Q) = 𝐽 ⊆𝖀, Q⊨𝐽 Pr(𝐽)

 Problem: in general #P-hard

4

slide-12
SLIDE 12

1) Approximate probability computation

5

slide-13
SLIDE 13

1) Approximate probability computation

 Monte-Carlo sampling

5

slide-14
SLIDE 14

1) Approximate probability computation

 Monte-Carlo sampling  Inconvenient: running time quadratic in desired

precision.

5

slide-15
SLIDE 15

1) Approximate probability computation

 Monte-Carlo sampling  Inconvenient: running time quadratic in desired

precision.

⇒ Not adequate for low probabilities.

5

slide-16
SLIDE 16

2) Restricting the class of queries

[Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

6

slide-17
SLIDE 17

2) Restricting the class of queries

[Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

 Either PQE is #P-hard on all intances

6

slide-18
SLIDE 18

2) Restricting the class of queries

[Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

 Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances

6

slide-19
SLIDE 19

2) Restricting the class of queries

[Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

 Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃x,y R(x),T(x,y),S(y) is

already #P-hard !

6

slide-20
SLIDE 20

2) Restricting the class of queries

[Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

 Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃x,y R(x),T(x,y),S(y) is

already #P-hard !

 Criterion is to crisp

6

slide-21
SLIDE 21

3) Restricting the shape of the instances

 Bound the treewidth of instances by a constant.  Treewidth: mesure used to tell how far a graph is from

being a tree

7

slide-22
SLIDE 22

O(f(q,k)) O(EXP(k).|I|) O(|A|.|T|)

8

slide-23
SLIDE 23

O(f(q,k)) O(EXP(k).|I|)

Instance I

  • f treewidth

k O(|A|.|T|)

8

slide-24
SLIDE 24

O(f(q,k)) O(EXP(k).|I|)

Instance I

  • f treewidth

k

Tree decomposition

T

O(|A|.|T|)

8

slide-25
SLIDE 25

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Tree decomposition

T

O(|A|.|T|)

8

slide-26
SLIDE 26

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

8

slide-27
SLIDE 27

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

8

slide-28
SLIDE 28

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C Probability

8

slide-29
SLIDE 29

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO Bool[X]

Probability

8

slide-30
SLIDE 30

Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

Provenance circuits

9

slide-31
SLIDE 31

Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

Provenance circuits

9

slide-32
SLIDE 32

Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

Provenance circuits

9

slide-33
SLIDE 33

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO

Probability

Problems

10

slide-34
SLIDE 34

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO

Probability

Problems

10

slide-35
SLIDE 35

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO

Probability

Problems

Non-elementary complexity in general

10

slide-36
SLIDE 36

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO

Probability

Problems

Non-elementary complexity in general

10

slide-37
SLIDE 37

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

MSO

Probability

Problems

Non-elementary complexity in general Are real datasets treelike ?

10

slide-38
SLIDE 38

Current work

11

slide-39
SLIDE 39

Current work

 From bottom-up tree automtata to alternating two-way

automata

11

slide-40
SLIDE 40

Current work

 From bottom-up tree automtata to alternating two-way

automata

 Introduce Intensionally-Clique-Guarded Datalog (ICG-

Datalog) parameterized by body-size

11

slide-41
SLIDE 41

Current work

 From bottom-up tree automtata to alternating two-way

automata

 Introduce Intensionally-Clique-Guarded Datalog (ICG-

Datalog) parameterized by body-size

 Provenance as a cyclic circuit ! (cycluit)

11

slide-42
SLIDE 42

Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

Provenance cycluits

12

slide-43
SLIDE 43

Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

Provenance cycluits

12

slide-44
SLIDE 44

Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

Provenance cycluits

12

slide-45
SLIDE 45

Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

Provenance cycluits

12

slide-46
SLIDE 46

Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

+ negations (stratified)

Provenance cycluits

12

slide-47
SLIDE 47

O(f(c,k)|P|)

ICG program P

  • f body

size c, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C Probability

13

slide-48
SLIDE 48

O(f(c,k)|P|)

ICG program P

  • f body

size c, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C Probability

13

slide-49
SLIDE 49

O(f(c,k)|P|)

ICG program P

  • f body

size c, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C Probability 2EXPTIME

Upper-bound

13

slide-50
SLIDE 50

Bad news…

 We proved that:

Path queries on tree instances (treewidth = 1) is already #P-hard. (reduction from #MONOTONE-2-SAT)

 Still, we obtain a 2EXPTIME combined complexity

upperbound

14

slide-51
SLIDE 51

Treelike datasets

15

slide-52
SLIDE 52

Treelike datasets

 Transportation networks

15

slide-53
SLIDE 53

Treelike datasets

 Transportation networks  Partial decompositions

15

slide-54
SLIDE 54

Treelike datasets

 Transportation networks  Partial decompositions  Query-specific decompositions

15

slide-55
SLIDE 55

O(f(c,k)|P|)

ICG program P

  • f body

size c, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C

16