Probabilistic Query Evaluation on Bounded- Treewidth Instances - - PowerPoint PPT Presentation

probabilistic query
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Query Evaluation on Bounded- Treewidth Instances - - PowerPoint PPT Presentation

Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikal Monet Supervised by Pierre Senellart Context 2 Boolean queries (yes/no) on relational instances Context 2


slide-1
SLIDE 1

Probabilistic Query Evaluation on Bounded- Treewidth Instances

SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO

Mikaël Monet Supervised by Pierre Senellart

slide-2
SLIDE 2

Context

 Boolean queries (yes/no) on relational instances

2

slide-3
SLIDE 3

Context

 Boolean queries (yes/no) on relational instances  We want the answer to contain more information than

just « yes/no »:

 Add uncertainty  Obtain provenance information

2

slide-4
SLIDE 4

Context

 Boolean queries (yes/no) on relational instances  We want the answer to contain more information than

just « yes/no »:

 Add uncertainty  Obtain provenance information

 We need restrictions for all of this to be tractable

2

slide-5
SLIDE 5

3

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f

slide-6
SLIDE 6

3

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66

slide-7
SLIDE 7

3

A probabilistic database

R a d f e d a b e S d e f c a e c e Q c e f R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66

TID model

slide-8
SLIDE 8

Probability of a possible world

4

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible world I

slide-9
SLIDE 9

Probability of a possible world

4

Probability Pr(I) of this possible world = 0.7*0.13*0.23*0.66

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible world I

slide-10
SLIDE 10

Probability of a possible world

4

Probability Pr(I) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7)

R a d 0.2 f e 0.7 d a 0.13 b e 0.81 S d e 0.005 f c 0.9 a e 0.7 c e 0.23 Q c e f 0.66 R f e d a S c e Q c e f A possible world I

slide-11
SLIDE 11

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)

5

slide-12
SLIDE 12

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀:

P(Q) = 𝐽 ⊆𝖀,𝐽 ⊨Q Pr(𝐽)

5

slide-13
SLIDE 13

Probabilistic query evaluation (PQE)

 Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀:

P(Q) = 𝐽 ⊆𝖀,𝐽 ⊨Q Pr(𝐽)

 Problem: in general #P-hard

5

slide-14
SLIDE 14

3 possible directions

 Approximate  Restrict queries  Restrict instances

6

slide-15
SLIDE 15

1) Approximate probability computation

7

slide-16
SLIDE 16

1) Approximate probability computation

 Monte-Carlo sampling

7

slide-17
SLIDE 17

1) Approximate probability computation

 Monte-Carlo sampling  Inconvenient: running time quadratic in desired

precision

7

slide-18
SLIDE 18

1) Approximate probability computation

 Monte-Carlo sampling  Inconvenient: running time quadratic in desired

precision

⇒ Not adequate for low probabilities.

7

slide-19
SLIDE 19

2) Restricting the class of queries

[Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

8

slide-20
SLIDE 20

2) Restricting the class of queries

[Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

 Either PQE is PTIME on all intances

8

slide-21
SLIDE 21

2) Restricting the class of queries

[Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

 Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances

8

slide-22
SLIDE 22

2) Restricting the class of queries

[Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

 Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃x,y R(x),S(x,y),T(y) is

already #P-hard!

8

slide-23
SLIDE 23

2) Restricting the class of queries

[Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

 Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃x,y R(x),S(x,y),T(y) is

already #P-hard!

 Criterion is too crisp

8

slide-24
SLIDE 24

3) Restricting the shape of the instances

 Bound the treewidth of instances by a constant  Treewidth: measure used to tell how far a graph is

from being a tree

9

slide-25
SLIDE 25

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f

slide-26
SLIDE 26

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f

slide-27
SLIDE 27

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f

slide-28
SLIDE 28

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f

slide-29
SLIDE 29

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f

slide-30
SLIDE 30

Treewidth

10

R a d f e d a b e S d e f c a e c e Q c e f Divide and conquer !

slide-31
SLIDE 31

O(f(q,k)) O(EXP(k).|I|) O(|A|.|T|)

11

slide-32
SLIDE 32

O(f(q,k)) O(EXP(k).|I|)

Instance I

  • f treewidth

k O(|A|.|T|)

11

slide-33
SLIDE 33

O(f(q,k)) O(EXP(k).|I|)

Instance I

  • f treewidth

k

Tree decomposition

T

O(|A|.|T|)

11

slide-34
SLIDE 34

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Tree decomposition

T

O(|A|.|T|)

11

slide-35
SLIDE 35

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

11

slide-36
SLIDE 36

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

11

slide-37
SLIDE 37

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

11

Probability

slide-38
SLIDE 38

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

11

MSO

Probability

slide-39
SLIDE 39

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

11

MSO

Probability

?

slide-40
SLIDE 40

O(f(q,k))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance circuit C

11

MSO

Probability

? ?

slide-41
SLIDE 41

Provenance circuit C of query Q on instance I

12

slide-42
SLIDE 42

Provenance circuit C of query Q on instance I

 Boolean circuit (AND, OR, NOT gates)

12

slide-43
SLIDE 43

Provenance circuit C of query Q on instance I

 Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I

12

slide-44
SLIDE 44

Provenance circuit C of query Q on instance I

 Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I

 For every ν : I → {true, false}

ν(I) ⊨ Q iff ν(C) = 1

12

slide-45
SLIDE 45

Tree automata

 A bottom-up deterministic tree automaton on

{a, b}-trees is a tuple A = (Q, F, 𝛋, 𝛆) where :

 Q : finite set of states  F ⊆ Q : accepting states  𝛋 : {a, b} → Q , determining state for the leaves  𝛆 : {a, b} X Q² → Q , determining the state for internal

nodes

13

slide-46
SLIDE 46

Run of an automaton on a tree

 Q = {O, O, O}

14

slide-47
SLIDE 47

Run of an automaton on a tree

 Q = {O, O, O}  F = {O}

14

slide-48
SLIDE 48

Run of an automaton on a tree

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

slide-49
SLIDE 49

Run of an automaton on a tree

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-50
SLIDE 50

Run of an automaton on a tree

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-51
SLIDE 51

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

Initialization of the leaves

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-52
SLIDE 52

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

Initialization of the leaves

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-53
SLIDE 53

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

Internal nodes

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-54
SLIDE 54

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

Internal nodes

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-55
SLIDE 55

And so on…

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-56
SLIDE 56

This tree is in the language of A

 Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

14

lab q1 q2

  • ut

a O O O a O ? O a ? O O a O ? O a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

𝛆=

slide-57
SLIDE 57

Major drawbacks

 In general, computing the automaton has non-

elementary complexity in the query

15

slide-58
SLIDE 58

Major drawbacks

 In general, computing the automaton has non-

elementary complexity in the query

 Exponential dependence in the instance treewidth

15

slide-59
SLIDE 59

Major drawbacks

 In general, computing the automaton has non-

elementary complexity in the query

 Exponential dependence in the instance treewidth  Natural question: restrict queries to obtain tractable

combined complexity of PQE on bounded treewidth instances?

15

slide-60
SLIDE 60

Bad news…

 We proved that:

Path queries on tree instances (treewidth = 1) is already #P-hard. (reduction from #MONOTONE-2-SAT)

16

slide-61
SLIDE 61

Bad news…

 We proved that:

Path queries on tree instances (treewidth = 1) is already #P-hard. (reduction from #MONOTONE-2-SAT)

 What to do now?

16

slide-62
SLIDE 62

Lower ambitions

 Restrict queries to obtain tractable combined

complexity of probabilistic query evaluation on bounded treewidth instances?

17

slide-63
SLIDE 63

Lower ambitions

 Restrict queries to obtain tractable combined

complexity of probabilistic query evaluation on bounded treewidth instances?

 We now aim at a tractable combined complexity for

deterministic query evaluation: which queries, which automata, which provenance representation?

17

slide-64
SLIDE 64

Two-way Alternating Tree Automata

 Can navigate the tree in every direction, can launch

simultaneous runs

18

slide-65
SLIDE 65

Two-way Alternating Tree Automata

 Can navigate the tree in every direction, can launch

simultaneous runs

18

𝛆(a,q) = (q,l) ∨ [ ∧ ]

slide-66
SLIDE 66

Two-way Alternating Tree Automata

 Can navigate the tree in every direction, can launch

simultaneous runs

18

𝛆(a,q) = (q,l) ∨ [ (q,p) ∧ ]

slide-67
SLIDE 67

Two-way Alternating Tree Automata

 Can navigate the tree in every direction, can launch

simultaneous runs

18

𝛆(a,q) = (q,l) ∨ [ (q,p) ∧ (q’,r) ]

slide-68
SLIDE 68

Two-way Alternating Tree Automata

 Can navigate the tree in every direction, can launch

simultaneous runs

 Intuition: less things to remember, more parallelizable

18

𝛆(a,q) = (q,l) ∨ [ (q,p) ∧ (q’,r) ]

slide-69
SLIDE 69

Cycluits

 Boolean circuits with cycles

19

slide-70
SLIDE 70

Cycluits

 Boolean circuits with cycles  Least fixed-point semantics

19

slide-71
SLIDE 71

Cycluits

 Boolean circuits with cycles  Least fixed-point semantics  Beware of negations!

19

slide-72
SLIDE 72

Cycluits

 Boolean circuits with cycles  Least fixed-point semantics  Beware of negations!  Linear time evaluation

19

slide-73
SLIDE 73

Cycluits

 Boolean circuits with cycles  Least fixed-point semantics  Beware of negations!  Linear time evaluation  Can be acyclified in quadratic time

19

slide-74
SLIDE 74

Cycluits

 Boolean circuits with cycles  Least fixed-point semantics  Beware of negations!  Linear time evaluation  Can be acyclified in quadratic time  Are they more concise?

19

slide-75
SLIDE 75

O(EXP(k).|P(|q|))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C

20

Probability

slide-76
SLIDE 76

O(EXP(k).P(|q|))

Query q, int k

O(EXP(k).|I|)

Instance I

  • f treewidth

k

2-way Automaton

A

Tree decomposition

T

O(|A|.|T|)

Provenance CYCLUIT C

20

Probability

Thanks for your attention!