Dissociation-based Optimization in Probabilistic Databases
Maarten Van den Heuvel
1, Floris Geerts 1, Martin Theobald 2 1Universiteit Antwerpen, Belgium 2Ulm University, Germany
Dissociation-based Optimization in Probabilistic Databases Maarten - - PowerPoint PPT Presentation
Dissociation-based Optimization in Probabilistic Databases Maarten Van den Heuvel 1 , Floris Geerts 1 , Martin Theobald 2 1 Universiteit Antwerpen, Belgium 2 Ulm University, Germany Contents Introduction Issues with safety
Maarten Van den Heuvel
1, Floris Geerts 1, Martin Theobald 2 1Universiteit Antwerpen, Belgium 2Ulm University, Germanyqueries safe
safe queries
What is the director that is most likely to have directed a movie starring an award winning actor?
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
Top-1 query
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
Top-1 query
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1 Answers Director P George Lucas 0.827 J.J. Abrahms 0.128
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1 Answers Director P George Lucas 0.827 J.J. Abrahms 0.128
Top-1 query:
Not interested in exact P
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
Answers Director P George Lucas 0.827 J.J. Abrahms 0.128
P
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
Answers Director P George Lucas 0.827 J.J. Abrahms 0.128
P
P
Some queries always have a query plan using probabilistic
⋈ π π
PlayedIn WonBy
x y y
Q(X):- PlayedIn(X, Y), WonBy(Y, Z)
project is always with duplicate elimination
PTIME in data size to calculate P(X)
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
Has no query plan using probabilistic operators since they assume independence = unsafe
#P-hard in data size to calculate P(X)
Answers Actor P George Lucas 0.827 J.J. Abrahms 0.128
P
Plow(X) Pup(X) P(X)
Use Qlow for lower bound Use Qup for upper bound
What if we pretend independence? Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Z, U)
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
What if we pretend independence?
PlayedIn Movie Actor P Star Wars Ewan 0.9 Star Wars Samuel 0.7 Star Trek Samuel 0.2 WonBy Movie* Actor Prize P Star Wars Ewan Oscar 0.9 Star Trek Ewan Oscar 0.9 Star Wars Samuel Grammy 0.8 Star Trek Samuel Grammy 0.8
Q’(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Y*, Z, U)
DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
Q(X):- DirectedBy(X, Y), PlayedIn(Y, Z), WonBy(Y*,Z, U)
P P
Q(X):- DirectedBy(X, Y, Z*), PlayedIn(Y, Z), WonBy(Z, U)
Downside to dissociation:
dissociations in query size
different accuracy
differentiation
P
Plow(X) Pup(X) P(X)
Safe queries alone are not efficient enough: Why not approximate these bounds with more bounds? P
Plow(X) Pup(X) P(X)
⋈ π π
PlayedIn WonBy
x y y
PlayedIn(X,Y) Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy(Y,Z) Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 … … …
Q(X):- PlayedIn(X, Y), WonBy(Y, Z)
⋈ π π
PlayedIn WonBy
x y y
WonBy(Y,Z) Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 … … … Answers Actor Pup Ewan McGregor ? Samuel L. Jackson ?
⋈ π π
PlayedIn WonBy
x y y
WonBy(Y,Z) Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 … … Pmax Answers(πy) Actor Pup Ewan McGregor 0.965 Samuel L. Jackson 0.853
Depends on all n tuples WonBy(Ewan, …)
⋈ π π
PlayedIn WonBy
x y y
WonBy(Y,Z) Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 … … Pmax Answers(πy) Actor Pup Ewan McGregor 0.965 Samuel L. Jackson 0.853
Depends on all n tuples WonBy(Ewan, …)
Summary:
⋈ π π
PlayedIn WonBy
x y y
to the query answers
Answers(πy) Actor Pup Plow Ewan McGregor 0.965 0.9 Samuel L. Jackson 0.853 0.8
…
Answers(πx) Movie Pup Plow Star Wars 0.82 0.30 Star Trek 0.64 0.12
⋈ π π
PlayedIn WonBy
x y y
to the query answers
Answers(πy) Actor Pup Plow Ewan McGregor 0.92 0.9 Samuel L. Jackson 0.832 0.8
…
Answers(πx) Movie Pup Plow Star Wars 0.82 0.62 Star Trek 0.58 0.37
⋈ π π
PlayedIn WonBy
x y y
Stop if enough differentiation:
Answers(πy) Actor Pup Plow Ewan McGregor 0.92 0.9 Samuel L. Jackson 0.832 0.8
…
Answers(πx) Movie Pup Plow Star Wars 0.82 0.62 Star Trek 0.58 0.37
PlayedIn Movie Actor P Star Wars Ewan McGregor 0.9 Star Wars Samuel L. Jackson 0.7 Star Trek Samuel L. Jackson 0.2 WonBy Actor Prize P Ewan McGregor Oscar 0.9 Samuel L. Jackson Grammy 0.8 DirectedBy Director Movie P George Lucas Star Wars 0.9 J.J. Abrahms Star Trek 0.8 George Lucas Star Trek 0.1
Choosing a good dissociation is costly but:
independence assumptions
size, detail,…? Thank you for your attention!
(1) Gatterbauer, W., & Suciu, D. (2014). Oblivious bounds on the probability of boolean functions. ACM Transactions on Database Systems (TODS), 39(1), 5. (2) Gatterbauer, Wolfgang, and Dan Suciu. "Approximate lifted inference with probabilistic databases." Proceedings of the VLDB Endowment 8.5 (2015): 629-640. (3) Dylla, M., Miliaraki, I., & Theobald, M. (2013, April). Top-k query processing in probabilistic databases with non-materialized
Conference on (pp. 122-133). IEEE.