open world probabilistic databases
play

Open-World Probabilistic Databases Guy Van den Broeck FLAIRS May - PowerPoint PPT Presentation

Open-World Probabilistic Databases Guy Van den Broeck FLAIRS May 23, 2017 Overview 1. Why probabilistic databases? 2. How probabilistic query evaluation? 3. Why open world? 4. How open-world query evaluation? 5. What is the broader picture? Why


  1. Open-World Probabilistic Databases Guy Van den Broeck FLAIRS May 23, 2017

  2. Overview 1. Why probabilistic databases? 2. How probabilistic query evaluation? 3. Why open world? 4. How open-world query evaluation? 5. What is the broader picture?

  3. Why probabilistic databases?

  4. What we’d like to do…

  5. Google Knowledge Graph > 570 million entities > 18 billion tuples

  6. Probabilistic Databases • Tuple-independent probabilistic database Scientist Coauthor x y P x P Erdos Renyi Erdos 0.9 0.6 Einstein Einstein Pauli 0.7 0.8 Pauli Obama Erdos 0.6 0.1 • Learned from the web, large text corpora, ontologies, etc., using statistical machine learning. [Suciu’11]

  7. Information Extraction is Noisy! Coauthor x y P Luc Laura 0.7 Luc Hendrik 0.6 Luc Kathleen 0.3 Luc Paol 0.3 Luc Paolo 0.1

  8. What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

  9. Einstein is in the Knowledge Graph

  10. Erdős is in the Knowledge Graph

  11. This guy is in the Knowledge Graph … and he published with both Einstein and Erdos!

  12. Desired Query Answer 1. Fuse uncertain information from web Ernst Straus ⇒ Embrace probability! Barack Obama, … 2. Cannot come from labeled data ⇒ Embrace query eval! Justin Bieber , …

  13. [ Chen’16] (NYTimes)

  14. How probabilistic query evaluation?

  15. Tuple-Independent Probabilistic DB Coauthor x y P Probabilistic database D: A B p 1 A C p 2 B C p 3 Possible worlds semantics: x y (1-p 1 )(1-p 2 )(1-p 3 ) x y x y A B x y A C A B x y A C A B x y p 1 p 2 p 3 B C A C A B x y B C B C x y A C (1-p 1 )p 2 p 3 B C

  16. Probabilistic Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P( Q ) = 1- {1- } * p 1 *[ ] 1-(1-q 1 )*(1-q 2 ) {1- } p 2 *[ ] 1-(1-q 3 )*(1-q 4 )*(1-q 5 ) Coauthor x y P A D q 1 Y 1 Scientist x P A E q 2 Y 2 A p 1 X 1 B F q 3 Y 3 B p 2 X 2 B G q 4 Y 4 C p 3 X 3 B H q 5 Y 5

  17. Lifted Inference Rules Preprocess Q (omitted), Then apply rules (some have preconditions) Negation P(¬Q) = 1 – P(Q) P(Q1 ∧ Q2) = P(Q1) P(Q2) Decomposable ∧ , ∨ P(Q1 ∨ Q2) =1 – (1 – P(Q1)) (1 – P(Q2)) P( ∃ z Q) = 1 – Π A ∈ Domain (1 – P(Q[A/z])) Decomposable ∃ , ∀ P( ∀ z Q) = Π A ∈ Domain P(Q[A/z]) P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) Inclusion/ P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2) exclusion [Suciu’11]

  18. Closed-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∀ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) Scientist(B) ∧ ∃ y Coauthor(B,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

  19. Limitations H 0 = ∀ x ∀ y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) P( ∀ z Q) = Π A ∈ Domain P(Q[A/z]) The decomposable ∀ -rule: … does not apply: Dependent H 0 [Alice/x] and H 0 [Bob/x] are dependent: ∀ y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀ y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Lifted inference sometimes fails. Computing P(H 0 ) is #P-hard in size database [Suciu’11]

  20. Are the Lifted Rules Complete? You already know: • Inference rules: PTIME data complexity • Some queries: #P-hard data complexity Dichotomy Theorem for UCQ / Mon. CNF • If lifted rules succeed, then PTIME query • If lifted rules fail, then query is #P-hard Lifted rules are complete for UCQ! [Dalvi and Suciu;JACM’11]

  21. Why open world?

  22. Knowledge Base Completion Coauthor Given: x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … … 0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y). Learn: Complete: x y P Straus Pauli 0.504 … … …

  23. Bayesian Learning Loop Bayesian view on learning: 1. Prior belief: P( Coauthor(Straus,Pauli) ) = 0.01 2. Observe page P( Coauthor(Straus,Pauli| ) = 0.2 3. Observe page P( Coauthor(Straus,Pauli)| , ) = 0.3 Principled and sound reasoning!

  24. Problem: Broken Learning Loop Bayesian view on learning: 1. Prior belief: P( Coauthor(Straus,Pauli) ) = 0 2. Observe page P( Coauthor(Straus,Pauli| ) = 0.2 3. Observe page P( Coauthor(Straus,Pauli)| , ) = 0.3 This is mathematical nonsense! [Ceylan , Darwiche, Van den Broeck; KR’16]

  25. What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Ernst Straus Kristian Kersting , … Justin Bieber , …

  26. Coauthor Open World DB X Y P Einstein Straus 0.7 Erdos Straus 0.6 • What if fact missing? Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 • Probability 0 for: … … … Q1 = ∃ x Coauthor(Einstein, x ) ∧ Coauthor(Erdos, x ) Q2 = ∃ x Coauthor(Bieber, x ) ∧ Coauthor(Erdos, x ) Q3 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) Q4 = Coauthor(Einstein, Bieber ) ∧ Coauthor(Erdos, Bieber ) Q5 = Coauthor(Einstein, Bieber ) ∧ ¬ Coauthor( Einstein , Bieber )

  27. X Y P Einstein Straus 0.7 Intuition Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 Q1 = ∃ x Coauthor(Einstein, x ) ∧ Coauthor(Erdos, x ) … … … Q2 = ∃ x Coauthor(Bieber, x ) ∧ Coauthor(Erdos, x ) Q3 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) Q4 = Coauthor(Einstein, Bieber ) ∧ Coauthor(Erdos, Bieber ) Q5 = Coauthor(Einstein, Bieber ) ∧ ¬ Coauthor( Einstein , Bieber ) We know for sure that P(Q1 ) ≥ P(Q3), P(Q1 ) ≥ P(Q4) and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0. We have strong evidence that P(Q1) ≥ P(Q2). [Ceylan , Darwiche, Van den Broeck; KR’16]

  28. Problem: Curse of Superlinearity • Reality is worse! • Tuples are intentionally missing!

  29. Problem: Curse of Superlinearity Sibling x y P … … … Facebook scale ⇒ 200 Exabytes of data” All Google storage is 2 exabytes … [Ceylan , Darwiche, Van den Broeck; KR’16]

  30. Problem: Model Evaluation Coauthor Given: x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … … 0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y). Learn: OR 0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z). What is the likelihood, precision, accuracy, …? [De Raedt et al; IJCAI’15]

  31. Open-World Prob. Databases Intuition: tuples can be added with P < λ Q2 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) 0.7 * λ ≥ P(Q2) ≥ 0 Coauthor Coauthor X Y P X Y P Einstein Straus 0.7 Einstein Straus 0.7 Einstein Pauli 0.9 Einstein Pauli 0.9 Erdos Renyi 0.7 Erdos Renyi 0.7 Kersting Natarajan 0.8 Kersting Natarajan 0.8 Luc Paol 0.1 Luc Paol 0.1 … … … … … … λ Erdos Straus

  32. How open-world query evaluation?

  33. UCQ / Monotone CNF • Lower bound = closed-world probability • Upper bound = probability after adding all tuples with probability λ • Polynomial time ☺ • Quadratic blow-up  • 200 exabytes … again 

  34. Closed-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∀ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) Scientist(B) ∧ ∃ y Coauthor(B,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

  35. Closed-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) No supporting facts x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) in database! x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Probability 0 in closed world Ignore these queries! Complexity linear time!

  36. Open-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) No supporting facts x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) in database! x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Probability p in closed world Complexity PTIME!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend