probabilistic programs
play

Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, - PowerPoint PPT Presentation

Computer Science Querying Advanced Probabilistic Models: From Relational Embeddings to Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, Feb 7, 2020 The AI Dilemma Pure Learning Pure Logic The AI Dilemma Pure Learning


  1. Computer Science Querying Advanced Probabilistic Models: From Relational Embeddings to Probabilistic Programs Guy Van den Broeck StarAI Workshop @ AAAI, Feb 7, 2020

  2. The AI Dilemma Pure Learning Pure Logic

  3. The AI Dilemma Pure Learning Pure Logic • Slow thinking: deliberative, cognitive, model-based, extrapolation • Amazing achievements until this day • “ Pure logic is brittle ” noise, uncertainty, incomplete knowledge, …

  4. The AI Dilemma Pure Learning Pure Logic • Fast thinking: instinctive, perceptive, model-free, interpolation • Amazing achievements recently • “ Pure learning is brittle ” bias, algorithmic fairness, interpretability, explainability, adversarial attacks, unknown unknowns, calibration, verification, missing features, missing labels, data efficiency, shift in distribution, general robustness and safety fails to incorporate a sensible model of the world

  5. The FALSE AI Dilemma So all hope is lost? Probabilistic World Models • Joint distribution P(X) • Wealth of representations: can be causal, relational, etc. • Knowledge + data • Reasoning + learning

  6. Probabilistic World Models Pure Learning Pure Logic A New Synthesis of Learning and Reasoning Tutorial on Probabilistic Circuits This afternoon: 2pm-6pm Sutton Center, 2nd floor

  7. Probabilistic World Models Pure Learning Pure Logic High-Level Probabilistic Representations 1 Probabilistic Databases Meets Relational Embeddings: Symbolic Querying of Vector Spaces 2 Modular Exact Inference for Discrete Probabilistic Programs

  8. What we’d like to do…

  9. What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

  10. Einstein is in the Knowledge Graph

  11. Erdős is in the Knowledge Graph

  12. This guy is in the Knowledge Graph … and he published with both Einstein and Erdos!

  13. Desired Query Answer 1. Fuse uncertain information from web Ernst Straus ⇒ Embrace probability! Barack Obama, … 2. Cannot come from labeled data ⇒ Embrace query eval! Justin Bieber , …

  14. Cartoon Motivation ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) ? Relational Curate Query Embedding Knowledge in a Vectors Graph DBMS Many exceptions in StarAI and PDB communities, but, we need to embed…

  15. Probabilistic Databases • Probabilistic database Scientist Coauthor x y P x P Erdos Renyi Erdos 0.9 0.6 Einstein Einstein Pauli 0.7 0.8 Pauli Obama Erdos 0.6 0.1 • Learned from the web, large text corpora, ontologies, etc., using statistical machine learning. [VdB&Suciu’17]

  16. Probabilistic Databases Semantics • All possible databases: Ω = *𝜕 1 , … , 𝜕 𝑜 + x y x y x y A B x y A C A B x y x y A C x y B C B C A C A B B C A C • Probabilistic database 𝑄 assigns a probability to each: 𝑄: Ω → ,0,1- • Probabilities sum to 1: 𝑄 𝜕 = 1 𝜕∈Ω [VdB&Suciu’17]

  17. Commercial Break • Survey book http://www.nowpublishers.com/article/Details/DBS-052 • IJCAI 2016 tutorial http://web.cs.ucla.edu/~guyvdb/talks/IJCAI16-tutorial/

  18. How to specify all these numbers? • Only specify marginals: 𝑄 𝐷𝑝𝑏𝑣𝑢ℎ𝑝𝑠 𝐵𝑚𝑗𝑑𝑓, 𝐶𝑝𝑐 = 0.23 Coauthor • Assume tuple-independence x y P A B p 1 A C p 2 x y x y B C p 3 x y A B x y A C x y A B x y A C x y A B p 1 p 2 p 3 B C A C A C A B B C B C B C x y (1-p 1 )p 2 p 3 (1-p 1 )(1-p 2 )(1-p 3 ) [VdB&Suciu’17]

  19. Probabilistic Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P( Q ) = 1- {1- } * p 1 *[ ] 1-(1-q 1 )*(1-q 2 ) {1- } p 2 *[ ] 1-(1-q 3 )*(1-q 4 )*(1-q 5 ) Coauthor x y P A D q 1 Y 1 Scientist x P A E q 2 Y 2 A p 1 X 1 B F q 3 Y 3 B p 2 X 2 B G q 4 Y 4 C p 3 X 3 B H q 5 Y 5

  20. Lifted Inference Rules Preprocess Q (omitted), Then apply rules (some have preconditions) Negation P(¬Q) = 1 – P(Q) P(Q1 ∧ Q2) = P(Q1) P(Q2) Decomposable ∧ , ∨ P(Q1 ∨ Q2) =1 – (1 – P(Q1)) (1 – P(Q2)) P( ∀ z Q ) = Π A ∈ Domain P(Q[A/z]) Decomposable ∃ , ∀ P( ∃ z Q) = 1 – Π A ∈ Domain (1 – P(Q[A/z])) P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) Inclusion/ P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2) exclusion

  21. Example Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∃ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) Scientist(B) ∧ ∃ y Coauthor(B,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

  22. Limitations H 0 = ∀ x ∀ y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) P( ∀ z Q) = Π A ∈ Domain P(Q[A/z]) The decomposable ∀ -rule: … does not apply: Dependent H 0 [Alice/x] and H 0 [Bob/x] are dependent: ∀ y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀ y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Lifted inference sometimes fails.

  23. Are the Lifted Rules Complete? Dichotomy Theorem for Unions of Conjunction Queries / Monotone CNF • If lifted rules succeed, then PTIME query • If lifted rules fail, then query is #P-hard Lifted rules are complete for UCQ! [Dalvi and Suciu;JACM’11]

  24. The Good, Bad, Ugly • We understand querying very well!  – and it is often efficient (a rare property!) – but often also highly intractable  • Tuple-independence is limiting unless reducing from a more expressive model  Can reduce from MLNs but then intractable… • Where do probabilities come from?   An unspecified “statistical model”

  25. Throwing Relational Embedding Models Over the Wall Coauthor • Associate vector with x y S – each relation R A B .6 A C -.1 – each entity A, B, … B C .4 • Score S(head, relation, tail) (based on Euclidian, cosine, …)

  26. Throwing Relational Embedding Models Over the Wall Interpret scores as probabilities High score ~ prob 1 ; Low score ~ prob 0 Coauthor Coauthor x y S x y P A B .6 A B 0.9 A C -.1 A C 0.1 B C .4 B C 0.5

  27. The Good, Bad, Ugly • Where do probabilities come from? We finally know the “statistical model ”!  Both capture marginals: a good match • We still understand querying very well!  but it is often highly intractable  • Tuple-independence is limiting   Relational embedding models do not attempt to capture dependencies in link prediction

  28. A Second Attempt • Let’s simplify drastically! • Assume each relation has the form 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) • That is, there are latent relations – 𝑈 ∗ to decide which relations can be true – 𝐹 to decide which entities participate E T Coauthor x P P x y P ~ , A 0.2 0.2 A B 0.9 B 0.5 A C 0.1 C 0.3 B C 0.5

  29. Can this do link prediction? • Predict Coauthor(Alice,Bob) E Coauthor T x P x y P P ~ , A 0.2 A B ? 0.3 B 0.5 C 0.3 • Rewrite query using 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) • Apply standard lifted inference rules • P(Coauthor(Alice,Bob)) = 0.3 ⋅ 0.2 ⋅ 0.5

  30. The Good, Bad, Ugly • Where do probabilities come from? We finally know the “statistical model ”!  • We still understand querying very well!  By rewriting 𝑆 into 𝐹 and 𝑈 𝑆 , every UCQ query becomes tractable!      • Tuples sharing entities or relation symbols depend one each other • The model is not very expressive 

  31. A Third Attempt • Mixture models of the second attempt 𝑆 𝑦, 𝑧 ⇔ 𝑈 𝑆 ∧ 𝐹(𝑦) ∧ 𝐹(𝑧) Now, there are latent relations 𝑈 𝑆 and 𝐹 for each mixture component • The Good:  – Still a clear statistical model – Every UCQ query is still tractable – Still captures tuple dependencies – Mixture can approximate any distribution

  32. Can this do link prediction? • Predict Coauthor(Alice,Bob) in each mixture component – 𝑄 1 (Coauthor(Alice,Bob)) = 0.3 ⋅ 0.2 ⋅ 0.5 – 𝑄 2 (Coauthor(Alice,Bob)) = 0.9 ⋅ 0.1 ⋅ 0.6 – Etc. • Probability in mixture of d components 𝑄 (Coauthor(Alice,Bob)) = 1 𝑒 0.3 ⋅ 0.2 ⋅ 0.5 + 1 𝑒 0.9 ⋅ 0.1 ⋅ 0.6 + ⋯

  33. How good is this? Does it look familiar? 𝑄 (Coauthor(Alice,Bob)) = 1 𝑒 0.3 ⋅ 0.2 ⋅ 0.5 + 1 𝑒 0.9 ⋅ 0.1 ⋅ 0.6 + ⋯

  34. How good is this? • At link prediction: same as DistMult • At queries on bio dataset [Hamilton] Competitive, while having a consistent underlying distribution Ask Tal at his poster!

  35. How expressive is this? GQE baseline are graph queries translated to linear algebra by Hamilton et al [2018]

  36. First Conclusions • We can give probabilistic database semantics to relational embedding models – Gives more meaningful query results • By doing some solve some annoyances of the theoretical PDB framework – Tuple dependence – Clear connection to learning – While everything stays tractable – And the intractable becomes tractable • Enables much more (train on Q, consistency)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend