Probabilistic Knowledge Bases Guy Van den Broeck First Conference - PowerPoint PPT Presentation

Towards Querying Probabilistic Knowledge Bases Guy Van den Broeck First Conference on Automated Knowledge Base Construction May 20, 2019

Cartoon Motivation

Cartoon Motivation: Relational Embedding Models ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Cartoon Motivation 2: Relational Embedding Models ?

Goal 1: Probabilistic Query Evaluation ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) ?

Goal 2: Querying Relational Embedding Models ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) ?

Probabilistic Query Evaluation

What we’d like to do…

What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Einstein is in the Knowledge Graph

Erdős is in the Knowledge Graph

This guy is in the Knowledge Graph … and he published with both Einstein and Erdos!

Desired Query Answer 1. Fuse uncertain information from web Ernst Straus ⇒ Embrace probability! Barack Obama, … 2. Cannot come from labeled data ⇒ Embrace query eval! Justin Bieber , …

Probabilistic Databases • Tuple-independent probabilistic database Scientist Coauthor x y P x P Erdos Renyi Erdos 0.9 0.6 Einstein Einstein Pauli 0.7 0.8 Pauli Obama Erdos 0.6 0.1 • Learned from the web, large text corpora, ontologies, etc., using statistical machine learning. [VdB&Suciu’17]

Tuple-Independent Probabilistic DB Coauthor x y P Probabilistic database D: A B p 1 A C p 2 B C p 3 Possible worlds semantics: x y (1-p 1 )(1-p 2 )(1-p 3 ) x y x y A B x y A C A B x y A C A B x y p 1 p 2 p 3 B C A C A B x y B C B C x y A C (1-p 1 )p 2 p 3 B C [VdB&Suciu’17]

Probabilistic Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P( Q ) = 1- {1- } * p 1 *[ ] 1-(1-q 1 )*(1-q 2 ) {1- } p 2 *[ ] 1-(1-q 3 )*(1-q 4 )*(1-q 5 ) Coauthor x y P A D q 1 Y 1 Scientist x P A E q 2 Y 2 A p 1 X 1 B F q 3 Y 3 B p 2 X 2 B G q 4 Y 4 C p 3 X 3 B H q 5 Y 5

Lifted Inference Rules Preprocess Q (omitted), Then apply rules (some have preconditions) Negation P(¬Q) = 1 – P(Q) P(Q1 ∧ Q2) = P(Q1) P(Q2) Decomposable ∧ , ∨ P(Q1 ∨ Q2) =1 – (1 – P(Q1)) (1 – P(Q2)) P( ∀ z Q ) = Π A ∈ Domain P(Q[A/z]) Decomposable ∃ , ∀ P( ∃ z Q) = 1 – Π A ∈ Domain (1 – P(Q[A/z])) P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) Inclusion/ P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2) exclusion

Example Query Evaluation Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∃ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) Scientist(B) ∧ ∃ y Coauthor(B,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

Limitations H 0 = ∀ x ∀ y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) P( ∀ z Q) = Π A ∈ Domain P(Q[A/z]) The decomposable ∀ -rule: … does not apply: Dependent H 0 [Alice/x] and H 0 [Bob/x] are dependent: ∀ y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀ y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Lifted inference sometimes fails.

Are the Lifted Rules Complete? Dichotomy Theorem for Unions of Conjunction Queries / Monotone CNF • If lifted rules succeed, then PTIME query • If lifted rules fail, then query is #P-hard Lifted rules are complete for UCQ! [Dalvi and Suciu;JACM’11]

Commercial Break • Survey book http://www.nowpublishers.com/article/Details/DBS-052 • IJCAI 2016 tutorial http://web.cs.ucla.edu/~guyvdb/talks/IJCAI16-tutorial/

Throwing Relational Embedding Models Over the Wall • Notion of distance 𝑒 ℎ, 𝑠, 𝑢 in vector space (Euclidian, 1- cosine, …) • Probabilistic semantics: – Distance 𝑒 ℎ, 𝑠, 𝑢 = 0 is certainty – Distance 𝑒 ℎ, 𝑠, 𝑢 > 0 is uncertainty P r(h,t) ≈ e −𝛽⋅𝑒 ℎ,𝑠,𝑢

What About Tuple-Independence? • Deterministic databases = tuple-independent • Relational embedding models = tuple-independent At no point do we model joint uncertainty between tuples • We can capture correlations, but query evaluation becomes much harder! – See probabilistic database literature – See statistical relational learning literature

So everything is solved?

What we’d like to do… ∃ x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Ernst Straus Kristian Kersting , … Justin Bieber , …

Coauthor Open World DB X Y P Einstein Straus 0.7 Erdos Straus 0.6 • What if fact missing? Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 • Probability 0 for: … … … Q1 = ∃ x Coauthor(Einstein, x ) ∧ Coauthor(Erdos, x ) Q2 = ∃ x Coauthor(Bieber, x ) ∧ Coauthor(Erdos, x ) Q3 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) Q4 = Coauthor(Einstein, Bieber ) ∧ Coauthor(Erdos, Bieber ) Q5 = Coauthor(Einstein, Bieber ) ∧ ¬ Coauthor( Einstein , Bieber )

X Y P Einstein Straus 0.7 Intuition Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 Q1 = ∃ x Coauthor(Einstein, x ) ∧ Coauthor(Erdos, x ) … … … Q2 = ∃ x Coauthor(Bieber, x ) ∧ Coauthor(Erdos, x ) Q3 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) Q4 = Coauthor(Einstein, Bieber ) ∧ Coauthor(Erdos, Bieber ) Q5 = Coauthor(Einstein, Bieber ) ∧ ¬ Coauthor( Einstein , Bieber ) We know for sure that P(Q1 ) ≥ P(Q3), P(Q1 ) ≥ P(Q4) and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0. We have strong evidence that P(Q1) ≥ P(Q2). [Ceylan , Darwiche, Van den Broeck; KR’16]

Problem: Curse of Superlinearity Reality is worse: tuples intentionally missing! Sibling x y P … … … ⇒ 200 Exabytes of data Facebook scale All Google storage is 2 exabytes … [Ceylan , Darwiche, Van den Broeck; KR’16]

Bayesian Learning Loop Bayesian view on learning: 1. Prior belief: P( Coauthor(Straus,Pauli) ) = 0.01 2. Observe page P( Coauthor(Straus,Pauli| ) = 0.2 3. Observe page P( Coauthor(Straus,Pauli)| , ) = 0.3 Principled and sound reasoning!

Problem: Broken Learning Loop Bayesian view on learning: 1. Prior belief: P( Coauthor(Straus,Pauli) ) = 0 2. Observe page P( Coauthor(Straus,Pauli| ) = 0.2 3. Observe page P( Coauthor(Straus,Pauli)| , ) = 0.3 This is mathematical nonsense! [Ceylan , Darwiche, Van den Broeck; KR’16]

Problem: Model Evaluation Coauthor Given: x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … … 0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y). Learn: OR 0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z). What is the likelihood, precision, accuracy, …? [De Raedt et al; IJCAI’15]

Open-World Prob. Databases Intuition: tuples can be added with P < λ Q2 = Coauthor(Einstein, Straus ) ∧ Coauthor(Erdos, Straus ) 0.7 * λ ≥ P(Q2) ≥ 0 Coauthor Coauthor X Y P X Y P Einstein Straus 0.7 Einstein Straus 0.7 Einstein Pauli 0.9 Einstein Pauli 0.9 Erdos Renyi 0.7 Erdos Renyi 0.7 Kersting Natarajan 0.8 Kersting Natarajan 0.8 Luc Paol 0.1 Luc Paol 0.1 … … … … … … λ Erdos Straus

Open-world query evaluation

UCQ / Monotone CNF • Lower bound = closed-world probability • Upper bound = probability after adding all tuples with probability λ • Polynomial time ☺ • Quadratic blow-up  • 200 exabytes … again 

Closed-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) Decomposable ∃ -Rule P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Check independence: Scientist(A) ∧ ∃ y Coauthor(A,y) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) Scientist(B) ∧ ∃ y Coauthor(B,y) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Complexity PTIME

Closed-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) No supporting facts x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) in database! x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Probability 0 in closed world Ignore these sub-queries! Complexity linear time!

Open-World Lifted Query Eval Q = ∃ x ∃ y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - Π A ∈ Domain (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) = 1 - (1 - P(Scientist(A) ∧ ∃ y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃ y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃ y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃ y Coauthor(D,y)) No supporting facts x (1 - P(Scientist(E) ∧ ∃ y Coauthor(E,y)) in database! x (1 - P(Scientist(F) ∧ ∃ y Coauthor(F,y)) … Probability λ in open world Complexity PTIME!

Probabilistic Knowledge Bases Guy Van den Broeck First Conference - PowerPoint PPT Presentation

Towards Querying Probabilistic Knowledge Bases Guy Van den Broeck First Conference on Automated Knowledge Base Construction May 20, 2019 Cartoon Motivation Cartoon Motivation: Relational Embedding Models x Coauthor(Einstein,x)

Chemistry 2000 Slide Set 20: Organic bases Marc R. Roussel March 26, 2020 Chemistry 2000 Slide

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

Knowledge-Based Reasoning in Computer Vision CSC 2539 Paul Vicol Outline Knowledge Bases

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

ProbKB D ata S cience R esearch Knowledge Expansion over Probabilistic Knowledge Bases Jun 25,

Acids and Bases Slide 3 / 208 Table of Contents: Acids and Bases Click on the topic to go to

Acids and Bases Slide 3 / 208 Slide 4 / 208 Table of Contents: Acids and Bases Click on the

Acids and Bases List as many things that you can about acids or bases in 15 seconds. Share

Acids and Bases Slide 3 / 208 Table of Contents: Acids and Bases Click on the topic to go to

Thinking Like a Chemist About Acids and Bases UNIT 6 DAY 5 What are we going to learn today?

G -bases in free objects of Topological Algebra (Local) -bases in topological and uniform

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2017/

On the integration of On the integration of biomedical knowledge bases: biomedical knowledge

Session #5: Learning With Errors Chris Peikert Georgia Institute of Technology Winter School on

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing and Mathematical Sciences,

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Hardness and advantages of Module-SIS and Module-LWE Adeline Roux-Langlois EMSEC: Univ Rennes,

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

MACHINE LEARNING Liviu Ciortuz Department of CS, University of Ia si, Rom ania 1. What is

Attacks on Ring Learning with Errors Kristin E. Lauter ** joint work with Yara Elias, Ekin

Defining a Learning Problem } Suppose we have three basic components: Set of tasks, T 1. A