Probabilistic Databases Guy Van den Broeck Scalable Uncertainty - - PowerPoint PPT Presentation

probabilistic databases
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Databases Guy Van den Broeck Scalable Uncertainty - - PowerPoint PPT Presentation

Open-World Probabilistic Databases Guy Van den Broeck Scalable Uncertainty Management (SUM) Sep 21, 2016 Overview 1. Why probabilistic databases? 2. How probabilistic query evaluation? 3. Why open world? 4. How open-world query evaluation? 5.


slide-1
SLIDE 1

Open-World Probabilistic Databases

Guy Van den Broeck

Scalable Uncertainty Management (SUM) Sep 21, 2016

slide-2
SLIDE 2

Overview

  • 1. Why probabilistic databases?
  • 2. How probabilistic query evaluation?
  • 3. Why open world?
  • 4. How open-world query evaluation?
  • 5. What is the broader picture?
slide-3
SLIDE 3

Why probabilistic databases?

slide-4
SLIDE 4

What we’d like to do…

slide-5
SLIDE 5

What we’d like to do…

slide-6
SLIDE 6

Google Knowledge Graph

> 570 million entities > 18 billion tuples

slide-7
SLIDE 7
  • Tuple-independent probabilistic database
  • Learned from the web, large text corpora, ontologies,

etc., using statistical machine learning.

Coauthor

Probabilistic Databases

x y P

Erdos Renyi 0.6 Einstein Pauli 0.7 Obama Erdos 0.1

Scientist x P

Erdos 0.9 Einstein 0.8 Pauli 0.6

[Suciu’11]

slide-8
SLIDE 8

Information Extraction

x y P Luc Laura 0.7 Luc Hendrik 0.6 Luc Kathleen 0.3 Luc Paol 0.3 Luc Paolo 0.1

Coauthor

slide-9
SLIDE 9

Information Extraction

x y P Luc Laura 0.7 Luc Hendrik 0.6 Luc Kathleen 0.3 Luc Paol 0.3 Luc Paolo 0.1

Coauthor

slide-10
SLIDE 10

Noisy!

slide-11
SLIDE 11

Noisy!

slide-12
SLIDE 12
  • Relational data is increasingly probabilistic

– NELL machine reading (>50M tuples) – Google Knowledge Vault (>2BN tuples) – DeepDive (>7M tuples)

  • Next step: Probabilistic Query Evaluation

SQL

  • r First-order logic

Q(x) = ∃y Scientist(x)∧Coauthor(x,y) SELECT Scientist.X FROM Scientist, Coauthor WHERE Scientist.X = Coauthor.Y

[Carlson’10, Dong’14, Niu’12+

Probabilistic Databases

slide-13
SLIDE 13

What we’d like to do…

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

slide-14
SLIDE 14

Einstein is in the Knowledge Graph

slide-15
SLIDE 15

Erdős is in the Knowledge Graph

slide-16
SLIDE 16

This guy is in the Knowledge Graph

slide-17
SLIDE 17

This guy is in the Knowledge Graph

… and he published with both Einstein and Erdos!

slide-18
SLIDE 18

Desired Query Answer

Ernst Straus Barack Obama, … Justin Bieber, …

slide-19
SLIDE 19

Desired Query Answer

Ernst Straus Barack Obama, … Justin Bieber, …

  • 1. Fuse uncertain

information from web ⇒ Embrace probability!

  • 2. Cannot come from

labeled data ⇒ Embrace query eval!

slide-20
SLIDE 20

[Chen’16+ (NYTimes)

slide-21
SLIDE 21

How probabilistic query evaluation?

slide-22
SLIDE 22

Tuple-Independent Probabilistic DB

x y P A B p1 A C p2 B C p3

Probabilistic database D:

Coauthor

slide-23
SLIDE 23

x y A B A C B C

Tuple-Independent Probabilistic DB

x y P A B p1 A C p2 B C p3

Possible worlds semantics: p1p2p3 Probabilistic database D:

Coauthor

slide-24
SLIDE 24

x y A B A C B C

Tuple-Independent Probabilistic DB

x y P A B p1 A C p2 B C p3

Possible worlds semantics: p1p2p3 (1-p1)p2p3 Probabilistic database D:

x y A C B C Coauthor

slide-25
SLIDE 25

x y A B A C B C

Tuple-Independent Probabilistic DB

x y P A B p1 A C p2 B C p3

Possible worlds semantics: p1p2p3 (1-p1)p2p3 (1-p1)(1-p2)(1-p3) Probabilistic database D:

x y A C B C x y A B A C x y A B B C x y A B x y A C x y B C x y Coauthor

slide-26
SLIDE 26

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) =

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-27
SLIDE 27

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) = 1-(1-q1)*(1-q2)

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-28
SLIDE 28

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-29
SLIDE 29

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5)

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-30
SLIDE 30

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-31
SLIDE 31

x y P A D q1 Y1 A E q2 Y2 B F q3 Y3 B G q4 Y4 B H q5 Y5 x P A p1 X1 B p2 X2 C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ] 1- {1- } * {1- }

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scientist Coauthor

slide-32
SLIDE 32

Lifted Inference Rules

Preprocess Q (omitted), Then apply rules (some have preconditions)

*Suciu’11+

slide-33
SLIDE 33

Lifted Inference Rules

Preprocess Q (omitted), Then apply rules (some have preconditions) P(¬Q) = 1 – P(Q) Negation

*Suciu’11+

slide-34
SLIDE 34

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2)) Preprocess Q (omitted), Then apply rules (some have preconditions) Decomposable ∧,∨ P(¬Q) = 1 – P(Q) Negation

*Suciu’11+

slide-35
SLIDE 35

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2)) P(∃z Q) = 1 – ΠA ∈Domain (1 – P(Q[A/z])) P(∀z Q) = ΠA ∈Domain P(Q[A/z]) Preprocess Q (omitted), Then apply rules (some have preconditions) Decomposable ∧,∨ Decomposable ∃,∀ P(¬Q) = 1 – P(Q) Negation

*Suciu’11+

slide-36
SLIDE 36

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2)) P(∃z Q) = 1 – ΠA ∈Domain (1 – P(Q[A/z])) P(∀z Q) = ΠA ∈Domain P(Q[A/z]) P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2) Preprocess Q (omitted), Then apply rules (some have preconditions) Decomposable ∧,∨ Decomposable ∃,∀ Inclusion/ exclusion P(¬Q) = 1 – P(Q) Negation

*Suciu’11+

slide-37
SLIDE 37

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) The decomposable ∀-rule: P(∀z Q) = ΠA ∈Domain P(Q[A/z])

[Suciu’11]

slide-38
SLIDE 38

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) The decomposable ∀-rule: … does not apply:

H0[Alice/x] and H0[Bob/x] are dependent: ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent

P(∀z Q) = ΠA ∈Domain P(Q[A/z])

[Suciu’11]

slide-39
SLIDE 39

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y) The decomposable ∀-rule: … does not apply:

H0[Alice/x] and H0[Bob/x] are dependent: ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y)) ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent

Lifted inference sometimes fails. Computing P(H0) is #P-hard in size database P(∀z Q) = ΠA ∈Domain P(Q[A/z])

[Suciu’11]

slide-40
SLIDE 40

Are the Lifted Rules Complete?

You already know:

  • Inference rules: PTIME data complexity
  • Some queries: #P-hard data complexity

[Dalvi and Suciu;JACM’11]

slide-41
SLIDE 41

Are the Lifted Rules Complete?

You already know:

  • Inference rules: PTIME data complexity
  • Some queries: #P-hard data complexity

Dichotomy Theorem for UCQ / Mon. CNF

  • If lifted rules succeed, then PTIME query
  • If lifted rules fail, then query is #P-hard

[Dalvi and Suciu;JACM’11]

slide-42
SLIDE 42

Are the Lifted Rules Complete?

You already know:

  • Inference rules: PTIME data complexity
  • Some queries: #P-hard data complexity

Dichotomy Theorem for UCQ / Mon. CNF

  • If lifted rules succeed, then PTIME query
  • If lifted rules fail, then query is #P-hard

Lifted rules are complete for UCQ!

[Dalvi and Suciu;JACM’11]

slide-43
SLIDE 43

Why open world?

slide-44
SLIDE 44

Knowledge Base Completion

Given: Learn: Complete:

0.8::Coauthor(x,y) :- Coauthor(x,z) ∧ Coauthor(z,y).

x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … … x y P Straus Pauli 0.504 … … …

Coauthor

slide-45
SLIDE 45

Bayesian Learning Loop

Bayesian view on learning:

  • 1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0.01

  • 2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

  • 3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

Principled and sound reasoning!

slide-46
SLIDE 46

Problem: Broken Learning Loop

Bayesian view on learning:

  • 1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

  • 2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

  • 3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-47
SLIDE 47

Problem: Broken Learning Loop

Bayesian view on learning:

  • 1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

  • 2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

  • 3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-48
SLIDE 48

Problem: Broken Learning Loop

Bayesian view on learning:

  • 1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

  • 2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

  • 3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

This is mathematical nonsense!

slide-49
SLIDE 49

What we’d like to do…

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Ernst Straus Kristian Kersting, … Justin Bieber, …

slide-50
SLIDE 50

Open World DB

  • What if fact missing?
  • Probability 0 for:

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

slide-51
SLIDE 51

Open World DB

  • What if fact missing?
  • Probability 0 for:

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

slide-52
SLIDE 52

Open World DB

  • What if fact missing?
  • Probability 0 for:

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

slide-53
SLIDE 53

Open World DB

  • What if fact missing?
  • Probability 0 for:

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

slide-54
SLIDE 54

Open World DB

  • What if fact missing?
  • Probability 0 for:

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber) Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

slide-55
SLIDE 55

Intuition

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-56
SLIDE 56

Intuition

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4) Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-57
SLIDE 57

Intuition

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4) and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber) Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-58
SLIDE 58

Intuition

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4) and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0. Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber) Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-59
SLIDE 59

Intuition

X Y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4) and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0. We have strong evidence that P(Q1) ≥ P(Q2). Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x) Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x) Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus) Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber) Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-60
SLIDE 60

Problem: Curse of Superlinearity

  • Reality is worse!
  • Tuples are

intentionally missing!

  • Every tuple

has 99% probability

slide-61
SLIDE 61

“This is all true, Guy, but it’s just a temporary issue.” “No it’s not!

  • A single table (Sibling)
  • Facebook scale (billions of people)
  • Real (non-zero) Bayesian beliefs

x y P … … …

Sibling

⇒ 200 Exabytes of data”

Problem: Curse of Superlinearity

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-62
SLIDE 62

Problem: Curse of Superlinearity

All Google storage is a couple exabytes…

slide-63
SLIDE 63

Problem: Curse of Superlinearity

We should be here!

slide-64
SLIDE 64

Problem: Evaluation

Given: Learn:

0.8::Coauthor(x,y) :- Coauthor(x,z) ∧ Coauthor(z,y).

x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … …

Coauthor

0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z).

OR

[De Raedt et al; IJCAI’15]

slide-65
SLIDE 65

Problem: Evaluation

Given: Learn:

0.8::Coauthor(x,y) :- Coauthor(x,z) ∧ Coauthor(z,y).

x y P Einstein Straus 0.7 Erdos Straus 0.6 Einstein Pauli 0.9 … … …

Coauthor

0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z).

OR

What is the likelihood, precision, accuracy, …?

[De Raedt et al; IJCAI’15]

slide-66
SLIDE 66

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P Einstein Straus 0.7 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor

P(Q2) ≥ 0

slide-67
SLIDE 67

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P Einstein Straus 0.7 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor

X Y P Einstein Straus 0.7 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … … Erdos Straus λ

Coauthor

P(Q2) ≥ 0

slide-68
SLIDE 68

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P Einstein Straus 0.7 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … …

Coauthor

X Y P Einstein Straus 0.7 Einstein Pauli 0.9 Erdos Renyi 0.7 Kersting Natarajan 0.8 Luc Paol 0.1 … … … Erdos Straus λ

Coauthor

0.7 * λ ≥ P(Q2) ≥ 0

slide-69
SLIDE 69

Closed-World Prob. Databases

slide-70
SLIDE 70

Open-World Prob. Databases

[Ceylan, Darwiche, Van den Broeck; KR’16]

slide-71
SLIDE 71

How open-world query evaluation?

slide-72
SLIDE 72

UCQ / Monotone CNF

  • Lower bound = closed-world probability
  • Upper bound = probability after adding all

tuples with probability λ

slide-73
SLIDE 73

UCQ / Monotone CNF

  • Lower bound = closed-world probability
  • Upper bound = probability after adding all

tuples with probability λ

  • Polynomial time☺
slide-74
SLIDE 74

UCQ / Monotone CNF

  • Lower bound = closed-world probability
  • Upper bound = probability after adding all

tuples with probability λ

  • Polynomial time☺
  • Quadratic blow-up 
  • 200 exabytes … again 
slide-75
SLIDE 75

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

slide-76
SLIDE 76

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∀-Rule

slide-77
SLIDE 77

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∀-Rule

Check independence: Scientist(A) ∧ ∃y Coauthor(A,y) Scientist(B) ∧ ∃y Coauthor(B,y)

slide-78
SLIDE 78

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∀-Rule

Check independence: Scientist(A) ∧ ∃y Coauthor(A,y) Scientist(B) ∧ ∃y Coauthor(B,y)

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-79
SLIDE 79

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∀-Rule

Check independence: Scientist(A) ∧ ∃y Coauthor(A,y) Scientist(B) ∧ ∃y Coauthor(B,y)

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

Complexity PTIME

slide-80
SLIDE 80

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-81
SLIDE 81

Closed-World Lifted Query Eval

No supporting facts in database!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-82
SLIDE 82

Closed-World Lifted Query Eval

No supporting facts in database! Probability 0 in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-83
SLIDE 83

Closed-World Lifted Query Eval

No supporting facts in database! Probability 0 in closed world Ignore these queries!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-84
SLIDE 84

Closed-World Lifted Query Eval

No supporting facts in database!

Complexity linear time!

Probability 0 in closed world Ignore these queries!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-85
SLIDE 85

Open-World Lifted Query Eval

No supporting facts in database!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-86
SLIDE 86

Open-World Lifted Query Eval

No supporting facts in database! Probability p in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-87
SLIDE 87

Open-World Lifted Query Eval

No supporting facts in database!

Complexity PTIME!

Probability p in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-88
SLIDE 88

Open-World Lifted Query Eval

No supporting facts in database! Probability p in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-89
SLIDE 89

Open-World Lifted Query Eval

No supporting facts in database! Probability p in closed world All together, probability (1-p)k Do symmetric lifted inference

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-90
SLIDE 90

Open-World Lifted Query Eval

No supporting facts in database! Probability p in closed world All together, probability (1-p)k Do symmetric lifted inference

Complexity linear time! Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-91
SLIDE 91

[Ceylan’16]

Complexity Results

slide-92
SLIDE 92

What is the broader picture?

slide-93
SLIDE 93

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts?

[Van den Broeck; AAAI-KRR’15]

slide-94
SLIDE 94

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts? 1/4

[Van den Broeck; AAAI-KRR’15]

slide-95
SLIDE 95

A Simple Reasoning Problem

... ?

Probability that Card52 is Spades given that Card1 is QH?

[Van den Broeck; AAAI-KRR’15]

slide-96
SLIDE 96

A Simple Reasoning Problem

... ?

Probability that Card52 is Spades given that Card1 is QH? 13/51

[Van den Broeck; AAAI-KRR’15]

slide-97
SLIDE 97

Let us automate this:

  • 1. Probabilistic graphical model (e.g., factor graph)
  • 2. Probabilistic inference algorithm

(e.g., variable elimination or junction tree)

Automated Reasoning

[Van den Broeck; AAAI-KRR’15+

slide-98
SLIDE 98

Classical Reasoning

A B C D E F A B C D E F A B C D E F Tree Sparse Graph Dense Graph

  • Higher treewidth
  • Fewer conditional independencies
  • Slower inference
slide-99
SLIDE 99

Let us automate this:

  • 1. Probabilistic graphical model (e.g., factor graph)

is fully connected!

  • 2. Probabilistic inference algorithm

(e.g., variable elimination or junction tree) builds a table with 5252 rows

Automated Reasoning

(artist's impression)

[Van den Broeck; AAAI-KRR’15+

slide-100
SLIDE 100

Lifted Inference in SRL

 Statistical relational model (e.g., MLN)  As a probabilistic graphical model:

 26 pages; 728 variables;

676 factors

 1000 pages; 1,002,000 variables;

1,000,000 factors

 Highly intractable?

– Lifted inference in milliseconds!

3.14 FacultyPage(x) ∧ Linked(x,y) ⇒ CoursePage(y)

slide-101
SLIDE 101

...

Tractable Reasoning

What's going on here? Which property makes reasoning tractable?

[Niepert and Van den Broeck, AAAI’14], [Van den Broeck, AAAI-KRR’15]

slide-102
SLIDE 102

...

Tractable Reasoning

What's going on here? Which property makes reasoning tractable?

⇒ Lifted Inference

 High-level (first-order) reasoning  Symmetry  Exchangeability

[Niepert and Van den Broeck, AAAI’14], [Van den Broeck, AAAI-KRR’15]

slide-103
SLIDE 103

Model Counting

  • Model = solution to a propositional logic formula Δ
  • Model counting = #SAT

Rain Cloudy Model? T T Yes T F No F T Yes F F Yes #SAT = 3

+

Δ = (Rain ⇒ Cloudy)

slide-104
SLIDE 104

First-Order Model Counting

Model = solution to first-order logic formula Δ

Δ = ∀d (Rain(d) ⇒ Cloudy(d)) Days = {Monday}

slide-105
SLIDE 105

First-Order Model Counting

Model = solution to first-order logic formula Δ

Δ = ∀d (Rain(d) ⇒ Cloudy(d)) Days = {Monday}

Rain(M) Cloudy(M) Model? T T Yes T F No F T Yes F F Yes

FOMC = 3

+

slide-106
SLIDE 106

First-Order Model Counting

Model = solution to first-order logic formula Δ

Δ = ∀d (Rain(d) ⇒ Cloudy(d)) Days = {Monday Tuesday}

slide-107
SLIDE 107

First-Order Model Counting

Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes T F T T No F T T T Yes F F T T Yes T T T F No T F T F No F T T F No F F T F No T T F T Yes T F F T No F T F T Yes F F F T Yes T T F F Yes T F F F No F T F F Yes F F F F Yes

#SAT = 9

+

Δ = ∀d (Rain(d) ⇒ Cloudy(d)) Days = {Monday Tuesday}

slide-108
SLIDE 108

FOMC Inference

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-109
SLIDE 109

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-110
SLIDE 110

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-111
SLIDE 111

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-112
SLIDE 112

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-113
SLIDE 113

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-114
SLIDE 114

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-115
SLIDE 115

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-116
SLIDE 116

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-117
SLIDE 117

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-118
SLIDE 118

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-119
SLIDE 119

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

 If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-120
SLIDE 120

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

 If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-121
SLIDE 121

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

 If we know that there are k smokers?  In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-122
SLIDE 122

FOMC Inference

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

 If we know that there are k smokers?  In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models → models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-123
SLIDE 123

Let us automate this:

 Relational model  Lifted probabilistic inference algorithm

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c) ∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

...

[Van den Broeck; AAAI-KRR’15]

slide-124
SLIDE 124

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c) ∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR’15]

slide-125
SLIDE 125

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c) ∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR’15]

slide-126
SLIDE 126

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c) ∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

Computed in time polynomial in n

[Van den Broeck.; AAAI-KR’15]

slide-127
SLIDE 127

Open-World Lifted Query Eval

All together, probability (1-p)k

Q = ∃x ∃y Smoker(x) ∧ Friend(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-128
SLIDE 128

Open-World Lifted Query Eval

All together, probability (1-p)k

Open-world query evaluation on empty db = Symmetric lifted inference

Q = ∃x ∃y Smoker(x) ∧ Friend(x,y) P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y)) x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y)) x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y)) x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y)) x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y)) x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y)) …

slide-129
SLIDE 129

Even on #P-hard queries!

 If we know precisely who smokes, and there are k smokers?

k n-k k n-k

 If we know that there are k smokers?  In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models → models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

slide-130
SLIDE 130

Tractable Classes

FO2 CNF FO2 Safe monotone CNF Safe type-1 CNF FO3 CQs

[VdB; NIPS’11+, [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15+, [Beame, VdB, Gribkoff, Suciu; PODS’15+, etc.

slide-131
SLIDE 131

Tractable Classes

FO2 CNF FO2 Safe monotone CNF Safe type-1 CNF FO3 CQs

[VdB; NIPS’11+, [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15+, [Beame, VdB, Gribkoff, Suciu; PODS’15+, etc.

slide-132
SLIDE 132

Tractable Classes

FO2 CNF FO2 Safe monotone CNF Safe type-1 CNF ? FO3 CQs Δ = ∀x,y,z, Friends(x,y) ∧ Friends(y,z) ⇒ Friends(x,z)

[VdB; NIPS’11+, [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15+, [Beame, VdB, Gribkoff, Suciu; PODS’15+, etc.

slide-133
SLIDE 133

X Y

Smokes(x) Gender(x) Young(x) Tall(x) Smokes(y) Gender(y) Young(y) Tall(y)

Properties Properties

FO2 is liftable!

slide-134
SLIDE 134

X Y

Smokes(x) Gender(x) Young(x) Tall(x) Smokes(y) Gender(y) Young(y) Tall(y)

Properties Properties

Friends(x,y) Colleagues(x,y) Family(x,y) Classmates(x,y)

Relations

FO2 is liftable!

slide-135
SLIDE 135

X Y

Smokes(x) Gender(x) Young(x) Tall(x) Smokes(y) Gender(y) Young(y) Tall(y)

Properties Properties

Friends(x,y) Colleagues(x,y) Family(x,y) Classmates(x,y)

Relations

FO2 is liftable!

“Smokers are more likely to be friends with other smokers.” “Colleagues of the same age are more likely to be friends.” “People are either family or friends, but never both.” “If X is family of Y, then Y is also family of X.” “If X is a parent of Y, then Y cannot be a parent of X.”

slide-136
SLIDE 136

Uncertainty in AI

Probability Distribution

=

Qualitative

+

Quantitative

slide-137
SLIDE 137

Probabilistic Graphical Models

Probability Distribution

=

Graph Structure

+

Parameterization

slide-138
SLIDE 138

Probabilistic Graphical Models

Probability Distribution

=

Graph Structure

+

Parameterization

+

slide-139
SLIDE 139

Weighted Model Counting

Probability Distribution

=

SAT Formula

+

Weights

[Chavira et al. 2008, Sang et al. 2005]

slide-140
SLIDE 140

Weighted Model Counting

Probability Distribution

=

SAT Formula

+

Weights

+

Rain ⇒ Cloudy Sun ∧ Rain ⇒ Rainbow w( Rain)=1 w(¬Rain)=2 w( Cloudy)=3 w(¬Cloudy)=5 …

[Chavira et al. 2008, Sang et al. 2005]

slide-141
SLIDE 141

Weighted First-Order Model Counting

Probability Distribution

=

First-Order Logic

+

Weights

[Van den Broeck 2011, 2013, Gogate 2011]

slide-142
SLIDE 142

Weighted First-Order Model Counting

Probability Distribution

=

First-Order Logic

+

Weights

+

Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) w( Smokes(a))=1 w(¬Smokes(a))=2 w( Smokes(b))=1 w(¬Smokes(b))=2 w( Friends(a,b))=3 w(¬Friends(a,b))=5 …

[Van den Broeck 2011, 2013, Gogate 2011]

slide-143
SLIDE 143

Generalized Model Counting

Probability Distribution

=

Logic

+

Weights

slide-144
SLIDE 144

Generalized Model Counting

Probability Distribution

=

Logic

+

Weights

+

Logical Syntax Model-theoretic Semantics Weight function w(.)

slide-145
SLIDE 145

Weighted Model Integration

Probability Distribution

=

SMT(LRA)

+

Weights

[Belle et al. IJCAI’15, UAI’15]

slide-146
SLIDE 146

Weighted Model Integration

Probability Distribution

=

SMT(LRA)

+

Weights

+

0 ≤ height ≤ 200 0 ≤ weight ≤ 200 0 ≤ age ≤ 100 age < 1 ⇒ height+weight ≤ 90 w(height))=height-10 w(¬height)=3*height2 w(¬weight)=5 …

[Belle et al. IJCAI’15, UAI’15]

slide-147
SLIDE 147

Probabilistic Programming

Probability Distribution

=

Logic Programs

+

Weights

[Fierens et al., TPLP’15]

slide-148
SLIDE 148

Probabilistic Programming

Probability Distribution

=

Logic Programs

+

Weights

+

path(X,Y) :- edge(X,Y). path(X,Y) :- edge(X,Z), path(Z,Y).

[Fierens et al., TPLP’15]

slide-149
SLIDE 149

Conclusions

 Relational probabilistic reasoning is frontier

and integration of AI, KR, ML, DB, TH, etc.

 We need

relational models and logic

probabilistic models and statistical learning

algorithms that scale

  • Open-world data model

semantics make sense

FREE for UCQs

expensive otherwise

slide-150
SLIDE 150

Long-Term Outlook

Probabilistic inference and learning exploit

~ 1988: conditional independence ~ 2000: contextual independence (local structure)

slide-151
SLIDE 151

Long-Term Outlook

Probabilistic inference and learning exploit

~ 1988: conditional independence ~ 2000: contextual independence (local structure) ~ 201?: symmetry & exchangeability & first-order

slide-152
SLIDE 152

References

  • Ceylan, Ismail Ilkan, Adnan Darwiche, and Guy Van den Broeck. "Open-world probabilistic

databases." Proceedings of KR (2016).

  • Suciu, Dan, Dan Olteanu, Christopher Ré, and Christoph Koch. "Probabilistic databases."

Synthesis Lectures on Data Management 3, no. 2 (2011): 1-180.

  • Dong, Xin, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas

Strohmann, Shaohua Sun, and Wei Zhang. "Knowledge vault: A web-scale approach to probabilistic knowledge fusion." In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 601-610. ACM, 2014.

  • Carlson, Andrew, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr, and Tom
  • M. Mitchell. "Toward an Architecture for Never-Ending Language Learning." In AAAI, vol. 5, p.
  • 3. 2010.
  • Niu, Feng, Ce Zhang, Christopher Ré, and Jude W. Shavlik. "DeepDive: Web-scale Knowledge-

base Construction using Statistical Learning and Inference." VLDS 12 (2012): 25-28.

slide-153
SLIDE 153

References

  • Chen, Brian X. "Siri, Alexa and Other Virtual Assistants Put to the Test" The New York Times

(2016).

  • Dalvi, Nilesh, and Dan Suciu. "The dichotomy of probabilistic inference for unions of

conjunctive queries." Journal of the ACM (JACM) 59, no. 6 (2012): 30.

  • De Raedt, Luc, Anton Dries, Ingo Thon, Guy Van den Broeck, and Mathias Verbeke. "Inducing

probabilistic relational rules from probabilistic examples." In Proceedings of the 24th International Conference on Artificial Intelligence, pp. 1835-1843. AAAI Press, 2015.

  • Van den Broeck, Guy. "Towards high-level probabilistic reasoning with lifted inference." AAAI

Spring Symposium on KRR (2015).

  • Niepert, Mathias, and Guy Van den Broeck. "Tractability through exchangeability: A new

perspective on efficient probabilistic inference." AAAI (2014).

  • Van den Broeck, Guy. "On the completeness of first-order knowledge compilation for lifted

probabilistic inference." In Advances in Neural Information Processing Systems, pp. 1386-

  • 1394. 2011.
slide-154
SLIDE 154

References

  • Van den Broeck, Guy, Wannes Meert, and Adnan Darwiche. "Skolemization for weighted first-
  • rder model counting." In Proceedings of the 14th International Conference on Principles of

Knowledge Representation and Reasoning (KR). 2014.

  • Gribkoff, Eric, Guy Van den Broeck, and Dan Suciu. "Understanding the complexity of lifted

inference and asymmetric weighted model counting." UAI, 2014.

  • Beame, Paul, Guy Van den Broeck, Eric Gribkoff, and Dan Suciu. "Symmetric weighted first-
  • rder model counting." In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium
  • n Principles of Database Systems, pp. 313-328. ACM, 2015.
  • Chavira, Mark, and Adnan Darwiche. "On probabilistic inference by weighted model

counting." Artificial Intelligence 172.6 (2008): 772-799.

  • Sang, Tian, Paul Beame, and Henry A. Kautz. "Performing Bayesian inference by weighted

model counting." AAAI. Vol. 5. 2005.

slide-155
SLIDE 155

References

  • Van den Broeck, Guy, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. "Lifted

probabilistic inference by first-order knowledge compilation." In Proceedings of the Twenty- Second international joint conference on Artificial Intelligence, pp. 2178-2185. AAAI Press/International Joint Conferences on Artificial Intelligence, 2011.

  • Van den Broeck, Guy. Lifted inference and learning in statistical relational models. Diss. Ph. D.

Dissertation, KU Leuven, 2013.

  • Gogate, Vibhav, and Pedro Domingos. "Probabilistic theorem proving." UAI (2011).
slide-156
SLIDE 156

References

  • Belle, Vaishak, Andrea Passerini, and Guy Van den Broeck. "Probabilistic inference in hybrid

domains by weighted model integration." Proceedings of 24th International Joint Conference

  • n Artificial Intelligence (IJCAI). 2015.
  • Belle, Vaishak, Guy Van den Broeck, and Andrea Passerini. "Hashing-based approximate

probabilistic inference in hybrid domains." In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). 2015.

  • Fierens, Daan, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo

Thon, Gerda Janssens, and Luc De Raedt. "Inference and learning in probabilistic logic programs using weighted boolean formulas." Theory and Practice of Logic Programming 15,

  • no. 03 (2015): 358-401.