Probabilistic Logic Programming & Knowledge Compilation - - PowerPoint PPT Presentation

probabilistic logic programming knowledge compilation
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Logic Programming & Knowledge Compilation - - PowerPoint PPT Presentation

Probabilistic Logic Programming & Knowledge Compilation Wannes Meert DTAI, Dept. Computer Science, KU Leuven Dagstuhl, 18 September 2017 In collaboration with Jonas Vlasselaer, Guy Van den Broeck, Anton Dries, Angelika Kimmig, Hendrik


slide-1
SLIDE 1

Probabilistic Logic Programming & Knowledge Compilation

Wannes Meert

DTAI, Dept. Computer Science, KU Leuven Dagstuhl, 18 September 2017 In collaboration with Jonas Vlasselaer, Guy Van den Broeck, Anton Dries, Angelika Kimmig, Hendrik Blockeel, Jesse Davis and Luc De Raedt

slide-2
SLIDE 2

StarAI

2

Dealing with uncertainty:

  • Probability theory
  • Graphical models

Learning

  • Parameters
  • Structure

Reasoning with relational data

  • Logic
  • Database
  • Programming

?

statistical relational AI, Statistical relational learning, probabilistic logic learning, probabilistic programming, ...

slide-3
SLIDE 3

ProbLog

3

Uncertainty Learning Relational data

?

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).
 smokes(X) :- influences(Y,X), smokes(Y). 0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl).

→ One World → Multiple possible worlds

t(0.8)::stress(ann). t(_)::influences(ann,bob). t(_)::influences(bob,carl).

slide-4
SLIDE 4

Introduction to ProbLog

4

slide-5
SLIDE 5

Example

5

h

  • toss (biased) coin & draw ball from each urn
  • win if (heads and a red ball) or (two balls of same color)

0.4 :: heads.
 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue).
 win :- heads, col(_,red). win :- col(1,C), col(2,C). evidence(heads). query(win).

Probabilistic fact Annotated disjunction Logical rules / background knowledge Evidence Query

slide-6
SLIDE 6

Example

6

0.4 :: heads. 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue).
 win :- heads, col(_,red). win :- col(1,C), col(2,C).

W

R R

H W

R R G G 0.4x0.3x0.3 (1-0.4)x0.3x0.2 (1-0.4)x0.3x0.3

slide-7
SLIDE 7

All possible worlds

7

W

R R

H W

R B

H W

R G

H W

R R R G R B

H W

B B

H

G B

H W

R B R B G B

W

B B 0.024 0.036 0.060 0.036 0.054 0.090 0.056 0.084 0.084 0.126 0.140 0.210

slide-8
SLIDE 8

P(win) = ∑

8

W

R R

H W

R B

H W

R G

H W

R R R G R B

H W

B B

H

G B

H W

R B R B G B

W

B B 0.024 0.036 0.060 0.036 0.054 0.090 0.056 0.084 0.084 0.126 0.140 0.210

slide-9
SLIDE 9

Alternative view: CP-logic

9

throws(john). 0.5::throws(mary). 0.8 :: break :- throws(mary). 0.6 :: break :- throws(john).

probabilistic causal laws

John throws Window breaks Window breaks Window breaks doesn’t break doesn’t break doesn’t break Mary throws Mary throws doesn’t throw doesn’t throw 1.0 0.6 0.4 0.5 0.5 0.5 0.5 0.8 0.8 0.2 0.2

P(break)=0.6×0.5×0.8 + 0.6×0.5×0.2 + 0.6×0.5 + 0.4×0.5×0.8

9 [Vennekens et al 2003, Meert and Vennekens 2014]

slide-10
SLIDE 10

Sato’s Distribution semantics

10

P(Q) = X

F [R| =Q

Y

f2F

p(f) Y

f62F

1 − p(f)

[Sato ICLP 95]

query subset of probabilistic facts Prolog rules sum over possible worlds where Q is true probability of possible world

slide-11
SLIDE 11

Examples from Tutorial

11

Try yourself: https://dtai.cs.kuleuven.be/problog

$ pip install problog

slide-12
SLIDE 12

Tutorial: Bayes net

12

slide-13
SLIDE 13

Tutorial: Higher-order functions

13

slide-14
SLIDE 14

Tutorial: As a Python Library

14

from problog.program import SimpleProgram from problog.logic import Constant,Var,Term,AnnotatedDisjunction coin,heads,tails,win = Term(‘coin'),Term('heads'),Term('tails'),Term('win') C = Var('C') p = SimpleProgram() p += coin(Constant('c1')) p += coin(Constant('c2')) p += AnnotatedDisjunction([heads(C,p=0.4), tails(C,p=0.6)], coin(C)) p += (win << heads(C)) p += query(win) lf = LogicFormula.create_from(p) # ground the program cnf = CNF.create_from(lf) # convert to CNF ddnnf = DDNNF.create_from(cnf) # compile CNF to ddnnf ddnnf.evaluate()

slide-15
SLIDE 15

Weighted Model Counting

15

WMC(φ) = X

IV | =φ

Y

l∈I

w(l)

interpretations (truth value assignments) of propositional variables weight of literal propositional formula in conjunctive normal form (CNF) Given by ProbLog program and query Possible worlds for p::f, w(f) = p w(¬f) = 1-p

P(Q) = X

F [R| =Q

Y

f2F

p(f) Y

f62F

1 − p(f)

slide-16
SLIDE 16

Encodings/Compilers for WMC

16

ProbLog d-DNNF SDD BDD CNF Tp-compilation Also links to MaxSAT (decisions), Bayes net inference, ... Grounding Cycle breaking Formula

usage: problog [--knowledge {sdd,bdd,nnf,ddnnf,kbest,fsdd,fbdd}] ... (+various tools)

slide-17
SLIDE 17

Impact of encoding

17 × × × × ¬y1 × ¬y0 ¬x + . . . ¬y0 ¬yn . . . smooth(Y2, . . . , Yn) y1 × × × + x smooth(Y1, . . . , Yn) y0 + WMC(x) = w(y0) + w(¬y0) · w(y1) + . . . = P

i w(yi) · Q j<i(1 − w(yj))

WMC(¬x) = Q

i w(¬yi)

= Q

i(1 − w(yi))

Since w(yi) + w(¬yi) = 1,smooth(·) = 1

y(1) ⇔ p(1,1) y(2) ⇔ p(1,2) … y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n)) [Van den Broeck 2014, Meert 2016]

Noisy-OR

Y1 X Y2 . . . Yn

slide-18
SLIDE 18

Impact of encoding

18

× . . . × × ¬yn × r × × ¬r ¬y0 ¬yn ¬y0 × ¬x . . . + x smooth(Y0, . . . , Yn) r + Since w(yi) + w(¬yi) = 1,smooth(·) = 1 WMC(¬x) = Q

i w(¬yi)

= Q

i(1 − w(yi))

WMC(x) = 1 − Q

i w(¬yi)

= 1 − Q

i(1 − w(yi))

[Van den Broeck 2014, Meert 2016] y(1) ⇔ p(1,1) y(2) ⇔ p(1,2) … y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n))

Noisy-OR

Y1 X Y2 . . . Yn

slide-19
SLIDE 19

19

Is KC just a toolbox for us? No, to tackle some types of problems we need to interact while compiling or performing inference. Yes, separate concerns and conveniently use what is available (and improve timings by waiting)

slide-20
SLIDE 20

Tp-compilation

Forward inference
 Incremental compilation

20

slide-21
SLIDE 21

Why Tp-compilation

21

Domains with many cycles or long temporal chains: We encountered two problems:

  • 1. Not always feasible to compile CNF
  • 2. Not always feasible to create CNF

Social Genes webpages Sensor networks

slide-22
SLIDE 22

Before

22

Grounding Loop breaking CNF conversion

Sampling on the CNF e.g. MC-SAT

[Poon and Domingos, NCAI ‘06]

“Approximate” Compilation e.g. via Weighted Partial MaxSAT

[Renkens et al., AAAI ‘14]

‘Exact’ Knowledge Compilation e.g. OBDD, d-DNNF, SDD

[Fierens et al., TPLP ‘15]

Horn Approximation

[Selman and Kautz, AAAI ’91]

slide-23
SLIDE 23

Tp-compilation

  • Generalizes the Tp operator from logic programming towards the

probabilistic setting.

  • Tp operator (forward reasoning):
  • Start with what is known.
  • Derive new knowledge by applying the rules.
  • Continue until fixpoint (interpretation unchanged)
  • Tp-compilation
  • Start with an empty formula for each probabilistic fact
  • Construct new formulas by applying the rules
  • Continue until fixpoint (formulas remains equivalent)

23

Apply-operator Equivalence SDD Bounds available at every iteration

slide-24
SLIDE 24

Really a problem?

  • Fully connected graph with 10 nodes (90 edges)
  • CNF contains +25k variables and +100k clauses
  • Tp-compilation only requires 90 variables
  • Alzheimer network

24

slide-25
SLIDE 25

Continuous observations

Sensor measurements Circuit interacts with other representations

25

slide-26
SLIDE 26

Continuous sensor measurements

26

“ ”

  • proefstand laat toe om complexere scenario’s te

WP’s in meer detail worden uitgewerkt

  • normal(0.2,0.1)::vibration(X) :- op1(X).

normal(0.6,0.2)::vibration(X) :- op2(X). normal(3.1,1.1)::vibration(X) :- fault(X). 0.2::fault(X) :- connected(X,Y), fault(Y). Restricted setting:

  • Sensor measurements are always available
  • Only used in head

Ongoing work

slide-27
SLIDE 27

Continuous values

27

t(0.5)::c(ID). t(normal(1, 10))::f(ID) :- c(ID). t(normal(10,10))::f(ID) :- \+c(ID). evidence(f(1), 10). evidence(f(2), 12). evidence(f(3), 8 ). evidence(f(4), 11). evidence(f(5), 7 ). evidence(f(6), 13). evidence(f(7), 20). evidence(f(8), 21). evidence(f(9), 22). evidence(f(10), 18). evidence(f(11), 19). evidence(f(12), 19). evidence(f(13), 19). evidence(f(14), 23). evidence(f(15), 21). 0.40::c(ID). normal(10.16,2.11)::f(ID) :- c(ID). normal(20.22,1.54)::f(ID) :- \+c(ID).

Gaussian Mixture Model

9 θ₂ ¬θ₁ ¬θ₂ ⊤ 9 θ₂ ¬θ₁ ¬θ₂ θ₁ 9 ¬θ₂ θ₁ θ₂ ⊤ 9 θ₂ ¬θ₁ ¬θ₂ ⊥ 9 θ₂ θ₁ ¬θ₂ ⊥ 9 ¬θ₂ θ₁ θ₂ ⊥ 9 θ₂ θ₁ ¬θ₂ ⊤ 9 ¬θ₂¬θ₁ θ₂ ⊥ 9 ¬θ₂¬θ₁ θ₂ θ₁ 9 ¬θ₂¬θ₁ θ₂ ⊤ 7 ¬θ₃ θ₃ ⊤ 7 θ₃ ¬θ₃¬θ₂ 7 θ₃ ¬θ₃¬θ₁ 7 ¬θ₃ θ₃ 7 ¬θ₃ θ₃ θ₂ 7 ¬θ₃ θ₃ 7 θ₃ ¬θ₃ ⊥ 7 ¬θ₃ θ₃ θ₁ 5 θ₄ ¬θ₄ 5 ¬θ₄ θ₄ 5 θ₄ ¬θ₄ 5 ¬θ₄ θ₄ 3 E ¬E 3 ¬E E 1 ¬θ₆ θ₆

+ weights are functions

Ongoing work

9 E ¬F ¬E ⊥ 7 D ¬D⊥ 3 ¬B B ⊥ 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 0.05 0.1 0.15
slide-28
SLIDE 28

Resource Aware Circuits

Memory and energy Circuits that are ‘hardware-friendly’

28

slide-29
SLIDE 29

Why resource aware?

29

spokes partner’s

  • f-the-art sensor fusion allows to combine sensory information streams to improve sensory info

Integrate AI and hardware to achieve dynamic attention-scalability → Adapt hardware dynamically and smart Allows to extract the maximum of information of relevant information under our limited computational bandwidth.

  • Resource-aware inference and fusion algorithms
  • Resource-scalable inference processors

Figure 24.2.4: (left) Schematic and decision tree algorithm for mixed-

“ ”

  • On the system’s operating mode:
  • μ

Previous work: Decision Trees Ongoing work

slide-30
SLIDE 30

Current challenges

  • We need self-reflection
  • To understand sensitivities to (de)activate sensors


Same Decision Probability ...

  • We need to steer the compilation to ‘hardware friendly’ structures
  • Group memory fetches, fixed-point operations, ...

30

Ongoing work

slide-31
SLIDE 31

Other challenges

  • First-order reasoning, also for probabilities (see Guy’s talk)
  • First-order circuits, WFOMC
  • Constraints
  • cProbLog, MLNs
  • Hybrid domains (continuous variables)
  • Knowledge compilation + other semirings / SMT solvers
  • Structure learning
  • Incrementally updating circuits, PSDD
  • Link to NLP
  • ...

31

Mike has a bag of marbles with 4 white, 8 blue, and 6 red marbles. He pulls out one marble from the bag and it is red. What is the probability that the second marble he pulls out of the bag is white?

The answer is 0.235941.

slide-32
SLIDE 32

Thank you!

32