Probabilistic Inference in BN2T Models by Weighted Model Counting - - PowerPoint PPT Presentation

probabilistic inference in bn2t models by weighted model
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Inference in BN2T Models by Weighted Model Counting - - PowerPoint PPT Presentation

Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013 A medical


slide-1
SLIDE 1

Probabilistic Inference in BN2T Models by Weighted Model Counting

Jirka Vomlel

Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel

Aalborg, Denmark, November, 21, 2013

slide-2
SLIDE 2

A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)

  • Eleven CHD risk factors X1, . . . , X11 measured at patient

admission to the clinic: diarrhea, hepatic metastases, etc.

slide-3
SLIDE 3

A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)

  • Eleven CHD risk factors X1, . . . , X11 measured at patient

admission to the clinic: diarrhea, hepatic metastases, etc.

  • Dependent variable Y has two values: either 0 if CHD does

not develop of 1 if it does.

slide-4
SLIDE 4

A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)

  • Eleven CHD risk factors X1, . . . , X11 measured at patient

admission to the clinic: diarrhea, hepatic metastases, etc.

  • Dependent variable Y has two values: either 0 if CHD does

not develop of 1 if it does.

  • The conditional probability P(Y |X1, . . . , X11) is modeled by a

noisy threshold model with ℓ = 6.

slide-5
SLIDE 5

A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)

  • Eleven CHD risk factors X1, . . . , X11 measured at patient

admission to the clinic: diarrhea, hepatic metastases, etc.

  • Dependent variable Y has two values: either 0 if CHD does

not develop of 1 if it does.

  • The conditional probability P(Y |X1, . . . , X11) is modeled by a

noisy threshold model with ℓ = 6.

  • The threshold model (without noise) implies that the CHD

develop if at least 6 risk factors are positive.

slide-6
SLIDE 6

A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)

  • Eleven CHD risk factors X1, . . . , X11 measured at patient

admission to the clinic: diarrhea, hepatic metastases, etc.

  • Dependent variable Y has two values: either 0 if CHD does

not develop of 1 if it does.

  • The conditional probability P(Y |X1, . . . , X11) is modeled by a

noisy threshold model with ℓ = 6.

  • The threshold model (without noise) implies that the CHD

develop if at least 6 risk factors are positive.

  • The noise on inputs allows a non-zero probability of no CHD

even if at least 6 risk factors are positive.

slide-7
SLIDE 7

BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models

Y2 Y1 X1 X2 X3 X4

Yj takes value 1 iff at least ℓ out of k parents Xi take value 1.

slide-8
SLIDE 8

BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models

Y2 Y1 X1 X2 X3 X4

Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2.

slide-9
SLIDE 9

BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models

Y2 Y1 X1 X2 X3 X4

Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. For deterministic threshold the value of p = P(Y1 = 1|X1, X2, X3) is

X1 X2 X3 p 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-10
SLIDE 10

BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models

Y2 Y1 X1 X2 X3 X4

Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. For noisy threshold the value of p′ = P(Y1 = 1|X1, X2, X3) is

X1 X2 X3 p p′ 1 1 1 1 1 (1 − p2)(1 − p3) 1 1 1 1 (1 − p1)(1 − p3) 1 1 1 (1 − p1)(1 − p2) 1 1 1 1 (1 − p1)(1 − p2)(1 − p3) +p1(1 − p2)(1 − p3) +(1 − p1)p2(1 − p3) +(1 − p1)(1 − p2)p3

where p1, p2, p3 are inhibitory probabilities.

slide-11
SLIDE 11

BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models

Y2 Y1 X1 X2 X3 X4

Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. The joint probability of the Bayesian network P(X1, . . . , Xn, Y1, . . . , Ym) =

n

  • i=1

P(Xi)

m

  • j=1

P(Yj|pa(Yj)) .

slide-12
SLIDE 12

Noisy-threshold with explicit deterministic and noisy parts

Xk . . . X′

2

X′

1

Y X1 X2 . . . X′

k

slide-13
SLIDE 13

Noisy-threshold with explicit deterministic and noisy parts

Xk . . . X′

2

X′

1

Y X1 X2 . . . X′

k

PX′

i |Xi

=

1

pi 1 − pi

slide-14
SLIDE 14

Noisy-threshold with explicit deterministic and noisy parts

Xk . . . X′

2

X′

1

Y X1 X2 . . . X′

k

PX′

i |Xi

=

1

pi 1 − pi

  • PY |X′

1,X′ 2,X′ 3

=

     

  • 1

1 1

  • 1
  • 1
  • 1

1 1

    

. where we visualize the CPT as a tensor using nested matrices.

slide-15
SLIDE 15

Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition

1 1 1 1 1 1 1 1 Y X′ 1 X′ 2 X′ 3

slide-16
SLIDE 16

Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition

−1 1 1 1 1 1 1 1

= +

1 1 1 1 1 1 1 1 1 1 Y X′ 3 X′ 2 X′ 1

slide-17
SLIDE 17

Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition

−1 1 1 1 1 1 1 1

= +

1 1 1 1 1 1 1 1 1 1 Y X′ 3 X′ 2 X′ 1 −1

= +

1 1 1 1 1 1 1 1 1 1 1

slide-18
SLIDE 18

CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)

X′

1

X′

2

. . . X′

k

Y ′

slide-19
SLIDE 19

CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)

X′

1

X′

2

. . . X′

k

Y ′

PY =1|X′

1,X′ 2,X′ 3

=

  • Y ′

3

  • i=1

ψX′

i ,Y ′,

slide-20
SLIDE 20

CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)

X′

1

X′

2

. . . X′

k

Y ′

PY =1|X′

1,X′ 2,X′ 3

=

  • Y ′

3

  • i=1

ψX′

i ,Y ′,

ψX′

i ,Y ′

=

  

3

  • 1

6

3

  • 1

3

− 3

  • 1

2

2 3

  • 1

6

− 3

  • 1

3

  

for i = 1, 2, 3.

slide-21
SLIDE 21

CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)

X′

1

X′

2

. . . X′

k

Y ′

PY =1|X′

1,X′ 2,X′ 3

=

  • Y ′

3

  • i=1

ψX′

i ,Y ′,

ψX′

i ,Y ′

=

  

3

  • 1

6

3

  • 1

3

− 3

  • 1

2

2 3

  • 1

6

− 3

  • 1

3

  

for i = 1, 2, 3. Instead of an array with 2k entries we get k arrays with 2k entries!

slide-22
SLIDE 22

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

slide-23
SLIDE 23

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

  • associate weights to literals according to the CPTs of the

Bayesian network, and

slide-24
SLIDE 24

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

  • associate weights to literals according to the CPTs of the

Bayesian network, and

  • compute the probability of evidence as the sum of weights of

all logical models consistent with that evidence.

slide-25
SLIDE 25

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

  • associate weights to literals according to the CPTs of the

Bayesian network, and

  • compute the probability of evidence as the sum of weights of

all logical models consistent with that evidence.

  • The weight of a logical model is the product of weights of all

literals.

slide-26
SLIDE 26

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

  • associate weights to literals according to the CPTs of the

Bayesian network, and

  • compute the probability of evidence as the sum of weights of

all logical models consistent with that evidence.

  • The weight of a logical model is the product of weights of all

literals.

  • Efficient WMC solvers exploiting several advanced techniques

such as clause learning, component caching can be used – e.g. Cachet.

slide-27
SLIDE 27

Probabilistic inference by weighted model counting (WMC)

  • The basic idea of WMC is to encode a Bayesian network using

a conjunctive normal form (CNF),

  • associate weights to literals according to the CPTs of the

Bayesian network, and

  • compute the probability of evidence as the sum of weights of

all logical models consistent with that evidence.

  • The weight of a logical model is the product of weights of all

literals.

  • Efficient WMC solvers exploiting several advanced techniques

such as clause learning, component caching can be used – e.g. Cachet.

  • If the Bayesian network exhibit a lot of determinism this is

much more efficient than standard techniques.

slide-28
SLIDE 28

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

slide-29
SLIDE 29

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

”states of Y ′

j are mutualy exclusive”

⊕y∈Y′

jλy

Y ′

j

slide-30
SLIDE 30

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

”states of Y ′

j are mutualy exclusive”

⊕y∈Y′

jλy

Y ′

j

”when parameters of P(Xi) applies” θx

Xi ⇔ λx Xi

slide-31
SLIDE 31

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

”states of Y ′

j are mutualy exclusive”

⊕y∈Y′

jλy

Y ′

j

”when parameters of P(Xi) applies” θx

Xi ⇔ λx Xi

”when parameters of ψ(Xi, Y ′

j ) applies”

θx,y

Xi,Y ′

j

⇔ λx

Xi ∧ λy Y ′

j

slide-32
SLIDE 32

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

”states of Y ′

j are mutualy exclusive”

⊕y∈Y′

jλy

Y ′

j

”when parameters of P(Xi) applies” θx

Xi ⇔ λx Xi

”when parameters of ψ(Xi, Y ′

j ) applies”

θx,y

Xi,Y ′

j

⇔ λx

Xi ∧ λy Y ′

j

The weights of all positive literals: w(λx

Xi) = 1 and w(λy Y ′

j ) = 1

w(θx

Xi) = PXi(x) and w(θx,y Xi,Y ′

j ) = ψXi,Y ′ j (x, y)

slide-33
SLIDE 33

Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding

Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx

Xi

”states of Y ′

j are mutualy exclusive”

⊕y∈Y′

jλy

Y ′

j

”when parameters of P(Xi) applies” θx

Xi ⇔ λx Xi

”when parameters of ψ(Xi, Y ′

j ) applies”

θx,y

Xi,Y ′

j

⇔ λx

Xi ∧ λy Y ′

j

The weights of all positive literals: w(λx

Xi) = 1 and w(λy Y ′

j ) = 1

w(θx

Xi) = PXi(x) and w(θx,y Xi,Y ′

j ) = ψXi,Y ′ j (x, y)

The weights of all negative literals ¬A are all one.

slide-34
SLIDE 34

A comparison with the standard approach based on WMC

  • Clauses consist of at most three literals.
slide-35
SLIDE 35

A comparison with the standard approach based on WMC

  • Clauses consist of at most three literals.
  • In the standard approach the number of literals in some

clauses is the number of parents of corresponding variable.

slide-36
SLIDE 36

A comparison with the standard approach based on WMC

  • Clauses consist of at most three literals.
  • In the standard approach the number of literals in some

clauses is the number of parents of corresponding variable.

  • The number of clauses is polynomial with respect to the

maximal number of variable’s parents.

slide-37
SLIDE 37

A comparison with the standard approach based on WMC

  • Clauses consist of at most three literals.
  • In the standard approach the number of literals in some

clauses is the number of parents of corresponding variable.

  • The number of clauses is polynomial with respect to the

maximal number of variable’s parents.

  • In the standard approach the number of clauses is exponential

with respect to the number of parents of corresponding variable.

slide-38
SLIDE 38

Conclusions

  • Noisy threshold models can be used in applications where

more than one factor need to be positive for a symptom to be present.

slide-39
SLIDE 39

Conclusions

  • Noisy threshold models can be used in applications where

more than one factor need to be positive for a symptom to be present.

  • Noisy threshold is a generalization of noisy-or and noisy-max.
slide-40
SLIDE 40

Conclusions

  • Noisy threshold models can be used in applications where

more than one factor need to be positive for a symptom to be present.

  • Noisy threshold is a generalization of noisy-or and noisy-max.
  • The CP tensor decomposition of noisy-threshold should be

applied before the construction of CNF used for weighted model counting.

slide-41
SLIDE 41

Conclusions

  • Noisy threshold models can be used in applications where

more than one factor need to be positive for a symptom to be present.

  • Noisy threshold is a generalization of noisy-or and noisy-max.
  • The CP tensor decomposition of noisy-threshold should be

applied before the construction of CNF used for weighted model counting.

  • The CP tensor decomposition as a preprocessing step for

WMC can be utilized also for other types of local structure of conditional probability tables.