SLIDE 1
Probabilistic Inference in BN2T Models by Weighted Model Counting
Jirka Vomlel
Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel
Aalborg, Denmark, November, 21, 2013
SLIDE 2 A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)
- Eleven CHD risk factors X1, . . . , X11 measured at patient
admission to the clinic: diarrhea, hepatic metastases, etc.
SLIDE 3 A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)
- Eleven CHD risk factors X1, . . . , X11 measured at patient
admission to the clinic: diarrhea, hepatic metastases, etc.
- Dependent variable Y has two values: either 0 if CHD does
not develop of 1 if it does.
SLIDE 4 A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)
- Eleven CHD risk factors X1, . . . , X11 measured at patient
admission to the clinic: diarrhea, hepatic metastases, etc.
- Dependent variable Y has two values: either 0 if CHD does
not develop of 1 if it does.
- The conditional probability P(Y |X1, . . . , X11) is modeled by a
noisy threshold model with ℓ = 6.
SLIDE 5 A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)
- Eleven CHD risk factors X1, . . . , X11 measured at patient
admission to the clinic: diarrhea, hepatic metastases, etc.
- Dependent variable Y has two values: either 0 if CHD does
not develop of 1 if it does.
- The conditional probability P(Y |X1, . . . , X11) is modeled by a
noisy threshold model with ℓ = 6.
- The threshold model (without noise) implies that the CHD
develop if at least 6 risk factors are positive.
SLIDE 6 A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003)
- Eleven CHD risk factors X1, . . . , X11 measured at patient
admission to the clinic: diarrhea, hepatic metastases, etc.
- Dependent variable Y has two values: either 0 if CHD does
not develop of 1 if it does.
- The conditional probability P(Y |X1, . . . , X11) is modeled by a
noisy threshold model with ℓ = 6.
- The threshold model (without noise) implies that the CHD
develop if at least 6 risk factors are positive.
- The noise on inputs allows a non-zero probability of no CHD
even if at least 6 risk factors are positive.
SLIDE 7
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models
Y2 Y1 X1 X2 X3 X4
Yj takes value 1 iff at least ℓ out of k parents Xi take value 1.
SLIDE 8
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models
Y2 Y1 X1 X2 X3 X4
Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2.
SLIDE 9
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models
Y2 Y1 X1 X2 X3 X4
Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. For deterministic threshold the value of p = P(Y1 = 1|X1, X2, X3) is
X1 X2 X3 p 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
SLIDE 10
BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models
Y2 Y1 X1 X2 X3 X4
Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. For noisy threshold the value of p′ = P(Y1 = 1|X1, X2, X3) is
X1 X2 X3 p p′ 1 1 1 1 1 (1 − p2)(1 − p3) 1 1 1 1 (1 − p1)(1 − p3) 1 1 1 (1 − p1)(1 − p2) 1 1 1 1 (1 − p1)(1 − p2)(1 − p3) +p1(1 − p2)(1 − p3) +(1 − p1)p2(1 − p3) +(1 − p1)(1 − p2)p3
where p1, p2, p3 are inhibitory probabilities.
SLIDE 11 BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models
Y2 Y1 X1 X2 X3 X4
Yj takes value 1 iff at least ℓ out of k parents Xi take value 1. Assume ℓ = 2. The joint probability of the Bayesian network P(X1, . . . , Xn, Y1, . . . , Ym) =
n
P(Xi)
m
P(Yj|pa(Yj)) .
SLIDE 12 Noisy-threshold with explicit deterministic and noisy parts
Xk . . . X′
2
X′
1
Y X1 X2 . . . X′
k
SLIDE 13 Noisy-threshold with explicit deterministic and noisy parts
Xk . . . X′
2
X′
1
Y X1 X2 . . . X′
k
PX′
i |Xi
=
1
pi 1 − pi
SLIDE 14 Noisy-threshold with explicit deterministic and noisy parts
Xk . . . X′
2
X′
1
Y X1 X2 . . . X′
k
PX′
i |Xi
=
1
pi 1 − pi
1,X′ 2,X′ 3
=
1 1
1 1
. where we visualize the CPT as a tensor using nested matrices.
SLIDE 15 Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition
1 1 1 1 1 1 1 1 Y X′ 1 X′ 2 X′ 3
SLIDE 16 Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition
−1 1 1 1 1 1 1 1
= +
1 1 1 1 1 1 1 1 1 1 Y X′ 3 X′ 2 X′ 1
SLIDE 17 Tensor of the threshold (for ℓ = 1) as 4D cube and its decomposition
−1 1 1 1 1 1 1 1
= +
1 1 1 1 1 1 1 1 1 1 Y X′ 3 X′ 2 X′ 1 −1
= +
1 1 1 1 1 1 1 1 1 1 1
SLIDE 18 CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)
X′
1
X′
2
. . . X′
k
Y ′
SLIDE 19 CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)
X′
1
X′
2
. . . X′
k
Y ′
PY =1|X′
1,X′ 2,X′ 3
=
3
ψX′
i ,Y ′,
SLIDE 20 CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)
X′
1
X′
2
. . . X′
k
Y ′
PY =1|X′
1,X′ 2,X′ 3
=
3
ψX′
i ,Y ′,
ψX′
i ,Y ′
=
3
6
3
3
− 3
2
2 3
6
− 3
3
for i = 1, 2, 3.
SLIDE 21 CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012)
X′
1
X′
2
. . . X′
k
Y ′
PY =1|X′
1,X′ 2,X′ 3
=
3
ψX′
i ,Y ′,
ψX′
i ,Y ′
=
3
6
3
3
− 3
2
2 3
6
− 3
3
for i = 1, 2, 3. Instead of an array with 2k entries we get k arrays with 2k entries!
SLIDE 22 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
SLIDE 23 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
- associate weights to literals according to the CPTs of the
Bayesian network, and
SLIDE 24 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
- associate weights to literals according to the CPTs of the
Bayesian network, and
- compute the probability of evidence as the sum of weights of
all logical models consistent with that evidence.
SLIDE 25 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
- associate weights to literals according to the CPTs of the
Bayesian network, and
- compute the probability of evidence as the sum of weights of
all logical models consistent with that evidence.
- The weight of a logical model is the product of weights of all
literals.
SLIDE 26 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
- associate weights to literals according to the CPTs of the
Bayesian network, and
- compute the probability of evidence as the sum of weights of
all logical models consistent with that evidence.
- The weight of a logical model is the product of weights of all
literals.
- Efficient WMC solvers exploiting several advanced techniques
such as clause learning, component caching can be used – e.g. Cachet.
SLIDE 27 Probabilistic inference by weighted model counting (WMC)
- The basic idea of WMC is to encode a Bayesian network using
a conjunctive normal form (CNF),
- associate weights to literals according to the CPTs of the
Bayesian network, and
- compute the probability of evidence as the sum of weights of
all logical models consistent with that evidence.
- The weight of a logical model is the product of weights of all
literals.
- Efficient WMC solvers exploiting several advanced techniques
such as clause learning, component caching can be used – e.g. Cachet.
- If the Bayesian network exhibit a lot of determinism this is
much more efficient than standard techniques.
SLIDE 28
Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
SLIDE 29 Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
”states of Y ′
j are mutualy exclusive”
⊕y∈Y′
jλy
Y ′
j
SLIDE 30 Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
”states of Y ′
j are mutualy exclusive”
⊕y∈Y′
jλy
Y ′
j
”when parameters of P(Xi) applies” θx
Xi ⇔ λx Xi
SLIDE 31 Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
”states of Y ′
j are mutualy exclusive”
⊕y∈Y′
jλy
Y ′
j
”when parameters of P(Xi) applies” θx
Xi ⇔ λx Xi
”when parameters of ψ(Xi, Y ′
j ) applies”
θx,y
Xi,Y ′
j
⇔ λx
Xi ∧ λy Y ′
j
SLIDE 32 Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
”states of Y ′
j are mutualy exclusive”
⊕y∈Y′
jλy
Y ′
j
”when parameters of P(Xi) applies” θx
Xi ⇔ λx Xi
”when parameters of ψ(Xi, Y ′
j ) applies”
θx,y
Xi,Y ′
j
⇔ λx
Xi ∧ λy Y ′
j
The weights of all positive literals: w(λx
Xi) = 1 and w(λy Y ′
j ) = 1
w(θx
Xi) = PXi(x) and w(θx,y Xi,Y ′
j ) = ψXi,Y ′ j (x, y)
SLIDE 33 Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding
Clauses for indicator (λ) and parameter (θ) logical variables: ”states of Xi are mutualy exclusive” ⊕x∈Xiλx
Xi
”states of Y ′
j are mutualy exclusive”
⊕y∈Y′
jλy
Y ′
j
”when parameters of P(Xi) applies” θx
Xi ⇔ λx Xi
”when parameters of ψ(Xi, Y ′
j ) applies”
θx,y
Xi,Y ′
j
⇔ λx
Xi ∧ λy Y ′
j
The weights of all positive literals: w(λx
Xi) = 1 and w(λy Y ′
j ) = 1
w(θx
Xi) = PXi(x) and w(θx,y Xi,Y ′
j ) = ψXi,Y ′ j (x, y)
The weights of all negative literals ¬A are all one.
SLIDE 34 A comparison with the standard approach based on WMC
- Clauses consist of at most three literals.
SLIDE 35 A comparison with the standard approach based on WMC
- Clauses consist of at most three literals.
- In the standard approach the number of literals in some
clauses is the number of parents of corresponding variable.
SLIDE 36 A comparison with the standard approach based on WMC
- Clauses consist of at most three literals.
- In the standard approach the number of literals in some
clauses is the number of parents of corresponding variable.
- The number of clauses is polynomial with respect to the
maximal number of variable’s parents.
SLIDE 37 A comparison with the standard approach based on WMC
- Clauses consist of at most three literals.
- In the standard approach the number of literals in some
clauses is the number of parents of corresponding variable.
- The number of clauses is polynomial with respect to the
maximal number of variable’s parents.
- In the standard approach the number of clauses is exponential
with respect to the number of parents of corresponding variable.
SLIDE 38 Conclusions
- Noisy threshold models can be used in applications where
more than one factor need to be positive for a symptom to be present.
SLIDE 39 Conclusions
- Noisy threshold models can be used in applications where
more than one factor need to be positive for a symptom to be present.
- Noisy threshold is a generalization of noisy-or and noisy-max.
SLIDE 40 Conclusions
- Noisy threshold models can be used in applications where
more than one factor need to be positive for a symptom to be present.
- Noisy threshold is a generalization of noisy-or and noisy-max.
- The CP tensor decomposition of noisy-threshold should be
applied before the construction of CNF used for weighted model counting.
SLIDE 41 Conclusions
- Noisy threshold models can be used in applications where
more than one factor need to be positive for a symptom to be present.
- Noisy threshold is a generalization of noisy-or and noisy-max.
- The CP tensor decomposition of noisy-threshold should be
applied before the construction of CNF used for weighted model counting.
- The CP tensor decomposition as a preprocessing step for
WMC can be utilized also for other types of local structure of conditional probability tables.