probabilistic inference in bn2t models by weighted model
play

Probabilistic Inference in BN2T Models by Weighted Model Counting - PowerPoint PPT Presentation

Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013 A medical


  1. Probabilistic Inference in BN2T Models by Weighted Model Counting Jirka Vomlel Institute of Information Theory and Automation Academy of Sciences of the Czech Republic http://www.utia.cz/vomlel Aalborg, Denmark, November, 21, 2013

  2. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc.

  3. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does.

  4. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 .

  5. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive.

  6. A medical example: Carcinoid Heart Disease (CHD), van Gerven (2003) • Eleven CHD risk factors X 1 , . . . , X 11 measured at patient admission to the clinic: diarrhea, hepatic metastases, etc. • Dependent variable Y has two values: either 0 if CHD does not develop of 1 if it does. • The conditional probability P ( Y | X 1 , . . . , X 11 ) is modeled by a noisy threshold model with ℓ = 6 . • The threshold model (without noise) implies that the CHD develop if at least 6 risk factors are positive. • The noise on inputs allows a non-zero probability of no CHD even if at least 6 risk factors are positive.

  7. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 .

  8. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models X 1 X 2 X 3 X 4 Y 1 Y 2 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .

  9. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For deterministic threshold the value of p = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 X 1 X 2 X 3 p 0 0 0 0 0 0 1 0 0 1 0 0 Y 1 Y 2 0 1 1 1 Y j takes value 1 iff at 1 0 0 0 least ℓ out of k parents 1 0 1 1 1 1 0 1 X i take value 1 . 1 1 1 1 Assume ℓ = 2 .

  10. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models For noisy threshold the value of p ′ = P ( Y 1 = 1 | X 1 , X 2 , X 3 ) is X 1 X 2 X 3 X 4 p ′ X 1 X 2 X 3 p 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 Y 1 Y 2 0 1 1 1 (1 − p 2 )(1 − p 3 ) Y j takes value 1 iff at 1 0 0 0 0 least ℓ out of k parents 1 0 1 1 (1 − p 1 )(1 − p 3 ) 1 1 0 1 (1 − p 1 )(1 − p 2 ) X i take value 1 . 1 1 1 1 (1 − p 1 )(1 − p 2 )(1 − p 3 ) Assume ℓ = 2 . + p 1 (1 − p 2 )(1 − p 3 ) +(1 − p 1 ) p 2 (1 − p 3 ) +(1 − p 1 )(1 − p 2 ) p 3 where p 1 , p 2 , p 3 are inhibitory probabilities.

  11. BN2T - Bayesian Network with 2 Layers Consisting of Noisy Threshold Models The joint probability of the Bayesian network X 1 X 2 X 3 X 4 P ( X 1 , . . . , X n , Y 1 , . . . , Y m ) n m � � = P ( X i ) P ( Y j | pa ( Y j )) . Y 1 Y 2 i =1 j =1 Y j takes value 1 iff at least ℓ out of k parents X i take value 1 . Assume ℓ = 2 .

  12. Noisy-threshold with explicit deterministic and noisy parts X 1 X 2 X k . . . X ′ X ′ X ′ 1 2 . . . k Y

  13. Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i X ′ X ′ X ′ 1 2 . . . k Y

  14. Noisy-threshold with explicit deterministic and noisy parts P X ′ i | X i � 1 � p i = X 1 X 2 X k . . . 0 1 − p i P Y | X ′ 1 , X ′ 2 , X ′ 3 � � � � X ′ X ′ X ′   1 1 1 0 1 2 . . . k 1 0 0 0     = .   � � � � 0 0 0 1     Y 0 1 1 1 where we visualize the CPT as a tensor using nested matrices.

  15. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 0 1 1 0 0 1 1 0 0 0 1 X ′ 3 1 0 Y X ′ 2 X ′ 1

  16. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1

  17. Tensor of the threshold (for ℓ = 1 ) as 4D cube and its decomposition 1 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 + = 1 1 1 1 0 0 0 0 0 0 0 0 −1 0 1 1 1 0 X ′ 3 1 0 0 0 1 0 Y X ′ 2 X ′ 1 1 −1 1 0 1 0 + = 0 1 1 1 1 1 1 1 1 0

  18. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) X ′ X ′ X ′ 1 2 k . . . Y ′

  19. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ Y ′

  20. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′   � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 =   � �   1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 .

  21. CP tensor decomposition of the threshold CPT (Vomlel, Tichavsk´ y, 2012) P Y =1 | X ′ 1 , X ′ 2 , X ′ 3 3 � � = ψ X ′ i , Y ′ , X ′ X ′ X ′ 1 2 k . . . i =1 Y ′ ψ X ′ i , Y ′   � � � 1 1 1 3 3 − 3 Y ′ 6 3 2 =   � �   1 1 2 3 − 3 0 6 3 for i = 1 , 2 , 3 . Instead of an array with 2 k entries we get k arrays with 2 k entries!

  22. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF),

  23. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and

  24. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence.

  25. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals.

  26. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet.

  27. Probabilistic inference by weighted model counting (WMC) • The basic idea of WMC is to encode a Bayesian network using a conjunctive normal form (CNF), • associate weights to literals according to the CPTs of the Bayesian network, and • compute the probability of evidence as the sum of weights of all logical models consistent with that evidence. • The weight of a logical model is the product of weights of all literals. • Efficient WMC solvers exploiting several advanced techniques such as clause learning, component caching can be used – e.g. Cachet. • If the Bayesian network exhibit a lot of determinism this is much more efficient than standard techniques.

  28. Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i

  29. Encoding transformed BN2T as a CNF using Chavira and Darwiche (2008) encoding Clauses for indicator ( λ ) and parameter ( θ ) logical variables: ⊕ x ∈ X i λ x ”states of X i are mutualy exclusive” X i ”states of Y ′ j λ y j are mutualy exclusive” ⊕ y ∈ Y ′ Y ′ j

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend