the bayesian network framework
play

The Bayesian Network Framework 89 / 384 The network formalism, - PowerPoint PPT Presentation

Chapter 4: The Bayesian Network Framework 89 / 384 The network formalism, informal A Bayesian network combines two types of domain knowledge to represent a joint probability distribution: qualitative knowledge: a (minimal) directed I-map


  1. Pearl’s computational architecture In Pearl ’s algorithm the graph of a Bayesian network is used as a computational architecture: • each node in the graph is an autonomous object; • each object has a local memory that stores the assessment functions of the associated node; • each object has available a local processor that can do (simple) probabilistic computations; • each arc in the graph is a (bi-directional) communication channel, through which connected objects can send each other messages. 106 / 384

  2. A computational architecture 3 2 1 count count count 107 / 384

  3. A computational architecture 1 3 2 1 2 count count count count count 108 / 384

  4. A computational architecture 2 1 1 3 2 1 2 count count count count count 109 / 384

  5. A computational architecture � � � 1 � � - 2 3 4 4 2 1 3 - 1 1 1 1 1 2 1 4 3 1 2 3 4 110 / 384

  6. A computational architecture � 1 � - � � 4 3 3 2 3 2 4 3 - 1 1 1 1 1 3 4 3 4 111 / 384

  7. Understanding Pearl: single arc (1) Consider Bayesian network B with the following graph: V 1 γ ( V 1 ) γ ( V 2 | V 1 ) V 2 Let Pr be the joint distribution defined by B . We consider the situation without evidence. • Can node V 1 compute the probabilities Pr( V 1 ) ? If so, how? • Can node V 2 compute the probabilities Pr( V 2 ) ? If so, how? 112 / 384

  8. Understanding Pearl: single arc (2) Consider Bayesian network B with the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) Let Pr be the joint distribution defined by B . We consider the situation γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) without evidence. V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) • node V 1 can determine the probabilities for its own values: Pr( v 1 ) = γ ( v 1 ) , Pr( ¬ v 1 ) = γ ( ¬ v 1 ) • node V 2 cannot determine Pr( V 2 ) , but does know all four conditional probabilities: Pr( V 2 | V 1 ) = γ ( V 2 | V 1 ) V 2 can compute its probabilities given information from V 1 : Pr( v 2 ) = Pr( v 2 | v 1 ) · Pr( v 1 ) + Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) Pr( ¬ v 2 ) = Pr( ¬ v 2 | v 1 ) · Pr( v 1 ) + Pr( ¬ v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) 113 / 384

  9. Understanding Pearl: directed path (1) Consider Bayesian network B with the following graph: γ ( V 1 ) V 1 γ ( V 2 | V 1 ) V 2 γ ( V 3 | V 2 ) V 3 We consider the situation without evidence. • Can node V 1 compute the probabilities Pr( V 1 ) ? • Can node V 2 compute the probabilities Pr( V 2 ) ? • Can node V 3 compute the probabilities Pr( V 3 ) ? If so, how? 114 / 384

  10. Understanding Pearl: directed path (2) Consider Bayesian network B with the following graph: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 We consider the situation without evidence. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 Given information from γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) V 1 , node V 2 can compute Pr( v 2 ) and Pr( ¬ v 2 ) . γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 2 now sends node V 3 the required information; node V 3 computes: Pr( v 3 ) = Pr( v 3 | v 2 ) · Pr( v 2 ) + Pr( v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) = γ ( v 3 | v 2 ) · Pr( v 2 ) + γ ( v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) Pr( ¬ v 3 ) = γ ( ¬ v 3 | v 2 ) · Pr( v 2 ) + γ ( ¬ v 3 | ¬ v 2 ) · Pr( ¬ v 2 ) 115 / 384

  11. Introduction to causal parameters Reconsider Bayesian network B without observations: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 Node V 1 sends a message π V 1 V 2 ↓ enabling V 2 to compute the probabilities for its values. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) This message is a function π V 1 V 2 : { v 1 , ¬ v 1 } → [0 , 1] that attaches a number to each value of V 1 , such that � π V 1 V 2 ( c V 1 ) = 1 c V 1 The function π V 1 V 2 is called the causal parameter from V 1 to V 2 . 116 / 384

  12. Causal parameters: an example Consider the following Bayesian network without observations: γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 V 1 Node V 1 : π V 1 • receives no mes- V 2 ↓ sages γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • computes and π V 2 V 3 ↓ sends to V 2 : causal γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 parameter π V 1 V 3 V 2 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 with π V 1 π V 1 V 2 ( v 1 ) = γ ( v 1 ) = 0 . 7; V 2 ( ¬ v 1 ) = 0 . 3 Node V 1 computes Pr( V 1 ) : Pr( v 1 ) = π V 1 V 2 ( v 1 ) = 0 . 7; Pr( ¬ v 1 ) = 0 . 3 117 / 384

  13. Causal parameters: an example (cntd) γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 V 1 Node V 2 : π V 1 • receives causal pa- V 2 ↓ rameter π V 1 V 2 from V 1 γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • computes and π V 2 V 3 ↓ sends to V 3 : causal γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 parameter π V 2 V 3 V 3 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 with π V 2 = Pr( v 2 | v 1 ) · Pr( v 1 ) + Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) V 3 ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = 0 . 2 · 0 . 7 + 0 . 5 · 0 . 3 = 0 . 29 π V 2 V 3 ( ¬ v 2 ) = 0 . 8 · 0 . 7 + 0 . 5 · 0 . 3 = 0 . 71 Node V 2 computes Pr( V 2 ) : Pr( v 2 ) = π V 2 V 3 ( v 2 ) = 0 . 29; Pr( ¬ v 2 ) = 0 . 71 118 / 384

  14. Causal parameters: an example (cntd) V 1 γ ( v 1 ) = 0 . 7 , γ ( ¬ v 1 ) = 0 . 3 Node V 3 : π V 1 V 2 ↓ • receives causal pa- γ ( v 2 | v 1 ) = 0 . 2 , γ ( ¬ v 2 | v 1 ) = 0 . 8 rameter π V 2 V 2 V 3 from V 2 γ ( v 2 | ¬ v 1 ) = 0 . 5 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 5 • sends no messa- π V 2 V 3 ↓ ges γ ( v 3 | v 2 ) = 0 . 6 , γ ( ¬ v 3 | v 2 ) = 0 . 4 V 3 γ ( v 3 | ¬ v 2 ) = 0 . 1 , γ ( ¬ v 3 | ¬ v 2 ) = 0.9 Node V 3 computes Pr( V 3 ) : = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 Pr( v 3 ) V 3 ( ¬ v 2 ) = 0 . 6 · 0 . 29 + 0 . 1 · 0 . 71 = 0 . 245 Pr( ¬ v 3 ) = 0 . 4 · 0 . 29 + 0 . 9 · 0 . 71 = 0 . 755 � 119 / 384

  15. Understanding Pearl: simple chains Consider the Bayesian networks B with the following graphs: γ ( v 1 | v 2 ) , γ ( ¬ v 1 | v 2 ) V 1 γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 1 | ¬ v 2 ) , γ ( ¬ v 1 | ¬ v 2 ) γ ( v 2 | v 1 ∧ v 3 ) , γ ( v 2 | v 1 ∧ ¬ v 3 ) V 2 γ ( v 2 ) , γ ( ¬ v 2 ) V 2 γ ( v 2 | ¬ v 1 ∧ v 3 ) , γ ( v 2 | ¬ v 1 ∧ ¬ v 3 ) ... γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 V 3 γ ( v 3 ) , γ ( ¬ v 3 ) γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) We consider the situation without observations. In each of the above networks, can nodes V 1 , V 2 , and V 3 compute the probabilities Pr( V 1 ) , Pr( V 2 ) , and Pr( V 3 ) , respectively. And if so, how? 120 / 384

  16. Understanding Pearl with evidence (1) Consider Bayesian network B with evidence V 1 = true ( v 1 ) and the following graph: Node V 1 updates its probabili- γ ( v 1 ) , γ ( ¬ v 1 ) V 1 ties and causal parameter: π V 1 = Pr v 1 ( v 1 ) V 2 ( v 1 ) π V 1 V 2 ↓ = Pr( v 1 | v 1 ) = 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) π V 1 V 2 ( ¬ v 1 ) = Pr v 1 ( ¬ v 1 ) = 0 V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Given the updated information from V 1 , node V 2 updates the probabilities for its own values: = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 Pr v 1 ( v 2 ) V 2 ( ¬ v 1 ) = γ ( v 2 | v 1 ) Pr v 1 ( ¬ v 2 ) = γ ( ¬ v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( ¬ v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = γ ( ¬ v 2 | v 1 ) Note that the function γ V 1 remains unchanged! 121 / 384

  17. Understanding Pearl with evidence (2a) Consider Bayesian network B with the following graph: γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Suppose we have evidence V 2 = true for node V 2 . • Can node V 1 compute the probabilities Pr v 2 ( V 1 ) ? If so, how? • Can node V 2 compute the probabilities Pr v 2 ( V 2 ) ? If so, how? 122 / 384

  18. Understanding Pearl with evidence (2b) Consider Bayesian network B with evidence V 2 = true and the following graph: Node V 1 cannot update its V 1 γ ( v 1 ) , γ ( ¬ v 1 ) probabilities using its own knowledge; it requires in- formation from V 2 ! What in- γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 formation does V 1 require? γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) Consider the following properties: = Pr( v 2 | v 1 ) · Pr( v 1 ) Pr v 2 ( v 1 ) ∝ Pr( v 2 | v 1 ) · Pr( v 1 ) Pr( v 2 ) Pr v 2 ( ¬ v 1 ) = Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) ∝ Pr( v 2 | ¬ v 1 ) · Pr( ¬ v 1 ) Pr( v 2 ) 123 / 384

  19. Introduction to diagnostic parameters Reconsider Bayesian network B : γ ( v 1 ) , γ ( ¬ v 1 ) V 1 Node V 2 sends a message enabling V 1 to update the λ V 1 V 2 ↑ probabilities for its values. γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) This message is a function λ V 1 V 2 : { v 1 , ¬ v 1 } → [0 , 1] that attaches a number to each value of V 1 . The message basically tells V 1 what node V 2 knows about V 1 ; in general: � λ V 1 V 2 ( c V 1 ) � = 1 c V 1 The function λ V 1 V 2 is called the diagnostic parameter from V 2 to V 1 . 124 / 384

  20. Diagnostic parameters: an example Consider the following Bayesian network B with evidence V 2 = true : γ ( v 1 ) = 0 . 8 , γ ( ¬ v 1 ) = 0 . 2 V 1 λ V 1 V 2 ↑ γ ( v 2 | v 1 ) = 0 . 4 , γ ( ¬ v 2 | v 1 ) = 0 . 6 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 9 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 1 Node V 2 : • computes and sends to V 1 : diagnostic parameter λ V 1 V 2 with λ V 1 V 2 ( v 1 ) = Pr( v 2 | v 1 ) = γ ( v 2 | v 1 ) = 0 . 4 λ V 1 V 2 ( ¬ v 1 ) = γ ( v 2 | ¬ v 1 ) = 0 . 9 Note that � c V 1 λ ( c V 1 ) = 1 . 3 > 1 ! 125 / 384

  21. Diagnostic parameters: an example (cntd) γ ( v 1 ) = 0 . 8 , γ ( ¬ v 1 ) = 0 . 2 V 1 Node V 1 receives from λ V 1 V 2 ↑ V 2 the diagnostic para- meter λ V 1 γ ( v 2 | v 1 ) = 0 . 4 , γ ( ¬ v 2 | v 1 ) = 0 . 6 V 2 V 2 γ ( v 2 | ¬ v 1 ) = 0 . 9 , γ ( ¬ v 2 | ¬ v 1 ) = 0 . 1 Node V 1 computes: Pr v 2 ( v 1 ) = α · Pr( v 2 | v 1 ) · Pr( v 1 ) = α · λ V 1 V 2 ( v 1 ) · γ ( v 1 ) = α · 0 . 4 · 0 . 8 = α · 0 . 32 Pr v 2 ( ¬ v 1 ) = α · λ V 1 V 2 ( ¬ v 1 ) · γ ( ¬ v 1 ) = α · 0 . 9 · 0 . 2 = α · 0 . 18 Node V 1 now normalises its probabilities using Pr v 2 ( v 1 ) + Pr v 2 ( ¬ v 1 ) = 1 : α · 0 . 32 + α · 0 . 18 = 1 = ⇒ α = 2 resulting in Pr v 2 ( v 1 ) = 0 . 64 Pr v 2 ( ¬ v 1 ) = 0 . 36 126 / 384 �

  22. Understanding Pearl: directed path with evidence Consider Bayesian network B with the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Suppose we have evidence V 3 = true for node V 3 . • Can node V 1 compute the probabilities Pr v 3 ( V 1 ) ? • Can node V 2 compute the probabilities Pr v 3 ( V 2 ) ? If so, how? • Can node V 3 compute the probabilities Pr v 3 ( V 3 ) ? What if node V 1 , node V 2 , or both have evidence instead? 127 / 384

  23. Pearl on directed paths – An example (1) Consider Bayesian network B with evidence V 3 = true and the following graph: V 1 γ ( v 1 ) , γ ( ¬ v 1 ) Node V 1 : • receives diagnostic para- meter λ V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 ( V 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) • computes and sends to V 2 : causal parameter π V 1 V 2 ( V 1 ) = γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) γ ( V 1 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 1 computes = α · Pr( v 3 | v 1 ) · Pr( v 1 ) = α · λ V 1 Pr v 3 ( v 1 ) V 2 ( v 1 ) · γ ( v 1 ) Pr v 3 ( ¬ v 1 ) = α · Pr( v 3 | ¬ v 1 ) · Pr( ¬ v 1 ) = α · λ V 1 V 2 ( ¬ v 1 ) · γ ( ¬ v 1 ) 128 / 384

  24. Pearl on directed paths – An example (2) Node V 2 : γ ( v 1 ) , γ ( ¬ v 1 ) V 1 • receives causal parameter π V 1 V 2 ( V 1 ) γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 • receives diagnostic para- γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) meter λ V 2 V 3 ( V 2 ) • computes and sends to V 3 : γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 π V 2 V 3 ( V 2 ) γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 2 computes and sends to V 1 : diagnostic parameter λ V 1 V 2 ( V 1 ) with λ V 1 V 2 ( v 1 ) = Pr( v 3 | v 1 ) = Pr( v 3 | v 2 ) · Pr( v 2 | v 1 ) + Pr( v 3 | ¬ v 2 ) · Pr( ¬ v 2 | v 1 ) = λ V 2 V 3 ( v 2 ) · γ ( v 2 | v 1 ) + λ V 2 V 3 ( ¬ v 2 ) · γ ( ¬ v 2 | v 1 ) λ V 1 V 2 ( ¬ v 1 ) = Pr( v 3 | ¬ v 1 ) = . . . The node then computes Pr v 3 ( V 2 ) . . . How? 129 / 384

  25. Pearl on directed paths – An example (3) γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 2 | v 1 ) , γ ( ¬ v 2 | v 1 ) V 2 γ ( v 2 | ¬ v 1 ) , γ ( ¬ v 2 | ¬ v 1 ) γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) Node V 3 : • receives causal parameter π V 2 V 3 ( V 2 ) • computes and sends to V 2 : diagnostic parameter λ V 2 V 3 ( V 2 ) with λ V 2 = Pr( v 3 | v 2 ) = γ ( v 3 | v 2 ) V 3 ( v 2 ) λ V 2 V 3 ( ¬ v 2 ) = Pr( v 3 | ¬ v 2 ) = γ ( v 3 | ¬ v 2 ) • computes Pr v 3 ( V 3 ) 130 / 384 �

  26. Understanding Pearl: simple chain with evidence Consider the Bayesian networks B with the following graphs: γ ( v 1 | v 2 ) , γ ( ¬ v 1 | v 2 ) V 1 γ ( v 1 ) , γ ( ¬ v 1 ) V 1 γ ( v 1 | ¬ v 2 ) , γ ( ¬ v 1 | ¬ v 2 ) γ ( v 2 | v 1 ∧ v 3 ) , γ ( v 2 | v 1 ∧ ¬ v 3 ) V 2 γ ( v 2 ) , γ ( ¬ v 2 ) V 2 γ ( v 2 | ¬ v 1 ∧ v 3 ) , γ ( v 2 | ¬ v 1 ∧ ¬ v 3 ) ... γ ( v 3 | v 2 ) , γ ( ¬ v 3 | v 2 ) V 3 V 3 γ ( v 3 | ¬ v 2 ) , γ ( ¬ v 3 | ¬ v 2 ) γ ( v 3 ) , γ ( ¬ v 3 ) Suppose we have evidence V 3 = true for V 3 . Answer the following questions for each network above: Can nodes V 1 , V 2 , and V 3 compute the probabilities Pr v 3 ( V 1 ) , Pr v 3 ( V 2 ) , and Pr v 3 ( V 3 ) , respectively. And if so, how? 131 / 384

  27. The parameters as messages V j Consider the graph of a Bayesian net- work as a computational architecture. π V j ↑ λ V j V i ↓ V i The separate causal and diagnostic V i parameters can be considered mes- sages that are passed between ob- π V i ↑ λ V i V k ↓ jects through communication chan- V k nels. V k 132 / 384

  28. Pearl’s algorithm (high-level) Let B = ( G, Γ) be a Bayesian network with G = ( V G , A G ) ; let Pr be the joint distribution defined by B . For each V i ∈ V G do await messages from parents (if any) and compute π ( V i ) await messages from children (if any) and compute λ ( V i ) compute and send messages π V i V ij ( V i ) to all children V i j V jk compute and send messages λ V i ( V j k ) to all parents V j k compute Pr( V i | c E ) for evidence c E (if any) In the prior network message passing starts at ’root’ nodes; upon processing evidence, message passing is initiated at observed nodes. 133 / 384

  29. Notation: partial configurations Definition : A random variable V j ∈ V is called instantiated if evidence V j = true or V j = false is obtained; otherwise V j is called uninstantiated. Let E ⊆ V be the subset of instantiated variables. The obtained configuration c E is called a partial configuration of V , written � c V . Example : Consider V = { V 1 , V 2 , V 3 } . If no evidence is obtained ( E = ∅ ) then: c V = T ( rue ) � c V = ¬ v 2 If evidence V 2 = false is obtained, then: � � Note: with � c V we can refer to evidence without specifying E . 134 / 384

  30. Singly connected graphs (SCGs) Definition : A directed graph G is called singly connected if the underlying graph of G is acyclic. Example : The following graph is singly connected: V i Lemma : Let G be a singly connected graph. Each graph that is obtained from G by removing an arc, is not connected. Definition : A (directed) tree is a singly connected graph where each node has at most one incoming arc. 135 / 384

  31. Notation: lowergraphs and uppergraphs Definition : Let G = ( V G , A G ) be a singly connected graph and let G ( V i ,V j ) be the subgraph of G after removing the arc ( V i , V j ) ∈ A G : G ( V i ,V j ) = ( V G , A G \ { ( V i , V j ) } ) Now consider a node V i ∈ V G : For each node V j ∈ ρ ( V i ) , let G + ( V j ,V i ) be the component of G + G ( V j ,V i ) that contains V j ; ( V j ,V i ) is called an uppergraph of V i . For each node V k ∈ σ ( V i ) , let G − ( V i ,V k ) be the component of G − G ( V i ,V k ) that contains V k ; ( V i ,V k ) is called a lowergraph of V i . 136 / 384

  32. An example G + G + V 1 V 2 ( V 1 ,V 0 ) ( V 2 ,V 0 ) Node V 0 has: – two uppergraphs G + ( V 1 ,V 0 ) and G + V 0 ( V 2 ,V 0 ) – two lowergraphs G − ( V 0 ,V 3 ) and G − ( V 0 ,V 4 ) G − V 3 V 4 G − ( V 0 ,V 3 ) ( V 0 ,V 4 ) For this graph we have, for example, that I ( V G + ( V 1 ,V 0) , { V 0 } , V G − ( V 0 ,V 3) ) I ( V G − ( V 0 ,V 3) , { V 0 } , V G − ( V 0 ,V 4) ) ( V 1 ,V 0) , ∅ , V G + I ( V G + ( V 2 ,V 0) ) 137 / 384

  33. Computing probabilities in singly connected graphs Lemma : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) with V G = V = { V 1 , . . . , V n } , n ≥ 1 ; let Pr be the joint distribution defined by B . � For V i ∈ V , let V + = V \ V + = ( Vj,Vi ) and V − i . V G + i i V j ∈ ρ ( V i ) Then Pr( V i | � c V ) = α · Pr( � | V i ) · Pr( V i | � i ) c V − c V + i i ∧ � where � c V = � c V − c V + i and α is a normalisation constant. 138 / 384

  34. Computing probabilities in singly connected graphs Proof : Pr( V i | � c V ) = Pr( V i | � c V − ∧ � c V + i ) i | V i ) · Pr( � | V i ) · Pr( V i ) Pr( � c V − c V + i i = Pr( � ∧ � i ) c V − c V + i Pr( � i ) c V + | V i ) · Pr( V i | � i ) · = Pr( � c V − c V + Pr( � ∧ � i ) c V − c V + i i = α · Pr( � | V i ) · Pr( V i | � c V − c V + i ) i 1 i ) . where α = � Pr( � | � c V − c V + i 139 / 384

  35. Compound parameters: definition Definition : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . For V i ∈ V G , let V + and V − be as before; i i • the function π : { v i , ¬ v i } → [0 , 1] for node V i is defined by π ( V i ) = Pr( V i | � i ) c V + and is called the compound causal parameter for V i ; • the function λ : { v i , ¬ v i } → [0 , 1] for node V i is defined by λ ( V i ) = Pr( � | V i ) c V − i and is called the compound diagnostic parameter for V i . 140 / 384

  36. Computing probabilities in singly connected graphs Lemma : ( ‘Data Fusion’ ) Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . Then for each V i ∈ V G : Pr( V i | � c V G ) = α · π ( V i ) · λ ( V i ) with compound causal parameter π , compound diagnostic parameter λ , and normalisation constant α . Proof : Follows directly from the previous lemma and the definitions of the compound parameters. � 141 / 384

  37. The separate parameters defined Definition : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) ; let Pr be the joint distribution defined by B . Let V i ∈ V G be a node with child V k ∈ σ ( V i ) and parent V j ∈ ρ ( V i ) ; • the function π V i V k : { v i , ¬ v i } → [0 , 1] is defined by π V i V k ( V i ) = Pr( V i | � ( Vi,Vk ) ) c V G + and is called the causal parameter from V i to V k . V j • the function λ V i : { v j , ¬ v j } → [0 , 1] is defined by V j ( Vj,Vi ) | V j ) λ V i ( V j ) = Pr( � c V G − and is called the diagnostic parameter from V i to V j . 142 / 384

  38. V ( G + ( V i ,V k ) ) V i V k V j V ( G − ( V j ,V i ) ) V i 143 / 384

  39. Separate parameters in directed trees V + k V i V k V j V − i V i 144 / 384

  40. Computing compound causal parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider a node V i ∈ V G and its parents ρ ( V i ) = { V i 1 , . . . , V i m } , m ≥ 1 . Then � � V ij π ( V i ) = γ ( V i | c ρ ( V i ) ) · V i ( c V ij ) π c ρ ( Vi ) j =1 ,...,m where c ρ ( V i ) = � j =1 ,...,m c V ij Note that each c V ij used in the product should be consistent with the c ρ ( V i ) from the summand! 145 / 384

  41. V ( G + V ( G + ( V i 1 ,V i ) ) V i 1 . . . V i m ( V im ,V i ) V + i V i 146 / 384

  42. Computing compound causal parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF π ( V i ) = Pr( V i | � c V + i ) = Pr( V i | � c V G + ( Vi 1 ,Vi ) ∧ . . . ∧ � c V G + ( Vim ,Vi ) ) � = Pr( V i | c ρ ( V i ) ∧ � ( Vi 1 ,Vi ) ∧ . . . ∧ � ( Vim ,Vi ) ) · c V G + c V G + c ρ ( Vi ) · Pr( c ρ ( V i ) | � ( Vi 1 ,Vi ) ∧ . . . ∧ � c V G + c V G + ( Vim ,Vi ) ) � � = Pr( V i | c ρ ( V i ) ) · Pr( c V ij | � c V G + ( Vij ,Vi ) ) c ρ ( Vi ) j =1 ,...,m � � V ij γ ( V i | c ρ ( V i ) ) · = π V i ( c V ij ) j =1 ,...,m c ρ ( Vi ) where c ρ ( V i ) = � j =1 ,...,m c V ij � 147 / 384

  43. Computing π in directed trees Lemma : Let B = ( G, Γ) be a Bayesian network with directed tree G . Consider a node V i ∈ V G and its parent ρ ( V i ) = { V j } . Then � V j π ( V i ) = γ ( V i | c V j ) · π V i ( c V j ) c Vj Proof : See the proof for the general case where G is a singly connected graph. Take into account that V i now only has a single parent V j . � 148 / 384

  44. Computing causal parameters in singly connected graphs Lemma : Let B = ( G, Γ) be a Bayesian network with singly connected graph G = ( V G , A G ) . Consider an uninstantiated node V i ∈ V G with m ≥ 1 children σ ( V i ) = { V i 1 , . . . , V i m } . Then � π V i λ V i V ij ( V i ) = α · π ( V i ) · V ik ( V i ) k =1 ,...,m, k � = j where α is a normalisation constant. 149 / 384

  45. 150 / 384

  46. Computing causal parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF π V i V ij ( V i ) = Pr( V i | � ( Vi,Vij ) ) c V G + = α ′ · Pr( � ( Vi,Vij ) | V i ) · Pr( V i ) c V G + � = α ′ · Pr( � c V + i ∧ ( c V G − � ( Vi,Vik ) ) | V i ) · Pr( V i ) k � = j i | V i ) · � = α ′ · Pr( � ( Vi,Vik ) | V i ) · Pr( V i ) c V + k � = j Pr( � c V G − i ) · � = α · Pr( V i | � k � = j Pr( � ( Vi,Vik ) | V i ) c V + c V G − = α · π ( V i ) · � k � = j λ V i V ik ( V i ) � 151 / 384

  47. Computing compound diagnostic parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider an uninstantiated node V i ∈ V G with m ≥ 1 children σ ( V i ) = { V i 1 , . . . , V i m } . Then � λ V i λ ( V i ) = V ij ( V i ) j =1 ,...,m 152 / 384

  48. 153 / 384

  49. Computing compound diagnostic parameters in singly connected graphs Proof : Let Pr be the joint distribution defined by B . Then DEF λ ( V i ) = Pr( � c V − | V i ) i = Pr( � ( Vi,Vi 1 ) ∧ . . . ∧ � ( Vi,Vim ) | V i ) c V G − c V G − ( Vi,Vi 1 ) | V i ) · . . . · Pr( � ( Vi,Vim ) | V i ) = Pr( � c V G − c V G − = λ V i V i 1 ( V i ) · . . . · λ V i V im ( V i ) � λ V i = V ij ( V i ) � j =1 ,...,m 154 / 384

  50. Computing diagnostic parameters in singly connected graphs Lemma : Let B = ( G, Γ) be as before. Consider a node V i ∈ V G with n ≥ 1 parents ρ ( V i ) = { V j 1 , . . . , V j n } . Then  �� � � � � V jk V jl  λ V i ( V j k ) = α · λ ( c V i ) · γ ( c V i | x ∧ V j k ) · π V i ( c V jl ) c Vi x = c ρ ( Vi ) \{ Vjk } l =1 ,...,n, l � = k where α is a normalisation constant. Note that each c V jl used in the product should be consistent with the x from the summand! Proof : see syllabus. � 155 / 384

  51. Computing separate λ ’s in directed trees Lemma : Let B = ( G, Γ) be a Bayesian network with directed tree G . Consider a node V i ∈ V G and its parent ρ ( V i ) = { V j } . Then � V j V i ( V j ) = λ ( c V i ) · γ ( c V i | V j ) λ c Vi 156 / 384

  52. Computing separate λ ’s in directed trees Proof : Let Pr be the joint distribution defined by B . Then V j DEF V i ( V j ) = Pr( � | V j ) λ c V − i = Pr( � | v i ∧ V j ) · Pr( v i | V j ) c V − i + Pr( � | ¬ v i ∧ V j ) · Pr( ¬ v i | V j ) c V − i = Pr( � | v i ) · Pr( v i | V j ) c V − i + Pr( � | ¬ v i ) · Pr( ¬ v i | V j ) c V − i = λ ( v i ) · γ ( v i | V j ) + λ ( ¬ v i ) · γ ( ¬ v i | V j ) � λ ( c V i ) · γ ( c V i | V j ) = � c Vi 157 / 384

  53. Pearl’s algorithm: detailed computation rules for inference For V i ∈ V G with ρ ( V i ) = { V j 1 , . . . , V j n } , σ ( V i ) = { V i 1 , . . . , V i m } : Pr( V i | � c V ) = α · π ( V i ) · λ ( V i ) n � � V jk γ ( V i | c ρ ( V i ) ) · π ( V i ) = π V i ( c V jk ) c ρ ( Vi ) k =1 � m λ V i λ ( V i ) = V ij ( V i ) dummy ! j =1 m � = α ′ · π ( V i ) · π V i λ V i V ij ( V i ) V ik ( V i ) dummy ! k =1 ,k � = j   �  � � n � � V jk V jl  V i ( V j k ) = α ′′ · λ ( c V i ) · γ ( c V i | x ∧ V j k ) · V i ( c V jl ) λ π c Vi x = c ρ ( Vi ) \{ Vjk } l =1 ,l � = k with normalisation constants α, α ′ , and α ′′ . 158 / 384

  54. Special cases: roots Let B = ( G, Γ) be a Bayesian network with singly connected graph G ; let Pr be the joint distribution defined by B . • Consider a node W ∈ V G with ρ ( W ) = ∅ The compound causal parameter π : { w, ¬ w } → [0 , 1] for W is defined by π ( W ) = Pr( W | � c W + ) (definition) ( W + = ∅ ) = Pr( W | T ) = Pr( W ) = γ ( W ) 159 / 384

  55. Special cases: leafs Let B = ( G, Γ) and Pr be as before. • Consider a node V with σ ( V ) = ∅ The compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V is defined as follows: • if node V is uninstantiated, then λ ( V ) = Pr( � c V − | V ) (definition) ( V − = { V } , V uninst . ) Pr( T | V ) = = 1 • if node V is instantiated, then c V − | V ) λ ( V ) = Pr( � (definition) = Pr( � c V | V ) ( σ ( V ) = ∅ ) � 1 for c V = � c V = 0 for c V � = � c V 160 / 384

  56. Special cases: uninstantiated (sub)graphs “a useful property” • Consider a node V ∈ V G and assume that � c V G = T(rue). The compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V is defined as follows: c V − | V ) λ ( V ) = Pr( � (definition) = Pr( T | V ) ( � c V G = T ) = 1 From the above it is clear that this property also holds for any node V for which � c V − = T . 161 / 384

  57. Pearl’s algorithm: a tree example Consider Bayesian network B = ( G, Γ) : γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Let Pr be the joint distribution defined by B . Assignment : compute Pr( V i ) , i = 1 , . . . , 5 . Start : Pr( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . λ ( V i ) = 1 for all V i . Why? As a result, no normalisation is required and Pr( V i ) = π ( V i ) . 162 / 384

  58. An example (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 π ( V 1 ) = γ ( V 1 ) . Why? Node V 1 computes: Pr( v 1 ) = π ( v 1 ) = γ ( v 1 ) = 0 . 7 Pr( ¬ v 1 ) π ( ¬ v 1 ) γ ( ¬ v 1 ) = = = 0 . 3 Node V 1 computes for node V 2 : π V 1 V 2 ( V 1 ) = π ( V 1 ) (why?) 163 / 384

  59. An example (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes: Pr( v 2 ) = π ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = γ ( v 2 | v 1 ) · π ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π ( ¬ v 1 ) = 0 . 5 · 0 . 7 + 0 . 4 · 0 . 3 = 0 . 47 Pr( ¬ v 2 ) = π ( ¬ v 2 ) = 0 . 5 · 0 . 7 + 0 . 6 · 0 . 3 = 0 . 53 164 / 384

  60. An example (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes for node V 3 : π V 2 V 3 ( V 2 ) = π ( V 2 ) Are all causal parameters sent by a node equal to its compound causal parameter? 165 / 384

  61. An example (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr( v 3 ) = π ( v 3 ) = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 V 3 ( ¬ v 2 ) = γ ( v 3 | v 2 ) · π ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π ( ¬ v 2 ) = 0 . 2 · 0 . 47 + 0 . 3 · 0 . 53 = 0 . 253 Pr( ¬ v 3 ) = π ( ¬ v 3 ) = 0 . 8 · 0 . 47 + 0 . 7 · 0 . 53 = 0 . 747 166 / 384

  62. An example (6) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 In a similar way, we find that Pr( v 4 ) = 0 . 376 , Pr( ¬ v 4 ) = 0 . 624 Pr( v 5 ) = 0 . 310 , Pr( ¬ v 5 ) = 0 . 690 � 167 / 384

  63. Pearl’s algorithm: a singly connected example Consider Bayesian network B = ( G, Γ) : γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Let Pr be the joint distribution defined by B . Assignment : compute Pr( V 1 ) = α · π ( V 1 ) · λ ( V 1 ) . λ ( V 1 ) = 1 , so no normalisation is required. 168 / 384

  64. An example (2) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 1 computes: Pr( v 1 ) = π ( v 1 ) = γ ( v 1 | v 2 ∧ v 3 ) · π V 2 V 1 ( v 2 ) · π V 3 V 1 ( v 3 ) + + γ ( v 1 | ¬ v 2 ∧ v 3 ) · π V 2 V 1 ( ¬ v 2 ) · π V 3 V 1 ( v 3 ) + + γ ( v 1 | v 2 ∧ ¬ v 3 ) · π V 2 V 1 ( v 2 ) · π V 3 V 1 ( ¬ v 3 ) + + γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) · π V 2 V 1 ( ¬ v 2 ) · π V 3 V 1 ( ¬ v 3 ) = 0 . 8 · 0 . 1 · 0 . 4 + 0 . 9 · 0 . 9 · 0 . 4+ + 0 . 5 · 0 . 1 · 0 . 6 + 0 . 6 · 0 . 9 · 0 . 6 = 0 . 71 Pr( ¬ v 1 ) = 0 . 29 � 169 / 384

  65. Instantiated nodes Let B = ( G, Γ) be a Bayesian network with singly connected graph G ; let Pr be as before. • Consider an instantiated node V ∈ V G , for which evidence V = true is obtained. For the compound diagnostic parameter λ : { v, ¬ v } → [0 , 1] for V we have that c V − | v ) λ ( v ) = Pr( � (definition) = Pr( � c V − \{ V } ∧ v | v ) = ?? ( unless σ ( V ) = ∅ in which case λ ( v ) = 1) c V − | ¬ v ) λ ( ¬ v ) = Pr( � (definition) = Pr( � c V − \{ V } ∧ v | ¬ v ) = 0 The case with evidence V = false is similar. 170 / 384

  66. Entering evidence Consider the following fragment of graph G (in black) of a Bayesian network: Suppose evidence is obtai- ned for node V . V Entering evidence is model- λ V D led by extending G with a ‘dummy’ child D for V . D The dummy node sends the diagnostic parameter λ V D to V with λ V λ V D ( v ) = 1 , D ( ¬ v ) = 0 for evidence V = true λ V λ V D ( v ) = 0 , D ( ¬ v ) = 1 for evidence V = false 171 / 384

  67. Entering evidence: a tree example Let Pr and B be as before: γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Evidence V 1 = false is entered. Assignment : compute Pr ¬ v 1 ( V i ) . Start : Pr ¬ v 1 ( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . For i = 2 , . . . , 5 , we have that λ ( V i ) = 1 . Why? For those nodes we thus have Pr( V i ) = π ( V i ) . 172 / 384

  68. An example with evidence V 1 = false (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 1 now computes: Pr ¬ v 1 ( v 1 ) = α · π ( v 1 ) · λ ( v 1 ) = 0 Pr ¬ v 1 ( ¬ v 1 ) = α · π ( ¬ v 1 ) · λ ( ¬ v 1 ) = α · 0 . 3 Pr ¬ v 1 ( v 1 ) = 0 , Pr ¬ v 1 ( ¬ v 1 ) = 1 Normalisation gives: Node V 1 computes for node V 2 : π V 1 V 2 ( V 1 ) = α · π ( V 1 ) · λ V 1 V 5 ( V 1 ) · λ V 1 D ( V 1 ) = ? 173 / 384

  69. An example with evidence V 1 = false (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 γ ( v 4 | ¬ v 2 ) = 0 V 3 V 4 Node V 2 computes: Pr ¬ v 1 ( v 2 ) = π ( v 2 ) = γ ( v 2 | v 1 ) · π V 1 V 2 ( v 1 ) + γ ( v 2 | ¬ v 1 ) · π V 1 V 2 ( ¬ v 1 ) = 0 . 5 · 0 + 0 . 4 · 1 = 0 . 4 Pr ¬ v 1 ( ¬ v 2 ) = π ( ¬ v 2 ) = 0 . 5 · 0 + 0 . 6 · 1 = 0 . 6 Node V 2 computes for node V 3 : π V 2 V 3 ( V 2 ) = π ( V 2 ) Why? 174 / 384

  70. An example with evidence V 1 = false (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr ¬ v 1 ( v 3 ) = π ( v 3 ) = γ ( v 3 | v 2 ) · π V 2 V 3 ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π V 2 V 3 ( ¬ v 2 ) = γ ( v 3 | v 2 ) · π ( v 2 ) + γ ( v 3 | ¬ v 2 ) · π ( ¬ v 2 ) = 0 . 2 · 0 . 4 + 0 . 3 · 0 . 6 = 0 . 26 Pr ¬ v 1 ( ¬ v 3 ) = 0 . 8 · 0 . 4 + 0 . 7 · 0 . 6 = 0 . 74 175 / 384

  71. An example with evidence V 1 = false (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 In a similar way, we find that Pr ¬ v 1 ( v 4 ) = 0 . 32 , Pr ¬ v 1 ( ¬ v 4 ) = 0 . 68 Pr ¬ v 1 ( v 5 ) = 0 . 80 , Pr ¬ v 1 ( ¬ v 5 ) = 0 . 20 � 176 / 384

  72. Another piece of evidence: tree example Let Pr and B be as before: γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 The additional evidence V 3 = true is entered. Assignment : compute Pr ¬ v 1 ,v 3 ( V i ) . Start : Pr ¬ v 1 ,v 3 ( V i ) = α · π ( V i ) · λ ( V i ) , i = 1 , . . . , 5 . Which parameters can be re-used and which should be updated? 177 / 384

  73. Another example (2) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 For i = 4 , 5 , we have that λ ( V i ) = 1 . For those two nodes we thus have Pr( V i ) = π ( V i ) . The probabilities for V 1 remain unchanged: Pr ¬ v 1 ,v 3 ( v 1 ) = 0 , Pr ¬ v 1 ,v 3 ( ¬ v 1 ) = 1 The probabilities for node V 5 remain unchanged. Why? Therefore Pr ¬ v 1 ,v 3 ( v 5 ) = Pr ¬ v 1 ( ¬ v 5 ) = 0 . 8 , Pr ¬ v 1 ,v 3 ( ¬ v 5 ) = 0 . 2 178 / 384

  74. Another example (3) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 3 computes: Pr ¬ v 1 ,v 3 ( v 3 ) = α · π ( v 3 ) · λ ( v 3 ) = α · π ( v 3 ) = α · 0 . 26 Why? Pr ¬ v 1 ,v 3 ( ¬ v 3 ) = α · π ( ¬ v 3 ) · λ ( ¬ v 3 ) = 0 After normalisation: Pr ¬ v 1 ,v 3 ( v 3 ) = 1 , Pr ¬ v 1 ,v 3 ( ¬ v 3 ) = 0 V 3 ( V 2 ) = � Node V 3 computes for node V 2 : λ V 2 c V 3 λ ( V 3 ) · γ ( c V 3 | V 2 ) 179 / 384

  75. Another example (4) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 2 computes: Pr ¬ v 1 ,v 3 ( v 2 ) = α · π ( v 2 ) · λ ( v 2 ) = α · π ( v 2 ) · λ V 2 V 3 ( v 2 ) · λ V 2 V 4 ( v 2 ) = α · π ( v 2 ) · γ ( v 3 | v 2 ) = α · 0 . 4 · 0 . 2 = α · 0 . 08 Pr ¬ v 1 ,v 3 ( ¬ v 2 ) = α · π ( ¬ v 2 ) · λ ( ¬ v 2 ) = α · π ( ¬ v 2 ) · λ V 2 V 3 ( ¬ v 2 ) · λ V 2 V 4 ( ¬ v 2 ) = α · π ( ¬ v 2 ) · γ ( v 3 | ¬ v 2 ) = α · 0 . 6 · 0 . 3 = α · 0 . 18 Normalisation results in: Pr ¬ v 1 ,v 3 ( v 2 ) = 0 . 31 , Pr ¬ v 1 ,v 3 ( ¬ v 2 ) = 0 . 69 180 / 384

  76. Another example (5) γ ( v 1 ) = 0 . 7 V 1 γ ( v 2 | v 1 ) = 0 . 5 γ ( v 2 | ¬ v 1 ) = 0 . 4 γ ( v 5 | ¬ v 1 ) = 0 . 8 V 2 V 5 γ ( v 3 | v 2 ) = 0 . 2 γ ( v 5 | v 1 ) = 0 . 1 γ ( v 3 | ¬ v 2 ) = 0 . 3 γ ( v 4 | v 2 ) = 0 . 8 V 3 V 4 γ ( v 4 | ¬ v 2 ) = 0 Node V 2 computes for node V 4 : π V 2 V 4 ( V 2 ) = α · π ( V 2 ) · λ V 2 V 3 ( V 2 ) ⇒ 0 . 31 and 0 . 69 Node V 4 computes: Pr ¬ v 1 ,v 3 ( v 4 ) = π ( v 4 ) = γ ( v 4 | v 2 ) · π V 2 V 4 ( v 2 ) + γ ( v 4 | ¬ v 2 ) · π V 2 V 4 ( ¬ v 2 ) = γ ( v 4 | v 2 ) · π V 2 V 4 ( v 2 ) + 0 = 0 . 8 · 0 . 31 = 0 . 248 Pr ¬ v 1 ,v 3 ( ¬ v 4 ) = 0 . 2 · 0 . 31 + 1 . 0 · 0 . 69 = 0 . 752 181 / 384 �

  77. Entering evidence: a singly connected example Let Pr and B be as before: γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Evidence V 1 = true is entered. Assignment : compute Pr v 1 ( V 2 ) = α · π ( V 2 ) · λ ( V 2 ) . 182 / 384

  78. An example with evidence V 1 = true (2) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 1 computes for node V 2 : λ V 2 = λ ( v 1 ) · [ γ ( v 1 | v 2 ∧ v 3 ) · π V 3 V 1 ( v 2 ) V 1 ( v 3 ) + γ ( v 1 | v 2 ∧ ¬ v 3 ) · π V 3 V 1 ( ¬ v 3 )] + λ ( ¬ v 1 ) · [ γ ( ¬ v 1 | v 2 ∧ v 3 ) · π V 3 V 1 ( v 3 ) + γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) · π V 3 V 1 ( ¬ v 3 )] = = 0 . 8 · 0 . 4 + 0 . 5 · 0 . 6 = 0 . 62 λ V 2 V 1 ( ¬ v 2 ) = 0 . 9 · 0 . 4 + 0 . 6 · 0 . 6 = 0 . 72 183 / 384

  79. An example with evidence V 1 = true (3) γ ( v 2 ) = 0 . 1 γ ( v 3 ) = 0 . 4 V 2 V 3 γ ( ¬ v 2 ) = 0 . 9 γ ( ¬ v 3 ) = 0 . 6 γ ( v 1 | v 2 ∧ v 3 ) = 0 . 8 γ ( ¬ v 1 | v 2 ∧ v 3 ) = 0 . 2 V 1 γ ( v 1 | ¬ v 2 ∧ v 3 ) = 0 . 9 γ ( ¬ v 1 | ¬ v 2 ∧ v 3 ) = 0 . 1 γ ( v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( ¬ v 1 | v 2 ∧ ¬ v 3 ) = 0 . 5 γ ( v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 6 γ ( ¬ v 1 | ¬ v 2 ∧ ¬ v 3 ) = 0 . 4 Node V 2 computes: Pr v 1 ( v 2 ) = α · π ( v 2 ) · λ ( v 2 ) = α · γ ( v 2 ) · λ V 2 V 1 ( v 2 ) = = α · 0 . 1 · 0 . 62 = 0 . 062 α Pr v 1 ( ¬ v 2 ) = α · 0 . 9 · 0 . 72 = 0 . 648 α Normalisation gives: Pr v 1 ( v 2 ) ∼ 0 . 087 , Pr v 1 ( ¬ v 2 ) ∼ 0 . 913 � 184 / 384

  80. The message passing Initially, the Bayesian network is in a stable situation. evidence λ π Once evidence is entered into the network, this stability is disturbed. 185 / 384

  81. The message passing, continued Evidence initiates message passing throughout the entire network: When each node in the network has been visited by the message passing algorithm, the network re- turns to a new stable situa- tion. 186 / 384

  82. Pearl: some complexity issues Consider a Bayesian network B with singly connected digraph G with n ≥ 1 nodes. Suppose that node V has O ( n ) parents and O ( n ) children: W 1 . . . W i . . . W p ρ ( V ) V Z 1 . . . Z j . . . Z s σ ( V ) • Computing the compound causal parameter requires at most O (2 n ) time: � � π W i π ( V ) = γ ( V | c ρ ( V ) ) · V ( c W i ) c ρ ( V ) k =1 ,...,p 187 / 384

  83. Complexity issues (2) ρ ( V ) W 1 . . . W i . . . W p V σ ( V ) Z 1 . . . Z j . . . Z s • Computing the compound diagnostic parameter requires at most O ( n ) time: � λ V λ ( V ) = Z j ( V ) j =1 ,...,s A node can therefore compute the probabilities for its values in at most O (2 n ) time. 188 / 384

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend