SLIDE 7 P(a,b) = P(b|a) P(a) By the chain rule, for any probability distribution, we have: ) | , , , ( ) ( ) , , , , (
1 5 4 3 2 1 5 4 3 2 1
x x x x x P x P x x x x x P =
) , | , , ( ) | ( ) (
2 1 5 4 3 1 2 1
x x x x x P x x P x P = ) , , | , ( ) , | ( ) | ( ) (
3 2 1 5 4 2 1 3 1 2 1
x x x x x P x x x P x x P x P = ) , , , | ( ) , , | ( ) , | ( ) | ( ) (
4 3 2 1 5 3 2 1 4 2 1 3 1 2 1
x x x x x P x x x x P x x x P x x P x P = ) | ( ) | ( ) | ( ) | ( ) (
4 5 3 4 2 3 1 2 1
x x P x x P x x P x x P x P =
But if we exploit the assumed modularity of the probability distribution over the 5 variables (in this case, the assumed Markov chain structure), then that expression simplifies:
1
x
2
x
3
x
4
x
5
x
Now our marginalization summations distribute through those terms:
∑ ∑ ∑ ∑ ∑ ∑
=
1 2 3 4 5 5 4 3 2
) | ( ) | ( ) | ( ) | ( ) ( ) , , , , (
4 5 3 4 2 3 1 2 1 , , , 5 4 3 2 1 x x x x x x x x x
x x P x x P x x P x x P x P x x x x x P