An Introduction to Bayesian Network Inference using Variable Elimination
Jhonatan Oliveira Department of Computer Science University of Regina
An Introduction to Bayesian Network Inference using Variable - - PowerPoint PPT Presentation
An Introduction to Bayesian Network Inference using Variable Elimination Jhonatan Oliveira Department of Computer Science University of Regina Outline Introduction F B Background Bayesian networks L D Variable
Jhonatan Oliveira Department of Computer Science University of Regina
L F D B H
Introduction
Bayesian networks are probabilistic graphical models used when reasoning under uncertainty.
family out bowel problem dog out dog out
family out bowel problem dog out dog out
family out bowel problem dog out dog out
TrueSkill™
Turbo Codes
Mars Exploration Rover
Background
Probability theory: introducing joint probability distribution, chain rule, and conditional independence
configuration (combination of variable’s values) of the variables
Lights On Family Out Dog Out Bowel Problem Hear Bark P(L,F,D,B,H) 0.01 1 0.25 1 0.08 1 0.19
Lights On Family Out Dog Out Bowel Problem Hear Bark P(L,F,D,B,H) 0.01 1 0.25 1 0.08 1 0.19
1st Query 2nd Query
The size issue = 32 probabilities
Conditional Probability Tables
P(…) = P(L) P(F|L) P(D|L,F) P(B|L,F,D) P(H|L,F,D,B)
The size issue = 62 probabilities
family out dog out
Given:
dog out hear bark
family out dog out
Given:
dog out hear bark
Independence I(family out, dog out, hear bark):
family out hear bark dog out
I(L,F,D)
P(L) P(F|L) P(D|L,F) P(B|L,F,D) P(H|L,F,D,B) P(L,F,D,B,H) P(L) P(F|L) P(D|F) P(B|L,F,D) P(H|L,F,D,B) P(L) P(F|L) P(D|F) P(B|L,D) P(H|L,F,D,B)
Chain Rule I(D,F,L) I(B, ,F)
?
Bayesian network
A graphical interpretation of probability theory
Lights on Family out Dog out Bowel problem Hear bark
L F D B H
A set of variables X is d-separated from a set of variables Y in the DAG if all paths from X to Y are blocked
L F D B H
Is F d-separated from H given D? Yes, namely, I(F,D,H) holds in P(L,F,D,B,H)
L F D B H
P(F) P(B) P(D|B,F) P(L|F) P(H|D) The size issue = 18 probabilities
L F D B H
P(H|D) P(F) P(B) P(D|B,F) P(L|F)
A directed acyclic graph B and a set of conditional probability tables P(U) = P(v | Pa(v)), where v is in B and Pa(v) are the parents of v
L F D B H
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L,F,D,B,H) =
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L)
part
P(L,F,D,B,H)
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L)
X
P(L,F)
+
F
L F P(L|F) 0.8 1 0.3 1 0.2 1 1 0.7
Multiplication
F P(F) 0.8 1 0.3 L F P(L,F) 0.64 1 0.09 1 0.16 1 1 0.21
X =
Marginalization
L F P(L,F) 0.2 1 0.3 1 0.4 1 1 0.1 L P(F) 0.5 1 0.5
+ =
F
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L)
Shafer-Shennoy Lauritzen and Spiegalhalter Hugin Lazy Propagation Variable Elimination
Variable Elimination
Eliminates all variables that are not in the query
Input: factorization F, elimination ordering L, query X, evidence Y Output: P(X|Y) For each variable v in L: multiply all CPTs in F involving v yielding CPT P1 marginalize v out of P1 remove all CPTs from F involving v append P1 to F Multiply all remaining CPTs in F yielding P(X,Y) return P(X|Y) = P(X,Y) / P(Y)
L F D B H
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L,F,D,B,H) = P(H | L)?
P(H|D) P(F) P(B) P(D|B,F) P(L|F) Factorization: Query variable: H Evidence variable: L=1 Elimination ordering: B, F, D Input
P(H|D) P(F) P(B,D|F) = P(B) P(D|B,F) P(D|F) = marginalize B from P(B,D|F) P(L|F) Eliminating B Factorization: P(D|F) P(H|D) P(D,F,L) = P(L|F) P(F) P(D|F) P(D,L) = marginalize F from P(D,F,L) Eliminating F Factorization: P(D,L)
P(D,H,L) = P(H|D) P(D,L) P(H,L) = marginalize D from P(D,H,L) Eliminating D Factorization: P(H,L) Output P(L) = marginalize H from P(H,L) P(H|L) = P(H,L) / P(L)
Repeated Computation
Variable Elimination can perform repeated computation
L F D B H
P(H|D) P(F) P(B) P(D|B,F) P(L|F) P(L,F,D,B,H) = P(H | F)?
P(H|D) P(F) P(B) P(D|B,F) P(L|F) Factorization: Query variable: H Evidence variable: F=1 Elimination ordering: L, B, D Input
1(F) = marginalize L from P(L|F) Eliminating L Factorization: Eliminating B P(H|D) P(F) P(B) P(D|B,F) P(H|D) P(F) P(B,D|F) = P(B) P(D|B,F) P(D|F) = marginalize B from P(B,D|F) Factorization: P(D|F)
P(D,H|F) = P(H|D) P(D|F) P(H|F) = marginalize D from P(D,H|F) Eliminating D Factorization: P(H|F) Output P(F) = marginalize H from P(F, H) P(H|F) = P(F,H) / P(F) P(F) Multiply all: P(F,H) = P(F) P(H|F)
P(H|D) P(F) P(B,D|F) = P(B) P(D|B,F) P(D|F) = marginalize B from P(B,D|F) P(L|F) Eliminating B Factorization: P(D|F) Eliminating B P(H|D) P(F) P(B,D|F) = P(B) P(D|B,F) P(D|F) = marginalize B from P(B,D|F) Factorization: P(D|F)
P(H|L) P(H|F)
P(B) P(D|B,F)
D,F,L
P(L|F) P(F)
D,B,F
P(D|F)
D,H,L
P(H|D)
P(D,L)
H,L
P(H,L)
Answering P(H|L)
P(B) P(D|B,F)
D,F,L
P(L|F) P(F)
D,B,F
P(D|F)
D,H,L
P(H|D)
P(D,L)
H,L
P(H,L)
Answering P(H|F)
probabilistic graphical models
by Variable Elimination
how to avoid repeated computation during Variable Elimination
L F D B H
instance of Pearl's "belief propagation" algorithm", IEEE Journal on Selected Areas in Communications 16 (2): 140–152, doi:10.1109/49.661103, ISSN 0733-8716.
Descent," Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, Beijing, 2006, pp. 5112-5117
MIT Press 2009.
University Press.