SLIDE 1 ECE 4524 Artificial Intelligence and Engineering Applications
Lecture 20: Approximate Inference Reading: AIAMA 14.5, see also MacKay Chapters 29 and Chapters 27 and 33 Today’s Schedule:
◮ Inference by simulation
◮ sampling random variables ◮ direct sampling of BN, rejection and weighting ◮ Gibbs sampling
◮ Inference by optimization (if time)
◮ KL Divergence
SLIDE 2 Generating Random Variates
An essential part of approximate inference by simulation is the sampling of random variates from arbitrary probability
- distributions. Some essential questions
◮ How is it possible for a deterministic computer to generate
random numbers?
◮ What distributions are generally available in programming
languages? The most common approach is to use pseudo-random number generators (PRNGs).
SLIDE 3
PRNGs for uniform variates
◮ PRNGs are iterated functions, recurrence relations
xk+1 = f (xk) that display chaotic behavior (sensitive dependence on initial conditions). x0, the initial condition, is called the seed.
◮ A simple example is the logistic map
xk+1 = rxk(1 − xk) for x ∈ [0, 1] and r > 0 which for r > 3.54 displays chaotic behavior. It is not very uniform though, which is a goal of a good PRNG.
◮ The most common PRNG is the Mersenne twister.
SLIDE 4
Generating arbitrary variates
So using a PRNG we can generate a random variate from a uniform distribution, U(0, N). How do we generate other arbitrary variates?
◮ Transformation Approach ◮ Rejection Sampling
SLIDE 5 Sampling a Bayesian Network
Recall the BN defines a factorization of the joint density P(X1, X2, · · · , Xn) =
n
P(Xi|parents(Xi))
◮ To sample from the joint density we sample from the
conditional probabilities in order from causes to effects.
◮ We then build a histogram and normalize it to a density.
SLIDE 6 BN with evidence nodes
When sampling from a BN with evidence nodes we can just disregard (reject) samples that conflict with the known value of the
- evidence. This is known as rejection sampling.
SLIDE 7
Likelihood Weighting
Rejecting samples is wasteful, so instead, we can use them to generate weights.
SLIDE 8
Gibbs Sampling
A different approach is to randomly perturbing nodes.
◮ Given an initial variate of the BN
X 0
1 , X 0 2 , · · · , X 0 n ◮ randomly choose (or cycle in order) a node Xi and generate a
new variate conditioned on the existing variates in its Markov blanket. Xi ∼ P(X k+1
i
|M(X k
i )) ◮ We accumulate a histogram as before and normalize.
SLIDE 9
Gibbs Sampling
SLIDE 10 Computing P(Zi|mb(Zi))
To compute the conditional of the non-evidence variables given thier Markov Blanket, we use the values in the blanket (the parents, children, and children’s parents) at the previous iteration. P(Zi|mb(Zi)) ∝ P(Zi|parents(Zi))
P(Yj|parents(Yj))
SLIDE 11 Variational Bayes
Sampling can be computationally costly, especially for continuous R.V.s. A different approach to sampling is to approximate the posterior f (x|e) by a parameterized function q(x; λ).
◮ The quality of the approximation is measured using the
KullbackLeibler (KL) divergence DKL[q||f ] =
q(x; λ) f (x|e) dx which is zero when q = f .
◮ The goal is then to find the parameters λ that minimize the
divergence, converting the inference problem to an
A common choice for q is a Gaussian.
SLIDE 12
Next Actions
◮ Reading on Decision Theory and Utility (AIAMA 16.1-16.3) ◮ Complete (really simple) warmup before noon on 4/3.
Note: You now have all you need to complete PS 3. It is due on 4/5.