SLIDE 1
ECE 4524 Artificial Intelligence and Engineering Applications
Lecture 17: Bayesian Inference Reading: AIAMA 13.5 and MacKay book Chapter 28 Today’s Schedule:
◮ Bayes’ Rule and its implications ◮ Causal versus Diagnostic Reasoning ◮ Combining Evidence ◮ Conditional Independence ◮ Examples
SLIDE 2 Bayes’ Theorem
Consider a joint probability P(A, B) with A, B ∈ A. We can factor it using conditionals one of two ways: P(A, B) = P(A|B)P(B) = P(B|A)P(A) Rearranging give Bayes rule for probabilities: P(A|B) = P(B|A)P(A) P(B)
P(B|A) = P(A|B)P(B) P(A) This same relation holds for PMFs and PDFs.
SLIDE 3 Bayes’ Theorem (Discrete Case)
Given a discrete r.v. X and some data D: p(x|D) =
p(D|x)p(x)
Posterior = Likelihood Prior Evidence
SLIDE 4 Bayes’ Theorem (Continuous Case)
Given a continuous random variable X and some data D: f (x|D) =
f (D|x)f (x)
Posterior = Likelihood Prior Evidence
SLIDE 5
Models
To specify the likelihood, we need a way to generate the probability of the data,D, given x. This is a forward or generative model that depends on x.
SLIDE 6
Causal versus Diagnostic Reasoning
Two ways to view Bayes’ rule: P(cause|effect) = P(effect|cause)P(cause)
P(effect)
This allows us to do causal reasoning using Models. P(effect|cause) = P(cause|effect)P(effect)
P(cause)
This allows us to do diagnostic reasoning using Models.
SLIDE 7 Warmup #1
There is a test for a deadly disease you could have. A test
- utcome of T=0 implies you do not have the disease and T=1
that you do. The test is 95% reliable (meaning it is correct 95% of the time). Given your age and family history you have a 1% prior probability of having the disease. The test comes back positive (T=1). How worried are you and why?
SLIDE 8
Exercise
Suppose a Robot has an acoustic sensor that measures distance to an obstacle every T seconds. The sensor has an associated error represented as a bias and variance from the true distance. Establish a probability model for this problem, making appropriate suggestions for the form of any probability densities.
SLIDE 9
Another Classic Example (Pearl 1988, McKay 2003)
◮ Fred lives in Los Angeles and commutes 60 miles to work.
Whilst at work, he receives a phone-call from his neighbor saying that Fred’s burglar alarm is ringing. What is the probability that there was a burglar in his house today?
◮ While driving home to investigate, Fred hears on the radio
that there was a small earthquake that day near his home. ‘Oh’, he says, feeling relieved, ‘it was probably the earthquake that set off the alarm’. What is the probability that there was a burglar in his house?
SLIDE 10
Combining Evidence
◮ Conditional Independence ◮ Factoring the Joint Probability
Recall that the joint probability distribution tells us all we need to know to make inferences. However,
◮ The complexity of Bayesian inference is dominated by the
dimensionality of the joint density.
◮ For every additional evidence feature introduced the data
required to estimate the parameters goes up by a factor of at least 10, for even simple N-D Gaussians.
◮ For 100s of features, most samples from a 100-D Gaussian
distribution are not even inside the variance ellipsoid!
◮ This is even worse for more complex joint distributions.
SLIDE 11
Warmup #2
You are given the prior probability of an event A, P(A) = 0.7. There is another event, B, that we know the outcome of. Describe briefly what effect there is on our knowledge of event A in three cases: if P(A|B) < P(A), if P(A|B) > P(A), and if P(A|B) = P(A).
SLIDE 12 The naive Bayes model
Let C be a condition (class) and Ei the evidence for that condition (feature), the naive Bayes model assumes a factorization of the joint probability: P(C, E1, E2, · · · , EN) = P(C)
N
P(Ei|C) i.e. that the evidence features are independent given the condition.
SLIDE 13
Example from Wumpus World
SLIDE 14
Next Actions
◮ Reading on Bayesian Networks AIAMA 14.1-14.3 ◮ There is no warmup.
Quiz II is Thursday 3/22. Covers lectures 9-15 (PL and FOL).