Bayesian Networks Volker Sorge Intro to AI: Specifying Probability - - PowerPoint PPT Presentation

▶

Sep 25, 2023 49 likes •181 views

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8 Volker Sorge Introduction A Bayesian

SLIDE 1

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Bayesian Networks

Volker Sorge

SLIDE 2

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Specifying Probability Distributions

◮ Specifying a probability for every atomic event is

impractical

◮ We have already seen it can be easier to specify

probability distributions by using (conditional) independence

◮ Bayesian (Belief) Networks allow us

◮ to specify any distribution, ◮ to specify such distributions concisely if there is

(conditional) independence, in a natural way

SLIDE 3

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Idea of a Bayesian Network

◮ Fix set of random variables {X1, . . . , Xn}. ◮ If every variable takes k values, we have to compute kn

conditional probabilities to get the complete set of probability distributions.

◮ A Bayesian Network tries to avoid this by representing

direct influences between random variables and restricting the necessary probability distributions that need to be computed to those direct influences.

SLIDE 4

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Setting Up a Bayesian Network

◮ Every random variable {X1, . . . , Xn} is a node in the

network.

◮ Influences are given by directed edges between nodes. ◮ Each node holds the joint probability distribution of

with its parents nodes.

◮ If we do this naively, we can still end up computing

close to kn probabilities.

◮ If we exploit conditional independence, we can reduce

complexity to kn. Every node of a Bayesian Network is conditionallly independent of its non-descendants given its parents.

SLIDE 5

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Example: Bayesian Net

Oversleeps A P(A) .6 Pershore closed B P(B) .2 Volker late C A B P(C) T T .9 T F .7 F T .8 F F .2 Mark late D B P(D) T .3 F .4 Committee cancelled E C D P(E) T T .9 T F .4 F T .5 F F .3

SLIDE 6

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Probabilistic Inference: Goal

◮ Compute the probability distribution for some event

given some evidence.

◮ More formally:

◮ Let Q be a set of query variables, ◮ let E be a set of evidence variables, ◮ compute P(Q|E).

◮ Here evidence means that we know the exact event for

the variables in E.

◮ E.g., we know that Volker has overslept, how likely is it

that the committee will be cancelled?

SLIDE 7

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Types of Inference

Diagnostic Inferences From effects to causes. How likely is a cause for some observed event? Causal Inferences From causes to effects. How likely will some observed events cause some other event? Intercausal Inferences Between causes of a common effect. How likely is that if we know one cause for an event, that some other cause is also happening? This is also sometimes called “explaining away”. Mixed Inferences Combining two or more of the above.

SLIDE 8

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Inferences in Bayesian Nets

A schematic overview for queries Q and observed evidence E: Diagnostic Causal Intercausal Mixed Q E E Q Q E E Q E

SLIDE 9

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Inference Examples

Diagnostic P(B|E) = [< .21, .79 >] Causal P(E|A) = [< .533, .467 >] Intercausal P(C|D) = [< .557, .443 >] Mixed P(C|B, E) = [< .904, .096 >] Computed with http://aispace.org/bayes/.

SLIDE 10

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Notation

◮ P stands for simple probability. ◮ P stands for a probability distribution (i.e., a set of

probabilities).

◮ P(A|B) denotes the probability of A under the

condition B.

◮ P(A|B) denotes the probability distribution for A under

the condition B.

◮ P(A, B) is the not yet normalised distribution for A

under the condition B. That is, αP(A, B) = P(A|B).

◮ Finally small letters stand for probability variables that

have to be summed out (sometimes called nuisance variables).

SLIDE 11

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Example Computation

◮ P(B|E = T) = αP(B, E = T) ◮ We compute P(B, E) by summing out the remaining

variables A, C, D.

◮ We will write a, c, d for the respective events. ◮ This means we have to compute

a
c
d P(B, E, a, c, d), which is the (not

normalised) distribution of B under the assumption that E = T, while summing out a, c, d.

◮ A simple example how to sum out is:

a P(B, a) = P(B|A) + P(B|¬A).

SLIDE 12

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

The great advantage of a Bayesian network is that we effectively can use all conditional probabilities given to express the term P(B, E, a, c, d) as follows: P(B, E) =

P(B, E, a, c, d) =

P(B)P(a)P(d|B)P(c|B, a We observe that all the probability distributions on the right hand side are indeed fully given in the network. The summing out works as follows: P(B, E) =

P(B, E, a, c, d) =

P(B)P(a)P(d|B)P(c|B, a =

P(B)P(a)P(d|B)P(c|B, a)P(E|c, d) =

P(B)P(A)P(d|B)P(c|B, A)P(E|c, d) +

P(B)P(¬A)P(d|B)P(c|B, ¬A)P(E|c, d) =

P(B)P(A)P(d|B)P(C|B, A)P(E|C, d)

SLIDE 13

Intro to AI: Lecture 8 Volker Sorge Introduction A Bayesian Network Inference in Bayesian Networks

Questions

◮ In the above Bayesian Network, give examples for the

following concepts

◮ independent events, ◮ conditionally independent events, and ◮ dependent events.

◮ Consider again the example network. Compute the

mixed inference with respect to observed evidence that no one overslept and the committee was cancelled (i.e., A = T and E = T). How likely is it that Volker was late?