Out line Wrap up d-separ at ion I nf erence in Bayes Net s Bayes - - PDF document

out line
SMART_READER_LITE
LIVE PREVIEW

Out line Wrap up d-separ at ion I nf erence in Bayes Net s Bayes - - PDF document

Out line Wrap up d-separ at ion I nf erence in Bayes Net s Bayes Net s (cont ) Variable Eliminat ion CS 486/ 686 Univer sit y of Wat erloo May 31, 2005 2 CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K.


slide-1
SLIDE 1

1

Bayes Net s (cont )

CS 486/ 686 Univer sit y of Wat erloo May 31, 2005

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

2

Out line

  • Wrap up d-separ at ion
  • I nf erence in Bayes Net s
  • Variable Eliminat ion

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

3

D-Separat ion: I nt uit ions

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

4

D-Separat ion: I nt uit ions

  • Subway and Therm are dependent ; but are independent

given Flu (since Flu blocks t he only pat h)

  • Aches and Fever are dependent ; but are independent

given Flu (since Flu blocks t he only pat h). Similarly f or Aches and Therm (dependent , but indep. given Flu).

  • Flu and Mal are indep. (given no evidence): Fever blocks

t he pat h, since it is not in evidence, nor is it s descendant

  • Therm. Flu,Mal are dependent given Fever (or given

Therm): not hing blocks pat h now.

  • Subway,Exot icTrip are indep.; t hey are dependent given

Therm; t hey are indep. given Therm and Malaria. This f or exact ly t he same reasons f or Flu/ Mal above.

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

5

I nf erence in Bayes Net s

  • The independence sanct ioned by D-

separat ion (and ot her met hods) allows us t o comput e prior and post erior probabilit ies quit e ef f ect ively.

  • We' ll look at a couple simple examples

t o illust rat e. We' ll f ocus on net works wit hout loops. (A loop is a cycle in t he underlying undir ect ed gr aph. Recall t he dir ect ed gr aph has no cycles.)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

6

Simple Forwar d I nf erence (Chain)

  • Comput ing prior require simple f orward

“propagat ion” of probabilit ies

  • Not e: all (f inal) t erms are CPTs in t he BN

Not e: only ancest ors of J considered P(J )=ΣM,ET P(J |M,ET)P(M,ET)

(marginalizat ion)

P(J )=ΣM,ET P(J |M)P(M|ET)P(ET)

(chain rule and independence)

P(J )=ΣMP(J |M)ΣETP(M|ET)P(ET)

(dist ribut ion of sum)

slide-2
SLIDE 2

2

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

7

Simple For war d I nf erence (Chain)

  • Same idea applies when we have

“upst ream” evidence

P(J |ET) = ΣMP(J |M,ET) P(M|ET) = ΣM P(J |M) P(M|ET)

(J is cond independent of ET given M)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

8

Simple Forwar d I nf erence (Pooling)

  • Same idea applies wit h mult iple parent s

P(Fev) = ΣFlu,M P(Fev| Flu,M) P(Flu,M) = ΣFlu,M P(Fev| Flu,M) P(Flu) P(M) = ΣFlu,M P(Fev| Flu,M) ΣTS P(Flu| TS) P(TS)ΣET P(M| ET) P(ET)

  • (1) f ollows by summing out rule; (2) by

independence of Flu, M; (3) by summing out

– not e: all t erms are CPTs in t he Bayes net

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

9

Simple Forwar d I nf erence (Pooling)

  • Same idea applies wit h evidence

P(Fev| t s,~M) = ΣFlu P(Fev | Flu,t s,~M) P(Flu| t s,~M)

= ΣFlu P(Fev| Flu,~M) P(Flu| t s)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

10

Simple Backward I nf erence

  • When evidence is downst ream of query

variable, we must reason “backwar ds.” This requires t he use of Bayes r ule: P(ET | j ) = α P(j | ET) P(ET) = α ΣM P(j | M,ET) P(M| ET) P(ET) = α ΣM P(j | M) P(M| ET) P(ET)

  • First st ep is j ust Bayes rule

– normalizing const ant α is 1/ P(j ); but we needn’t comput e it explicit ly if we comput e P(ET | j ) f or each value of ET: we j ust add up t erms P(j | ET) P(ET) f or all values

  • f ET (t hey sum t o P

(j ))

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

11

Backward I nf erence (Pooling)

  • Same ideas when several pieces of

evidence lie “downst ream” P(ET | j ,f ev) =α P(j ,f ev | ET) P(ET) = α ΣM P(j ,f ev | M,ET) P(M| ET) P(ET) = α ΣM P(j ,f ev | M) P(M| ET) P(ET) = α ΣM P(j | M)P(f ev| M)P(M| ET) P(ET)

– Same st eps as bef ore; but now we comput e prob of bot h pieces of evidence given hypot hesis ET and combine t hem. Not e: t hey are independent given M; but not given ET. – St ill must simplif y P (f ev|M) down t o CPTs (as usual)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

12

Variable Eliminat ion

  • The int uit ions in t he above examples give us

a simple inf erence algorit hm f or net wor ks wit hout loops: t he polyt ree algorit hm.

  • I nst ead we' ll look at a mor e gener al

algorit hm t hat works f or general BNs; but t he polyt ree algor it hm will be a special case.

  • The algorit hm, variable eliminat ion, simply

applies t he summing out rule repeat edly.

– To keep comput at ion simple, it exploit s t he independence in t he net work and t he abilit y t o dist ribut e sums inward

slide-3
SLIDE 3

3

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

13

Fact ors

  • A f unct ion f (X1, X2,…

, Xk) is also called a f act or. We can view t his as a t able of number s, one f or each inst ant iat ion of t he variables X1, X2,… , Xk.

– A t abular rep’n of a f act or is exponent ial in k

  • Each CPT in a Bayes net is a f act or:

– e.g., Pr(C| A,B) is a f unct ion of t hree variables, A, B, C

  • Not at ion: f (X,Y) denot es a f act or over t he

variables X ∪ Y. (Here X, Y are set s of variables.)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

14

The Product of Two Fact ors

  • Let f (X,Y) & g(Y,Z) be t wo f act ors wit h

variables Y in common

  • The product of f and g, denot ed h = f x g

(or somet imes j ust h = f g), is def ined: h(X,Y,Z) = f (X,Y) x g(Y,Z)

0.12 ~a~b~c 0.48 ~a~bc 0.2 ~b~c 0.6 ~a~b 0.12 ~ab~c 0.28 ~abc 0.8 ~bc 0.4 ~ab 0.02 a~b~c 0.08 a~bc 0.3 b~c 0.1 a~b 0.27 ab~c 0.63 abc 0.7 bc 0.9 ab

h(A,B,C) g(B,C) f (A,B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

15

Summing a Var iable Out of a Fact or

  • Let f (X,Y) be a f act or wit h var iable X (Y

is a set )

  • We sum out var iable X f rom f t o produce

a new f act or h = ΣX f , which is def ined: h(Y) = Σx∊Dom(X) f (x,Y)

0.6 ~a~b 0.4 ~ab 0.7 ~b 0.1 a~b 1.3 b 0.9 ab

h(B) f (A,B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

16

Rest rict ing a Fact or

  • Let f (X,Y) be a f act or wit h var iable X (Y

is a set )

  • We rest rict f act or f t o X=x by set t ing X

t o t he value x and “delet ing”. Def ine h = f X=x as: h(Y) = f (x,Y)

0.6 ~a~b 0.4 ~ab 0.1 ~b 0.1 a~b 0.9 b 0.9 ab

h(B) = f A=a f (A,B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

17

Variable Eliminat ion: No Evidence

  • Comput ing pr ior probabilit y of query var X

can be seen as applying t hese operat ions on f act ors

  • P(C) = ΣA,B P

(C|B) P(B| A) P(A) = ΣB P(C|B) ΣA P (B| A) P(A) = ΣB f 3(B,C) ΣA f 2(A,B) f 1(A) = ΣB f 3(B,C) f 4(B) = f 5(C) Def ine new f act ors: f 4(B)= ΣA f 2(A,B) f 1(A) and f 5(C)= ΣB f 3(B,C) f 4(B) B C A

f 1(A) f 2(A,B) f 3(B,C)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

18

Variable Eliminat ion: No Evidence

  • Here’s t he example wit h some numbers

B C A

f 1(A) f 2(A,B) f 3(B,C)

~c c

f 5(C)

0.375 0.625 ~b b

f 4(B)

0.15 0.85 0.1 0.9 ~a a

f 1(A)

0.8 ~b~c 0.6 ~a~b 0.2 ~bc 0.4 ~ab 0.3 b~c 0.1 a~b 0.7 bc 0.9 ab

f 3(B,C) f 2(A,B)

slide-4
SLIDE 4

4

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

19

VE: No Evidence (Example 2)

P(D) = ΣA,B,C P(D| C) P(C| B,A) P(B) P(A) = ΣC P(D| C) ΣB P(B) ΣA P(C| B,A) P(A) = ΣC f 4(C,D) ΣB f 2(B) ΣA f 3(A,B,C) f 1(A) = ΣC f 4(C,D) ΣB f 2(B) f 5(B,C) = ΣC f 4(C,D) f 6(C) = f 7(D)

Def ine new f act ors: f 5(B,C), f 6(C), f 7(D), in t he obvious way C D A

f 1(A) f 3(A,B,C) f 4(C,D)

B

f 2(B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

20

Variable Eliminat ion: One View

  • One way t o t hink of variable eliminat ion:

– writ e out desired comput at ion using t he chain rule, exploit ing t he independence relat ions in t he net work – arr ange t he t erms in a convenient f ashion – dist ribut e each sum (over each variable) in as f ar as it will go

  • i.e., t he sum over variable X can be “pushed in” as

f ar as t he “f irst ” f act or ment ioning X

– apply oper at ions “inside out ”, r epeat edly eliminat ing and creat ing new f act ors (not e t hat each st ep/ removal of a sum eliminat es

  • ne variable)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

21

Variable Eliminat ion Algorit hm

  • Given query var Q, remaining var s Z. Let

F be set of f act or s corresponding t o CPTs f or {Q} ∪ Z.

  • 1. Choose an elimination ordering Z1, …, Zn of variables in Z.
  • 2. For each Zj -- in the order given -- eliminate Zj ∊ Z

as follows: (a) Compute new factor gj = ΣZj f1 x f2 x … x fk, where the fi are the factors in F that include Zj (b) Remove the factors fi (that mention Zj ) from F and add new factor gj to F

  • 3. The remaining factors refer only to the query variable Q.

Take their product and normalize to produce P(Q)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

22

VE: Example 2 again

St ep 1: Add f 5(B,C) = ΣA f 3(A,B,C) f 1(A) Remove: f 1(A), f 3(A,B,C) St ep 2: Add f 6(C)= ΣB f 2(B) f 5(B,C) Remove: f 2(B) , f 5(B,C) St ep 3: Add f 7(D) = ΣC f 4(C,D) f 6(C) Remove: f 4(C,D), f 6(C) Last f act or f 7(D) is (possibly unnormalized) probabilit y P(D) Factors: f 1(A) f 2(B) f 3(A,B,C) f 4(C,D) Query: P(D)?

  • Elim. Order: A, B, C

C D A

f 1(A) f 3(A,B,C) f 4(C,D)

B

f 2(B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

23

Variable Eliminat ion: Evidence

  • Comput ing post erior of query variable given

evidence is similar; suppose we observe C=c: P(A| c) = α P(A) P(c| A) = α P(A) ΣB P(c| B) P(B| A) = α f 1(A) ΣB f 3(B,c) f 2(A,B) = α f 1(A) ΣB f 4(B) f 2(A,B) = α f 1(A) f 5(A) = α f 6(A)

New f act ors: f 4(B)= f 3(B,c); f 5(A)= ΣB f 2(A,B) f 4(B); f 6(A)= f 1(A) f 5(A) B C A

f 1(A) f 2(A,B) f 3(B,C)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

24

Var iable Eliminat ion wit h Evidence

Given query var Q, evidence var s E (observed t o be e), remaining var s Z. Let F be set of f act or s involving CPTs f or {Q} ∪ Z.

  • 1. Replace each factor f∊F that mentions a variable(s) in E

with its restriction fE=e (somewhat abusing notation)

  • 2. Choose an elimination ordering Z1, …, Zn of variables in Z.
  • 3. Run variable elimination as above.
  • 4. The remaining factors refer only to the query variable Q.

Take their product and normalize to produce P(Q)

slide-5
SLIDE 5

5

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

25

VE: Example 2 again wit h Evidence

Rest rict ion: replace f 4(C,D) wit h f 5(C) = f 4(C,d) St ep 1: Add f 6(A,B)= ΣC f 5(C) f 3(A,B,C) Remove: f 3(A,B,C), f 5(C) St ep 2: Add f 7(A) = ΣB f 6(A,B) f 2(B) Remove: f 6(A,B), f 2(B) Last f act ors: f 7(A), f 1(A). The product f 1(A) x f 7(A) is (possibly unnormalized) post erior. So…P (A|d) = α f 1(A) x f 7(A). Factors: f 1(A) f 2(B) f 3(A,B,C) f 4(C,D) Query: P(A)? Evidence: D = d

  • Elim. Order: C, B

C D A

f 1(A) f 3(A,B,C) f 4(C,D)

B

f 2(B)

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

26

Some Not es on t he VE Algorit hm

  • Af t er it erat ion j (eliminat ion of Zj), f act ors remaining in

set F ref er only t o variables Xj +1, …Zn and Q. No f act or ment ions an evidence variable E af t er t he init ial rest rict ion.

  • Number of it erat ions: linear in number of variables
  • Complexit y is linear in number of vars and exponent ial in

size of t he largest f act or. – Recall each f act or has exponent ial size in it s number of variables – Can' t do any bet t er t han size of BN (since it s original f act ors are part of t he f act or set ) – When we creat e new f act ors, we might make a set of variables larger.

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

27

Some Not es on t he VE Algorit hm

  • The size of t he result ing f act ors is det ermined by

eliminat ion ordering! (We’ll see t his in det ail)

  • For polyt rees, easy t o f ind good ordering (e.g.,

work out side in).

  • For general BNs, somet imes good orderings exist ,

somet imes t hey don' t (t hen inf erence is exponent ial in number of vars).

– Simply f inding t he opt imal eliminat ion ordering f or general BNs is NP-hard. – I nf erence in general is NP-hard in general BNs

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

28

Eliminat ion Ordering: Polyt rees

  • I nf erence is linear in size
  • f net work

– ordering: eliminat e only “singly-connect ed” nodes – e.g., in t his net work, eliminat e D, A, C, X1,… ; or eliminat e X1,…Xk, D, A, C;

  • r mix up…

– result : no f act or ever larger t han original CPTs – eliminat ing B bef ore t hese gives f act ors t hat include all of A,C, X1,…Xk !!!

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

29

Ef f ect of Dif f er ent Or der ings

  • Suppose query variable

is D. Consider dif f erent orderings f or t his net work

– A,F,H,G,B,C,E:

  • good: why?

– E,C,A,B,G,H,F:

  • bad: why?
  • Which order ing

creat es smallest f act ors?

– eit her max size or t ot al – which creat es largest ?

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

30

Relevance

  • Cert ain var iables have no impact on t he

query.

– I n ABC net work, comput ing Pr(A) wit h no evidence requires eliminat ion of B and C.

  • But when you sum out t hese vars, you comput e a

t rivial f act or (whose value are all ones); f or example:

  • eliminat ing C: f 4(B) = ΣC f 3(B,C) = ΣC Pr(C|B)
  • 1 f or any value of B (e.g., Pr(c| b) + Pr(~c|b) = 1)
  • No need t o t hink about B or C f or t his

query

B C A

slide-6
SLIDE 6

6

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

31

Relevance: A Sound Approximat ion

  • Can rest rict at t ent ion t o relevant
  • variables. Given query Q, evidence E:

– Q is relevant – if any node Z is relevant , it s parent s are relevant – if E∊E is a descendent of a relevant node, t hen E is relevant

  • We can rest rict our at t ent ion t o t he

subnet work compr ising only relevant variables when evaluat ing a query Q

CS486/686 Lecture Slides (c) 2005 C. Boutilier, P. Poupart & K. Larson

32

Next Class

  • Probabilist ic reasoning over t ime

– Dynamic Bayesian Net works – Hidden Markov Models

  • Russell & Norvig: Chapt er 15
  • Lect ure on t he board (no slides)