Inference in b elief net w orks Chapter 15.3{4 + new c - - PowerPoint PPT Presentation

inference in b elief net w orks chapter 15 3 4 new c aima
SMART_READER_LITE
LIVE PREVIEW

Inference in b elief net w orks Chapter 15.3{4 + new c - - PowerPoint PPT Presentation

Inference in b elief net w orks Chapter 15.3{4 + new c AIMA Slides Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 1 Outline } Exact inference b y enumerati on } Exact inference b y va riable


slide-1
SLIDE 1 Inference in b elief net w
  • rks
Chapter 15.3{4 + new AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 1
slide-2
SLIDE 2 Outline } Exact inference b y enumerati
  • n
} Exact inference b y va riable elimina tion } App ro ximate inferenc e b y sto chastic simulation } App ro ximate inferenc e b y Ma rk
  • v
chain Monte Ca rlo AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 2
slide-3
SLIDE 3 Inference tasks Simple queries : compute p
  • sterio
r ma rginal P (X i jE = e) e.g., P (N
  • GasjGaug
e = empty ; Lig hts =
  • n;
S tar ts = f al se) Conjunctive queries : P (X i ; X j jE = e ) = P(X i jE = e)P(X j jX i ; E = e) Optimal decisions : decision net w
  • rks
include utilit y info rmation; p robabili stic inference required fo r P (outcomejaction; ev idence) V alue
  • f
info rmation : which evidence to seek next? Sensitivit y analysis : which p robabilit y values a re most critical? Explanation : why do I need a new sta rter moto r? AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 3
slide-4
SLIDE 4 Inference b y en umeration Slightly intellige nt w a y to sum
  • ut
va riables from the joint without ac- tually constructin g its explicit rep resentation Simple query
  • n
the burgla ry net w
  • rk:
P(B jJ = tr ue; M = tr ue) = P(B ; J = tr ue; M = tr ue)=P (J = tr ue; M = tr ue) = P(B ; J = tr ue; M = tr ue) =
  • e
  • a
P(B ; e; a; J = tr ue; M = tr ue) Rewrite full joint entries using p ro duct
  • f
CPT entries: P (B = tr uejJ = tr ue; M = tr ue) =
  • e
  • a
P (B = tr ue)P (e)P (a jB = tr ue; e)P (J = tr ueja)P (M = tr ueja) = P (B = tr ue) e P (e) a P (ajB = tr ue; e)P (J = tr ueja)P (M = tr ueja) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 4
slide-5
SLIDE 5 En umeration algorithm Exhaustive depth-rst enumeration: O (n) space, O (d n ) time Enumera tionAsk(X,e,bn) returns a distribution
  • v
er X inputs: X, the query v ariable e , evidence sp ecied as an ev en t bn, a b elief net w
  • rk
sp ecifying join t distribution P(X 1 ; : : : ; X n ) Q(X ) a distribution
  • v
er X for eac h v alue x i
  • f
X do extend e with v alue x i for X Q (x i ) Enumera teAll (V ars[bn],e ) return Normalize (Q(X )) Enumera teAll (vars,e) returns a real n um b er if Empty? (vars) then return 1.0 else do Y First(vars) if Y has v alue y in e then return P (y j P a(Y ))
  • Enumera
teAll (Rest (vars),e) else return P y P (y j P a(Y ))
  • Enumera
teAll (Rest (vars),e y ) where e y is e extended with Y = y AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 5
slide-6
SLIDE 6 Inference b y v ariable elimination Enumeratio n is inecient: rep eated computation e.g., computes P (J = tr ueja)P (M = tr ueja) fo r each value
  • f
e V a riable elimina tion : ca rry
  • ut
summations right-to-left, sto ring intermedia te results (facto rs ) to avoid recomputation P(B jJ = tr ue; M = tr ue) =
  • P
(B ) | {z } B
  • e
P (e) | {z } E
  • a
P(ajB ; e) | {z } A P (J = tr ueja) | {z } J P (M = tr ueja) | {z } M = P(B ) e P (e) a P (ajB ; e)P (J = tr ueja)f M (a) = P(B ) e P (e) a P (ajB ; e)f J (a)f M (a) = P(B ) e P (e) a f A (a; b; e)f J (a)f M (a) = P(B ) e P (e)f
  • A
J M (b; e) (sum
  • ut
A) = P(B )f
  • E
  • AJ
M (b) (sum
  • ut
E ) = f B (b)
  • f
  • E
  • AJ
M (b) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 6
slide-7
SLIDE 7 V ariable elimination: Basic
  • p
erations P
  • int
wise p ro duct
  • f
facto rs f 1 and f 2 : f 1 (x 1 ; : : : ; x j ; y 1 ; : : : ; y k )
  • f
2 (y 1 ; : : : ; y k ; z 1 ; : : : ; z l ) = f (x 1 ; : : : ; x j ; y 1 ; : : : ; y k ; z 1 ; : : : ; z l ) E.g., f 1 (a; b)
  • f
2 (b; c) = f (a; b; c) Summing
  • ut
a va riable from a p ro duct
  • f
facto rs: move any constant facto rs
  • utside
the summation:
  • x
f 1
  • f
k = f 1
  • f
i
  • x
f i+1
  • f
k = f 1
  • f
i
  • f
  • X
assuming f 1 ; : : : ; f i do not dep end
  • n
X AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 7
slide-8
SLIDE 8 V ariable elimination algorithm function Elimina tionAsk (X,e,bn) returns a distribution
  • v
er X inputs: X, the query v ariable e , evidence sp ecied as an ev en t bn, a b elief net w
  • rk
sp ecifying join t distribution P(X 1 ; : : : ; X n ) if X 2 e then return
  • bserv
ed p
  • in
t distribution for X factors [ ]; vars Reverse (V ars[bn]) for eac h var in vars do factors [MakeF a ctor (var ; e )jfactors ] if var is a hidden v ariable then factors SumOut (var,factors) return Normalize (PointwisePr
  • duct
(factors)) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 8
slide-9
SLIDE 9 Complexit y
  • f
exact inference Singly connected net w
  • rks
(o r p
  • lytrees
): { any t w
  • no
des a re connected b y at most
  • ne
(undirec ted ) path { time and space cost
  • f
va riable eliminati
  • n
a re O (d k n) Multiply connected net w
  • rks:
{ can reduce 3SA T to exact inference ) NP-ha rd { equivalent to c
  • unting
3SA T mo dels ) #P-complete

A B C D 1 2 3 AND

  • 1. A v B v C
  • 2. C v D v ~A
  • 3. B v C v ~D

0.5 0.5 0.5 0.5

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 9
slide-10
SLIDE 10 Inference b y sto c hastic sim ulation Basic idea: 1) Dra w N samples from a sampling distributi
  • n
S 2) Compute an app ro ximate p
  • sterio
r p robabili t y ^ P 3) Sho w this converges to the true p robabili t y P Outline: { Sampling from an empt y net w
  • rk
{ Rejection sampling: reject samples disagreeing with evidence { Lik eliho
  • d
w eighting: use evidence to w eight samples { MCMC: sample from a sto chastic p ro cess whose stationa ry distributi
  • n
is the true p
  • sterio
r AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 10
slide-11
SLIDE 11 Sampling from an empt y net w
  • rk
function PriorSample(bn) returns an ev en t sampled from P(X 1 ; : : : ; X n ) sp ecied b y bn x an ev en t with n elemen ts for i = 1 to n do x i a random sample from P(X i j P ar ents(X i )) return x P(C l
  • udy
) = h0:5; 0:5i sample ! tr ue P(S pr ink l er jC l
  • udy
) = h0:1; 0:9i sample ! f al se P(R ainjC l
  • udy
) = h0:8; 0:2i sample ! tr ue P(W etGr assj:S pr ink l er ; R ain) = h0:9; 0:1i sample ! tr ue

P(C) = .5 C P(R) T F .80 .20 C P(S) T F .10 .50 S R P(W) T T T F F T F F .90 .90 .00 .99

Cloudy Rain Sprinkler Wet Grass

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 11
slide-12
SLIDE 12 Sampling from an empt y net w
  • rk
con td. Probabili t y that PriorSample generates a pa rticula r event S P S (x 1 : : : x n ) =
  • n
i = 1 P (x i jP ar ents(X i )) = P (x 1 : : : x n ) i.e., the true p rio r p robabilit y Let N P S (Y = y) b e the numb er
  • f
samples generated fo r which Y = y, fo r any set
  • f
va riables Y. Then ^ P (Y = y) = N P S (Y = y )= N and lim N !1 ^ P (Y = y ) =
  • h
S P S (Y = y; H = h) =
  • h
P (Y = y; H = h) = P (Y = y ) That is, estimates derived from PriorSample a re consistent AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 12
slide-13
SLIDE 13 Rejection sampling ^ P(X je ) estimated from samples agreeing with e function RejectionSampling(X,e,bn,N) returns an appro ximation to P (X je ) N[X] a v ector
  • f
coun ts
  • v
er X, initially zero for j = 1 to N do x PriorSample (bn) if x is consisten t with e then N[x] N[x]+1 where x is the v alue
  • f
X in x return Normalize (N[X]) E.g., estimate P (R ainjS pr ink l er = tr ue) using 100 samples 27 samples have S pr ink l er = tr ue Of these, 8 have R ain = tr ue and 19 have R ain = f al se. ^ P(R ainjS pr ink l er = tr ue) = Normalize (h8; 19i) = h0:296; 0:704i Simila r to a basic real-w
  • rld
empirical estimation p ro cedure AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 13
slide-14
SLIDE 14 Analysis
  • f
rejection sampling ^ P(X je ) = N P S (X ; e) (algo rithm defn.) = N P S (X ; e )= N P S (e ) (no rmalized b y N P S (e ))
  • P(X
; e)=P (e ) (p rop ert y
  • f
PriorSample ) = P(X je) (defn.
  • f
conditional p robabilit y) Hence rejection sampling returns consistent p
  • sterio
r estimates Problem: hop elessly exp ensive if P (e ) is small AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 14
slide-15
SLIDE 15 Lik eli ho
  • d
w eigh ting Idea: x evidence va riables, sample
  • nly
nonevidence va riables, and w eight each sample b y the lik eliho
  • d
it acco rds the evidence function WeightedSample(bn,e) returns an ev en t and a w eigh t x an ev en t with n elemen ts; w 1 for i = 1 to n do if X i has a v alue x i in e then w w
  • P
(X i = x i j P ar ents(X i )) else x i a random sample from P(X i j P ar ents(X i )) return x , w function LikelihoodWeighting (X,e ,bn,N) returns an appro ximation to P (X je) W[X ] a v ector
  • f
w eigh ted coun ts
  • v
er X, initially zero for j = 1 to N do x,w WeightedSample (bn) W[x ] W[x ] + w where x is the v alue
  • f
X in x return Normalize (W [X ]) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 15
slide-16
SLIDE 16 Lik eliho
  • d
w eigh ting example Estimate P(R ainjS pr ink l er = tr ue; W etGr ass = tr ue)

P(C) = .5 C P(R) T F .80 .20 C P(S) T F .10 .50 S R P(W) T T T F F T F F .90 .90 .00 .99

Cloudy Rain Sprinkler Wet Grass

true true

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 16
slide-17
SLIDE 17 L W example con td. Sample generation p ro cess: 1. w 1:0 2. Sample P(C l
  • udy
) = h0:5; 0:5i; sa y tr ue 3. S pr ink l er has value tr ue, so w w
  • P
(S pr ink l er = tr uejC l
  • udy
= tr ue) = 0:1 4. Sample P(R ainjC l
  • udy
= tr ue) = h0:8; 0:2i; sa y tr ue 5. W etGr ass has value tr ue, so w w
  • P
(W etGr ass = tr uejS pr ink l er = tr ue; R ain = tr ue) = 0:099 AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 17
slide-18
SLIDE 18 Lik eli ho
  • d
w eigh ting analysis Sampling p robabili t y fo r WeightedSample is S W S (y; e) =
  • l
i = 1 P (y i jP ar ents(Y i )) Note: pa ys attention to evidence in anc estors
  • nly
) somewhere \in b et w een" p rio r and p
  • sterio
r distributi
  • n
W eight fo r a given sample y; e is w (y ; e) =
  • m
i = 1 P (e i jP ar ents(E i )) W eighted sampling p robabili t y is S W S (y; e)w (y ; e) =
  • l
i = 1 P (y i jP ar ents(Y i ))
  • m
i = 1 P (e i jP ar ents(E i )) = P (y ; e) (b y standa rd global semantics
  • f
net w
  • rk)
Hence lik eliho
  • d
w eighting returns consistent estimates but p erfo rmance still degrades with many evidence va riables AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 18
slide-19
SLIDE 19 Appro ximate inference using MCMC \State"
  • f
net w
  • rk
= current assignment to all va riables Generate next state b y sampling
  • ne
va riable given Ma rk
  • v
blank et Sample each va riable in turn, k eeping evidence xed function MCMC-Ask (X,e,bn,N) returns an appro ximation to P (X je ) lo cal v ariables: N[X ], a v ector
  • f
coun ts
  • v
er X, initially zero Y, the nonevidence v ariables in bn x , the curren t state
  • f
the net w
  • rk,
initially copied from e initialize x with random v alues for the v ariables in Y for j = 1 to N do N [x ] N[x ] + 1 where x is the v alue
  • f
X in x for eac h Y i in Y do sample the v alue
  • f
Y i in x from P(Y i jM B (Y i )) giv en the v alues
  • f
M B (Y i ) in x return Normalize (N[X ]) App roaches stationa ry distributi
  • n
: long-run fraction
  • f
time sp ent in each state is exactly p rop
  • rtional
to its p
  • sterio
r p robabili t y AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 19
slide-20
SLIDE 20 MCMC Example Estimate P(R ainjS pr ink l er = tr ue; W etGr ass = tr ue) Sample C l
  • udy
then R ain, rep eat. Count numb er
  • f
times R ain is true and false in the samples. Ma rk
  • v
blank et
  • f
C l
  • udy
is S pr ink l er and R ain Ma rk
  • v
blank et
  • f
R ain is C l
  • udy
, S pr ink l er , and W etGr ass

P(C) = .5 C P(R) T F .80 .20 C P(S) T F .10 .50 S R P(W) T T T F F T F F .90 .90 .00 .99

Cloudy Rain Sprinkler Wet Grass

true true

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 20
slide-21
SLIDE 21 MCMC example con td. Random initial state: C l
  • udy
= tr ue and R ain = f al se 1. P(C l
  • udy
jM B (C l
  • udy
)) = P(C l
  • udy
jS pr ink l er ; :R ain) sample ! f al se 2. P(R ainjM B (R ain)) = P(R ainj:C l
  • udy
; S pr ink l er ; W etGr ass) sample ! tr ue Visit 100 states 31 have R ain = tr ue, 69 have R ain = f al se ^ P(R ainjS pr ink l er = tr ue; W etGr ass = tr ue) = Normalize(h31; 69i) = h0:31; 0:69i AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 21
slide-22
SLIDE 22 MCMC analysis: Outline T ransition p robabili t y q (y ! y ) Occupancy p robabili t y
  • t
(y ) at time t Equilib riu m condition
  • n
  • t
denes stationa ry distributi
  • n
  • (y
) Note: stationa ry distributi
  • n
dep ends
  • n
choice
  • f
q (y ! y ) P airwise detailed balance
  • n
states gua rantees equilib riu m Gibbs sampling transition p robabili t y: sample each va riable given current values
  • f
all
  • thers
) detailed balance with the true p
  • sterio
r F
  • r
Ba y esian net w
  • rks,
Gibbs sampling reduces to sampling conditioned
  • n
each va riable's Ma rk
  • v
blank et AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 22
slide-23
SLIDE 23 Stationary distribution
  • t
(y) = p robabili t y in state y at time t
  • t+1
(y ) = p robabili t y in state y at time t + 1
  • t+1
in terms
  • f
  • t
and q (y ! y )
  • t+1
(y ) =
  • y
  • t
(y)q (y ! y ) Stationa ry distributi
  • n
:
  • t
=
  • t+1
=
  • (y
) =
  • y
  • (y
)q (y ! y ) fo r all y If
  • exists,
it is unique (sp ecic to q (y ! y )) In equilib riu m, exp ected \outo w" = exp ected \ino w" AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 23
slide-24
SLIDE 24 Detailed balance \Outo w" = \ino w" fo r each pair
  • f
states:
  • (y
)q (y ! y ) =
  • (y
)q (y ! y) fo r all y; y Detailed balance ) stationa rit y:
  • y
  • (y
)q (y ! y ) =
  • y
  • (y
)q (y ! y) =
  • (y
) y q (y ! y) =
  • (y
) MCMC algo rithms t ypically constructed b y designing a transition p robabili t y q that is in detailed balance with desired
  • AIMA
Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 24
slide-25
SLIDE 25 Gibbs sampling Sample each va riable in turn, given al l
  • ther
variables Sampling Y i , let
  • Y
i b e all
  • ther
nonevidence va riables Current values a re y i and
  • y
i ; e is xed T ransition p robabili t y is given b y q (y ! y ) = q (y i ;
  • y
i ! y i ;
  • y
i ) = P (y i j
  • y
i ; e) This gives detailed balance with true p
  • sterio
r P (y je):
  • (y
)q (y ! y ) = P (y je)P (y i j
  • y
i ; e) = P (y i ;
  • y
i je )P (y i j
  • y
i ; e) = P (y i j
  • y
i ; e )P (
  • y
i je)P (y i j
  • y
i ; e ) (chain rule) = P (y i j
  • y
i ; e )P (y i ;
  • y
i je) (chain rule backw a rds) = q (y ! y ) (y ) =
  • (y
)q (y ! y ) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 25
slide-26
SLIDE 26 Mark
  • v
blank et sampling A va riable is indep en de nt
  • f
all
  • thers
given its Ma rk
  • v
blank et: P (y i j
  • y
i ; e) = P (y i jM B (Y i )) Probabili t y given the Ma rk
  • v
blank et is calculated as follo ws: P (y i jM B (Y i )) = P (y i jP ar ents(Y i )) Z j 2C hil dr en(Y i ) P (z j jP ar ents(Z j )) Hence computing the sampling distribu tion
  • ver
Y i fo r each ip requires just cd multipli ca tion s if Y i has c childre n and d values; can cache it if c not to
  • la
rge. Main computation al p roblems: 1) Dicult to tell if convergence has b een achieved 2) Can b e w asteful if Ma rk
  • v
blank et is la rge: P (Y i jM B (Y i )) w
  • n't
change much (la w
  • f
la rge numb ers) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 26
slide-27
SLIDE 27 P erformance
  • f
appro ximation algorithms Absolute app ro ximation : jP (X je )
  • ^
P (X je )j
  • Relative
app ro ximation : jP (X je ) ^ P (X je)j P (X je)
  • Relative
) absolute since
  • P
  • 1
(ma y b e O (2 n )) Randomized algo rithms ma y fail with p robabili t y at most
  • P
  • lytime
app ro ximation: p
  • ly
(n;
  • 1
; log
  • 1
) Theo rem (Dagum and Lub y , 1993): b
  • th
absolute and relative app ro ximation fo r either deterministi c
  • r
randomized algo rithms a re NP-ha rd fo r any ;
  • <
0:5 (Absolute app ro ximation p
  • lytime
with no evidence|Chern
  • b
  • unds)
AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.3{4 + new 27