Belief net w orks Chapter 15.1{2 c AIMA Slides Stuart - - PowerPoint PPT Presentation

belief net w orks chapter 15 1 2 c aima slides stuart
SMART_READER_LITE
LIVE PREVIEW

Belief net w orks Chapter 15.1{2 c AIMA Slides Stuart - - PowerPoint PPT Presentation

Belief net w orks Chapter 15.1{2 c AIMA Slides Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 1 Outline } Conditional indep end enc e } Ba y esian net w o rks: syntax and semantics } Exact inference


slide-1
SLIDE 1 Belief net w
  • rks
Chapter 15.1{2 AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 1
slide-2
SLIDE 2 Outline } Conditional indep end enc e } Ba y esian net w
  • rks:
syntax and semantics } Exact inference } App ro ximate inferenc e AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 2
slide-3
SLIDE 3 Indep endence Tw
  • random
va riables A B a re (absolutely) indep end en t i P (AjB ) = P (A)
  • r
P (A; B ) = P (AjB )P (B ) = P (A)P (B ) e.g., A and B a re t w
  • coin
tosses If n Bo
  • lean
va riables a re indep end en t, the full joint is P(X 1 ; : : : ; X n ) =
  • i
P(X i ) hence can b e sp ecied b y just n numb ers Absolute indep end en ce is a very strong requiremen t, seldom met AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 3
slide-4
SLIDE 4 Conditional indep endence Consider the dentist p roblem with three random va riables: T
  • othache,
C av ity , C atch (steel p rob e catches in my to
  • th)
The full joint distribu tion has 2 3
  • 1
= 7 indep en de nt entries If I have a cavit y , the p robabili t y that the p rob e catches in it do esn't dep end
  • n
whether I have a to
  • thache:
(1) P (C atchjT
  • othache;
C av ity ) = P (C atchjC av ity ) i.e., C atch is conditiona ll y indep en den t
  • f
T
  • othache
given C av ity The same indep end enc e holds if I haven't got a cavit y: (2) P (C atchjT
  • othache;
:C av ity ) = P (C atchj:C av ity ) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 4
slide-5
SLIDE 5 Conditional indep endence con td. Equivalent statements to (1) (1a) P (T
  • othachejC
atch; C av ity ) = P (T
  • othachejC
av ity ) Why ?? (1b) P (T
  • othache;
C atchjC av ity ) = P (T
  • othachejC
av ity )P (C atchjC av ity ) Why?? F ull joint distributi
  • n
can no w b e written as P(T
  • othache;
C atch; C av ity ) = P(T
  • othache;
C atchjC av ity )P (C av ity ) = P (T
  • othachejC
av ity )P (C atchjC av ity )P (C av ity ) i.e., 2 + 2 + 1 = 5 indep en de nt numb ers (equations 1 and 2 remove 2) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 5
slide-6
SLIDE 6 Conditional indep endence con td. Equivalent statements to (1) (1a) P (T
  • othachejC
atch; C av ity ) = P (T
  • othachejC
av ity ) Why ?? P (T
  • othachejC
atch; C av ity ) = P (C atchjT
  • othache;
C av ity )P (T
  • othachejC
av ity )=P (C atchjC av ity ) = P (C atchjC av ity )P (T
  • othachejC
av ity )=P (C atchjC av ity ) (from 1) = P (T
  • othachejC
av ity ) (1b) P (T
  • othache;
C atchjC av ity ) = P (T
  • othachejC
av ity )P (C atchjC av ity ) Why?? P (T
  • othache;
C atchjC av ity ) = P (T
  • othachejC
atch; C av ity )P (C atchjC av ity ) (pro duct rule) = P (T
  • othachejC
av ity )P (C atchjC av ity ) (from 1a) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 6
slide-7
SLIDE 7 Belief net w
  • rks
A simple, graphical notation fo r conditiona l indep end en ce assertions and hence fo r compact sp ecicati
  • n
  • f
full joint distribut ion s Syntax: a set
  • f
no des,
  • ne
p er va riable a directed, acyclic graph (link
  • \directly
inuences") a conditional distributio n fo r each no de given its pa rents: P(X i jP ar ents(X i )) In the simplest case, conditiona l distributi
  • n
rep resented as a conditional p robabilit y table (CPT) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 7
slide-8
SLIDE 8 Example I'm at w
  • rk,
neighb
  • r
John calls to sa y my ala rm is ringing, but neighb
  • r
Ma ry do esn't call. Sometimes it's set
  • b
y mino r ea rthquak es. Is there a burgla r? V a riables: B ur g l ar , E ar thq uak e, Al ar m, J
  • hnC
al l s, M ar y C al l s Net w
  • rk
top
  • logy
reects \causal" kno wledge:

B

T T F F

E

T F T F

P(A)

.95 .29 .001 .001

P(B)

.002

P(E)

Alarm Earthquake MaryCalls JohnCalls Burglary

A P(J)

T F .90 .05

A P(M)

T F .70 .01 .94

Note:
  • k
pa rents ) O (d k n) numb ers vs. O (d n ) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 8
slide-9
SLIDE 9 Seman tics \Global" semantics denes the full joint distribution as the p ro duct
  • f
the lo cal conditional distribution s: P (X 1 ; : : : ; X n ) =
  • n
i = 1 P (X i jP ar ents(X i )) e.g., P (J ^ M ^ A ^ :B ^ :E ) is given b y?? = AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 9
slide-10
SLIDE 10 Seman tics \Global" semantics denes the full joint distribution as the p ro duct
  • f
the lo cal conditional distribution s: P (X 1 ; : : : ; X n ) =
  • n
i = 1 P (X i jP ar ents(X i )) e.g., P (J ^ M ^ A ^ :B ^ :E ) is given b y?? = P (:B )P (:E )P ( Aj:B ^ :E )P (J jA)P (M jA) \Lo cal" semantics: each no de is conditiona ll y indep en den t
  • f
its nondescend an ts given its pa rents Theo rem: Lo cal semantics , global semantics AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 10
slide-11
SLIDE 11 Mark
  • v
blank et Each no de is conditional ly indep end ent
  • f
all
  • thers
given its Ma rk
  • v
blank et : pa rents + children + children's pa rents

. . . . . . U1 X Um Yn Znj Y

1

Z1j

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 11
slide-12
SLIDE 12 Constructing b elief net w
  • rks
Need a metho d such that a series
  • f
lo cally testable assertions
  • f
conditional indep end enc e gua rantees the required global semantics 1. Cho
  • se
an
  • rdering
  • f
va riables X 1 ; : : : ; X n 2. F
  • r
i = 1 to n add X i to the net w
  • rk
select pa rents from X 1 ; : : : ; X i1 such that P(X i jP ar ents(X i )) = P(X i jX 1 ; : : : ; X i1 ) This choice
  • f
pa rents gua rantees the global semantics: P(X 1 ; : : : ; X n ) =
  • n
i = 1 P (X i jX 1 ; : : : ; X i1 ) (chain rule) =
  • n
i = 1 P(X i jP ar ents(X i )) b y construction AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 12
slide-13
SLIDE 13 Example Supp
  • se
w e cho
  • se
the
  • rdering
M , J , A, B , E

MaryCalls JohnCalls

P (J jM ) = P (J )? AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 13
slide-14
SLIDE 14 .

Alarm

. No P (AjJ ; M ) = P (AjJ )? P (AjJ ; M ) = P (A)? AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 14
slide-15
SLIDE 15 .

Burglary

. . No P (B jA; J ; M ) = P (B jA)? P (B jA; J ; M ) = P (B )? AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 15
slide-16
SLIDE 16 .

Earthquake

. . . Y es . No P (E jB ; A; J ; M ) = P (E jA)? P (E jB ; A; J ; M ) = P (E jA; B )? AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 16
slide-17
SLIDE 17 . . . . . . No . Y es AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 17
slide-18
SLIDE 18 Example: Car diagnosis Initial evidence: engine w
  • n't
sta rt T estable va riables (thin
  • vals),
diagnosis va riables (thick
  • vals)
Hidden va riables (shaded) ensure spa rse structure, reduce pa rameters

lights no oil no gas starter broken battery age alternator broken fanbelt broken battery dead no charging battery flat engine won’t start gas gauge fuel line blocked

  • il light
AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 18
slide-19
SLIDE 19 Example: Car insurance Predict claim costs (medical, liabilit y , p rop ert y) given data
  • n
applicati
  • n
fo rm (other unshaded no des)

SocioEcon Age GoodStudent ExtraCar Mileage VehicleYear RiskAversion SeniorTrain DrivingSkill MakeModel DrivingHist DrivQuality Antilock Airbag CarValue HomeBase AntiTheft Theft OwnDamage PropertyCost LiabilityCost MedicalCost Cushioning Ruggedness Accident OtherCost OwnCost

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 19
slide-20
SLIDE 20 Compact conditional distributions CPT gro ws exp
  • nential
ly with no.
  • f
pa rents CPT b ecomes innite with continuous-valu ed pa rent
  • r
child Solution: canonical distribut ion s that a re dened compactly Deterministi c no des a re the simplest case: X = f (P ar ents(X )) fo r some function f E.g., Bo
  • lean
functions N
  • r
thAmer ican , C anadian _ U S _ M exican E.g., numerical relationsh ip s among continuous va riables @ Lev el @ t = ino w + p recipation
  • uto
w
  • evap
  • ration
AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 20
slide-21
SLIDE 21 Compact conditional distributions con td. Noisy-OR distribution s mo del multipl e noninteracti ng causes 1) P a rents U 1 : : : U k include all causes (can add leak no de ) 2) Indep ende nt failure p robabilit y q i fo r each cause alone ) P (X jU 1 : : : U j ; :U j +1 : : : :U k ) = 1
  • j
i = 1 q i C
  • l
d F l u M al ar ia P (F ev er ) P (:F ev er ) F F F 0.0 1:0 F F T 0:9 0.1 F T F 0:8 0.2 F T T 0:98 0:02 = 0:2
  • 0:1
T F F 0:4 0.6 T F T 0:94 0:06 = 0:6
  • 0:1
T T F 0:88 0:12 = 0:6
  • 0:2
T T T 0:988 0:012 = 0:6
  • 0:2
  • 0:1
Numb er
  • f
pa rameters linea r in numb er
  • f
pa rents AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 21
slide-22
SLIDE 22 Hybrid (discrete+con tin uous) net w
  • rks
Discrete (S ubsidy ? and B uy s?); continuous (H ar v est and C
  • st)

Buys? Harvest Subsidy? Cost

Option 1: discretizat ion |p
  • ssib
ly la rge erro rs, la rge CPTs Option 2: nitely pa rameteri zed canonical families 1) Continuous va riable, discrete+contin uou s pa rents (e.g., C
  • st)
2) Discrete va riable, continuous pa rents (e.g., B uy s?) AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 22
slide-23
SLIDE 23 Con tin uous c hild v ariables Need
  • ne
conditional densit y function fo r child va riable given continuous pa rents, fo r each p
  • ssible
assignment to discrete pa rents Most common is the linea r Gaussian mo del, e.g.,: P (C
  • st
= cjH ar v est = h; S ubsidy ? = tr ue) = N (a t h + b t ;
  • t
)(c) = 1
  • t
p 2 exp B B B @
  • 1
2 B B @ c
  • (a
t h + b t )
  • t
1 C C A 2 1 C C C A Mean C
  • st
va ries linea rly with H ar v est, va riance is xed Linea r va riation is unreasonab le
  • ver
the full range but w
  • rks
OK if the lik ely range
  • f
H ar v est is na rro w AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 23
slide-24
SLIDE 24 Con tin uous c hild v ariables

5 10 5 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Cost Harvest P(Cost|Harvest,Subsidy?=true) 5 10 5 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Cost Harvest P(Cost|Harvest,Subsidy?=false) 5 10 5 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Cost Harvest P(Cost|Harvest)

All-continu
  • us
net w
  • rk
with LG distributi
  • ns
) full joint is a multiva riate Gaussian Discrete+contin uou s LG net w
  • rk
is a conditional Gaussian net w
  • rk
i.e., a multiva riate Gaussian
  • ver
all continuous va riables fo r each combina- tion
  • f
discrete va riable values AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 24
slide-25
SLIDE 25 Discrete v ariable w/ con tin uous paren ts Probabili t y
  • f
B uy s? given C
  • st
should b e a \soft" threshold:

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 P(Buys?=false|Cost=c) Cost c

Probit distributi
  • n
uses integral
  • f
Gaussian: (x) = R 1 x N (0; 1)(x)dx P (B uy s? = tr ue j C
  • st
= c) = ((c + )= ) Can view as ha rd threshold whose lo cation is subject to noise AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 25
slide-26
SLIDE 26 Discrete v ariable con td. Sigmoid (o r logit ) distribut ion also used in neural net w
  • rks:
P (B uy s? = tr ue j C
  • st
= c) = 1 1 + exp(2 c+
  • )
Sigmoid has simila r shap e to p robit but much longer tails:

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 12 P(Buys?=false|Cost=c) Cost c

AIMA Slides c Stuart Russell and P eter Norvig, 1998 Chapter 15.1{2 26