Announcements Post Midterm Feedback Form (< 5 mins) - - PDF document

announcements post midterm feedback form 5 mins
SMART_READER_LITE
LIVE PREVIEW

Announcements Post Midterm Feedback Form (< 5 mins) - - PDF document

Announcements Post Midterm Feedback Form (< 5 mins) https://forms.gle/TFw1D1SbGRfxw2TB8 Homework k 6: Baye yes s Nets s I (lead TA: Eli) Due Fri 1 Nov at 11:59pm Homework k 7: Baye yes s Nets s II (lead TA: Eli)


slide-1
SLIDE 1

Announcements

  • Homework

k 6: Baye yes’ s’ Nets s I (lead TA: Eli)

  • Due Fri 1 Nov at 11:59pm
  • Homework

k 7: Baye yes’ s’ Nets s II (lead TA: Eli)

  • Due Mon 4 Nov at 11:59pm
  • Offi

Office H Hours

  • Iris:

s: Mon 10.00am-noon, RI 237

  • Ja

Jan-Wi Willem: m: Tue 1.40pm-2.40pm, DG 111

  • Zh

Zhaoqi qing: : Thu 9.00am-11.00am, HS 202

  • El

Eli: Fri 10.00am-noon, RY 207

Post Midterm Feedback Form (< 5 mins)

https://forms.gle/TFw1D1SbGRfxw2TB8

CS 4100: Artificial Intelligence Bayes’ Nets: Inference

Jan-Willem van de Meent, Northeastern University

[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Bayes’ Net Representation

  • A

A di directed, d, acyclic graph ph, o , one n node p per r random v variable

  • A

A co conditional al probab ability tab able (CP CPT) ) for each node de

  • A collection of distributions over X, one for

each possible assignment to parent variables

  • Ba

Bayes’ne nets implicitly enc ncode jo join int dis istrib ributio ions

  • As a product of local conditional distributions
  • To see what probability a BN gives to a full assignment,

multiply all the relevant conditionals together:

Example: Alarm Network

B P(B) +b 0.001

  • b

0.999 E P(E) +e 0.002

  • e

0.998 B E A P(A|B,E) +b +e +a 0.95 +b +e

  • a

0.05 +b

  • e

+a 0.94 +b

  • e
  • a

0.06

  • b

+e +a 0.29

  • b

+e

  • a

0.71

  • b
  • e

+a 0.001

  • b
  • e
  • a

0.999 A J P(J|A) +a +j 0.9 +a

  • j

0.1

  • a

+j 0.05

  • a
  • j

0.95 A M P(M|A) +a +m 0.7 +a

  • m

0.3

  • a

+m 0.01

  • a
  • m

0.99

B E A M J

Example: Alarm Network

B P(B) +b 0.001

  • b

0.999 E P(E) +e 0.002

  • e

0.998 B E A P(A|B,E) +b +e +a 0.95 +b +e

  • a

0.05 +b

  • e

+a 0.94 +b

  • e
  • a

0.06

  • b

+e +a 0.29

  • b

+e

  • a

0.71

  • b
  • e

+a 0.001

  • b
  • e
  • a

0.999 A J P(J|A) +a +j 0.9 +a

  • j

0.1

  • a

+j 0.05

  • a
  • j

0.95 A M P(M|A) +a +m 0.7 +a

  • m

0.3

  • a

+m 0.01

  • a
  • m

0.99

B E A M J

Example: Alarm Network

B P(B) +b 0.001

  • b

0.999 E P(E) +e 0.002

  • e

0.998 B E A P(A|B,E) +b +e +a 0.95 +b +e

  • a

0.05 +b

  • e

+a 0.94 +b

  • e
  • a

0.06

  • b

+e +a 0.29

  • b

+e

  • a

0.71

  • b
  • e

+a 0.001

  • b
  • e
  • a

0.999 A J P(J|A) +a +j 0.9 +a

  • j

0.1

  • a

+j 0.05

  • a
  • j

0.95 A M P(M|A) +a +m 0.7 +a

  • m

0.3

  • a

+m 0.01

  • a
  • m

0.99

B E A M J

Bayes’ Nets

  • Representation
  • Conditional Independences
  • Probabilistic Inference
  • Enumeration (exact, exponential

complexity)

  • Variable elimination (exact, worst-case

exponential complexity, often better)

  • Inference is NP-complete
  • Sampling (approximate)
  • Learning Bayes’ Nets from Data
slide-2
SLIDE 2
  • Exa

Examples: s:

  • Po

Posterior probability

  • Most like

kely explanation

Inference

  • In

Inference: calculating some useful quantity from a joint probability distribution

Inference by Enumeration in Bayes’ Net

  • Gi

Given unlimited d time, inference in BNs Ns is easy

  • Na

Naïve strategy: In Infer eren ence ce by en enumer erat ation

B E A M J P(B | + j, +m) ∝B P(B, +j, +m)

= X

e,a

P(B, e, a, +j, +m) = X

e,a

P(B)P(e)P(a|B, e)P(+j|a)P(+m|a)

=P(B)P(+e)P(+a|B, +e)P(+j| + a)P(+m| + a) + P(B)P(+e)P(−a|B, +e)P(+j| − a)P(+m| − a) P(B)P(−e)P(+a|B, −e)P(+j| + a)P(+m| + a) + P(B)P(−e)P(−a|B, −e)P(+j| − a)P(+m| − a)

J M

capital B means: compute for all values b

P(B| + j, +m) = P(B, +j, +m) P(+j, +m)

<latexit sha1_base64="QeIUGpwmkQOMpLFOW6kPQY/Mnw=">AHR3icfZXdbtMwFMe9Ad0oXxtchPRIQ0tGsk6uoGENG1M4ysQ+pqSYncduszgeO07Uzfgyehlu4BF4Cu4Ql9hJBkc5irt6fHv/H3sHNt2hL2YGsaPufkbN281FhZvN+/cvXf/wdLyw+M4TIiDjpwQh+TUhjHCXoCOqEcxOo0Igr6N0Yk93pP9JxNEYi8MPtBZhPo+HAbewHMgFa6zpefd1V3tk7Z2rmtr/jPtWYNCHSY8OpXTi7+XZna2VLWDfSpqmGmRutHZC17tlyA1hu6CQ+CqiDYRz3TCOifQYJ9RyMeNKYhRBZwyHqCfMAPo7rN0ZlxrNp8W+lmALqIpRVPKa/w+pCPeLOox6MeZt+IchAGNy14MhW48te2y8r9tA0ColM3z1PYmqHUy5TdNFArH+aM3NtnCDODg92OTP0Tls3N7Z4lSHIzRFz29DFRyGBKEgZ7Y3dbOzXQNFCYkw+kcZkpMZF6k3kIy7Oej4s7FIa73T0cW70o30abe5GrGbziLjzRy7j+UM7qSN1L1LOxFDbw3g0FZXD7mf+iDbCkKuRvXqr8jMBiYjaFCcvkxQIRJGrGCX0fBi6zJsjhPbPLBTECUGyZphlh9gVBSF+WMvknCtBWYiITfuFaLEXTSDmzNJF0AjR5YKW7Ewgm5MwxVuvaoGTHl5+AmbpoMWmZnCzBTmUmEuFWakMBaisGaOSRlMqkIfK0J0lOpUZXAFw+KAcms4WpWDyoiDChKNPGXtIRn6UK5nGCECaUjkoXLh0RH2fI/GLO/napQXB8l+quD7VcSkt+2zfa5Qjo2TktGWOVtlpZPGSWugsodVkMOiUJmG6aGjVQ2PxlqYGemwOm+rUFDVTfhBJO74yXsnX+3hCqcbyxbrbX2+83Wzsb+e2xCB6DJ2AVmGAL7IC3oAuOgAM+gy/gK/jW+N742fjV+J2h83N5zCNQagtzfwCgbJVC</latexit>

Inference by Enumeration

  • Ge

General l case:

  • Ev

Evidence vari riables:

  • Qu

Quer ery* var variable: iable:

  • Hi

Hidden vari riables: Al All var variab ables es

* Works ks fine with multiple qu query ry vari riabl bles, too

  • We

We want:

  • St

Step p 1: Se Select the entries consistent with the evidence

  • St

Step p 2: Su Sum out H to get joint

  • f Query and evidence
  • St

Step p 3: No Normaliz lize

× 1 Z

Inference by Enumeration? Inference by Enumeration vs. Variable Elimination

  • Why is inference by enumeration so slow?
  • You join up the whole joint distribution before

you sum out the hidden variables

  • Id

Idea: ea: inter erleav eave e joining an and mar arginal alizing!

  • Called “Va

Variable Elimination”

  • Still NP-hard, but usually much faster

than inference by enumeration

  • First we’ll need some new notation: fa

factors rs

Factor Zoo

Go Goal: Let’s make a ta taxon

  • nom
  • my of

co conditional al probab ability tab ables es (we have seen most of these before)

Factor Zoo I

  • Jo

Joint dist stribution: P( P(X,Y) X,Y)

  • Entries P(

P(x, x,y) for all x, y

  • Sums to 1
  • Se

Selecte ted j d joint: t: P( P(x, x,Y)

  • A slice of the joint distribution
  • Entries P(

P(x, x,y) for fixe xed x, al all y

  • Sums to P(x)

x) (usually not 1)

  • Number of capitals =

number of dimensions in table

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T W P cold sun 0.2 cold rain 0.3

Factor Zoo II

  • Si

Single conditional: P( P(Y | x)

  • Entries P(

P(y | x) for fi fixed x, al all y

  • Sums to 1
  • Fa

Family of cond nditiona nals: P( P(Y | X)

  • Multiple conditionals
  • Entries P(

P(y | x) for al all x, y

  • Sums to |X

|X| (size of domain)

T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 T W P cold sun 0.4 cold rain 0.6

slide-3
SLIDE 3

Factor Zoo III

  • Specified family: P(

P( y y | X )

  • Entries P(

P(y | x) for fi fixed y, bu but for all x

  • Sums to … who knows!

T W P hot rain 0.2 cold rain 0.6

Factor Zoo Summary

  • In

In gen ener eral al, when en we e write e P( P(Y1 … … YN | | X1 … … XM)

  • This is a fa

factor, a multi-dimensional array containing numbers ≥ 0

  • Its values are P(

P(y1 … … yN | x | x1 … … xM)

  • Any assi

assigned ed (=lower-case) X or Y is a dimension missing (selected) from the array

Example: Traffic Domain

  • Ra

Random Varia iable les

  • R: Ra

Raining

  • T: Traf

Traffic

  • L: Late for class!

ss!

T L R

+r 0.1

  • r

0.9 +r +t 0.8 +r

  • t

0.2

  • r

+t 0.1

  • r
  • t

0.9 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9

P(L) = ?

= X

r,t

P(r, t, L) = X

r,t

P(r)P(t|r)P(L|t)

Inference by Enumeration: Procedural Outline

  • Track

k all objects s (fa facto tors)

  • Initial factors

s are lo local l CPTs s (one per node)

  • Any

y kn known va values s are se selected

  • E.g. if we know , the initial factors are
  • Pr

Procedu dure: Jo Join all factors, el eliminat nate all hidden variables, normalize ze

+r 0.1

  • r

0.9 +r +t 0.8 +r

  • t

0.2

  • r

+t 0.1

  • r
  • t

0.9 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9 +t +l 0.3

  • t

+l 0.1 +r 0.1

  • r

0.9 +r +t 0.8 +r

  • t

0.2

  • r

+t 0.1

  • r
  • t

0.9

Operation 1: Join Factors

  • First

st basi sic operation: jo join inin ing fa facto tors

  • Combining factors:

s:

  • Ju

Just st like ke a database se join

  • Get all factors over the joining variable
  • Build a new factor over the union
  • f the variables involved
  • Exa

xample: Join on R

  • Computation for each entry:

y: pointwise products +r 0.1

  • r

0.9 +r +t 0.8 +r

  • t

0.2

  • r

+t 0.1

  • r
  • t

0.9 +r +t 0.08 +r

  • t

0.02

  • r

+t 0.09

  • r
  • t

0.81

T R R,T

Example: Multiple Joins Example: Multiple Joins

T R

Join R

L R, T L

+r 0.1

  • r

0.9 +r +t 0.8 +r

  • t 0.2
  • r

+t 0.1

  • r
  • t 0.9

+t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9 +r +t 0.08 +r

  • t

0.02

  • r

+t 0.09

  • r
  • t

0.81 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9

R, T, L

+r +t +l

0.024

+r +t

  • l

0.056

+r

  • t

+l

0.002

+r

  • t
  • l

0.018

  • r

+t +l

0.027

  • r

+t

  • l

0.063

  • r
  • t

+l

0.081

  • r
  • t
  • l

0.729

Join T

O(2^2) O(2^3)

Operation 2: Eliminate

  • Second basi

sic operation: marginaliza zation

  • Take

ke a factor and su sum out a va variable

  • Shrinks

ks a factor to a smaller one

  • A pr

projection operation

  • Exa

xample:

+r +t 0.08 +r

  • t

0.02

  • r

+t 0.09

  • r
  • t

0.81 +t 0.17

  • t

0.83

slide-4
SLIDE 4

Multiple Elimination

Sum

  • ut R

Sum

  • ut T

T, L L R, T, L

+r +t +l

0.024

+r +t

  • l

0.056

+r

  • t

+l

0.002

+r

  • t
  • l

0.018

  • r

+t +l

0.027

  • r

+t

  • l

0.063

  • r
  • t

+l

0.081

  • r
  • t
  • l

0.729 +t +l

0.051

+t

  • l

0.119

  • t

+l

0.083

  • t
  • l

0.747

+l 0.134

  • l

0.886

Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)

Marginalizing Early (= Variable Elimination) Traffic Domain

Inference by y Enumeration

T L R

P(L) = ?

Va Variable Elimination

= X

t

P(L|t) X

r

P(r)P(t|r)

= X

t

X

r

P(L|t)P(r)P(t|r)

Join on r Join on t Eliminate r Eliminate t Join on r Eliminate r Join on t Eliminate t

Marginalizing Early! (aka VE)

Sum out R

T L

+r +t 0.08 +r

  • t

0.02

  • r

+t 0.09

  • r
  • t

0.81 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9 +t 0.17

  • t

0.83 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9

T R L

+r 0.1

  • r

0.9 +r +t 0.8 +r

  • t 0.2
  • r

+t 0.1

  • r
  • t 0.9

+t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9 Join R

R, T L T, L L

+t +l

0.051

+t

  • l

0.119

  • t

+l

0.083

  • t
  • l

0.747

+l 0.134

  • l

0.866 Join T Sum out T

O(2^2) O(2^2) O(2^2) O(2^2)

Evidence

  • If evidence, start with factors that select that evidence
  • No evidence uses these initial factors:
  • Computing , the initial factors become:
  • We eliminate all vars other than query + evidence

+r 0.1

  • r

0.9 +r +t 0.8 +r

  • t

0.2

  • r

+t 0.1

  • r
  • t

0.9 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9 +r 0.1 +r +t 0.8 +r

  • t

0.2 +t +l 0.3 +t

  • l

0.7

  • t

+l 0.1

  • t
  • l

0.9

Evidence II

  • Re

Resul ult will be a se selected j joint of

  • f qu

query an and ev eviden ence ce

  • E.g. for P(L | +r

+r), we would end up with:

  • To get our answer, just no

normalize this!

  • That’s it!

+l 0.26

  • l

0.74 +r +l 0.026 +r

  • l

0.074 Normalize

General Variable Elimination

  • Query:

y:

  • Start with initial factors:

s:

  • Local CPTs (but instantiated by evidence)
  • While there are st

still hidden va variables s (not Q Q or evi vidence):

  • Pick

k a hidden variable H

  • Jo

Join all factors mentioning H

  • El

Eliminate (sum out) H

  • Jo

Join all remaining factors and normalize ze

slide-5
SLIDE 5

Example

Cho Choose A

Example

Choose E Finish with B

Normalize

Same Example in Equations

define marginal in terms of sum decompose joint probability for Bayes’ net use x*(y+z) = xy + xz joining on a, and then summing out gives f1 use x*(y+z) = xy + xz joining on e, and then summing out gives f2

All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency!

Exercise: Variable Elimination Ordering

  • Suppose

se we have ve the query: y: P( P(Xn|y |y1,…, …,yn) )

  • Compare two possi

ssible or

  • rder

ering ngs: 1.

  • 1. Z,

Z, X1, …, …, Xn-1 2.

  • 2. X1, …,

…, Xn-1, Z , Z

  • Answ

swer: O( O(2n+

n+1)

) versus O( O(n 2 22) (assuming binary)

  • In

In g general: l: the ordering can greatly affect efficiency.

… …

VE: Computational and Space Complexity

  • The

The com comput utat ational

  • nal and

and sp space complexi xity y of va variable elimination are determined by y the largest st factor

  • The

The el eliminat nation

  • n or
  • rder

ering ng can greatly y affect the si size ze of the largest st factor.

  • E.g., previous slide’s example 2n vs. 2
  • Does

s there always ys exi xist st an ordering that only y resu sults s in sm small factors? s?

  • No

No! (everything in AI is NP hard)

Polytrees

  • A

A polyt ytree is s a directed graph with no undirected cyc ycles

  • For

For pol

  • ly-trees

s yo you can always ys find an ordering that is s efficient

  • Try it!!
  • Cu

Cut-se set conditioning for Baye yes’ net net inf nfer erence ence

  • Choose se

set of va variables s such that if removed only a polyt ytree re rema mains

  • Exe

xercise se: Think about how the specifics would work out!

Bayes’ Nets

  • Representation
  • Conditional Independences
  • Probabilistic Inference
  • Enumeration (exact, exponential

complexity)

  • Variable elimination (exact, worst-case

exponential complexity, often better)

  • Inference is NP-complete
  • Sampling (approximate)
  • Learning Bayes’ Nets from Data