1 Building the (Entire) Joint Example: Alarm Network We can take a - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Building the (Entire) Joint Example: Alarm Network We can take a - - PDF document

Announcements Introduction to Artificial Intelligence How was mid-term? V22.0472-001 Fall 2009 Will grade mid-term / assignment 2 this Lecture 14: Bayes Nets 2 Lecture 14: Bayes Nets 2 weekend weekend Assignment 3 due this time


slide-1
SLIDE 1

1

Introduction to Artificial Intelligence

V22.0472-001 Fall 2009 Lecture 14: Bayes’ Nets 2 Lecture 14: Bayes Nets 2

Rob Fergus – Dept of Computer Science, Courant Institute, NYU Slides from Karen Livescu, Jeff Blimes, Dan Klein, Stuart Russell or Andrew Moore

Announcements

  • How was mid-term?
  • Will grade mid-term / assignment 2 this

weekend weekend

  • Assignment 3 due this time next week
  • Office hours today after class

Example Bayes’ Net

3

Bayes’ Nets

  • A Bayes’ net is an efficient encoding of a

probabilistic model of a domain

  • Questions we can ask:

Q

  • Inference: given a fixed BN, what is P(X | e)?
  • Representation: given a fixed BN, what kinds of

distributions can it encode?

  • Modeling: what BN is most appropriate for a

given domain?

4

Example: Traffic

  • Variables
  • T: Traffic
  • R: It rains
  • L: Low pressure

R B L

  • D: Roof drips
  • B: Ballgame

5

T D

Bayes’ Net Semantics

  • A Bayes’ net:
  • A set of nodes, one per variable X
  • A directed, acyclic graph
  • A conditional distribution of each variable

conditioned on its parents (the parameters θ)

A1 An

  • Semantics:
  • A BN defines a joint probability distribution over

its variables:

6

X

slide-2
SLIDE 2

2

Building the (Entire) Joint

  • We can take a Bayes’ net and build any entry from

the full joint distribution it encodes

  • Typically, there’s no reason to build ALL of it
  • We build what we need on the fly
  • To emphasize: every BN over a domain implicitly

defines a joint distribution over that domain, specified by local probabilities and graph structure

7

Example: Alarm Network

8

Size of a Bayes’ Net

  • How big is a joint distribution over N Boolean variables?

2N

  • How big is an N-node net if nodes have up to k parents?

O(N * 2k+1) O(N * 2k+1)

  • Both give you the power to calculate
  • BNs: Huge space savings!
  • Also easier to elicit local CPTs
  • Also turns out to be faster to answer queries (coming)

9

Bayes’ Nets

  • So far:
  • What is a Bayes’ net?
  • What joint distribution does it encode?
  • Next: how to answer queries about that distribution
  • Key idea: conditional independence

Key idea: conditional independence

  • Last class: assembled BNs using an intuitive notion of conditional

independence as causality

  • Today: formalize these ideas
  • Main goal: answer queries about conditional independence and

influence

  • After that: how to answer numerical queries (inference)

10

Conditional Independence

  • Reminder: independence
  • X and Y are independent if
  • X and Y are conditionally independent given Z
  • (Conditional) independence is a property of a

distribution

11

Example: Independence

  • For this graph, you can fiddle with θ (the CPTs) all you want,

but you won’t be able to represent any distribution in which the flips are dependent!

12

h 0.5 t 0.5 h 0.5 t 0.5

X1 X2

All distributions

slide-3
SLIDE 3

3

Topology Limits Distributions

  • Given some graph

topology G, only certain joint distributions can be encoded

  • The graph structure

guarantees certain X Y Z X Y Z g (conditional) independences

  • (There might be more

independence)

  • Adding arcs increases the

set of distributions, but has several costs

13

X Y Z

Independence in a BN

  • Important question about a BN:
  • Are two nodes independent given certain evidence?
  • If yes, can calculate using algebra (really tedious)
  • If no, can prove with a counter example
  • Example:
  • Example:
  • Question: are X and Z independent?
  • Answer: not necessarily, we’ve seen examples otherwise: low

pressure causes rain which causes traffic.

  • X can influence Z, Z can influence X (via Y)
  • Addendum: they could be independent: how?

14

X Y Z

  • 1. Causal Chains
  • This configuration is a “causal chain”

X Y Z

X: Low pressure Y: Rain Z: Traffic

  • Is X independent of Z given Y?
  • Evidence along the chain “blocks” the influence

15

Yes!

  • 2. Common Cause
  • Another basic configuration: two

effects of the same cause

  • Are X and Z independent?
  • Are X and Z independent given Y?

X Y Z

p g

  • Observing the cause blocks influence

between effects.

16

Yes!

Y: Midterm exam X: Email list busy Z: Library full

Common Cause Example: Is height independent of hair length?

Slide credit: Karen Livescu x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

Is height independent of hair length? (2)

L

mid long x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

H

5’ 6’ 7’

L

short Slide credit: Karen Livescu

slide-4
SLIDE 4

4

Is height independent of hair length? (3)

  • Generally, no
  • If gender known, yes
  • This is the “common cause” scenario

gender G hair length height H L

) | ( ) , | ( ) ( ) | ( g h p g l h p h p l h p = ≠

G L H | ⊥ L H ⊥

Slide credit: Karen Livescu

  • 3. Common Effect
  • Last configuration: two causes of one

effect (v-structures)

  • Are X and Z independent?
  • Yes: remember the ballgame and the rain

causing traffic no correlation?

X Z

causing traffic, no correlation?

  • Still need to prove they must be (try it!)
  • Are X and Z independent given Y?
  • No: remember that seeing traffic put the rain

and the ballgame in competition?

  • This is backwards from the other cases
  • Observing the effect enables influence between

effects.

20

Y

X: Raining Z: Ballgame Y: Traffic

Common Effect Example

  • Let X, Z be two i.i.d coin tosses {0,1}
  • Let Y = X + Z
  • If we observe Y then X and Z become coupled
  • P(X=1|Z=1) = 0.25 but P(X=1|Z=1,Y=2) = 1

X Z Y 1 1 1 1 1 1 2

More explaining away...

C4 C2 C3 C5 C1 pipes faucet caulking drain upstairs L leak

j i L C C

j i

, | ∀ ⊥

) | ( ) , | ( ) ( ) | ( l c p l c c p c p c c p

i j i i j i

≠ =

j i C C

j i

, ∀ ⊥

Slide credit: Karen Livescu

SUVs Greenhouse Gasses Global Warming d

Examples of the three cases

Slide: J. Bilmes Page 23

Lung Cancer Smoking Bad Breath Genetics Cancer Smoking

The General Case

  • Any complex example can be analyzed using

these three canonical cases

  • General question: in a given BN are two
  • General question: in a given BN, are two

variables independent (given evidence)?

  • Solution: analyze the graph

24

slide-5
SLIDE 5

5

Reachability

  • Recipe: shade evidence nodes
  • Attempt 1: if two nodes are connected

by an undirected path not blocked by a shaded node, they are conditionally d d R B L dependent

  • Almost works, but not quite
  • Where does it break?
  • Answer: the v-structure at T doesn’t count

as a link in a path unless “inactive”

25

T D T’

Reachability (the Bayes’ Ball)

  • Correct algorithm:
  • Shade in evidence
  • Start at source node
  • Try to reach target by search
  • States: pair of (node X, previous

state S)

S X X S

  • Successor function:
  • X unobserved:
  • To any child
  • To any parent if coming from a

child

  • X observed:
  • From parent to parent
  • If you can’t reach a node, it’s

conditionally independent of the start node given evidence

26

S S X X S

Reachability (D-Separation)

  • Question: Are X and Y

conditionally independent given evidence variables {Z}?

  • Look for “active paths” from X to Y
  • No active paths = independence!
  • A path is active if each triple is

Active Triples Inactive Triples

  • A path is active if each triple is

either a:

  • Causal chain A → B → C where B is

unobserved (either direction)

  • Common cause A ← B → C where B

is unobserved

  • Common effect (aka v-structure)

A → B ← C where B or one of its descendents is observed

Also known as Bayes Ball

27

Example

Yes

28

Example

R B L Yes Yes

29

T D T’ Yes Yes

Example

  • Variables:
  • R: Raining
  • T: Traffic
  • D: Roof drips

R

D: Roof drips

  • S: I’m sad
  • Questions:

30

T S D Yes

slide-6
SLIDE 6

6

Causality?

  • When Bayes’ nets reflect the true causal patterns:
  • Often simpler (nodes have fewer parents)
  • Often easier to think about
  • Often easier to elicit from experts
  • BNs need not actually be causal

y

  • Sometimes no causal net exists over the domain
  • E.g. consider the variables Traffic and Drips
  • End up with arrows that reflect correlation, not causation
  • What do the arrows really mean?
  • Topology may happen to encode causal structure
  • Topology only guaranteed to encode conditional independence

31

Example: Coins

  • Extra arcs don’t prevent representing independence,

just allow non-independence X1 X2 X1 X2

32

h 0.5 t 0.5 h 0.5 t 0.5 1 2 h 0.5 t 0.5 h | h 0.5 t | h 0.5 1 2 h | t 0.5 t | t 0.5

Changing Bayes’ Net Structure

  • The same joint distribution can be encoded in

many different Bayes’ nets

  • Causal structure tends to be the simplest
  • Analysis question: given some edges, what
  • ther edges do you need to add?
  • One answer: fully connect the graph
  • Better answer: don’t make any false conditional

independence assumptions

33

Example: Alternate Alarm

Burglary Earthquake Al John calls Mary calls

If we reverse the edges, we make different conditional independence assumptions

34

Alarm John calls Mary calls Alarm Burglary Earthquake

To capture the same joint distribution, we have to add more edges to the graph

Summary

  • Bayes nets compactly encode joint distributions
  • Guaranteed independencies of distributions can be

deduced from BN graph structure

  • The Bayes’ ball algorithm (aka d-separation)
  • A Bayes’ net may have other independencies that are

not detectable until you inspect its specific distribution

35