ECE 6504: Advanced Topics in Machine Learning Probabilistic - - PowerPoint PPT Presentation

ece 6504 advanced topics in machine learning
SMART_READER_LITE
LIVE PREVIEW

ECE 6504: Advanced Topics in Machine Learning Probabilistic - - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics: Bayes Nets: Representation/Semantics d-separation, Local Markov Assumption Markov Blanket I-equivalence, (Minimal)


slide-1
SLIDE 1

ECE 6504: Advanced Topics in Machine Learning

Probabilistic Graphical Models and Large-Scale Learning

Dhruv Batra Virginia Tech

Topics:

– Bayes Nets: Representation/Semantics – d-separation, Local Markov Assumption – Markov Blanket – I-equivalence, (Minimal) I-Maps, P-Maps

Readings: KF 3.2,, 3.4

slide-2
SLIDE 2

Recap of Last Time

(C) Dhruv Batra 2

slide-3
SLIDE 3

A general Bayes net

  • Set of random variables
  • Directed acyclic graph

– Encodes independence assumptions

  • CPTs

– Conditional Probability Tables

  • Joint distribution:

(C) Dhruv Batra 3 Flu Allergy Sinus Headache Nose

slide-4
SLIDE 4

Independencies in Problem

BN:

Graph G encodes local independence assumptions

World, Data, reality:

True distribution P contains independence assertions

(C) Dhruv Batra 4 Slide Credit: Carlos Guestrin

slide-5
SLIDE 5

Bayes Nets

  • BN encode (conditional) independence assumptions.

– I(G) = {X indep of Y given Z}

  • Which ones?
  • And how can we easily read them?

(C) Dhruv Batra 5

slide-6
SLIDE 6

Local Structures

  • What’s the smallest Bayes Net?

(C) Dhruv Batra 6

slide-7
SLIDE 7

Local Structures

(C) Dhruv Batra 7

Z Y X Z Y X Z Y X Z Y X

Indirect causal effect: Indirect evidential effect: Common cause: Common effect:

slide-8
SLIDE 8

Bayes Ball Rules

  • Flow of information

– on board

(C) Dhruv Batra 8

slide-9
SLIDE 9

Plan for today

  • Bayesian Networks: Semantics

– d-separation – General (conditional) independence assumptions in a BN – Markov Blanket – (Minimal) I-map, P-map

(C) Dhruv Batra 9

slide-10
SLIDE 10

Active trails formalized

  • Let variables O ⊆ {X1,…,Xn} be observed
  • A path X1 – X2 – · · · –Xk is an active trail if for each

consecutive triplet:

– Xi-1→Xi→Xi+1, and Xi is not observed (Xi∉O) – Xi-1←Xi←Xi+1, and Xi is not observed (Xi∉O) – Xi-1←Xi→Xi+1, and Xi is not observed (Xi∉O) – Xi-1→Xi←Xi+1, and Xi is observed (Xi∈O), or one of its descendents is observed

Slide Credit: Carlos Guestrin (C) Dhruv Batra 10

slide-11
SLIDE 11

A H C E G D B F F’’ F’

When are A and H independent?

An active trail – Example

slide-12
SLIDE 12

d-Separation

  • Definition: Variables X and Y

are d-separated given Z if

– no active trail between Xi and Yj when variables Z⊆{X1,…,Xn} are

  • bserved

A H C E G D B F K J I

(C) Dhruv Batra 12 Slide Credit: Carlos Guestrin

slide-13
SLIDE 13

d-Separation

  • So what if X and Y are d-separated given Z?

(C) Dhruv Batra 13

slide-14
SLIDE 14

Factorization + d-sep è Independence

  • Theorem:

– If

  • P factorizes over G
  • d-sepG(X, Y | Z)

– Then

  • P Ⱶ (X ⊥ Y | Z)

– Corollary:

  • I(G) ⊆ I(P)
  • All independence assertions read from G are correct!

(C) Dhruv Batra 14

slide-15
SLIDE 15

More generally: Completeness of d-separation

  • Theorem: Completeness of d-separation

– For “almost all” distributions where P factorizes over to G – we have that I(G) = I(P)

  • “almost all” distributions: except for a set of measure zero of CPTs
  • Means that if X & Y are not d-separated given Z, then P¬ (X⊥Y|Z)

(C) Dhruv Batra 15 Slide Credit: Carlos Guestrin

slide-16
SLIDE 16

A variable X is independent of its non-descendants given its parents and only its parents

(Xi ⊥ NonDescendantsXi | PaXi)

Local Markov Assumption

Flu Allergy Sinus Headache Nose

slide-17
SLIDE 17

= Markov Blanket of variable x8 ¡– Parents, children and parents of children

¡

Markov Blanket

(C) Dhruv Batra 17 Slide Credit: Simon J.D. Prince

slide-18
SLIDE 18

A variable is conditionally independent of all others, given its Markov Blanket

¡

Example

(C) Dhruv Batra 18 Slide Credit: Simon J.D. Prince

slide-19
SLIDE 19

I-map

  • Independency map
  • Definition:

– If I(G) ⊆ I(P) – G is an I-map of P

(C) Dhruv Batra 19

slide-20
SLIDE 20

Factorization + d-sep è Independence

  • Theorem:

– If

  • P factorizes over G
  • d-sepG(X, Y | Z)

– Then

  • P Ⱶ (X ⊥ Y | Z)

– Corollary:

  • I(G) ⊆ I(P)
  • G is an I-map of P
  • All independence assertions read from G are correct!

(C) Dhruv Batra 20

slide-21
SLIDE 21

Important because: Every P has at least one BN structure G

If G is an I-map of P

Obtain

P factorizes to G P factorizes to G

Obtain

G is an I-map of P

(C) Dhruv Batra 21 Slide Credit: Carlos Guestrin

Important because: Read independencies of P from BN structure G

The BN Representation Theorem

Homework 1!!!! J J

slide-22
SLIDE 22

I-Equivalence

  • Two graphs G1 and G2 are I-equivalent if

– I(G1) = I(G2)

  • Equivalence class of BN structures

– Mutually-exclusive and exhaustive partition of graphs

(C) Dhruv Batra 22

slide-23
SLIDE 23

Minimal I-maps & P-maps

  • Many possible I-maps
  • Is there a “simplest” I-map?
  • Yes, two directions

– Minimal I-maps – P-maps

(C) Dhruv Batra 23

slide-24
SLIDE 24

Minimal I-map

  • G is a minimal I-map for P if

– deleting any edges from G makes it no longer an I-map

(C) Dhruv Batra 24

slide-25
SLIDE 25

P-map

  • Perfect map
  • G is a P-map for P if

– I(P) = I(G)

  • Question: Does every distribution P have P-map?

(C) Dhruv Batra 25

slide-26
SLIDE 26

BN: Representation: What you need to know

  • Bayesian networks

– A compact representation for large probability distributions – Not an algorithm

  • Representation

– BNs represent (conditional) independence assumptions – BN structure = family of distributions – BN structure + CPTs = 1 single distribution – Concepts

  • Active Trails (flow of information); d-separation;
  • Local Markov Assumptions, Markov Blanket
  • I-map, P-map
  • BN Representation Theorem (I-map çè Factorization)

(C) Dhruv Batra 26

slide-27
SLIDE 27

Main Issues in PGMs

  • Representation

– How do we store P(X1, X2, …, Xn) – What does my model mean/imply/assume? (Semantics)

  • Learning

– How do we learn parameters and structure of P(X1, X2, …, Xn) from data? – What model is the right for my data?

  • Inference

– How do I answer questions/queries with my model? such as – Marginal Estimation: P(X5 | X1, X4) – Most Probable Explanation: argmax P(X1, X2, …, Xn)

(C) Dhruv Batra 27

slide-28
SLIDE 28

Learning Bayes nets

Known structure Unknown structure Fully observable data Missing data

x(1) … x(m)

Data

structure parameters

CPTs – P(Xi| PaXi)

(C) Dhruv Batra 28 Slide Credit: Carlos Guestrin

Very easy Somewhat easy (EM) Hard Very very hard