Probabilistic graphical models Yifeng Tao School of Computer - - PowerPoint PPT Presentation

probabilistic graphical models
SMART_READER_LITE
LIVE PREVIEW

Probabilistic graphical models Yifeng Tao School of Computer - - PowerPoint PPT Presentation

Introduction to Machine Learning Probabilistic graphical models Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Matt Gormley Yifeng Tao Carnegie Mellon University 1 Recap of Basic Probability


slide-1
SLIDE 1

Probabilistic graphical models

Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Matt Gormley

Carnegie Mellon University 1 Yifeng Tao

Introduction to Machine Learning

slide-2
SLIDE 2

Recap of Basic Probability Concepts

  • Representation: the joint probability distribution on multiple binary

variables?

  • State configurations in total: 28
  • Are they all needed to be represented?
  • Do we get any scientific/medical insight?
  • Learning: where do we get all this probabilities?
  • Maximal-likelihood estimation?
  • Inference: If not all variables are observable, how to compute the

conditional distribution of latent variables given evidence?

  • Computing p(H|A) would require summing over all 26 configurations of the

unobserved variables

Yifeng Tao Carnegie Mellon University 2

[Slide from Eric Xing.]

slide-3
SLIDE 3

Graphical Model: Structure Simplifies Representation

  • Dependencies among variables

Yifeng Tao Carnegie Mellon University 3

[Slide from Eric Xing.]

slide-4
SLIDE 4

Probabilistic Graphical Models

  • If Xi’s are conditionally independent (as described by a PGM), the

joint can be factored to a product of simpler terms, e.g.,

  • Why we may favor a PGM?
  • Incorporation of domain knowledge and causal (logical) structures
  • 2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28 in representation cost!

Yifeng Tao Carnegie Mellon University 4

[Slide from Eric Xing.]

slide-5
SLIDE 5

Two types of GMs

  • Directed edges give causality relationships (Bayesian Network or

Directed Graphical Model):

  • Undirected edges simply give correlations between variables

(Markov Random Field or Undirected Graphical model):

Yifeng Tao Carnegie Mellon University 5

[Slide from Eric Xing.]

slide-6
SLIDE 6

Bayesian Network

  • Definition:
  • It consists of a graph G and the conditional probabilities P
  • These two parts full specify the distribution:
  • Qualitative Specification: G
  • Quantitative Specification: P

Yifeng Tao Carnegie Mellon University 6

[Slide from Eric Xing.]

slide-7
SLIDE 7

Where does the qualitative specification come from?

  • Prior knowledge of causal relationships
  • Learning from data (i.e. structure learning)
  • We simply prefer a certain architecture (e.g. a layered graph)

Yifeng Tao Carnegie Mellon University 7

[Slide from Matt Gormley.]

slide-8
SLIDE 8

Quantitative Specification

  • Example: Conditional probability tables (CPTs) for discrete random

variables

Yifeng Tao Carnegie Mellon University 8

[Slide from Eric Xing.]

slide-9
SLIDE 9

Quantitative Specification

  • Example: Conditional probability density functions (CPDs) for

continuous random variables

Yifeng Tao Carnegie Mellon University 9

[Slide from Eric Xing.]

slide-10
SLIDE 10

Observed Variables

  • In a graphical model, shaded nodes are “observed”, i.e. their

values are given

Yifeng Tao Carnegie Mellon University 10

[Slide from Matt Gormley.]

slide-11
SLIDE 11

GMs are your old friends

  • Density estimation
  • Parametric and nonparametric methods
  • Regression
  • Linear, conditional mixture, nonparametric
  • Classification
  • Generative and discriminative approach
  • Clustering

Yifeng Tao Carnegie Mellon University 11

[Slide from Eric Xing.]

slide-12
SLIDE 12

What Independencies does a Bayes Net Model?

  • Independency of X and Z given Y?

P(X|Y)P(Z|Y) = P(X,Z|Y)

  • Three cases of interest...
  • Proof?

Yifeng Tao Carnegie Mellon University 12

[Slide from Matt Gormley.]

slide-13
SLIDE 13

The “Burglar Alarm” example

  • Your house has a twitchy burglar alarm that is also sometimes

triggered by earthquakes.

  • Earth arguably doesn’t care whether your house is currently being

burgled.

  • While you are on vacation, one of your neighbors calls and tells you

your home’s burglar alarm is ringing.

Yifeng Tao Carnegie Mellon University 13

[Slide from Matt Gormley.]

slide-14
SLIDE 14

Markov Blanket

  • Def: the co-parents of a node are the

parents of its children

  • Def: the Markov Blanket of a node is

the set containing the node’s parents, children, and co-parents.

  • Thm: a node is conditionally

independent of every other node in the graph given its Markov blanket

  • Example: The Markov Blanket of X6 is

{X3, X4, X5, X8, X9, X10}

Yifeng Tao Carnegie Mellon University 14

[Slide from Matt Gormley.]

slide-15
SLIDE 15

Markov Blanket

  • Example: The Markov Blanket of X6 is

{X3, X4, X5, X8, X9, X10}

Yifeng Tao Carnegie Mellon University 15

[Slide from Matt Gormley.]

slide-16
SLIDE 16

D-Separation

  • Thm: If variables X and Z are d-separated given a set of variables E

Then X and Z are conditionally independent given the set E

  • Definition:
  • Variables X and Z are d-separated given a set of evidence variables E

iff every path from X to Z is “blocked”.

Yifeng Tao Carnegie Mellon University 16

[Slide from Matt Gormley.]

slide-17
SLIDE 17

D-Separation

  • Variables X and Z are d-separated

given a set of evidence variables E iff every path from X to Z is “blocked”.

Yifeng Tao Carnegie Mellon University 17

[Slide from Eric Xing.]

slide-18
SLIDE 18

Machine Learning

Yifeng Tao Carnegie Mellon University 18

[Slide from Matt Gormley.]

slide-19
SLIDE 19

Recipe for Closed-form MLE

Yifeng Tao Carnegie Mellon University 19

[Slide from Matt Gormley.]

slide-20
SLIDE 20

Learning Fully Observed BNs

  • How do we learn these conditional and marginal distributions for a

Bayes Net?

Yifeng Tao Carnegie Mellon University 20

[Slide from Matt Gormley.]

slide-21
SLIDE 21

Learning Fully Observed BNs

  • Learning this fully observed Bayesian Network is equivalent to

learning five (small / simple) independent networks from the same data

Yifeng Tao Carnegie Mellon University 21

[Slide from Matt Gormley.]

slide-22
SLIDE 22

Learning Fully Observed BNs

Yifeng Tao Carnegie Mellon University 22

[Slide from Matt Gormley.]

slide-23
SLIDE 23

Learning Partially Observed BNs

  • Partially Observed Bayesian Network:
  • Maximal likelihood estimation à Incomplete log-likelihood
  • The log-likelihood contains unobserved latent variables
  • Solve with EM algorithm
  • Example: Gaussian Mixture Models (GMMs)

Yifeng Tao Carnegie Mellon University 23

[Slide from Eric Xing.]

slide-24
SLIDE 24

Inference of BNs

  • Suppose we already have the parameters of a Bayesian Network...

Yifeng Tao Carnegie Mellon University 24

[Slide from Matt Gormley.]

slide-25
SLIDE 25

Approaches to inference

  • Exact inference algorithms
  • The elimination algorithm à Message Passing
  • Belief propagation
  • The junction tree algorithms
  • Approximate inference techniques
  • Variational algorithms
  • Stochastic simulation / sampling methods
  • Markov chain Monte Carlo methods

Yifeng Tao Carnegie Mellon University 25

[Slide from Eric Xing.]

slide-26
SLIDE 26

Marginalization and Elimination

Yifeng Tao Carnegie Mellon University 26

[Slide from Eric Xing.]

slide-27
SLIDE 27

Marginalization and Elimination

Yifeng Tao Carnegie Mellon University 27

[Slide from Eric Xing.]

slide-28
SLIDE 28

Yifeng Tao Carnegie Mellon University 28

[Slide from Eric Xing.]

slide-29
SLIDE 29
  • Step 8: Wrap-up

Yifeng Tao Carnegie Mellon University 29

[Slide from Eric Xing.]

slide-30
SLIDE 30

Elimination algorithm

  • Elimination on trees is equivalent to message passing on branches
  • Message-passing is consistent in trees
  • Application: HMM

Yifeng Tao Carnegie Mellon University 30

[Slide from Eric Xing.]

slide-31
SLIDE 31

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 31

[Slide from Matt Gormley.]

slide-32
SLIDE 32

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 32

[Slide from Matt Gormley.]

slide-33
SLIDE 33

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 33

[Slide from Matt Gormley.]

slide-34
SLIDE 34

Gibbs Sampling

  • Full conditionals only need to condition
  • n the Markov Blanket
  • Must be “easy” to sample from

conditionals

  • Many conditionals are log-concave and

are amenable to adaptive rejection sampling

Yifeng Tao Carnegie Mellon University 34

[Slide from Matt Gormley.]

slide-35
SLIDE 35

Take home message

  • Graphical models portrays the sparse dependencies of variables
  • Two types of graphical models: Bayesian network and Markov

random field

  • Conditional independence, Markov blanket, and d-separation
  • Learning fully observed and partially observed Bayesian networks
  • Exact inference and approximate inference of Bayesian networks

Carnegie Mellon University 35 Yifeng Tao

slide-36
SLIDE 36

References

  • Eric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning:

http://www.cs.cmu.edu/~epxing/Class/10701/

  • Matt Gormley. 10601 Introduction to Machine Learning:

http://www.cs.cmu.edu/~mgormley/courses/10601/index.html

Carnegie Mellon University 36 Yifeng Tao