Markov Random Fields: Inference and Estimation SPiNCOM reading - - PowerPoint PPT Presentation

markov random fields inference and estimation
SMART_READER_LITE
LIVE PREVIEW

Markov Random Fields: Inference and Estimation SPiNCOM reading - - PowerPoint PPT Presentation

Markov Random Fields: Inference and Estimation SPiNCOM reading group April 24 th , 2017 Dimitris Berberidis Ack: Juan-Andres Bazerque 1 Probabilistic graphical models Set of random variables Graph represents joint Nodes


slide-1
SLIDE 1

1

Markov Random Fields: Inference and Estimation

SPiNCOM reading group April 24th , 2017

Dimitris Berberidis

Ack: Juan-Andres Bazerque

slide-2
SLIDE 2

2

Probabilistic graphical models

 Key idea: Graph models conditional independencies

  • Two main tasks: Inference and Estimation

Inference: Given observed

, obtain (marginal) conditionals

 Set of random variables

  • Graph represents joint
  • Nodes correspond to random variables
  • Edges imply relations between rv’s

 Some applications

  • Speech recognition, computer vision
  • Decoding
  • Gene reg. networks, disease diagnosis

Estimation: Given samples

estimate (and thus )

slide-3
SLIDE 3

Roadmap

3

 Markov Random Fields  Continuous valued MRFs

  • Inference using Harmonic solution
  • Structure estimation through l-1 penalized MLE

 Binary valued MRFs (Ising model)

  • Inference
  • Gaussian approximation – Random walk interpretation
  • MCMC
  • Structure estimation
  • Pseudo MLE
  • Logistic regression

 Conclusions  Bayesian networks basics

slide-4
SLIDE 4

Arbitrary

Directed Acyclical GMs (Bayesian networks)

4

 Joint pdf modeled as product of conditionals:  Examples  Ordered Markov property :

  • Complete independence: Markov “Blanket” (Parents+children+co-parents)

Naïve Bayes Markov chain 2nd order Markov chain Hidden Markov model

slide-5
SLIDE 5

Basic building blocks of Bayesian nets

5

 The chain structure  The tent structure  The V structure Berkson’s Paradox (“explaining away”)

slide-6
SLIDE 6

Undirected GMs (Markov random fields)

6

 Example Partition Function: Generally NP-hard to compute  More natural in some domains (e.g. special statistics, relational data)

  • Simple rule: Nodes not connected w. edge are conditionally independent

 Hamersley-Clifford theorem

  • satisfies the CI properties of an undirected graph iff

where  Joint pdf parametrized and modeled as product of factors(not conditionals)

  • Each factor or potential corresponds to a maximal clique
slide-7
SLIDE 7

Equivalence of DGMs and UGMs

7

 Moralization: Transition from directed to undirected GM

  • Drop directionality and connect “unmarried’’ parents
  • Information may be lost during transition (see example)

Cannot be represented by DAGs Cannot be represented by UGMs

lost due to this edge

slide-8
SLIDE 8

MRFs with energy functions

8

 High probability states correspond to low energy configurations  Clique potentials usually represented using an “energy” function  Any MRF can be decomposed to pairwise potentials (and energy functions)  Joint (Gibbs distribution)  MRF is associative if measures difference btw and , and

  • Gaussian MRF:
  • Ising (binary +1,-1) model:
slide-9
SLIDE 9

Gaussian MRFs

9

 Joint Gaussian fully parametrized by covariance and mean  GMRF structure given by precision matrix (inv. Cov.)

  • Also viewed as the Laplacian of the graph

 Inference: Given known and observed , find  Assume for simplicity (and wlog) that

slide-10
SLIDE 10

Inference via Harmonic solution

10

 Negative log-likelihood of joint  Finally “Harmonic”  Conditional mean of contains all information from observed

slide-11
SLIDE 11

 Given , goal is to estimate and

GMRF structure estimation via maximum likelihood

11

 Log-likelihood

slide-12
SLIDE 12
  • penalized MLE of

12

 generally is full matrix  Closed-form solution:  Idea: Add constrain on to enforce (sparse) graph structure

  • O. Banerjee, L. El Ghaoui, and A. d'Aspremont, "Model selection through sparse maximum likelihood estimation for

multivariate Gaussian or binary data," J. Machine Learning Research, vol. 9, pp. 485-516, June 2008.

 Problem is convex and for is equivalent to Solvable via Graphical Lasso

slide-13
SLIDE 13

 Two alternatives: is upper-bounded or avoided  Problem: combinatorialy complex to compute  Estimation: l-1 penalized maximum likelihood for  Ising model for or

Binary random variables

13

Ising model Log partition function:  Similar problem for inference: can only be approximated

slide-14
SLIDE 14

The role of in Ising model

14

Claim:

 Use the Ising model  Plug in the expression above  Proof: consider and

slide-15
SLIDE 15

Example: Image segmentation

15

 Use 2-D HMM (Ising as hidden layer) to infer “meaning’’ of image pixels Observed image Hidden layer: Pixel Class ( water, sky, etc )

slide-16
SLIDE 16

Inference via Gaussian field approximation

16

 Approximation of marginal posteriors:  Predictor of unknown labels via GMRF mean:  Exact inference NP-hard  Use surrogate continuous-values Gaussian random field:

  • Compute exact Harmonic solution:

 Random walk interpretation

  • Imagine particle performing a random walk on (unobserved) graph
  • Let normalized Laplacian be transition probability matrix
  • Observed variables act as sink nodes where the walk ends
  • Starting from node i, probability that walk ends in +1 node is
slide-17
SLIDE 17

Inference via MCMC

17

 More sophisticated MCMC methods achieve faster mixing (e.g. Wolfs algorithm)  Gibbs sampler: One variable (node) sampled at every round t ( the rest are fixed )

  • Exploits (sparse) conditional dependence structure of MRF
  • Observed nodes used as (fixed) boundary conditions

 Collect samples from MC with as stationary distribution  Experiments indicate Gibbs smpl offers better inference in rect. Ising models

slide-18
SLIDE 18

Towards estimation: Bounding the partition function

18

 Goal: Find computable with polynomial complexity

  • L. El Ghaoui, A. Gueye. “A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models,” Journal of

Machine Learning Research, vol. 9, pp. 485–516, Mar. 2008.

 Consider partition such that  Computing is still hard

slide-19
SLIDE 19

 Upper-bound

Relaxation of the bound

19

 Relax  Add redundant constrains  Relax  Claim: bound quality

slide-20
SLIDE 20

Pseudo Maximum Likelihood

20

 Want to solve:  Dual  Substituting dual above

slide-21
SLIDE 21

Logistic regression for

21

 Goal: Estimate while avoiding computation of  Idea: consider node and its connections

  • Separate
  • Use as input and as output
  • Logistic regression parametric estimation of
  • Estimate as a byproduct
  • P. Ravikumar, M. J. Wainwright and J. Lafferty. High-dimensional Ising model selection using -regularized logistic
  • regression. To appear in the Annals of Statistics. Available at http://www.eecs.berkeley.edu

 Problem statement: re-write problem bellow for the Ising model

slide-22
SLIDE 22

Estimation of

22

 We have:  Taking the logarithm  Substituting the log-likelihood  Convex problem

slide-23
SLIDE 23

23

Conclusions

 Possible research directions

  • Active sampling on binary MRFs using MCMC
  • Active sampling for MRF structure estimation

 Graphical models

  • Modeling pdfs using conditional dependencies
  • Undirected models (MRFs) naturally modeled by graphs
  • Inference in closed form for Gaussian MRFs
  • Estimation of GMRFs as Laplacian fitting problem
  • Inference and estimation approximations for binary MRFs (Ising model)