Markov random fields Help from David Bolin, Johan Lindstrm and Finn - PDF document

Markov random fields Help from David Bolin, Johan Lindström and Finn Lindgren

Julian Besag 1945-2010 Håvard Rue 1965- Finn Lindgren 1973-

The big N problem Log likelihood: ℓ ( σ , θ ) = − n 2 log(2 πσ ) − 1 2 logdet Σ ( θ ) + 1 2 σ (Z − µ ( θ ) T Σ ( θ ) − 1 (Z − µ ( θ )) Prediction: − 1 (Z − µ Z ) E(Z(s 0 ) Z 1 ,...,Z N , ˆ θ ) = µ 0 + Σ 0,Z Σ Z,Z Covariance has O(N 2 ) unique elements Inverse and determinant take O(N 3 ) operations

The Markov property Discrete time: (X k X k − 1 ,X k − 2 ,...) = (X k X k − 1 ) A time symmetric version: X − k ) = (X k X k − 1 ,X k + 1 ) (X k ! A more general version: Let A be a set of indices >k, B a set of indices <k. Then X A ⊥ X B X k These are all equivalent.

On a spatial grid Let δ i be the neighbors of the location i. The Markov assumption is Z − i = ! z − i ) = P(Z i = z i Z δ i = z δ i ) P(Z i = z i ! = p i (z i z δ i ) i ∉δ j Z − i,j Z i ⊥ Z j ! Equivalently for The p i are called local characteristics . They are stationary if p i = p. A potential assigns a number V A (z) to every subconfiguration z A of a configuration z. (There are lots of them!)

Graphical models Neighbors are nodes connected with edges. 2 4 1 5 3 Given 2, 1 and 4 are independent.

Gibbs measure The energy U corresponding to a ∑ U(z) = V A (z) potential V is . A The corresponding Gibbs measure is P(z) = exp( − U(z)) C ∑ C = exp( − U(z)) where z is called the partition function .

Nearest neighbor potentials A set of points is a clique if all its members are neighbours. A potential is a nearest neighbor potential if V A (z)=0 whenever A is not a clique.

Markov random field Any nearest neighbor potential induces a Markov random field: ∑ exp( − V C ( ! z)) P( ! z) z − i ) = = C clique p i (z i ! ∑ ∑ ∑ exp( − P( ! z') V C ( ! z')) z' z' C clique where z’ agrees with z except possibly at i, so V C (z)=V C (z ’ ) for any C not including i.

The Hammersley-Clifford theorem Assume P(z)>0 for all z. Then P is a MRF on a (finite) graph with respect to a neighbourhood system Δ iff P is a Gibbs measure corresponding to a nearest neighbour potential. Does a given nn potential correspond to a unique P? Peter Clifford 1944- John Hammersley 1920-2004

The Ising model Model for ferromagnetic spin (values +1 or -1). Stationary nn pair potential V(i,j)=V(j,i); V(i,i)=V(0,0)=v 0 ; V(0,e N )=V(0,e E )=v 1 . Z − i = ! logit P(Z i = 1 ! z − i ) = − (v 0 + v 1 (z i + e N + z i − e N + z i + e E + z i − e E )) L(v) = exp(t 0 v 0 + t 1 v 1 ) so where C(v) ∑ ∑ ∑ t 0 = t 1 = z i ; z i z j i j~i

Interpretation v 0 is related to the external magnetic field (if it is strong the field will tend to have the same sign as the external field) v 1 corresponds to inverse temperature (in Kelvins), so will be large for low temperatures. Rudolf Peierls 1907-1995 Ernst Ising 1900-1998

Phase transition At very low temperature there is a tendency for spontaneous magnetization. For the Ising model, the boundary conditions can affect the distribution of x 0 . In fact, there is a critical temperature (or value of v 1 ) such that for temperatures below this value, the boundary conditions are felt. Thus there can be different probabilities at the origin depending on the values on an arbitrary distant boundary!

Simulated Ising fields

The auto-models Let Q(x)=log(P(x)/P(0)). Besag ’ s auto- models are defined by n n ∑ ∑ ∑ z) = + β ij z i z j Q( ! z i G i (z i ) i = 1 i = 1 j~i z i ∈ {0,1 } When and G i (z i )= α i we get the autologistic model ( ) G i (z i ) = α i z i − log z i ! When and β ij ≤ 0 we get the auto-Poisson model

Pseudolikelihood Another approximate approach is to write down a function of the data which p i (x δ i ) is the product of the , I.e., acting as if the neighborhoods of each point were independent. This as an estimating equation, but not an optimal one. In fact, in cases of high dependence it tends to be biased.

Recall the Gaussian formula ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎟ ~ N µ X ⎟ , Σ XX Σ XY X ⎜ ⎟ ⎜ ⎜ ⎜ ⎟ If µ Y Σ YX Σ YY ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ Y − 1 ( X − µ X ), ( Y | X ) ~ N ( µ Y + Σ YX Σ XX then − 1 Σ XY ) Σ YY − Σ YX Σ XX Q = Σ − 1 Let be the precision matrix . Then the conditional precision matrix is − 1 = Q YY ( ) − 1 Σ XY Σ YY − Σ YX Σ XX

Gaussian MRFs Z − i,j Z i ⊥ Z j ! We want a setup in which whenever i and j are not neighbors. Using the Gaussian formula we see that the condition is met iff Q ij = 0. Typically the precision matrix of a GMRF is sparse where the covariance is not. This allows fast computation of likelihoods, simulation etc.

An AR(1) process X t X t − 1 = φ X t − 1 + ε t Let . The lag k autocorrelation is φ |k| . The precision matrix has Q ij = φ if |i-j|=1, Q 11 =Q nn =1 and Q ii =1+ φ 2 elsewhere, all other 0. Thus Σ has N 2 non-zero elements, while Q has N+2(N-1)=3N-2 non-zero elements. Using the Gaussian formula we see that φ Q i − i = 1 + φ 2 µ i − i = 1 + φ 2 (x i − 1 + x i + 1 )

Conditional autoregression Suppose that ∑ Z − i ~ N( µ i + − 1 ) β ij (x j − µ j ) , κ i Z i ! i ≠ j This is called a Gaussian conditional autoregressive (CAR) model. WLOG κ i β ij = κ j β ji µ i =0. If also these conditional distributions correspond to a multivariate joint Gaussian distribution, mean 0 and precision Q with Q ii = κ i and Q ij = - κ i β ij , provided Q is positive definite. If the β ij are nonzero only when i~j we have a GMRF.

Likelihood calculation   The Cholesky decomposition of a pd square matrix A is a lower triangular matrix L such that A=LL T . To solve Ay = b first solve Lv = b (forward substitution), then L T y = v (backward substitution). If a precision matrix Q = LL T , ( ) ∑ log L i,i log det(Q) = 2 . The quadratic form in the likelihood is w T u where u=Qw and w=(z- µ ). Note that ∑ u i = Q i,i w i + Q i,j w j j:j~i

Simulation Let x ~ N(0,I), solve L T v = x and set z = µ + v. Then E(z) = µ and Var(z) = (L T ) -1 IL -1 = (LL T ) -1 = Q -1 .

Spatial covariance Whittle (1963) noted that the solution to the stochastic differential equation ( κ + 1)/2 ⎛ ⎞ Δ − 1 Z(s) = W(s) ⎜ ⎟ φ 2 ⎝ ⎠ has covariance function κ ⎛ ⎞ ⎛ ⎞ h h K κ Cov(Z(s),Z(s + h)) ∝ ⎜ ⎟ ⎜ ⎟ φ φ ⎝ ⎠ ⎝ ⎠ Rue and Tjelmeland (2003) show that one can approximate a Gaussian random field on a grid by a GMRF.

Solution ∑ Z(s) = ψ k (s)w k Write k where ψ k (s) are piecewise linear on a (possibly nonregular) grid and w k are appropriately chosen normal random variables. A i i = ( ψ 1 (s i ),..., ψ N (s i )) Let and A=(A i• ). η If Y is Z observed with spatial error Y w ∼ N(Aw,Q η − 1 ), w ~ N( µ ,Q − 1 )

Basis functions x ( u ) = P k ψ k ( u ) x k x ( u ) = cos( u 1 ) + sin( u 2 )

Unequal spacing Lindgren and Rue show how one can use finite element methods to approximate the solution to the sde (even on a manifold like a sphere) on a triangulization on a set of possibly unequally spaced points.

Covariance approximation

Hierarchic model p(y z; θ ) Data model: p(z θ ) Latent model: w ~ N(0,Q − 1 ) Z = Aw + β B p( θ ) If Bayesian, parameter model: For INLA, need ∏ p(y z, θ ) = p(y i z i , θ ) i

INLA Laplace’s approximation: x*=argmax(f(x)). Taylor expansion around x*: f(x) ≈ f(x*)+(x-x*) 2 f”(x*)/2 f (x*) (x − x*) 2 /2 e Nf(x) ≈ e Nf(x*) e − N ʹʹ 2 π ( ) f (x*) e Nf(x*) φ N ʹʹ f (x*) (x − x*) = N ʹʹ Interested in predictive distribution p(z|y) and posterior density p( θ |y)

f(x)=sin(x)/x

Posterior manipulation p(y x, θ )p(x θ ) = p(y,x θ ) ≡ p(x y, θ )p(y θ ) Thus p( θ y) ∝ p(y θ )p( θ ) = p(y x, θ )p(x θ ) p( θ ) p(x y, θ ) Using the Laplace approximation on p(x y, θ ) f(x)=log( /N) we get a Gaussian approximation x | y, θ ≈ N( µ *,Q*) µ *,Q* where depend on Q, β and D 2 f.

What INLA computes Joint posterior of parameters (Laplace approximation) Marginal posterior of latent variable (integral Laplace approximation or numerical integration) Not computing the joint predictive distribution

Computational comparison n=20 × 2500 obs, m=20 × 15000 kriging locs Estimation O(n 3 ), storage O(n 2 ) ≈ 20GB Kriging O(mn+n 3 ), storage O(mn+n 2 ) ≈ 130GB INLA Estimation + kriging O(m 3/2 ), storage O(m+n) ≈ 50MB

Global temperature analysis obs = climate + anomaly + elevation + error Climate covariance parameters: Approximate range Standard deviations 3.0 3000 2000 2.0 km C 1000 1.0 0.0 0 − 90 − 60 − 30 0 30 60 90 − 90 − 60 − 30 0 30 60 90 Latitude Latitude

Empirical Climate 1970 − 1989 (C) − 30 − 20 − 10 0 10 20 30 90 60 45 30 15 Latitude 0 − 15 − 30 − 45 − 60 − 90 − 180 − 135 − 90 − 45 0 45 90 135 180

Std dev for Climate 1970 − 1989 (C) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 90 60 45 30 15 Latitude 0 − 15 − 30 − 45 − 60 − 90 − 180 − 135 − 90 − 45 0 45 90 135 180

Markov random fields Help from David Bolin, Johan Lindstrm and Finn - PDF document

Markov random fields Help from David Bolin, Johan Lindstrm and Finn Lindgren Julian Besag 1945-2010 Hvard Rue 1965- Finn Lindgren 1973- The big N problem Log likelihood: ( , ) = n 2 log(2 ) 1 2 logdet ( ) +

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov Random Fields Umamahesh Srinivas iPAL Group Meeting February 25, 2011 Outline Basic

Markov Random Fields and its Applications Huiwen Chang Introduction Markov Random

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Outline Markov networks (a.k.a. Markov random fields) Markov Networks Reading: Michael

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Evidence estimation for Markov random fields: a triply intractable problem Richard Everitt

Planar Markov fields Marie-Colette van Lieshout colette@cwi.nl CWI P .O. Box 94079, NL-1090 GB

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

CS70: Lecture 36. Markov Chains 1. Markov Process: Motivation, Definition 2. Examples 3.

Markov Networks [KF] Chapter 4 CS 786 University of Waterloo Lecture 7: May 24, 2012 Outline

Creative Teaching in Actuarial Science Diana Skrzydlo, ASA Outline Background Clickers

Harmonic signal parameters estimation Supervisor: I.G. Prokopenko Background Postgraduate

THE UNIVERSITY OF HONG KONG FACULTY OF SOCIAL SCIENCES 17th Annual Research Postgraduate

Quantum logarithmic Sobolev inequalities and rapid mixing Michael Kastoryano and Kristan Temme 1

Deep Learning Russ Salakhutdinov Associate Professor Machine Learning

Lithology/fluid prediction subsurface Torstein Fjeldstad Department of Mathematical sciences

PatchCut: Data-Driven Obje ject Segmentation via Local Shape Transfer Jimei Yang, Brian Price,

Online Slip Prediction for Mobile Robots 16831 Project Proposal Neal Seegmiller, Chris