A road map to more complex dynamic models discrete discrete - - PDF document

a road map to more complex dynamic models
SMART_READER_LITE
LIVE PREVIEW

A road map to more complex dynamic models discrete discrete - - PDF document

School of Computer Science State Space Models Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 13, part II Nov 5th, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric


slide-1
SLIDE 1

1

1

School of Computer Science

State Space Models

Probabilistic Graphical Models (10 Probabilistic Graphical Models (10-

  • 708)

708)

Lecture 13, part II Nov 5th, 2007

Eric Xing Eric Xing

Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

Reading: J-Chap. 15, K&F chapter 19.1 -19.3

Eric Xing 2

A road map to more complex dynamic models

A

X Y

A

X Y

A

X Y discrete discrete discrete continuous continuous continuous Mixture model

e.g., mixture of multinomials

Mixture model

e.g., mixture of Gaussians

Factor analysis

A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ...

HMM

(for discrete sequential data, e.g., text)

HMM

(for continuous sequential data, e.g., speech signal)

State space model

... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

...

Factorial HMM Switching SSM

slide-2
SLIDE 2

2

Eric Xing 3

State space models (SSM):

A sequential FA or a continuous state HMM In general,

A A A A

Y2 Y3 Y1 YN X2 X3 X1 XN

... ...

), ; ( ~ ) ; ( ~ ), ; ( ~

1 t t t t t t t t

R v Q w v C w G A Σ + = + =

N N N x x y x x

This is a linear dynamic system.

t t t t t t

v w + = + =

− −

) ( ) (

1 1

x y x x g G f

where f is an (arbitrary) dynamic model, and g is an (arbitrary)

  • bservation model

Eric Xing 4

LDS for 2D tracking

Dynamics: new position = old position + ∆×velocity + noise

(constant velocity model, Gaussian noise)

Observation: project out first two components (we observe

Cartesian position of object - linear!)

noise 1 1 1 1

2 1 1 1 2 1 1 1 2 1 2 1

+ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ∆ ∆ = ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛

− − − − t t t t t t t t

x x x x x x x x & & & & noise + ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛

2 1 2 1 2 1

1 1

t t t t t t

x x x x y y & &

slide-3
SLIDE 3

3

Eric Xing 5

The inference problem 1

j t t j t t t i t t t

j i p i p i p

1 1 1 − − =

= = ∝ = =

α α ) X | X ( ) X | y ( ) | X (

:

y

A

Y2 Y3 Y1 Yt X2 X3 X1 Xt

... ...

) | (

: 1t t

x P y

α

1

α

2

α

t

α

Filtering given y1, …, yt, estimate xt:

  • The Kalman filter is a way to perform exact online inference (sequential

Bayesian updating) in an LDS.

  • It is the Gaussian analog of the forward algorithm for HMMs:

Eric Xing 6

The inference problem 2

Smoothing given y1, …, yT, estimate xt (t<T)

  • The Rauch-Tung-Strievel smoother is a way to perform exact off-line

inference in an LDS. It is the Gaussian analog of the forwards- backwards (alpha-gamma) algorithm:

+ +

∝ = =

j j t j i j t i t i t T t

X X P y i p

1 1 : 1

) | ( ) | X ( γ α γ

A

Y2 Y3 Y1 Yt X2 X3 X1 Xt

... ... α

1

α

2

α

t

α

γ

1

γ

2

γ

t

γ

slide-4
SLIDE 4

4

Eric Xing 7

2D tracking

filtered

X1 X2 X1 X2

Eric Xing 8

Kalman filtering in the brain?

slide-5
SLIDE 5

5

Eric Xing 9

Kalman filtering derivation

Since all CPDs are linear Gaussian, the system defines a

large multivariate Gaussian.

  • Hence all marginals are Gaussian.
  • Hence we can represent the belief state p(Xt|y1:t) as a Gaussian
  • mean
  • covariance .
  • Hence, instead of marginalization for message passing, we will directly

estimate the means and covariances of the required marginals

  • It is common to work with the inverse covariance (precision) matrix ;

this is called information form.

) , , | X X (

| t t t t t

y y K

1 T

E P ≡

1 − t t |

P ) , , | X ( x ˆ |

t t t t

y y K

1

E ≡

Eric Xing 10

Kalman filtering derivation

Kalman filtering is a recursive procedure to update the belief

state:

  • Predict step: compute p(Xt+1|y1:t) from prior belief

p(Xt|y1:t) and dynamical model p(Xt+1|Xt) --- time update

  • Update step: compute new belief p(Xt+1|y1:t+1) from

prediction p(Xt+1|y1:t), observation yt+1 and observation model p(yt+1|Xt+1) --- measurement update

A A

Yt Y1 Xt Xt+1 X1

... A A A

Yt Yt+1 Y1 Xt Xt+1 X1

...

slide-6
SLIDE 6

6

Eric Xing 11

Predict step

Dynamical Model:

  • One step ahead prediction of state:

Observation model:

  • One step ahead prediction of observation:

) ; ( ~ , Q G A

1

N

t t t t

w w + =

+

x x ) ; ( ~ , R v v C

t t t t

N + = x y

A A

Yt Y1 Xt Xt+1 X1

... A A A

Yt Yt+1 Y1 Xt Xt+1 X1

...

Eric Xing 12

Update step

Summarizing results from previous slide, we have

p(Xt+1,Yt+1|y1:t) ~ N(mt+1, Vt+1), where

Remember the formulas for conditional Gaussian

distributions:

) , ( ) , | ( ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ Σ Σ Σ Σ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = Σ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡

22 21 12 11 2 1 2 1 2 1

µ µ µ x x x x N p

21 1 22 12 11 2 1 2 2 1 22 12 1 2 1 2 1 2 1 1 2 1

Σ Σ Σ − Σ = − Σ Σ + = =

− − | | | |

) ( ) , | ( ) ( V x m V m x x x µ µ N p

22 2 2 2 2 2 2 2

Σ = = =

m m m m

p V m V m x x µ ) , | ( ) ( N , ˆ ˆ

| |

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

+ + + t t t t t

x C x m

1 1 1

,

| | | |

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + =

+ + + + +

R C CP CP C P P V

T t t t t T t t t t t 1 1 1 1 1

slide-7
SLIDE 7

7

Eric Xing 13

Kalman Filter

Measurement updates:

  • where Kt+1 is the Kalman gain matrix

) x ˆ C (y ˆ ˆ

t | 1 t 1 t | | + + + + + +

− + =

1 1 1 1 t t t t t

K x x

t t t t t t t

CP K P P

| 1 1 | 1 1 | 1 + + + + +

− =

  • 1

T T

R C CP C P K ) (

| |

+ =

+ + + t t t t t 1 1 1

Eric Xing 14

Example of KF in 1D

Consider noisy observations of a 1D particle doing a random

walk:

KF equations:

) , ( ~ ,

| x t t t

w w x x σ

1 1

N + =

− −

) , ( ~ ,

z t t

v v x z σ N + =

t|t t|t t t

x x x ˆ ˆ ˆ

|

= =

+

A

1

,

| | 1 x t T T t t t t

GQG A AP P σ σ + = + =

+

( )

z x t t t z t x t t t t t t

x z x C z K x x σ σ σ σ σ σ + + + + = + =

+ + + + + + + | 1 t | 1 t 1 t 1 | 1 1 | 1

ˆ ) ˆ

  • (

ˆ ˆ

( )

z x t z x t t t t t t t

σ σ σ σ σ σ + + + = =

+ + + + | | | 1 1 1 1

KCP

  • P

P ) )( ( ) (

| | z x t x t t t t t t

σ σ σ σ σ + + + = + =

+ + +

  • 1

T T

R C CP C P K

1 1 1

slide-8
SLIDE 8

8

Eric Xing 15

KF intuition

The KF update of the mean is

  • the term is called the innovation

New belief is convex combination of updates from prior and

  • bservation, weighted by Kalman Gain matrix:

If the observation is unreliable, σz (i.e., R) is large so Kt+1 is

small, so we pay more attention to the prediction.

If the old prior is unreliable (large σt) or the process is very

unpredictable (large σx), we pay more attention to the

  • bservation.

( )

z x t t t z t x t t t t t t

x z x C z K x x σ σ σ σ σ σ + + + + = + =

+ + + + + + + | 1 t | 1 t 1 t 1 | 1 1 | 1

ˆ ) ˆ

  • (

ˆ ˆ ) ˆ (

t | 1 t 1 t + + −

x z C ) )( ( ) (

| | z x t x t t t t t t

σ σ σ σ σ + + + = + =

+ + +

  • 1

T T

R C CP C P K

1 1 1

Eric Xing 16

The KF update of the mean is Consider the special case where the hidden state is a

constant, xt =θ, but the “observation matrix” C is a time- varying vector, C = xt

T.

  • Hence the observation model at each time slide,

, is a linear regression

We can estimate recursively using the Kalman filter:

This is called the recursive least squares (RLS) algorithm.

We can approximate by a scalar constant. This is

called the least mean squares (LMS) algorithm.

We can adapt ηt online using stochastic approximation theory.

) ˆ

  • (

ˆ ˆ

t | 1 t 1 t | | + + + + +

+ = x y x x

t t t t t

C K A

1 1 1

t T t t

v x y + = θ

t t T t t t t

x x y ) ˆ ( ˆ ˆ

1 t

θ θ θ − + =

+ − + + 1 1 1

R P

1 1 1 + − +

t t

η R P

KF, RLS and LMS

slide-9
SLIDE 9

9

Eric Xing 17

Complexity of one KF step

Let and , Computing takes O(Nx 3) time, assuming

dense P and dense A.

Computing takes O(Ny 3) time. So overall time is, in general, max {Nx 3,Ny 3}

x

N t

X R ∈

y

N t

Y R ∈

T T t t t t

GQG A AP P + =

+ | | 1

  • 1

T T

R C CP C P K ) (

| |

+ =

+ + + t t t t t 1 1 1

Eric Xing 18

The inference problem 2

Smoothing given y1, …, yT, estimate xt (t<T)

  • The Rauch-Tung-Strievel smoother is a way to perform exact off-line

inference in an LDS. It is the Gaussian analog of the forwards- backwards (alpha-gamma) algorithm:

+ +

∝ = =

j j t j i j t i t i t T t

X X P y i p

1 1 : 1

) | ( ) | X ( γ α γ

A

Y2 Y3 Y1 Yt X2 X3 X1 Xt

... ... α

1

α

2

α

t

α

γ

1

γ

2

γ

t

γ

slide-10
SLIDE 10

10

Eric Xing 19

RTS smoother derivation

Smoothing given y1, …, yT, estimate P(xt |y1:T) (t<T)

Step 1: joint distribution of xt and xt+1 conditioned on y1:t Use

A A

Yt Yt+1 Xt+1

); ; ( ~ ;

1

Q w w G A

t t t t

N + =

+

x x

Xt

Eric Xing 20

RTS smoother derivation

Following the results from previous slide, we need to derive

p(Xt+1,Xt|y1:t) ~ N(m, V), where

  • all the quantities here are available after a forward KF pass

Remember the formulas for conditional Gaussian

distributions:

, ˆ ˆ

| |

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

+ t t t t

x x m

1

,

| | | |

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

+ t t t t T t t t t

P AP A P P V

1 , ) , ( ) , | ( ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ Σ Σ Σ Σ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = Σ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡

22 21 12 11 2 1 2 1 2 1

µ µ µ x x x x N p

21 1 22 12 11 2 1 2 2 1 22 12 1 2 1 2 1 2 1 1 2 1

Σ Σ Σ − Σ = − Σ Σ + = =

− − | | | |

) ( ) , | ( ) ( V x m V m x x x µ µ N p

22 2 2 2 2 2 2 2

Σ = = =

m m m m

p V m V m x x µ ) , | ( ) ( N 1 1 − +

=

t t t t t | |

P A P L

T

slide-11
SLIDE 11

11

Eric Xing 21

RTS smoother derivation

  • Step 2: compute = E[xt|y0:T] using results above
  • Use
  • Use E[X|Z] = E[E[X|Y,Z]|Z]

A A

Yt Yt+1 Xt+1

Xt

T t

x | ˆ

Eric Xing 22

RTS derivation

Repeat the same process for Variance

  • Refer to Jordan chapter 15

The RTS smoother results:

) ˆ

  • ˆ

( ˆ ˆ

| 1 | 1 | | t t T t t t t T t

L x

+ +

+ = x x x

( )

T t t t T t t t t T t

L P P L P P

| 1 | 1 | | + +

− + =

slide-12
SLIDE 12

12

Eric Xing 23

Learning SSMs

Complete log likelihood EM

  • E-step: compute

these quantities can be inferred via KF and RTS filters, etc., e,g.,

  • M-step: MLE using

c.f., M-step in factor analysis

{ } ( ) { } ( )

R C G Q A , ; : , , , ; : , , ) ; ( ) | ( log ) | ( log ) ( log ) , ( log ) , (

, , , ,

t X X X f t X X X X X f X f x y p x x p x p y x p D

t T t t t T t t T t t n t t n t n n t t n t n n n n n

∀ + ∀ + Σ = + + = =

− −

∑∑ ∑∑ ∑ ∑

3 1 2 1 1 1 1

θ

c

l

T t T t t T t t

y y X X X X X K , , ,

1 1 − 2 2 T t T t t T t t T t t

x P X X X X X

| |

ˆ ) ( E ) var( + = + ≡

( )

{ } ( ) { } ( )

R C G Q A , ; : , , , ; : , , ; ) , ( t X X X f t X X X X X f X f D

t T t t t T t t T t t

∀ + ∀ + Σ =

− 3 1 2 1 1

θ

c

l

Eric Xing 24

Nonlinear systems

  • In robotics and other problems, the motion model and the
  • bservation model are often nonlinear:
  • An optimal closed form solution to the filtering problem is no longer

possible.

  • The nonlinear functions f and g are sometimes represented by

neural networks (multi-layer perceptrons or radial basis function networks).

  • The parameters of f and g may be learned offline using EM, where

we do gradient descent (back propagation) in the M step, c.f. learning a MRF/CRF with hidden nodes.

  • Or we may learn the parameters online by adding them to the state

space: xt'= (xt, θ). This makes the problem even more nonlinear.

) ( , ) (

t t t t t t

v x g y w x f x + = + =

−1

slide-13
SLIDE 13

13

Eric Xing 25

Extended Kalman Filter (EKF)

The basic idea of the EKF is to linearize f and g using a

second order Taylor expansion, and then apply the standard KF.

  • i.e., we approximate a stationary nonlinear system with a non-stationary

linear system. where and and

The noise covariance (Q and R) is not changed, i.e., the

additional error due to linearization is not modeled.

) ˆ ( ) ˆ ( ) ˆ ( ) ˆ (

| ˆ | | ˆ |

| |

t t t t x t t t t t t t x t t t

v x x x g y w x x x f x

t t t t

+ − + = + − + =

− − − − − − −

− − −

1 1 1 1 1 1 1

1 1 1

C A ) ˆ ( ˆ

| | 1 1 1 − − − = t t t t

x f x

x x

x f

ˆ def ˆ

∂ ∂ = A

x x

x g

ˆ def ˆ

∂ ∂ = C

26

School of Computer Science

Complex Graphical Models

Probabilistic Graphical Models (10 Probabilistic Graphical Models (10-

  • 708)

708)

Lecture 14, Nov 5th, 2007

Eric Xing Eric Xing

Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

Reading: K&F chapter 20.1 - 20.3

slide-14
SLIDE 14

14

Eric Xing 27

A road map to more complex dynamic models

A

X Y

A

X Y

A

X Y discrete discrete discrete continuous continuous continuous Mixture model

e.g., mixture of multinomials

Mixture model

e.g., mixture of Gaussians

Factor analysis

A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ... A A A A

x2 x3 x1 xN y2 y3 y1 yN

... ...

HMM

(for discrete sequential data, e.g., text)

HMM

(for continuous sequential data, e.g., speech signal)

State space model

... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

... ... ... ... ... A A A A

x2 x3 x1 xN yk2 yk3 yk1 ykN

... ...

y12 y13 y11 y1N

...

S2 S3 S1 SN

...

Factorial HMM Switching SSM

Eric Xing 28

NLP and Data Mining

We want:

  • Semantic-based search
  • infer topics and categorize

documents

  • Multimedia inference
  • Automatic translation
  • Predict how topics

evolve

Research topics 1900 2000 Research topics 1900 2000

slide-15
SLIDE 15

15

Eric Xing 29

The Vector Space Model

  • Represent each document by a high-dimensional vector in the

space of words

⇒ ⇒

Eric Xing 30

* * T (m x k) Λ (k x k) D

T

(k x n) = X (m x n)

Document Term

...

=

=

K k k k k T

d w

1

r r λ

Latent Semantic Indexing

  • LSA does not define a properly normalized probability distribution of
  • bserved and latent entities
  • Does not support probabilistic reasoning under uncertainty and data fusion
slide-16
SLIDE 16

16

Eric Xing 31

Latent Semantic Structure

Latent Structure Words

=

l

l) , ( ) ( w w P P

l w

Distribution over words ) w ( ) ( ) | w ( ) w | ( P P P P l l l = Inferring latent structure

... ) w | (

1

=

+ n

w P

Prediction

Eric Xing 32

Objects are bags of elements Mixtures are distributions over elements Objects have mixing vector θ

  • Represents each mixtures’ contributions

Object is generated as follows:

  • Pick a mixture component from θ
  • Pick an element from that component

Admixture Models

money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2

0.1 0.1 0.5 ….. 0.1 0.5 0.1 ….. 0.5 0.1 0.1 …..

money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1 stream2 bank1 money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 bank1 money1 stream2
slide-17
SLIDE 17

17

Eric Xing 33

Topic Models =Admixture Models

Generating a document

Prior θ z w β Nd N K

( ) { }

( )

from , | Draw

  • from

Draw

  • each word

For prior the from

: 1

n

z k n n n

l multinomia z w l multinomia z n Draw β β θ θ − Which prior to use?

Eric Xing 34

Choice of Prior

Dirichlet (LDA) (Blei et al. 2003)

  • Conjugate prior means efficient inference
  • Can only capture variations in each topic’s intensity independently

Logistic Normal (CTM=LoNTAM) (Blei & Lafferty 2005,

Ahmed & Xing 2006)

  • Capture the intuition that some topics are highly correlated and can rise

up in intensity together

  • Not a conjugate prior implies hard inference
slide-18
SLIDE 18

18

Eric Xing 35

Logistic Normal Vs. Dirichlet

D i r i c h l e t

Eric Xing 36

Logistic Normal Vs. Dirichlet

L

  • g

i s t i c N

  • r

m a l

slide-19
SLIDE 19

19

Eric Xing 37

Mixed Membership Model (M3)

Mixture versus admixture

⇒ ⇒

A Bayesian mixture model A Bayesian admixture model: Mixed membership model

Eric Xing 38

Latent Dirichlet Allocation: M3 in text mining

A document is a bag of words each generated from a

randomly selected topic

slide-20
SLIDE 20

20

Eric Xing 39

Population admixture: M3 in genetics

The genetic materials of each modern individual are inherited

from multiple ancestral populations, each DNA locus may have a different generic origin …

Ancestral labels may have (e.g., Markovian) dependencies

Eric Xing 40

Inference in Mixed Membership Models

Mixture versus admixture

  • Inference is very hard in M3, all hidden variables are coupled and not factorizable!

∑ ∏ ∏ ∫ ∫

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

} { , ,

,

) | ( ) | ( ) | ( ) | ( ) (

m n n

z N n n m n m n z m n

d d d G p p z p x p D p φ π π φ α π π φ L L

1

⇒ ⇒

∑ ∫ ∏ ∏

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛

} { , ,

,

) | ( ) | ( ) | ( ) | ( ~ ) | (

m n n

z i n n m n m n z m n n

d d G p p z p x p D p φ π φ α π π φ π

slide-21
SLIDE 21

21

Eric Xing 41

Approaches to inference

Exact inference algorithms

  • The elimination algorithm
  • The junction tree algorithms

Approximate inference techniques

  • Monte Carlo algorithms:
  • Stochastic simulation / sampling methods
  • Markov chain Monte Carlo methods
  • Variational algorithms:
  • Belief propagation
  • Variational inference