Monte Carlo methods Draw random samples from the desired - - PDF document

monte carlo methods
SMART_READER_LITE
LIVE PREVIEW

Monte Carlo methods Draw random samples from the desired - - PDF document

School of Computer Science Approximate Inference: Monte Carlo Inference Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 18, Nov 19, 2007 Receptor A Receptor A X 1 X 1 X 1 Receptor B Receptor B X


slide-1
SLIDE 1

1

1

School of Computer Science

Approximate Inference: Monte Carlo Inference

Probabilistic Graphical Models (10 Probabilistic Graphical Models (10-

  • 708)

708)

Lecture 18, Nov 19, 2007

Eric Xing Eric Xing

Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 Receptor A Kinase C TF F Gene G Gene H Kinase E Kinase D Receptor B X1 X2 X3 X4 X5 X6 X7 X8 X1 X2 X3 X4 X5 X6 X7 X8

Reading: J-Chap. 1, KF-Chap. 11

Eric Xing 2

Monte Carlo methods

Draw random samples from the desired distribution Yield a stochastic representation of a complex distribution

  • marginals and other expections can be approximated using sample-

based averages

Asymptotically exact and easy to apply to arbitrary models Challenges:

  • how to draw samples from a given dist. (not all distributions can be

trivially sampled)?

  • how to make better use of the samples (not all sample are useful, or

eqally useful, see an example later)?

  • how to know we've sampled enough?

=

=

N t t

x f N x f

1

1 ) ( )] ( [

) (

E

slide-2
SLIDE 2

2

Eric Xing 3

Example: naive sampling

Construct samples according to probabilities given in a BN. Alarm example: (Choose the right sampling sequence) 1) Sampling:P(B)=<0.001, 0.999> suppose it is false,

  • B0. Same for E0. P(A|B0, E0)=<0.001, 0.999> suppose

it is false... 2) Frequency counting: In the samples right,

P(J|A0)=P(J,A0)/P(A0)=<1/9, 8/9>. J1 M1 A1 B0 E1 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J1 M0 A0 B0 E0 J0 M0 A0 B0 E0 J1 M1 A1 B0 E1 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J1 M0 A0 B0 E0 J0 M0 A0 B0 E0

Eric Xing 4

Example: naive sampling

Construct samples according to probabilities given in a BN. Alarm example: (Choose the right sampling sequence) 3) what if we want to compute P(J|A1) ? we have only one sample ... P(J|A1)=P(J,A1)/P(A1)=<0, 1>. 4) what if we want to compute P(J|B1) ? No such sample available! P(J|A1)=P(J,B1)/P(B1) can not be defined. For a model with hundreds or more variables, rare events will be very hard to garner evough samples even after a long time or sampling ... J1 M1 A1 B0 E1 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J0 M0 A0 B0 E0 J1 M0 A0 B0 E0 J0 M0 A0 B0 E0

slide-3
SLIDE 3

3

Eric Xing 5

Monte Carlo methods (cond.)

Direct Sampling

  • We have seen it.
  • Very difficult to populate a high-dimensional state space

Rejection Sampling

  • Create samples like direct sampling, only count samples which is

consistent with given evidences. Likelihood weighting, ...

  • Sample variables and calculate evidence weight. Only create the

samples which support the evidences. Markov chain Monte Carlo (MCMC)

  • Metropolis-Hasting
  • Gibbs

Eric Xing 6

Rejection sampling

Suppose we wish to sample from dist. Π(X)=Π'(X)/Z.

  • Π(X) is difficult to sample, but Π'(X) is easy to evaluate
  • Sample from a simpler dist Q(X)
  • Rejection sampling
  • Correctness:
  • Pitfall …

) ( / ) ( ' w.p. accept ), ( ~

* * * *

x kQ x x X Q x Π

[ ] [ ]

) ( ) ( ' ) ( ' ) ( ) ( / ) ( ' ) ( ) ( / ) ( ' ) ( x dx x x dx x Q x kQ x x Q x kQ x x p Π = Π Π = Π Π =

∫ ∫

slide-4
SLIDE 4

4

Eric Xing 7

Rejection sampling

  • Pitfall:
  • Using Q=N(µ,σqI) to sample P=N(µ,σpI)
  • If σq exceeds σp by 1%, and dimensional=1000,
  • The optimal acceptance rate k=(σq/σp)d≈1/20,000
  • Big waste of samples!

Adaptive rejection sampling

  • Using envelope functions to define Q

Eric Xing 8

Unnormalized importance sampling

Suppose sampling from P(·) is hard. Suppose we can sample from a "simpler" proposal distribution

Q(·) instead.

If Q dominates P (i.e., Q(x) > 0 whenever P(x) > 0), we can

sample from Q and reweight:

∑ ∑ ∫ ∫

= ≈ = =

m m m m m m m m

w x M X Q x x Q x P x M dx x Q x Q x P x dx x P x X ) ( ) ( ~ where ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( f f f f f 1 1

slide-5
SLIDE 5

5

Eric Xing 9

Normalized importance sampling

Suppose we can only evaluate P'(x) = αP(x) (e.g. for an

MRF).

We can get around the nasty normalization constant α as

follows:

  • Now

α = = = ⇒

∫ ∫

dx x P dx x Q x Q x P X r

Q

) ( ' ) ( ) ( ) ( ' ) ( ) ( ) ( ' ) ( Let x Q x P X r =

∑ ∑ ∑ ∑ ∫ ∫ ∫ ∫

= = ≈ = = =

m m m m m m m m m m m m m P

r r w w x X Q x r r x dx x Q x r dx x Q x r x dx x Q x Q x P x dx x P x X re whe ) ( ) ( ~ where ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ' ) ( ) ( ) ( ) ( f f f f f f α 1

Eric Xing 10

Normalized vs unnormalized importance sampling

  • Unormalized importance sampling is unbiased:
  • Normalized importance sampling is biased, eg for M = 1:
  • However, the variance of the normalized importance sampler is

usually lower in practice.

  • Also, it is common that we can evaluate P'(x) but not P(x), e.g.

P(x|e) = P'(x, e)/P(e) for Bayes net, or P(x) = P'(x)/Z for MRF. [ ]=

) ( ) ( X w X f EQ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ) ( ) ( ) (

1 1 1

x w x w x f EQ

slide-6
SLIDE 6

6

Eric Xing 11

Likelhood weighting

  • We now apply normalized importance sampling to a Bayes net.
  • The proposal Q is gotten from the mutilated BN where we clamp

evidence nodes, and cut their incoming arcs. Call this PM.

  • The unnormalized posterior is P'(x) = P(x, e).
  • So for f(Xi) = δ(Xi = xi), we get

where .

∑ ∑

= = =

m m m i m i m i i

w x x w e x X P ) ( ) | ( ˆ δ ) ( / ) , (

m M m m

x P e x P w ′ =

Eric Xing 12

Likelhood weighting algorithm

slide-7
SLIDE 7

7

Eric Xing 13

Efficiency of likelihood weighting

The efficiency of importance sampling depends on how close

the proposal Q is to the target P.

Suppose all the evidence is at the roots. Then Q = P(X|e), and

all samples have weight 1.

Suppose all the evidence is at the leaves. Then Q is the prior,

so many samples might get small weight if the evidence is unlikely.

We can use arc reversal to make some of the evidence nodes

be roots instead of leaves, but the resulting network can be much more densely connected.

Eric Xing 14

Weighted resampling

Problem of importance sampling: depends on how well Q

matches P

  • If P(x)f(x) is strongly varying and has a significant proportion of its mass

concentrated in a small region, rm will be dominated by a few samples

  • Note that if the high-prob mass region of Q falls into the low-prob mass

region of P, the variance of can be small even if the samples come from low-prob region of P and potentially erroneous .

Solution

  • Use heavy tail Q.
  • Weighted resampling

* * * * * * * * * * ) ( / ) (

m m m

x Q x P r =

∑ ∑

= =

m m m l l l m m m

r r x Q x P x Q x P w ) ( / ) ( ) ( / ) (

slide-8
SLIDE 8

8

Eric Xing 15

Weighted resampling

  • Sampling importance resampling (SIR):

1.

Draw N samples from Q: X1 … XN

2.

Constructing weights: w1 … wN ,

3.

Sub-sample x from {X1 … XN} w.p. (w1 … wN)

  • Particular Filtering
  • A special weighted resampler
  • Yield samples from posterior p(Xt|Y1:t)

∑ ∑

= =

m m m l l l m m m

r r x Q x P x Q x P w ) ( / ) ( ) ( / ) (

A A A

Yt Yt+1 Y1 Xt Xt+1 X1 ...

Eric Xing 16

Sketch of Particle Filters

The starting point

  • Thus p(Xt|Y1:t) is represented by

A sequential weighted resampler

  • Time update
  • Measurement update

t t t t t t t

dX X p X X p X p

+ +

= ) | ( ) | ( ) | (

: : 1 1 1 1

Y Y

− − −

= =

t t t t t t t t t t t t t t

dX X Y p X p X Y p X p Y X p X p ) | ( ) | ( ) | ( ) | ( ) , | ( ) (

: : : : 1 1 1 1 1 1 1

Y Y Y Y ⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ =

∑ −

= M m m t t m t t

X Y p X Y p m t t t m t

w X p X

1

1 1 ) | ( ) | ( :

), | ( ~ Y

+ + + + + + + + +

=

1 1 1 1 1 1 1 1 1 1 1 1 t t t t t t t t t t t

dX X Y p X p X Y p X p X p ) | ( ) | ( ) | ( ) | ( ) (

: : :

Y Y Y

+

=

m t t m t

X X p w ) | (

1

(sample from a mixture model)

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ = ⇒

∑ + + +

= + + + + M m m t t m t t

X Y p X Y p m t t t m t

w X p X

1 1 1 1 1

1 1 1 1 ) | ( ) | ( :

), | ( ~ Y

(reweight)

) | (

t t

X p Y

1 +

slide-9
SLIDE 9

9

Eric Xing 17

PF for switching SSM

Recall that the belief state has O(2t) Gaussian modes

Eric Xing 18

PF for switching SSM

  • Key idea: if you knew the discrete states, you can apply the right

Kalman filter at each time step.

  • So for each old particle m, sample

from the prior, apply the KF (usomg parameters for St

m)

to the old belief state to get an approximation to

  • Useful for online tracking,

fault diagnosis, etc.

) | ( ~

m t t m t

S S P S

1 −

) , ˆ (

| | m t t m t t

P x

1 1 1 1 − − − −

) , | (

: : m t t t

s y X P

1 1

slide-10
SLIDE 10

10

Eric Xing 19

Rao-Blackwellised sampling

  • Sampling in high dimensional spaces causes high variance in the

estimate.

  • RB idea: sample some variables Xp, and conditional on that,

compute expected value of rest Xd analytically:

  • This has lower variance, because of the identity:
  • Hence , so

is a lower variance estimator.

[ ]

[ ] [ ]

) | ( ~ , ) , ( ) , ( ) | ( ) , ( ) , | ( ) | ( ) , ( ) | , ( ) (

) , | ( ) , | ( ) | (

e x p x X x f E dx X x f E e x p dx dx x x f e x x p e x p dx dx x x f e x x p X f E

p m p m d m p e x X p M x p d p e x X p p x p x d d p p d p d p d p d p e X p

m p d p p d p d

∑ ∫ ∫ ∫ ∫

= = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = =

1

[ ] [ ] [ ] [ ] [ ]

p d p p d p d p

X X X E X X X E X X | ) , ( var | ) , ( var ) , ( var τ τ τ + =

[ ] [ ] [ ]

) , ( var | ) , ( var

d p p d p

X X X X X E τ τ ≤

[ ]

p d p d p

X X X f E X X | ) , ( ) , ( = τ

Eric Xing 20

slide-11
SLIDE 11

11

Eric Xing 21

Markov chain Monte Carlo (MCMC)

Importance sampling does not scale well to high dimensions. Rao-Blackwellisation not always possible. MCMC is an alternative. Construct a Markov chain whose stationary distribution is the

target density = P(X|e).

Run for T samples (burn-in time) until the chain

converges/mixes/reaches stationary distribution.

Then collect M (correlated) samples xm . Key issues:

  • Designing proposals so that the chain mixes rapidly.
  • Diagnosing convergence.

Eric Xing 22

Markov Chains

Definition:

  • Given an n-dimensional state space
  • Random vector X = (x1,…,xn)
  • x(t) = x at time-step t
  • x(t) transitions to x(t+1) with prob

P(x(t+1) | x(t),…,x(1)) = T(x(t+1) | x(t)) = T(x(t) x(t+1))

Homogenous: chain determined by state x(0), fixed transition

kernel T (rows sum to 1)

Equilibrium: π(x) is a stationary (equilibrium) distribution if

π(x') = Σxπ(x) T(xx').

i.e., is a left eigenvector of the transition matrix πTT = πTT.

( ) ( )

⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ = 5 5 3 7 75 25 3 5 2 3 5 2 . . . . . . . . . . . .

X1 X2 X3

0.25 0.7 0.5 0.5 0.75 0.3

slide-12
SLIDE 12

12

Eric Xing 23

Markov Chains

An MC is irreducible if transition graph connected An MC is aperiodic if it is not trapped in cycles An MC is ergodic (regular) if you can get from state x to x '

in a finite number of steps.

Detailed balance: prob(x(t)x(i-1)) = prob(x(t-1)x(t))

summing over x(t-1)

Detailed bal stationary dist exists

) | ( ) ( ) | ( ) (

) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 − − −

=

t t t t t t

T p T p x x x x x x

− −

=

) (

) | ( ) ( ) (

) ( ) ( ) ( ) (

1

1 1

t

t t t t

T p p

x

x x x x

Eric Xing 24

Metropolis-Hastings

Treat the target distribution as stationary distribution Sample from an easier proposal distribution, followed by an

acceptance test

This induces a transition matrix that satisfies detailed balance

  • MH proposes moves according to Q(x'|x) and accepts samples with

probability A(x'|x).

  • The induced transition matrix is
  • Detailed balance means
  • Hence the acceptance ratio is

) | ' ( ) | ' ( ) ' ( x x A x x Q x x T = → ) ' | ( ) ' | ( ) ' ( ) | ' ( ) | ' ( ) ( x x A x x Q x x x A x x Q x π π = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ) | ' ( ) ( ) ' | ( ) ' ( , min ) | ' ( x x Q x x x Q x x x A π π 1

slide-13
SLIDE 13

13

Eric Xing 25

Metropolis-Hastings

1.

Initialize x(0)

2.

While not mixing // burn-in

  • x=x(t)
  • t += 1,
  • sample u ~ Unif(0,1)
  • sample x* ~ Q(x*|x)
  • if
  • x(t) = x*

// transition

  • else
  • x(t) = x

// stay in current state

  • Reset t=0, for t =1:N
  • x(t+1)) Draw sample (x(t))

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = < ) | * ( ) ( *) | ( *) ( , min ) | * ( x x Q x x x Q x x x A u π π 1 Function Draw sample (x(t))

Eric Xing 26

Mixing time

The ε mixing time Tε is the minimal number of steps (from any

starting distribution) until Dvar(P(T), π) ≤ ε, where Dvar is the variational distance between the two distance:

Chains with low bandwidth (conductance) regions of space

take a long time to mix.

This arises for GMs with deterministic or highly skewed

potentials.

X1 X2 X3 X4 X5 X7 X6

) ( ) ( sup ) , (

def var

A A

S A 2 1 2 1

µ µ µ µ − =

D

slide-14
SLIDE 14

14

Eric Xing 27

MCMC example

q(x*|x) ~ N(x(i),100) p(x) ~ 0.3 exp(-0.2x2) + 0.7 exp(-0.2(x-10)2)

Eric Xing 28

Summary of MH

Random walk through state space Can simulate multiple chains in parallel Much hinges on proposal distribution Q

  • Want to visit state space where p(X) puts mass
  • Want A(x*|x) high in modes of p(X)
  • Chain mixes well

Convergence diagnosis

  • How can we tell when burn-in is over?
  • Run multiple chains from different starting conditions, wait until they start

“behaving similarly”.

  • Various heuristics have been proposed.
slide-15
SLIDE 15

15

Eric Xing 29

Gibbs sampling

Gibbs sampling is an MCMC algorithm that is especially

appropriate for inference in graphical models.

The procedue

  • we have variable set X={x1, x2, x3,... xN} for a GM
  • at each step one of the variables Xi is selected (at random or according

to some fixed sequences), denote the remaining variables as X-i , and its current value as x-i

(t-1)

  • Using the "alarm network" as an example, say at time t we choose XE, and we

denote the current value assignments of the remaining variables, X-E ,

  • btained from previous samples, as
  • the conditonal distribution p(Xi| x-i

(t-1)) is computed

  • a value xi

(t) is sampled from this distribution

  • the sample xi

(t) replaces the previous sampled value of Xi in X.

  • i.e.,

{ }

) ( ) ( ) ( ) ( ) (

, , ,

1 1 1 1 1 − − − − − −

=

t M t J t A t B t E

x x x x x

) ( ) ( ) ( t E t E t

x x x ∪ =

− − 1 Eric Xing 30

Markov Blanket

Markov Blanket in BN

A variable is independent from

  • thers, given its parents, children

and children‘s parents (d- separation). MB in MRF

A variable is independent all its

non-neighbors, given all its direct neighbors. ⇒ p(Xi| X-i)= p(Xi| MB(Xi)) Gibbs sampling

Every step, choose one variable

and sample it by P(X|MB(X)) based on previous sample.

slide-16
SLIDE 16

16

Eric Xing 31

Gibbs sampling of the alarm network

  • To calculate P(J|B1,M1)
  • Choose (B1,E0,A1,M1,J1) as a

start

  • Evidences are B1, M1,

variables are A, E, J.

  • Choose next variable as A
  • Sample A by

P(A|MB(A))=P(A|B1, E0, M1, J1) suppose to be false.

  • (B1, E0, A0, M1, J1)
  • Choose next random variable

as E, sample E~P(E|B1,A0)

  • ...

MB(A)={B, E, J, M} MB(E)={A, B}

Eric Xing 32

Gibbs sampling is a special case of MH The transition matrix updates each node one at a time using

the following proposal:

This is efficient since for two reasons

  • It leads to samples that is always accepted

Thus

  • It is efficient since only depends on the values in Xi’s Markov

blanket

Gibbs sampling

( )

) | ' ( ) , ' ( ) , (

i i i i i i

x p x x

− − −

= → x x x Q

( )

( ) ( ) ( )

1 1 1 1 , min ) | ' ( ) ( ) | ( ) | ( ) ( ) | ' ( , min ) , ' ( ) , ( ) , ( ) , ( ) , ' ( ) , ' ( , min ) , ( ) , (

'

= ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ → → = →

− − − − − − − − − − − − − − i i i i i i i i i i i i i i i i i i i i i i i i i i

x p p x p x p p x p x x x p x x x p x x x x x x x x x x x x x x x x Q Q A

( )

) | ' ( ) , ' ( ) , (

i i i i i i

x p x x

− − −

= → x x x T

) | (

' i i

x p

x

slide-17
SLIDE 17

17

Eric Xing 33

Scheduling and ordering:

  • Sequential sweeping: in each "epoch" t, touch every r.v. in some order

and yield an new sample, , after every r.v. is resampled

  • Randomly pick an r.v. at each time step

Blocking:

  • Large state space: state vector X comprised of many components (high

dimension)

  • Some components can be correlated and we can sample components

(i.e., subsets of r.v.,) one at a time

Gibbs sampling can fail if there are deterministic constraint

Gibbs sampling

) (t

x

X Y Z

Z is xor

  • Suppose we observe Z = 1. The posterior has 2 modes: P(X = 1, Y = 0|Z = 1)

and P(X = 0, Y = 1|Z = 1). if we start in mode 1, P(X|Y = 0, Z = 1) leaves X = 1, so we can’t move to mode 2 (Reducible Markov chain).

  • If all states have non-zero probability, the MC is guaranteed to be regular.
  • Sampling blocks of variables at a time can help improve mixing.

Eric Xing 34

Chains

slide-18
SLIDE 18

18

Eric Xing 35

Chains

Eric Xing 36

The of simulation

Run several chains Start at over-dispersed

points

Monitor the log lik. Monitor the serial

correlations

Monitor acceptance ratios Re-parameterize (to get

  • approx. indep.)

Re-block (Gibbs) Collapse (int. over other

pars.)

Run with troubled pars.

fixed at reasonable vals.

slide-19
SLIDE 19

19

Eric Xing 37

Collapsed Gibbs sampling of M3 model (Tom Griffiths & Mark Steyvers)

Collapsed Gibbs sampling

  • Integrate out π

For variables z = z1, z2, …, zn Draw zi

(t+1) from P(zi|z-i, w)

z-i = z1

(t+1), z2 (t+1),…, zi-1 (t+1), zi+1 (t), …, zn (t)

Eric Xing 38

Gibbs sampling

Need full conditional distributions for variables Since we only sample z we need

number of times word w assigned to topic j number of times topic j used in document d

β

slide-20
SLIDE 20

20

Eric Xing 39

Gibbs sampling

i wi di zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2

iteration 1

Eric Xing 40

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ?

iteration 1 2

slide-21
SLIDE 21

21

Eric Xing 41

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ?

iteration 1 2

Eric Xing 42

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 ?

iteration 1 2

slide-22
SLIDE 22

22

Eric Xing 43

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 ?

iteration 1 2

Eric Xing 44

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 ?

iteration 1 2

slide-23
SLIDE 23

23

Eric Xing 45

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 ?

iteration 1 2

Eric Xing 46

Gibbs sampling

i wi di zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 2 ?

iteration 1 2

slide-24
SLIDE 24

24

Eric Xing 47

Gibbs sampling

i wi di zi zi zi

1 2 3 4 5 6 7 8 9 10 11 12 . . . 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE . . . JOY 1 1 1 1 1 1 1 1 1 1 2 2 . . . 5 2 2 1 2 1 2 2 1 2 1 1 1 . . . 2 2 1 1 2 2 2 2 1 2 2 1 2 . . . 1 … 2 2 2 1 2 2 2 1 2 2 2 2 . . . 1

iteration 1 2 … 1000

Eric Xing 48

Document tagging