[PPT] - Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye PowerPoint Presentation

SLIDE 1

Population Markov Chain Monte Carlo and Genetic Networks

Fujun Ye

MSc in Artificial Intelligence Supervised by Dirk Husmeier

SLIDE 2

Outline

Introduction MCMCMC MCMCMC for missing values Result Evaluation (complete data) Result Evaluation (missing values) Summary

SLIDE 3

Introduction

Genetic Network Clustering and Differential equation Bayesian Network MCMC

SLIDE 4

Genetic Network

+ A F B b ab f2 a f eq

+
+

SLIDE 5

Clustering

SLIDE 6

Differential Equation

Advantage

provide detailed understanding of the biological systems

Shortcoming

short of data noisy data

SLIDE 7

Inferring Bayesian Network From Expression Data

∏

=

n i i G i n

X Pa X P X X X P

1 2 1

)) ( | ( ) ,..., , (

Bayesian Network

A C B D E

) ( ) | ( ) | ( ) , | ( ) | ( ) , , , , ( a P a b P a c P c b d P d e P e d c b a P =

SLIDE 8

Problems

The number of different network structures grows

super exponentially with the number of nodes

SLIDE 9

M M’ P(M|D) Where the data set is large, the optimal structure M’ is well defined Where the data set is small, there are many networks which can explain the data fairly well. M P(M|D) M’

SLIDE 10

MCMC

MCMC samples networks from its posterior

distribution

Calculate the posterior probability of a

feature

) | ( D f P

=

∑ ∑

i i i i i

D M P M f P D M P ) | ( ) | ( ) | (

∑

=

i i i k k k

M P M D P M P M D P D M P ) ( ) | ( ) ( ) | ( ) | (

SLIDE 11

1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

Coincidence dependence

SLIDE 12

P (M|D) M

Escape from local optima

using traditional MCMC

SLIDE 13

Small step size versus big step size

P (M|D) M

SLIDE 14

Problems

Huge search space and coincidence

dependence — Prescreening is important!

Local optima — Traversal operator is

important!

Fixed step size —

Varied step size is more reasonable

SLIDE 15

MCMCMC

Metropolis-coupled Markov Chain Monte Carlo

(MCMCMC)

Pre-processing method Traversal operators Algorithm MCMCMC for missing values

SLIDE 16

MCMCMC

1 2

T T >

1

1 =

T

2 3

T T >

SLIDE 17

For each chain, move a step based on Chain swap ) ) | ( ) | ( ) ( ) | ( ) ( ) | ( , 1 min( ) , (

' ' 1 ' ' '

M M Q M M Q M P M D P M P M D P M M A

T

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

M k i M k i a

M M M M M T T T T T S ... ... ... ... ... ...

2 1 2 1

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =

M i k M k i b

M M M M M T T T T T S ... ... ... ... ... ...

2 1 2 1

Acceptance Probability

[ ] [ ] [ ] [ ]

} ) ( ) | ( ) ( ) | ( ) ( ) | ( ) ( ) | ( , 1 min

1 1 1 1

⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ =

k i i k

T k k T i i T k k T i i a

M P M D P M P M D P M P M D P M P M D P P

SLIDE 18

Pre-processing method

∫

= θ θ θ d M P M D P M D P ) | ( ) , | ( ) | (

Penalize complex model | | * | |

n n nv

v

n n

π α α

π =

n n n v

n v nv π π

α α =

∑

∏∏ ∏

Γ + Γ + Γ Γ =

n v nv nv nv n n n

n n n n n n n n n n n

n n M D p

π π π π π π π

α α α α ) ( ) ( ) ( ) ( ) | (

SLIDE 19

The log likelihood is

∑

=

n n D

n score M D p ) , , ( )) | ( log( π where

∑

+ + Γ − Γ =

n n n n n

n nv n n

n D n score

π π π π

α α π ))] ( log( )) ( [log( ) , , ( ∑∑ Γ − + Γ

n n n n n n n n

nv v nv nv

n

π π π π

α α ))] ( log( ) ) ( [log(

SLIDE 20

Use some max fan in Find all possible parents-configurations for each node and delete low

score parents-configurations

Keep C parents-configurations for each node and cardinality

Threshold is set as:

= θ

m scoresl m scoresl scoresh / / ) ( * + − λ

SLIDE 21

π π’ score(n,π,D) When data is quite sparse and noisy score(n,π,D) π π’ Using pre-screening method

SLIDE 22

Traversal operators

Importance sampling---

Sample a parents-configuration for a node

= ) | ( nodei p

j

π

C n D i score C D i score

n

ld

k k k k j

) 1 ( ) , , ( ) , , (

_ , 1

− + +

∑

≠ =

π π

= ) | ( ) | ( Mold Mnew Q Mnew Mold Q

∑ ∑

≠ = ≠ =

− + + − + +

n new k k k k new k n

ld

k k k k

ld

k

C n D i score C D i score C n D i score C D i score

_ , 1 _ _ , 1 _

) 1 ( ) , , ( ) , , ( ) 1 ( ) , , ( ) , , ( π π π π

= ) | ( ) | ( Mold D P Mnew D P

)) , , ( ) , , ( exp(

_ _

D i score D i score

ld

k new k

π π −

∑

=

n i

Ki Ki nodei P

1

) (

SLIDE 23

DIN sampling --- If the new network is loopy

Step 1 2 3 1 2 1 3 The old model 2 3 1 Step 2 2 3 1 The new model

SLIDE 24

) ) | ( * ) ( * ) | ( ) | ( * ) ( * ) | ( , 1 ( ) | ( Mold Mnew Q Mold P Mold D P Mnew Mold Q Mnew P Mnew D P Min Mold Mnew A =

)) ), ( , ( ) ), ( , ( exp( )) ), ( , ( ) ), ( , ( exp( ) | ( ) | (

1 1

D n

n

score D n

n

score D n n n score D n n n score Mold D P Mnew D P

i i n j j j i i n j j j

π π π π + + =

∑ ∑

= =

) | ( ) | ( Mold Mnew Q Mnew Mold Q = I simply us an approximation since it is quite time consuming to calculate the proposal probability

∑ ∑

≠ = ≠ =

− + + − + +

n new k k k k new k n

ld

k k k k

ld

k

C n D i score C D i score C n D i score C D i score

_ , 1 _ _ , 1 _

) 1 ( ) , , ( ) , , ( ) 1 ( ) , , ( ) , , ( π π π π

SLIDE 25

DIN proposal Traditional MCMC

SLIDE 26

Algorithm

Initialization Each iteration

Move a step for every chain
Chain swap

Keep the first chain

SLIDE 27

T>=1 Chain Swap DIN Sampling Illegal Legal M1 M2 … Mm Importance sampling M1’ Importance sampling Mi’ A (Mi’, Mi) Mi’ S( m Chains) S’ Pa (S’, S) T=1

SLIDE 28

SLIDE 29

MCMCMC for missing values

I4 I3 I10 I6 2 2 1 1 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 1 2 2 2 1 1 2 1 2 2 1 2

X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 ? 1 1 2 ? 2 ? ? 1 1 1 2 1 1 2 ? 1 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ? I1 I2 I3 I4 I5 I6 … 1 3 1 5 2 1 2 7 3 4 3 9 …

SLIDE 30

T=1 T>=1 Chain Swap

DIN Sampling

Illegal Legal M1, D1 M2, D2 … Mm, Dm Importance sampling M1’ Importance sampling Mi’ A (Mi’, Mi| Di) Mi’ S( m Chains) S’ Pa (S’, S) T=1 Dmi Dmi’ Di’ Observed data A (Di’, Di| Gi)

SLIDE 31

Proposal method for before

burn in

) , , | ( m n M v Q

n

=∑

+ +

n m n m n

v v v v v

N N ) 1 ( 1

π π

) , , | , ( m n M v v Q

mis n π

= ∑

+ +

mis n nm mis n nm mis n

v v v v v v v

N N

π

π π π π ,

) 1 ( 1

) , | ( n M v Q

n

=∑

+ +

n n n

v v v

N N ) 1 ( 1

SLIDE 32

X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 2 1 1 2 ? ? ? ? 1 1 1 2 1 1 2 ? 1 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ?

1 2 5 4 3

SLIDE 33

Acceptance probability ) , (

'

MissVal MissVal Accept = ) ) | ( ) | ( ) | ( ) | ( , 1 min(

' ' '

M D P MissVal MissVal Q M D P MissVal MissVal Q

SLIDE 34

After burn in

Acceptance probability

) , (

'

MissVal MissVal Accept

=

) ) | ( ) | ( ) | ( ) | ( , 1 min(

' ' '

M D P MissVal MissVal Q M D P MissVal MissVal Q

) | (

' MissVal

MissVal Q = ∏ ∑

Ω ∈

+ +

) ( _

) 1 ( 1

cmis i j ij new i

N N ) | (

'

MissVal MissVal Q = ∏ ∑

Ω ∈

+ +

) ( _

) 1 ' ( 1

cmis i j ij

ld

i

N N

SLIDE 35

Result Evaluation (complete data)

fn tp tp sensitivty + = fp tn tn y specificit + = ary complement fp tn tn y specificit + − = 1 = fp tn fp + fn tp

2 1 3 2 1 3

fp

4 4

tn tn

ROC curve

tp is the number of true positive edges. fn is the number of false negative edges. fp is the number of false positive edges. tn is the number of true negative edges.

SLIDE 36

Model Genetic Network

SLIDE 37

MCMCMC against order MCMC MCMCMC against structure MCMC MCMCMC against Population MCMC

SLIDE 38

Temperatures= [1, 1, 3, 9, 30]
Keep at most 10 parents-configurations for each node and cardinality.
With 60000 iterations: 30000 burn in and keep the last 30000 samples.

SLIDE 39

Alarm Network

SLIDE 40

SLIDE 41

Arabidopsis data

SLIDE 42

Result Evaluation (missing values)

Model Genetic Network

Before burn in(30000 burn in, 30000 iterations After burn in 40000 iterations Temp=[1,1,3,9,12]

SLIDE 43

SLIDE 44

SLIDE 45

The ROC curve for noise=0.2 data=200 with different missing rate
Temp = [1, 1, 3, 9, 12]
Use 30000 burn in and 30000 iterations.

Every 10 steps keep one sample. (before burn in algorithm)

SLIDE 46

B cell Lymphoma data

SLIDE 47

Summary

MCMCMC Order MCMC Structure MCMC Population MCMC

SLIDE 48

Problems with MCMCMC

T=1 T=50 iterations

logP(D|M)+ logP(M)

SLIDE 49