Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye - - PowerPoint PPT Presentation
Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye - - PowerPoint PPT Presentation
Population Markov Chain Monte Carlo and Genetic Networks Fujun Ye MSc in Artificial Intelligence Supervised by Dirk Husmeier Outline Introduction MCMCMC MCMCMC for missing values Result Evaluation (complete data) Result
Outline
Introduction MCMCMC MCMCMC for missing values Result Evaluation (complete data) Result Evaluation (missing values) Summary
Introduction
Genetic Network Clustering and Differential equation Bayesian Network MCMC
Genetic Network
+ A F B b ab f2 a f eq
- +
- +
Clustering
Differential Equation
Advantage
provide detailed understanding of the biological systems
Shortcoming
short of data noisy data
Inferring Bayesian Network From Expression Data
∏
=
=
n i i G i n
X Pa X P X X X P
1 2 1
)) ( | ( ) ,..., , (
Bayesian Network
A C B D E
) ( ) | ( ) | ( ) , | ( ) | ( ) , , , , ( a P a b P a c P c b d P d e P e d c b a P =
Problems
The number of different network structures grows
super exponentially with the number of nodes
M M’ P(M|D) Where the data set is large, the optimal structure M’ is well defined Where the data set is small, there are many networks which can explain the data fairly well. M P(M|D) M’
MCMC
MCMC samples networks from its posterior
distribution
Calculate the posterior probability of a
feature
) | ( D f P
=
∑ ∑
i i i i i
D M P M f P D M P ) | ( ) | ( ) | (
∑
=
i i i k k k
M P M D P M P M D P D M P ) ( ) | ( ) ( ) | ( ) | (
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Coincidence dependence
P (M|D) M
Escape from local optima
using traditional MCMC
Small step size versus big step size
P (M|D) M
Problems
Huge search space and coincidence
dependence — Prescreening is important!
Local optima — Traversal operator is
important!
Fixed step size —
Varied step size is more reasonable
MCMCMC
Metropolis-coupled Markov Chain Monte Carlo
(MCMCMC)
Pre-processing method Traversal operators Algorithm MCMCMC for missing values
MCMCMC
1 2
T T >
1
1 =
T
2 3
T T >
For each chain, move a step based on Chain swap ) ) | ( ) | ( ) ( ) | ( ) ( ) | ( , 1 min( ) , (
' ' 1 ' ' '
M M Q M M Q M P M D P M P M D P M M A
T
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
M k i M k i a
M M M M M T T T T T S ... ... ... ... ... ...
2 1 2 1
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
M i k M k i b
M M M M M T T T T T S ... ... ... ... ... ...
2 1 2 1
Acceptance Probability
[ ] [ ] [ ] [ ]
} ) ( ) | ( ) ( ) | ( ) ( ) | ( ) ( ) | ( , 1 min
1 1 1 1
⎪ ⎭ ⎪ ⎬ ⎫ ⎪ ⎩ ⎪ ⎨ ⎧ =
k i i k
T k k T i i T k k T i i a
M P M D P M P M D P M P M D P M P M D P P
Pre-processing method
∫
= θ θ θ d M P M D P M D P ) | ( ) , | ( ) | (
Penalize complex model | | * | |
n n nv
v
n n
π α α
π =
n n n v
n v nv π π
α α =
∑
∏∏ ∏
Γ + Γ + Γ Γ =
n v nv nv nv n n n
n n n n n n n n n n n
n n M D p
π π π π π π π
α α α α ) ( ) ( ) ( ) ( ) | (
The log likelihood is
∑
=
n n D
n score M D p ) , , ( )) | ( log( π where
∑
+ + Γ − Γ =
n n n n n
n nv n n
n D n score
π π π π
α α π ))] ( log( )) ( [log( ) , , ( ∑∑ Γ − + Γ
n n n n n n n n
nv v nv nv
n
π π π π
α α ))] ( log( ) ) ( [log(
Use some max fan in Find all possible parents-configurations for each node and delete low
score parents-configurations
Keep C parents-configurations for each node and cardinality
Threshold is set as:
= θ
m scoresl m scoresl scoresh / / ) ( * + − λ
π π’ score(n,π,D) When data is quite sparse and noisy score(n,π,D) π π’ Using pre-screening method
Traversal operators
Importance sampling---
Sample a parents-configuration for a node
= ) | ( nodei p
j
π
C n D i score C D i score
n
- ld
k k k k j
) 1 ( ) , , ( ) , , (
_ , 1
− + +
∑
≠ =
π π
= ) | ( ) | ( Mold Mnew Q Mnew Mold Q
∑ ∑
≠ = ≠ =
− + + − + +
n new k k k k new k n
- ld
k k k k
- ld
k
C n D i score C D i score C n D i score C D i score
_ , 1 _ _ , 1 _
) 1 ( ) , , ( ) , , ( ) 1 ( ) , , ( ) , , ( π π π π
= ) | ( ) | ( Mold D P Mnew D P
)) , , ( ) , , ( exp(
_ _
D i score D i score
- ld
k new k
π π −
∑
=
=
n i
Ki Ki nodei P
1
) (
DIN sampling --- If the new network is loopy
Step 1 2 3 1 2 1 3 The old model 2 3 1 Step 2 2 3 1 The new model
) ) | ( * ) ( * ) | ( ) | ( * ) ( * ) | ( , 1 ( ) | ( Mold Mnew Q Mold P Mold D P Mnew Mold Q Mnew P Mnew D P Min Mold Mnew A =
)) ), ( , ( ) ), ( , ( exp( )) ), ( , ( ) ), ( , ( exp( ) | ( ) | (
1 1
D n
- n
score D n
- n
score D n n n score D n n n score Mold D P Mnew D P
i i n j j j i i n j j j
π π π π + + =
∑ ∑
= =
) | ( ) | ( Mold Mnew Q Mnew Mold Q = I simply us an approximation since it is quite time consuming to calculate the proposal probability
∑ ∑
≠ = ≠ =
− + + − + +
n new k k k k new k n
- ld
k k k k
- ld
k
C n D i score C D i score C n D i score C D i score
_ , 1 _ _ , 1 _
) 1 ( ) , , ( ) , , ( ) 1 ( ) , , ( ) , , ( π π π π
DIN proposal Traditional MCMC
Algorithm
Initialization Each iteration
- Move a step for every chain
- Chain swap
Keep the first chain
T>=1 Chain Swap DIN Sampling Illegal Legal M1 M2 … Mm Importance sampling M1’ Importance sampling Mi’ A (Mi’, Mi) Mi’ S( m Chains) S’ Pa (S’, S) T=1
MCMCMC for missing values
I4 I3 I10 I6 2 2 1 1 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 1 2 2 2 1 1 2 1 2 2 1 2
X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 ? 1 1 2 ? 2 ? ? 1 1 1 2 1 1 2 ? 1 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ? I1 I2 I3 I4 I5 I6 … 1 3 1 5 2 1 2 7 3 4 3 9 …
T=1 T>=1 Chain Swap
DIN Sampling
Illegal Legal M1, D1 M2, D2 … Mm, Dm Importance sampling M1’ Importance sampling Mi’ A (Mi’, Mi| Di) Mi’ S( m Chains) S’ Pa (S’, S) T=1 Dmi Dmi’ Di’ Observed data A (Di’, Di| Gi)
Proposal method for before
burn in
) , , | ( m n M v Q
n
=∑
+ +
n m n m n
v v v v v
N N ) 1 ( 1
π π
) , , | , ( m n M v v Q
mis n π
= ∑
+ +
mis n nm mis n nm mis n
v v v v v v v
N N
π
π π π π ,
) 1 ( 1
) , | ( n M v Q
n
=∑
+ +
n n n
v v v
N N ) 1 ( 1
X1 X2 X3 X4 X5 1 ? 2 1 1 2 2 2 1 2 ? 1 1 2 1 1 2 ? ? ? ? 1 1 1 2 1 1 2 ? 1 1 ? 2 1 1 2 1 1 2 ? 2 2 ? 1 2 1 2 2 ? ?
1 2 5 4 3
Acceptance probability ) , (
'
MissVal MissVal Accept = ) ) | ( ) | ( ) | ( ) | ( , 1 min(
' ' '
M D P MissVal MissVal Q M D P MissVal MissVal Q
After burn in
Acceptance probability
) , (
'
MissVal MissVal Accept
=
) ) | ( ) | ( ) | ( ) | ( , 1 min(
' ' '
M D P MissVal MissVal Q M D P MissVal MissVal Q
) | (
' MissVal
MissVal Q = ∏ ∑
Ω ∈
+ +
) ( _
) 1 ( 1
cmis i j ij new i
N N ) | (
'
MissVal MissVal Q = ∏ ∑
Ω ∈
+ +
) ( _
) 1 ' ( 1
cmis i j ij
- ld
i
N N
Result Evaluation (complete data)
fn tp tp sensitivty + = fp tn tn y specificit + = ary complement fp tn tn y specificit + − = 1 = fp tn fp + fn tp
2 1 3 2 1 3
fp
4 4
tn tn
ROC curve
tp is the number of true positive edges. fn is the number of false negative edges. fp is the number of false positive edges. tn is the number of true negative edges.
Model Genetic Network
MCMCMC against order MCMC MCMCMC against structure MCMC MCMCMC against Population MCMC
- Temperatures= [1, 1, 3, 9, 30]
- Keep at most 10 parents-configurations for each node and cardinality.
- With 60000 iterations: 30000 burn in and keep the last 30000 samples.
Alarm Network
Arabidopsis data
Result Evaluation (missing values)
Model Genetic Network
Before burn in(30000 burn in, 30000 iterations After burn in 40000 iterations Temp=[1,1,3,9,12]
- The ROC curve for noise=0.2 data=200 with different missing rate
- Temp = [1, 1, 3, 9, 12]
- Use 30000 burn in and 30000 iterations.
Every 10 steps keep one sample. (before burn in algorithm)
B cell Lymphoma data
Summary
MCMCMC Order MCMC Structure MCMC Population MCMC
Problems with MCMCMC
T=1 T=50 iterations
logP(D|M)+ logP(M)