Using selective pressure to improve protein Aude GRELAUD - - PowerPoint PPT Presentation

using selective pressure to improve protein
SMART_READER_LITE
LIVE PREVIEW

Using selective pressure to improve protein Aude GRELAUD - - PowerPoint PPT Presentation

Using selective pressure to improve protein tridimensional structure prediction Using selective pressure to improve protein Aude GRELAUD tridimensional structure prediction Context Markov random fields Aude GRELAUD 1 , 2 Jean-Michel


slide-1
SLIDE 1

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Using selective pressure to improve protein tridimensional structure prediction

Aude GRELAUD1,2 Jean-Michel MARIN 3 , Christian P . ROBERT1 , François RODOLPHE2

1 Cérémade, Université Paris Dauphine et Laboratoire de statistique, CREST-INSEE 2 Unite Mathématique, Informatique et Génome, INRA 3 INRIA Saclay

MIEP Hameau de l’Etoile, june 2008

slide-2
SLIDE 2

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-3
SLIDE 3

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Aim

Predict the tridimensional structure of the protein Knowning amino acid sequence Met Thr Gln Cys

·········

slide-4
SLIDE 4

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Existing methods

  • Experimental methods :
  • X-ray cristallography
  • Nuclear magnetic resonance spectroscopy
  • Cryomicroscopy

Expensive and slow, but provide the exact 3D structure

  • Computational methods :
  • Based on homologies with proteins of known structure :

methods based on sequence similarity, protein threading

  • De novo prediction

Gives several possible 3D structures, with no criterion of

choice

slide-5
SLIDE 5

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Purpose : build a ranking method based on a phylogenetic stability criterion

  • In a 3D structure, amino acids in contact frequently have

similar modifi cation tolerances

  • Criterion : selective pressure sequence
slide-6
SLIDE 6

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Data

  • First step : Estimate the selective pressure sequence

ω1,...ωn on a multiple aligment of homologs

Seq : caa agg tgc tta H1 : cat agg tgc gta H2 : cat tgg tgc cta H3 : aat tgg tgc ctg

↓ ω1 ω2 ω3 ω4

  • m folding candidates
  • r
slide-7
SLIDE 7

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Statistical tools

  • Markov random fi elds
  • ABC (Approximate Bayesian Computation)
  • Bayesian model choice
slide-8
SLIDE 8

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-9
SLIDE 9

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Defi nition

  • Markov chain :
  • Markov random fi eld : Markov chain generalisation
slide-10
SLIDE 10

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Defi nition (2)

  • State at a point i only depends on the state of its neighbours

n(i) :

π(xi = j|x−i) = π(xi = j|zn(i))

  • Hammersley-Clifford theorem :

P(X = x) = 1 Z exp(−U(x)) with

  • U(x) : potential

U(x) = ∑

c∈C

Vc(x) U(x) = −θ ∑

(i,j):i s ∼j

1{xi=xj}

  • Z : normalizing constant

Z = ∑

x

exp(−U(x))

slide-11
SLIDE 11

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-12
SLIDE 12

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Bayesian modelisation

  • Prior distribution :
  • (θ) ∼ π(θ)
  • Likelihood :

(X|θ) ∼ MRF(θ)

f(x|θ) = 1 Zθ exp(θ ∑

(i,j):i∼j

1{xi=xj})

Target : Posterior distribution of θ

slide-13
SLIDE 13

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Parameter posterior distribution

MCMC methods : Hastings-Metropolis algorithm :

  • Proposal : θ′ ∼ p(θ′|θ(t))
  • θ(t+1) = θ′ with probability

min{1,

1 Zθ′ qθ′(X) 1 Zθ(t) qθ(t)(X)

p(θ(t)|θ′) p(θ′|θ(t))

π(θ′) π(θ(t))}

Ratio involves intractable normalizing constants Zθ′ and Zθ(t)

slide-14
SLIDE 14

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

ABC : Approximate Bayesian Computation

  • Bayesian inference without using likelihood
  • Idea : Data suffi ciently close provide similar parameter

posterior distribution

  • What we need :
  • Simulate data given parameter values
  • Summary statistics (suffi cient)
  • Calculate closeness between our data (X 0) and simulated

data (X i∗) : distance between summary statistics

slide-15
SLIDE 15

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

ABC Algorithm

  • Sufficient statistic : S(X) = ∑(i,j):i∼j 1{xi=xj}
  • Distance : d(S(X 0),S(X i∗)) = (S(X 0)− S(X i∗))2
  • Algorithm :
  • Generate θi∗ ∼ π(θi∗)
  • Generate (X|θi∗) ∼ MRF(θi∗)
  • Calculate di = d(S(X 0),S(X i∗))
  • Accept θi∗ if di < ε
  • Result : sample of independent draws from f(θ|d < ε)

Good approximation of f(θ|X 0)

  • In practice, ε is a 1% quantile of d
slide-16
SLIDE 16

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-17
SLIDE 17

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Bayesian hierarchical modelisation

1 model ←

→ 1 neighborhood / 3D structure

  • Prior distributions :
  • s ∼ π(s)
  • (θs|s) ∼ πs(θs)
  • Likelihood :

(X|θs,s) ∼ MRF(θs,s)

fs(x|θs) = 1 Zθs,s exp(θs ∑

(i,j):i

s

∼j

1{xi=xj})

slide-18
SLIDE 18

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Bayes factor defi nition

  • BF0/1

=

P(s=0|X) P(s=1|X) P(s=0) P(s=1)

=

R f0(X|θ0)π0(θ0)dθ0 R f1(X|θ1)π1(θ1)dθ1

  • Interpretation :
  • BF > 1 : Model 0
  • BF < 1 : Model 1
  • Jeffreys scale :

< 10−2 [10−2,10−3/2] [10−3/2,10−1] [10−1,10−1/2] > 102 [103/2,102] [101,103/2] [101/2,101]

decisive very hard hard substantial

slide-19
SLIDE 19

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Another way to write the Bayes factor

P(Si(X) = s|θi)

=

X:Si(X)=s

fi(X|θi)

=

1 Zθi,i exp(θi s) card{X : Si(X) = s} BF0/1

=

card{X : S1(X) = s1} card{X : S0(X) = s0} R P(S0(X) = s0|θ0)π0(θ0)dθ0 R P(S1(X) = s1|θ1)π1(θ1)dθ1

slide-20
SLIDE 20

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

ABC algorithm

  • Vector of summary statistics : S(X) = S1(X),..Sm(X) avec

Ss(X) = ∑(i,j):i s

∼j 1{xi=xj}

  • Distance : d(S(X 0),S(X i∗)) = ∑s(Ss(X 0)− Ss(X i∗))2
  • Algorithm :
  • Generate si∗ ∼ π(s)
  • Generate (θsi∗|si∗) ∼ πsi∗(θ∗

si∗)

  • Generate (X|θ∗

si∗,i∗) ∼ MRF(θ∗ si∗,i∗)

  • Calculate di = d(S(X 0),S(X i∗))
  • Accept (si∗,θ∗

si∗) if di < ε

slide-21
SLIDE 21

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

  • Result : ((si∗,θ∗

si∗)i)

card(si∗ = 0)

card(si∗ = 1) estimate of R P(S0(X) = s0|θ0)π0(θ0)dθ0 R P(S1(X) = s1|θ1)π1(θ1)dθ1

  • Calculate card{X : S1(X) = s1}

card{X : S0(X) = s0} to obtain BF

slide-22
SLIDE 22

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-23
SLIDE 23

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Models

  • M0 : iid case, Bernouilli(p)
  • f0(X|θ,m = 0) =

1 Zθ,0 exp(θ∑i 1{xi=1})

  • p =

exp(θ) 1+exp(θ)

  • M1 : Markov chain with transition matrix P
  • f(X|θ,1) =

exp(2θ ∑n−1

i=1 1{xi =xi+1})

2(1+exp(2θ))n−1

  • P =
  • exp(2θ)

1+exp(2θ) 1 1+exp(2θ) 1 1+exp(2θ) exp(2θ) 1+exp(2θ)

slide-24
SLIDE 24

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Parameter estimation

  • Comparison ML/ABC estimates : |

θabc − θmv|

1stQu. Median Mean 3rdQu. Var 1.68e − 03 3.28e − 03 3.27e − 03 5.001e − 03 3.89e − 06

slide-25
SLIDE 25

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Model choice

BF = Cs1

n−1

Cs0

n

card(si∗ = 0) card(si∗ = 1)

  • BF =

R exp(θ0S0(X))

(1+exp(θ0))n π0(θ0)dθ0

R

exp(2θ1S1(X))

(1+exp(2θ1))n−1π1(θ1)dθ1

  • Comparison on 10.000 simulated data :

BF

  • BF

M0 ? M1 M0 4684 2 30 ? 158 339 53 M1 5 261 4464

slide-26
SLIDE 26

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

1

Context

2

Markov random fi elds

3

Parameter posterior distribution

4

Model choice

5

Simulations

6

Conclusion

slide-27
SLIDE 27

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion

Conclusion

  • Conclusion
  • Parameter estimation in MRF is not too expensive using ABC
  • BF is a good way to choose between some neighborhoods
  • Perspectives :
  • Estimate / calculate the number of confi gurations given a

value of S

  • Apply on biological data
slide-28
SLIDE 28

Using selective pressure to improve protein tridimensional structure prediction Aude GRELAUD Context Markov random fields Parameter posterior distribution Model choice Simulations Conclusion