Quantifying the Equilibrium and Irreversibility Properties of the - - PowerPoint PPT Presentation

quantifying the equilibrium and irreversibility
SMART_READER_LITE
LIVE PREVIEW

Quantifying the Equilibrium and Irreversibility Properties of the - - PowerPoint PPT Presentation

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process Federico Squartini and Peter. F . Arndt Max Planck Institute for Molecular Genetics Quantifying the Equilibrium and Irreversibility Properties of


slide-1
SLIDE 1

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process

Federico Squartini and Peter. F . Arndt Max Planck Institute for Molecular Genetics

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.1

slide-2
SLIDE 2

We will talk about disequilibrium and

  • irreversibility. . .

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.2

slide-3
SLIDE 3

Markovian Sequence Evolution

Nucleotide substitution models: i.i.d Markov models of evolution, i.e. a master equation:

∂ ∂tρβ(t) =

  • α

Qβαρα(t) α, β ∈ {A, G, C, T}

G T C A

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.3

slide-4
SLIDE 4

Markovian Sequence Evolution

Nucleotide substitution models: i.i.d Markov models of evolution, i.e. a master equation:

∂ ∂tρβ(t) =

  • α

Qβαρα(t) α, β ∈ {A, G, C, T} Q =      A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG ·     .

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.3

slide-5
SLIDE 5

Markovian Sequence Evolution

Nucleotide substitution models: i.i.d Markov models of evolution, i.e. a master equation:

∂ ∂tρβ(t) =

  • α

Qβαρα(t) α, β ∈ {A, G, C, T}

The solution to this equation, with initial condition ρ0, is:

ρβ(t) =

  • eQtρ0
  • β

P(t) = eQt

Such a model is not complete. . .

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.3

slide-6
SLIDE 6

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-7
SLIDE 7

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Widely used models:

Q =     T C A G T · µ µ µ C µ · µ µ A µ µ · µ G µ µ µ ·    .

Jukes-Cantor

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-8
SLIDE 8

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Widely used models:

Q =     T C A G T · α β β C α · β β A β β · α G β β α ·    .

Kimura

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-9
SLIDE 9

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Widely used models:

Q =     T C A G T · πT πT πT C πC · πC πC A πA πA · πA G πG πG πG ·    .

Felsenstein

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-10
SLIDE 10

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Widely used models:

Q =     T C A G T · kπT πT πT C kπC · πC πC A πA πA · kπA G πG πG kπG ·    .

Hasegawa-Kishino-Yano

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-11
SLIDE 11

Choosing Parameters

Specifying an evolutionary mode ⇒ postulating a form for the rate matrix:

Q = B B B @ A C G T A · QAC QAG QAT C QCA · QCG QCT G QGA QGC · QGT T QTA QTC QTG · 1 C C C A.

Widely used models:

Q =     T C A G T · k1πT πT πT C k1πC · πC πC A πA πA · k2πA G πG πG k2πG ·    .

Tamura-Nei

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.4

slide-12
SLIDE 12

Two Evolutionary Models

All preceding models are nested into the following:

QGTR =     A G T C A · aπA bπA cπA G aπG · dπG eπG T bπT dπT · fπT C cπC eπC fπC ·    .

A possible alternative:

QRCS =     A C G T A · rAC rAG rAT C rGT · rCG rCT G rCT rCG · rGT T rAT rAG rAC ·    .

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.5

slide-13
SLIDE 13

Two Evolutionary Models - 2

sister1 sister2

  • utgroup

4 1 2 3 Assumption No

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.6

slide-14
SLIDE 14

Two Evolutionary Models - 2

sister1 sister2

  • utgroup

4 1 2 3 Equilibrium + Time Rev.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.6

slide-15
SLIDE 15

Two Evolutionary Models - 2

sister1 sister2

  • utgroup

4 1 2 3 No selection

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.6

slide-16
SLIDE 16

Estimating Parameters

For a given triple alignment

αi of nucleotide sequences from 3 species,

the likelihood of the alignment is:

L =

N

  • k=1
  • α0,α4∈{A,C,G,T}

ρ0

α0 [P 30]α3

kα0 [P 40]α4α0 [P 24]α2 kα4 [P 14]α1 kα4

The vector ρ0 represents the ancestral nucleotide distribution at the root node.

sister1 sister2

  • utgroup

4 1 2 3 each branch?

  • ne Q for

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.7

slide-17
SLIDE 17

Equilibrium

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.8

slide-18
SLIDE 18

The stationarity index

The equilibrium distribution of a Markov process is defined by:

Qπ = 0

Just taking the difference between present and stationary distribution:

∆α = ρα − πα

And rearrange the terms: STI1

= ∆C + ∆G = ρGC − πGC

STI2

= ∆A − ∆T

STI3

= ∆C − ∆G,

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.9

slide-19
SLIDE 19

The STI - Reverse complement symmetry

Substituting the equilibrium distribution:

(1 − πCG, πCG, πCG, 1 − πCG)

Where:

πCG = rGT + rCT rAC + rAG + rGT + rCT

For the reverse complement symmetric model the STI has a simple form: STI1

= ρGC − πGC

STI2

= (ρA − ρT)

STI3

= (ρC − ρG).

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.10

slide-20
SLIDE 20

Analysis of the Fly Genome

Results about the time reversal properties for the evolution of the fly genome: Alignment of 3 Drosophilas: sechellia, simulans and melanogaster Removed annotated coding regions Rates have been estimated using a maximum likelihood algorithm Sliding window analysis, 50kbp length For each window we have calculated the stationarity index in the simulans lineage

D.Sechellia D.Simulans D.Melanogaster

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.11

slide-21
SLIDE 21

Analysis of the Fly Genome - Stationarity

STI Freq STI1 STI2 STI3 −0.1 0.0 0.1 0.0 0.1 0.2 0.3 0.4

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.12

slide-22
SLIDE 22

Reversibility

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.13

slide-23
SLIDE 23

Time Reversibility: the Detailed Balance

Time reversibility is usually defined in terms of the detailed balance conditions:

Qjiπi = Qijπj

From which one can derive the General Time Reversible (GTR) Parameterization:

QGTR =     A G T C A · aπA bπA cπA G aπG · dπG eπG T bπT dπT · fπT C cπC eπC fπC ·    

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.14

slide-24
SLIDE 24

Time reversibility: Kolmogorov Cycle Conditions

A lesser known formulation of time reversibility:

  • Definition. A Markov process is said to satisfy the Kolmogorov cycle

conditions if the following equality on generators holds:

Qi1inQinin−1 . . . Qi2i1 = Qi1i2 . . . Qin−1inQini1 ∀i1, . . . , in ∈ C

(-2)

i_3 i_4 i_5 i_2 i_1 i_6 Q Q Q Q Q Q

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.15

slide-25
SLIDE 25

Time reversibility: Kolmogorov Cycle Conditions - 2

Moreover the following proposition (relevant when analyzing biological sequences) holds:

  • Proposition. If the coefficients of the rate matrix are strictly positive and

if Kolmogorov conditions hold for three cycles then they hold for cycles of arbitrary length.

  • Proposition. Given a four states Markov process with strictly positive

rate matrix coefficients, if the conditions:

QαδQδγQγβQβα = QαβQβγQγδQδα,

(-2) hold for (α, β, γ, δ) equal to (A, G, C, T), (A, G, T, C) and (A, C, G, T) then Kolmogorov conditions hold for 3-cycles. Ans lastly:

  • Proposition. If the coefficients of the rate matrix are strictly positive and

if Kolmogorov conditions hold for four cycles then they hold for cycles of arbitrary length.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.16

slide-26
SLIDE 26

IRI - The general iid case

To check reversibility for nucleotide sequences we need to check the following conditions on four cycles:

G A T C G A T C G A T C

IRI1 := QAGQGCQCTQTA − QATQTCQCGQGA

QAGQGCQCTQTA + QATQTCQCGQGA

IRI2 := QACQCTQTGQGA − QAGQGTQTCQCA

QACQCTQTGQGA + QAGQGTQTCQCA

IRI3 := QACQCGQGTQTA − QATQTGQGCQCA

QACQCGQGTQTA + QATQTGQGCQCA

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.17

slide-27
SLIDE 27

Iri for the Reverse Complement Symmetric Model

Out of the previous indices we get a specialized version of the IRI: IRI1

= r2

AGr2 GT − r2 ACr2 CT

r2

AGr2 GT + r2 ACr2 CT

IRI2

=

IRI3

=

The IRI1 will thus be comprised in the interval [−1, 1] and if the system under study evolves time symmetrically: IRI1 = 0

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.18

slide-28
SLIDE 28

Irreversibility in the Fly Genome

Plots of the IRI for the Drosophila simulans genome and for the null model:

IRI Freq −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 0.00 0.05 0.10 IRI1 IRInull

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.19

slide-29
SLIDE 29

If water is around. . .

Cytosine can easily decay into Uracil:

O N H NH O

NH3

O N NH N

2

H

H2O + +

Uracyl Cytosine

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.20

slide-30
SLIDE 30

If water is around. . .

Cytosine can easily decay into Uracil:

O N H NH O

NH3

O N NH N

2

H

H2O + +

Uracyl Cytosine

On the other hand GpC pairs often occur in a methylated form:

O N NH N

2

H O N NH N

2

H H3C

CH3 + +

2O

H

H3C N H NH O O

NH3 +

Methylcytosine Thymine Cytosine

The net effect is the decay of CpG pairs into TpG and CpA pairs.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.20

slide-31
SLIDE 31

A Nucleotide Substitution Model with CpG Decay

We need to extend the configuration space:

C = s1 × . . . × sN si ∈ {A, C, G, T}.

We assume the following form for the generator:

Q =

N

  • i=1

Qi +

N−1

  • i=1

QCpG

i,i+1.

Where:

Qi = I ⊗ . . . ⊗ I

  • i−1

⊗Q ⊗ I ⊗ . . . ⊗ I

  • N−i

.

And:

QCpG

i,i+1 = I ⊗ . . . ⊗ I

  • i−1

⊗QCpG ⊗ I ⊗ . . . ⊗ I

  • N−i−1

.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.21

slide-32
SLIDE 32

The IRI of a Process with CpG Decay

We get two IRI’s in this case: IRI1 := r2

AGr2 GT − r2 ACr2 CT

r2

AGr2 GT + r2 ACr2 CT

IRICpG :=

r2

GT(rAG + rCpG)2 − (rCT + rrev CpG)2r2 AC

r2

GT(rAG + rCpG)2 + (rCT + rrev CpG)2r2 AC

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.22

slide-33
SLIDE 33

Analysis of the Human Genome

Alignment of Human, Chimp and Rhesus Macaque genomes Rates have been estimated using a maximum likelihood algorithm Sliding window analysis, 1 Mbp length For each window we have calculated the STIs, IRIRC and IRICpG in the human lineage

Human Chimp Macaque

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.23

slide-34
SLIDE 34

STI Human

STI Freq −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 0.00 0.15 0.30 STI1 STI2 STI3

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.24

slide-35
SLIDE 35

IRI Human

IRI Freq −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 0.00 0.05 0.10 IRIrc IRInull IRIcpg

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.25

slide-36
SLIDE 36

Summary

Commonly used evolutionary models assume equilibrium and reversibility We have introduced indices to test for equilibrium (STI) and reversibility(IRI) on each single branch of a given phylogeny Analysis in Drosophila and Human show clear violation of the equilibrium/reversibility. Further work has to be done to asses how this violations affect specific bioinformatic algorithms.

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.26

slide-37
SLIDE 37

It’s Evolution Baby. . .

Thank you!

Quantifying the Equilibrium and Irreversibility Properties of the Nucleotide Substitution Process – p.27