Algorithmique des structures dARN H el` ene Touzet Groupe de - - PowerPoint PPT Presentation
Algorithmique des structures dARN H el` ene Touzet Groupe de - - PowerPoint PPT Presentation
Algorithmique des structures dARN H el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g enome RNA - RiboNucleic Acid DNA messenger RNA noncoding RNA protein RNA structure C A G A C
RNA - RiboNucleic Acid
DNA noncoding RNA protein messenger RNA
RNA structure
C G G A A G C U G A C C A G A C A G U C G C C G C U U C G U C G U C G U C C U C U U CG G G G G A G A C G G G C G G A G G G G A G G A A A G U C C G G G C U C C A U A G G G AG G U G C C A G G U A A C G C C U G G G G G G G A A A C C C AC G A C C A G U G C A A C A G A G A G CA A A C C G C C G A U G G C C C G C G C A A G C G G G A U C A G G U A A G G G UG A A A G GG U G C G C A C C G C GC G G C U G A A C A G U C C G U G G C A C G G U A A A C U C C A C C C G G A G C A A U U A U C G G U C A G U U U C A C C U
5´ 3´ P1 P2 P3 P5 P 7 P8 P9 P10 P11 P12 P13 P14
RNA structure
base pairs
a . . . u c . . . g
c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g
secondary structure
Overview of the talk
- RNA folding
- Comparison of RNA structures
- Dynamic programming 2.0
RNA folding problem
- an RNA sequence + a folding model
1 2 3 4 6 7 8 10 11 12 13 9 5 14
- find a secondary structure with maximum number of base
pairs
1 2 3 4 6 7 8 10 11 12 13 9 5 14
RNA folding problem
1 2 3 4 6 7 8 10 11 12 13 9 5 14
3,5 7,8 5,10 12,13 10,11 9,14 2,4 1,6
RNA folding problem
1 2 3 4 6 7 8 10 11 12 13 9 5 14
2,4 1,6 3,5 7,8 5,10 12,13 10,11 9,14
RNA folding problem
=
i i + 1 j j i k − 1 k k + 1 i + 1 j i
S(i, j) : number of base pairs for the subtring i..j S(i, j) = max S(i + 1, j) S(i + 1, k − 1) + S(k + 1, j) + 1, i < k ≤ j Implementation by dynamic programming
- R. Nussinov, A. Jacobson, PNAS 1980
RNA folding – Locally optimal secondary structures
no base pairs can be added without violating the definition of secondary structure
1 2 3 4 6 7 8 10 11 12 13 9 5 14
1 2 3 4 6 7 8 10 11 12 13 9 5 14 1 2 3 4 6 7 8 10 11 12 13 9 5 14 3 4 6 7 8 10 11 12 13 9 5 14 1 2
Construction of all locally optimal secondary structures
- maximal horizontal structure
1 2 3 4 6 7 8 10 11 12 13 9 5 14
set of juxtaposed base pairs, such that there is no pairing between any pair of visible positions
- locally optimal secondary structures : combinations of
maximal horizontal structures
- implementation : dynamic programming ×2
1 2 3 4 6 7 8 10 11 12 13 9 5 14
1 6 7 8 9 14
2 4 12 13 10 5 1 2 4 6 7 8 10 11 12 13 9 14 1 3 6 7 8 10 11 12 13 9 5 14 2 4 7 8 12 13 5 10
Saffarian, Giraud, de Monte, Touzet 2012
Comparison of RNA structures
Comparison of RNA structures
Unlimited
Comparison of RNA structures
Unlimited Crossing
Comparison of RNA structures
Unlimited Crossing Nested
Comparison of RNA structures
Simple operations
single del ins pair pair ins pair del
Comparison of RNA structures
Simple operations
single del ins pair pair ins pair del
Comparison of RNA structures
Simple operations
single del ins pair pair ins pair del
Full operations
pairL del pairR del pairLR del pairL ins pairR ins pairLR ins
Comparison of RNA structures
Nested Crossing Unlimited Simple Tree alignment O(n4) [1] Tree edit distance O(n3 log(n)) [2] Tree edit distance O(n3 log(n)) [2] Full O(n4) [3] NP-complet [3] General edit distance NP-complet [4]
1 Jiang, Wang, Zhang, 1995 2 Klein 1998 3 Blin, Touzet 2006 4 Blin, Fertin, Rusu, Sinoquet 2007
From RNAs to ICOREs
- myriad of problems over sequences, trees and graphs
- extensive use of dynamic programming
- ICORE
universal specification framework for dynamic programming problems
Dynamic Programming 2.0
code dynamic programming equations
- ptimization problem
algorithm
Dynamic Programming 2.0
- ptimizations
dynamic programming equations
- ptimization problem
algorithm code
debugging new choices
Dynamic Programming 2.0
- ptimizations
- ptimization problem
specification code
dynamic programming equations algorithm
dynamic programming equations
- ptimization problem
algorithm code
debugging new choices
Levenshtein distance
- 2 strings : sand and aunt
- 3 operations : replacement, deletion, insertion
- edit script
s a - n d | | :
- a u n t
del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
Levenshtein distance
- 2 strings : sand and aunt
- 3 operations : replacement, deletion, insertion
- edit script
s a - n d | | :
- a u n t
del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
- rewrite rules
a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε
- rewrite rules
a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε del(s, rep(a, a, ins(u, rep(n, n, rep(d, t, mty))))) ↓ ↓ del(s, a ∼ ins(u, rep(n, n, rep(d, t, mty)))) ↓ ↓ del(s, a ∼ ins(u, n ∼ rep(d, t, mty))) ւ ց del(s, a ∼ ins(u, n ∼ d ∼ mty)) del(s, a ∼ ins(u, n ∼ t ∼ mty)) ↓ ↓ s ∼ a ∼ ins(u, n ∼ d ∼ mty) a ∼ ins(u, n ∼ t ∼ mty) ↓ ↓ s ∼ a ∼ n ∼ d ∼ mty a ∼ u ∼ n ∼ t ∼ mty ↓ ↓ s ∼ a ∼ n ∼ d a ∼ u ∼ n ∼ t
- evaluation algebra
rep(a, c, x) = if a == c then x else x + 1 del(a, x) = x + 1 ins(c, x) = x + 1 mty = φ = min
del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
- evaluation algebra
rep(a, c, x) = if a == c then x else x + 1 del(a, x) = x + 1 ins(c, x) = x + 1 mty = φ = min
del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
- solving the Levenshtein distance
finding a term on rep, del, ins, mty that rewrites to sand and aunt, and that is optimal for the evaluation algebra
ICOREs – definition
Inverted Coupled Rewrite Systems
k, a positive natural number – ICORE of dimension k :
- a set V of variables
- a core signature ζ, and k satellite signatures Σ1, . . . , Σk
- k term rewrite systems, which all have the same left-hand
sides in T(ζ, V )
- optionally a tree grammar G over the core signature ζ
- an evaluation algebra A for the core signature ζ, including an
- bjective function φ
Back to RNA problems
C G G A A G C U G A C C A G A C A G U C G C C G C U U C G U C G U C G U C C U C U U CG G G G G A G A C G G G C G G A G G G G A G G A A A G U C C G G G C U C C A U A G G G AG G U G C C A G G U A A C G C C U G G G G G G G A A A C C C AC G A C C A G U G C A A C A G A G A G CA A A C C G C C G A U G G C C C G C G C A A G C G G G A U C A G G U A A G G G UG A A A G GG U G C G C A C C G C GC G G C U G A A C A G U C C G U G G C A C G G U A A A C U C C A C C C G G A G C A A U U A U C G G U C A G U U U C A C C U
5´ 3´ P1 P2 P3 P5 P 7 P8 P9 P10 P11 P12 P13 P14
RNA folding problem
Input : an RNA sequence
g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g
Ouput : its optimal secondary structure
c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g
RNA folding problem
Input : an RNA sequence
g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g
Ouput : its optimal secondary structure
c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g
ICORE single(a, X) → a ∼ X split(X, Y ) → X ∼ Y pair(a, X, b) → a ∼ X ∼ b mty → ε
RNA folding problem
Input : an RNA sequence
g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g
Ouput : its optimal secondary structure
c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g
ICORE single(a, X) → a ∼ X split(X, Y ) → X ∼ Y pair(a, X, b) → a ∼ X ∼ b mty → ε
g c c c a a a g g a a g g u u a a c c c g g c u u u a a u g c g c a
mty pair pair pair pair pair pair pair single single split pair pair pair pair single single single single single single single single pair split mty mty
RNA folding
mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b
Levenshtein distance
ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
RNA folding
mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b
Levenshtein distance
ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
RNA folding × 2 + Levenshtein distance
ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d
Sankoff 1985
RNA folding
mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b
Levenshtein distance
ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
RNA folding × 2 + Levenshtein distance
ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
RNA folding
mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b
Levenshtein distance
ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
RNA folding × 2 + Levenshtein distance
ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X Simultaneous alignment and folding – Sankoff 1985
Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X
Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y
- a ∼ X
← single(a, c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y
- a ∼ X
← single(a, c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
Alignment - nested - simple operations
Alignment - nested - simple operations
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
single del ins pair pair ins pair del pairL del pairR del pairLR del pairL ins pairR ins pairLR ins
Alignment - nested - simple operations
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →
- c ∼ X ∼ •d
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →
- c ∼ X
- a ∼ X
← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d Alignment - nested - full operations
ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →
- c ∼ X ∼ •d
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →
- c ∼ X
- a ∼ X
← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d
ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →
- c ∼ X ∼ •d
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →
- c ∼ X
- a ∼ X
← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d
ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →
- c ∼ X ∼ •d
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →
- c ∼ X
- a ∼ X
← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d
ε ← mty → ε
- a ∼ X
← single(a, c, X) →
- c ∼ X
X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d
- a ∼ X
← del(a, X) → X X ← ins(c, X) →
- c ∼ X
<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →
- c ∼ X ∼ •d
- a ∼ X ∼ •b
← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →
- c ∼ X
- a ∼ X
← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d
Many more things with ICOREs
Conclusion
- RNA is hot topic in molecular biology
- nice combinatorial models and algorithms
- ICORE framework
automatic generation of code intrinsic properties of dynamic programming PhD position available (discrete algorithms, graph theory, programming)
Tree edit distance and alignment of trees
- Tree edit distance
a(X) ← rep(a, b, X) → b(X) a(X) ∼ Y ← del(a, X ∼ Y ) → X ∼ Y X ∼ Y ← ins(a, X ∼ Y ) → a(X) ∼ Y ε ← mty → ε
- Alignment of trees