Algorithmique des structures dARN H el` ene Touzet Groupe de - - PowerPoint PPT Presentation

algorithmique des structures d arn
SMART_READER_LITE
LIVE PREVIEW

Algorithmique des structures dARN H el` ene Touzet Groupe de - - PowerPoint PPT Presentation

Algorithmique des structures dARN H el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g enome RNA - RiboNucleic Acid DNA messenger RNA noncoding RNA protein RNA structure C A G A C


slide-1
SLIDE 1

Algorithmique des structures d’ARN

H´ el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g´ enome

slide-2
SLIDE 2

RNA - RiboNucleic Acid

DNA noncoding RNA protein messenger RNA

slide-3
SLIDE 3

RNA structure

C G G A A G C U G A C C A G A C A G U C G C C G C U U C G U C G U C G U C C U C U U CG G G G G A G A C G G G C G G A G G G G A G G A A A G U C C G G G C U C C A U A G G G AG G U G C C A G G U A A C G C C U G G G G G G G A A A C C C AC G A C C A G U G C A A C A G A G A G CA A A C C G C C G A U G G C C C G C G C A A G C G G G A U C A G G U A A G G G UG A A A G GG U G C G C A C C G C GC G G C U G A A C A G U C C G U G G C A C G G U A A A C U C C A C C C G G A G C A A U U A U C G G U C A G U U U C A C C U

5´ 3´ P1 P2 P3 P5 P 7 P8 P9 P10 P11 P12 P13 P14

slide-4
SLIDE 4

RNA structure

base pairs

a . . . u c . . . g

c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g

secondary structure

slide-5
SLIDE 5

Overview of the talk

  • RNA folding
  • Comparison of RNA structures
  • Dynamic programming 2.0
slide-6
SLIDE 6

RNA folding problem

  • an RNA sequence + a folding model

1 2 3 4 6 7 8 10 11 12 13 9 5 14

  • find a secondary structure with maximum number of base

pairs

1 2 3 4 6 7 8 10 11 12 13 9 5 14

slide-7
SLIDE 7

RNA folding problem

1 2 3 4 6 7 8 10 11 12 13 9 5 14

3,5 7,8 5,10 12,13 10,11 9,14 2,4 1,6

slide-8
SLIDE 8

RNA folding problem

1 2 3 4 6 7 8 10 11 12 13 9 5 14

2,4 1,6 3,5 7,8 5,10 12,13 10,11 9,14

slide-9
SLIDE 9

RNA folding problem

=

i i + 1 j j i k − 1 k k + 1 i + 1 j i

S(i, j) : number of base pairs for the subtring i..j S(i, j) = max S(i + 1, j) S(i + 1, k − 1) + S(k + 1, j) + 1, i < k ≤ j Implementation by dynamic programming

  • R. Nussinov, A. Jacobson, PNAS 1980
slide-10
SLIDE 10

RNA folding – Locally optimal secondary structures

no base pairs can be added without violating the definition of secondary structure

1 2 3 4 6 7 8 10 11 12 13 9 5 14

1 2 3 4 6 7 8 10 11 12 13 9 5 14 1 2 3 4 6 7 8 10 11 12 13 9 5 14 3 4 6 7 8 10 11 12 13 9 5 14 1 2

slide-11
SLIDE 11

Construction of all locally optimal secondary structures

  • maximal horizontal structure

1 2 3 4 6 7 8 10 11 12 13 9 5 14

set of juxtaposed base pairs, such that there is no pairing between any pair of visible positions

  • locally optimal secondary structures : combinations of

maximal horizontal structures

  • implementation : dynamic programming ×2
slide-12
SLIDE 12

1 2 3 4 6 7 8 10 11 12 13 9 5 14

1 6 7 8 9 14

2 4 12 13 10 5 1 2 4 6 7 8 10 11 12 13 9 14 1 3 6 7 8 10 11 12 13 9 5 14 2 4 7 8 12 13 5 10

Saffarian, Giraud, de Monte, Touzet 2012

slide-13
SLIDE 13

Comparison of RNA structures

slide-14
SLIDE 14

Comparison of RNA structures

Unlimited

slide-15
SLIDE 15

Comparison of RNA structures

Unlimited Crossing

slide-16
SLIDE 16

Comparison of RNA structures

Unlimited Crossing Nested

slide-17
SLIDE 17

Comparison of RNA structures

Simple operations

single del ins pair pair ins pair del

slide-18
SLIDE 18

Comparison of RNA structures

Simple operations

single del ins pair pair ins pair del

slide-19
SLIDE 19

Comparison of RNA structures

Simple operations

single del ins pair pair ins pair del

Full operations

pairL del pairR del pairLR del pairL ins pairR ins pairLR ins

slide-20
SLIDE 20

Comparison of RNA structures

Nested Crossing Unlimited Simple Tree alignment O(n4) [1] Tree edit distance O(n3 log(n)) [2] Tree edit distance O(n3 log(n)) [2] Full O(n4) [3] NP-complet [3] General edit distance NP-complet [4]

1 Jiang, Wang, Zhang, 1995 2 Klein 1998 3 Blin, Touzet 2006 4 Blin, Fertin, Rusu, Sinoquet 2007

slide-21
SLIDE 21

From RNAs to ICOREs

  • myriad of problems over sequences, trees and graphs
  • extensive use of dynamic programming
  • ICORE

universal specification framework for dynamic programming problems

slide-22
SLIDE 22

Dynamic Programming 2.0

code dynamic programming equations

  • ptimization problem

algorithm

slide-23
SLIDE 23

Dynamic Programming 2.0

  • ptimizations

dynamic programming equations

  • ptimization problem

algorithm code

debugging new choices

slide-24
SLIDE 24

Dynamic Programming 2.0

  • ptimizations
  • ptimization problem

specification code

dynamic programming equations algorithm

dynamic programming equations

  • ptimization problem

algorithm code

debugging new choices

slide-25
SLIDE 25

Levenshtein distance

  • 2 strings : sand and aunt
  • 3 operations : replacement, deletion, insertion
  • edit script

s a - n d | | :

  • a u n t

del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

slide-26
SLIDE 26

Levenshtein distance

  • 2 strings : sand and aunt
  • 3 operations : replacement, deletion, insertion
  • edit script

s a - n d | | :

  • a u n t

del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

slide-27
SLIDE 27
  • rewrite rules

a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε

slide-28
SLIDE 28
  • rewrite rules

a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε del(s, rep(a, a, ins(u, rep(n, n, rep(d, t, mty))))) ↓ ↓ del(s, a ∼ ins(u, rep(n, n, rep(d, t, mty)))) ↓ ↓ del(s, a ∼ ins(u, n ∼ rep(d, t, mty))) ւ ց del(s, a ∼ ins(u, n ∼ d ∼ mty)) del(s, a ∼ ins(u, n ∼ t ∼ mty)) ↓ ↓ s ∼ a ∼ ins(u, n ∼ d ∼ mty) a ∼ ins(u, n ∼ t ∼ mty) ↓ ↓ s ∼ a ∼ n ∼ d ∼ mty a ∼ u ∼ n ∼ t ∼ mty ↓ ↓ s ∼ a ∼ n ∼ d a ∼ u ∼ n ∼ t

slide-29
SLIDE 29
  • evaluation algebra

rep(a, c, x) = if a == c then x else x + 1 del(a, x) = x + 1 ins(c, x) = x + 1 mty = φ = min

del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

slide-30
SLIDE 30
  • evaluation algebra

rep(a, c, x) = if a == c then x else x + 1 del(a, x) = x + 1 ins(c, x) = x + 1 mty = φ = min

del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

  • solving the Levenshtein distance

finding a term on rep, del, ins, mty that rewrites to sand and aunt, and that is optimal for the evaluation algebra

slide-31
SLIDE 31

ICOREs – definition

Inverted Coupled Rewrite Systems

k, a positive natural number – ICORE of dimension k :

  • a set V of variables
  • a core signature ζ, and k satellite signatures Σ1, . . . , Σk
  • k term rewrite systems, which all have the same left-hand

sides in T(ζ, V )

  • optionally a tree grammar G over the core signature ζ
  • an evaluation algebra A for the core signature ζ, including an
  • bjective function φ
slide-32
SLIDE 32

Back to RNA problems

C G G A A G C U G A C C A G A C A G U C G C C G C U U C G U C G U C G U C C U C U U CG G G G G A G A C G G G C G G A G G G G A G G A A A G U C C G G G C U C C A U A G G G AG G U G C C A G G U A A C G C C U G G G G G G G A A A C C C AC G A C C A G U G C A A C A G A G A G CA A A C C G C C G A U G G C C C G C G C A A G C G G G A U C A G G U A A G G G UG A A A G GG U G C G C A C C G C GC G G C U G A A C A G U C C G U G G C A C G G U A A A C U C C A C C C G G A G C A A U U A U C G G U C A G U U U C A C C U

5´ 3´ P1 P2 P3 P5 P 7 P8 P9 P10 P11 P12 P13 P14

slide-33
SLIDE 33

RNA folding problem

Input : an RNA sequence

g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g

Ouput : its optimal secondary structure

c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g

slide-34
SLIDE 34

RNA folding problem

Input : an RNA sequence

g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g

Ouput : its optimal secondary structure

c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g

ICORE single(a, X) → a ∼ X split(X, Y ) → X ∼ Y pair(a, X, b) → a ∼ X ∼ b mty → ε

slide-35
SLIDE 35

RNA folding problem

Input : an RNA sequence

g u u u c g g u c a a g a c a a c c c g g g a a a c c c a g g u u c g

Ouput : its optimal secondary structure

c g u c a a g a a a c c c g g g a a a c c c a g g u u c g u u u c g g

ICORE single(a, X) → a ∼ X split(X, Y ) → X ∼ Y pair(a, X, b) → a ∼ X ∼ b mty → ε

g c c c a a a g g a a g g u u a a c c c g g c u u u a a u g c g c a

mty pair pair pair pair pair pair pair single single split pair pair pair pair single single single single single single single single pair split mty mty

slide-36
SLIDE 36

RNA folding

mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b

Levenshtein distance

ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

slide-37
SLIDE 37

RNA folding

mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b

Levenshtein distance

ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

RNA folding × 2 + Levenshtein distance

ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d

Sankoff 1985

slide-38
SLIDE 38

RNA folding

mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b

Levenshtein distance

ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

RNA folding × 2 + Levenshtein distance

ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

slide-39
SLIDE 39

RNA folding

mty → ε split(X, Y ) → X ∼ Y single(a, X) → a ∼ X pair(a, X, b) → a ∼ X ∼ b

Levenshtein distance

ε ← mty → ε a ∼ X ← rep(a, c, X) → c ∼ X a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

RNA folding × 2 + Levenshtein distance

ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X Simultaneous alignment and folding – Sankoff 1985

slide-40
SLIDE 40

Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X

slide-41
SLIDE 41

Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X
slide-42
SLIDE 42

Sankoff ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y a ∼ X ← single(a, c, X) → c ∼ X a ∼ X ∼ b ← pair(a, c, X, b, d) → c ∼ X ∼ d a ∼ X ← del(a, X) → X X ← ins(c, X) → c ∼ X ε ← mty → ε X ∼ Y ← split(X, Y ) → X ∼ Y

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

Alignment - nested - simple operations

slide-43
SLIDE 43

Alignment - nested - simple operations

  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

single del ins pair pair ins pair del pairL del pairR del pairLR del pairL ins pairR ins pairLR ins

slide-44
SLIDE 44

Alignment - nested - simple operations

  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →

  • c ∼ X ∼ •d
  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →

  • c ∼ X
  • a ∼ X

← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d Alignment - nested - full operations

slide-45
SLIDE 45

ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →

  • c ∼ X ∼ •d
  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →

  • c ∼ X
  • a ∼ X

← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d

slide-46
SLIDE 46

ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →

  • c ∼ X ∼ •d
  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →

  • c ∼ X
  • a ∼ X

← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d

slide-47
SLIDE 47

ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →

  • c ∼ X ∼ •d
  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →

  • c ∼ X
  • a ∼ X

← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d

slide-48
SLIDE 48

ε ← mty → ε

  • a ∼ X

← single(a, c, X) →

  • c ∼ X

X ∼ Y ← split(X, Y ) → X ∼ Y <a ∼ X ∼ >b ← pair(a, c, X, b, d) → <c ∼ X ∼ >d

  • a ∼ X

← del(a, X) → X X ← ins(c, X) →

  • c ∼ X

<a ∼ X ∼ >b ← pair del(a, X, b) → X X ← pair ins(c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairLR del(a, c, X, b, d) →

  • c ∼ X ∼ •d
  • a ∼ X ∼ •b

← pairLR ins(a, c, X, b, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairL del(a, c, X, b) →

  • c ∼ X
  • a ∼ X

← pairL ins(a, c, X, d) → <c ∼ X ∼ >d <a ∼ X ∼ >b ← pairR del(a, X, b, d) → X ∼ •d X ∼ •b ← pairR ins(c, X, b, d) → <c ∼ X ∼ >d

slide-49
SLIDE 49

Many more things with ICOREs

slide-50
SLIDE 50

Conclusion

  • RNA is hot topic in molecular biology
  • nice combinatorial models and algorithms
  • ICORE framework

automatic generation of code intrinsic properties of dynamic programming PhD position available (discrete algorithms, graph theory, programming)

slide-51
SLIDE 51
slide-52
SLIDE 52

Tree edit distance and alignment of trees

  • Tree edit distance

a(X) ← rep(a, b, X) → b(X) a(X) ∼ Y ← del(a, X ∼ Y ) → X ∼ Y X ∼ Y ← ins(a, X ∼ Y ) → a(X) ∼ Y ε ← mty → ε

  • Alignment of trees

a(X) ← rep(a, b, X) → b(X) a(X) ← del(a, X) → X X ← ins(a, X) → a(X) ε ← mty → ε