Combinatorial RNA Design: Designability and Structure-Approximating - - PowerPoint PPT Presentation

combinatorial rna design designability and structure
SMART_READER_LITE
LIVE PREVIEW

Combinatorial RNA Design: Designability and Structure-Approximating - - PowerPoint PPT Presentation

RNA Secondary Structures Our Results Open Problems Combinatorial RNA Design: Designability and Structure-Approximating Algorithm s 1 nuch 1 , 3 Yann Ponty 1 , 2 Jozef Hale J an Ma Ladislav Stacho 1 1 Simon Fraser University, Canada 2


slide-1
SLIDE 1

RNA Secondary Structures Our Results Open Problems

Combinatorial RNA Design: Designability and Structure-Approximating Algorithm

Jozef Haleˇ s1 J´ an Maˇ nuch1,3 Yann Ponty1,2 Ladislav Stacho1

1Simon Fraser University, Canada 2Pacific Institute for Mathematical Sciences, Canada 3University of British Columbia, Canada

CPM 2015

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-2
SLIDE 2

RNA Secondary Structures Our Results Open Problems

RNA Structures

Composed of four bases: adenine (A), guanine (G), cytosine (C) and uracil (U)

Source: http://www.mpi-inf.mpg.de/departments/d1/projects/CompBio/align.html CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-3
SLIDE 3

RNA Secondary Structures Our Results Open Problems

Representations of Secondary Structures

Structure is a pair (n, P), where n is the number of bases and P is a set of pairs (i, j) with 1 ≤ i < j ≤ n representing a base pair between the i-th base and the j-the base.

G C A G G A G U C U A G C G A U G C U A G U C A G C U A G C U C A U A A U G A A U U A G G C U A C G A C U A G C G C U G A G A C C C U U 1 10 20 30 40 50 60 68 Root [1,1] [2,2] [3,3] [4,66] [67,67] [68,68] [5,65] [6,6] [7,64] [8,63] [9,9] [10,20] [21,21] [22,61] [62,62] [11,19] [12,18] [13,17] [14,14] [15,15] [16,16] [23,60] [24,59] [25,58] [26,26] [27,43] [44,44] [45,56] [57,57] [28,42] [29,29] [30,30] [31,39] [40,40] [41,41] [32,38] [33,33] [34,34] [35,35] [36,36] [37,37] [46,55] [47,54] [48,53] [49,49] [50,50] [51,51] [52,52]

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-4
SLIDE 4

RNA Secondary Structures Our Results Open Problems

Representations of Secondary Structures

Structure is a pair (n, P), where n is the number of bases and P is a set of pairs (i, j) with 1 ≤ i < j ≤ n representing a base pair between the i-th base and the j-the base.

G C A G G A G U C U A G C G A U G C U A G U C A G C U A G C U C A U A A U G A A U U A G G C U A C G A C U A G C G C U G A G A C C C U U 1 10 20 30 40 50 60 68

G C A G G A G U C U A G C G A U G C U A G U C A G C U A G C U C A U A A U G A A U U A G G C U A C G A C U A G C G C U G A G A C C C U U 1 10 20 30 40 50 60 68

arc diagram

Root [1,1] [2,2] [3,3] [4,66] [67,67] [68,68] [5,65] [6,6] [7,64] [8,63] [9,9] [10,20] [21,21] [22,61] [62,62] [11,19] [12,18] [13,17] [14,14] [15,15] [16,16] [23,60] [24,59] [25,58] [26,26] [27,43] [44,44] [45,56] [57,57] [28,42] [29,29] [30,30] [31,39] [40,40] [41,41] [32,38] [33,33] [34,34] [35,35] [36,36] [37,37] [46,55] [47,54] [48,53] [49,49] [50,50] [51,51] [52,52]

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-5
SLIDE 5

RNA Secondary Structures Our Results Open Problems

Representations of Secondary Structures

Structure is a pair (n, P), where n is the number of bases and P is a set of pairs (i, j) with 1 ≤ i < j ≤ n representing a base pair between the i-th base and the j-the base.

G C A G G A G U C U A G C G A U G C U A G U C A G C U A G C U C A U A A U G A A U U A G G C U A C G A C U A G C G C U G A G A C C C U U 1 10 20 30 40 50 60 68

G C A G G A G U C U A G C G A U G C U A G U C A G C U A G C U C A U A A U G A A U U A G G C U A C G A C U A G C G C U G A G A C C C U U 1 10 20 30 40 50 60 68

arc diagram

Root [1,1] [2,2] [3,3] [4,66] [67,67] [68,68] [5,65] [6,6] [7,64] [8,63] [9,9] [10,20] [21,21] [22,61] [62,62] [11,19] [12,18] [13,17] [14,14] [15,15] [16,16] [23,60] [24,59] [25,58] [26,26] [27,43] [44,44] [45,56] [57,57] [28,42] [29,29] [30,30] [31,39] [40,40] [41,41] [32,38] [33,33] [34,34] [35,35] [36,36] [37,37] [46,55] [47,54] [48,53] [49,49] [50,50] [51,51] [52,52]

tree representation

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-6
SLIDE 6

RNA Secondary Structures Our Results Open Problems

Pseudoknot-Free Secondary Structures

pseudoknotted structure pseudoknot-free structure

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-7
SLIDE 7

RNA Secondary Structures Our Results Open Problems

Pseudoknot-Free Secondary Structures

pseudoknotted structure pseudoknot-free structure Let Sn denote all pseudoknot-free structures with n bases.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-8
SLIDE 8

RNA Secondary Structures Our Results Open Problems

RNA Folding

Let M be an energy model. RNA Folding problem looks from the MFE structure(s). Problem RNA-FOLDM problem Input: RNA sequence w Output: set of PKF structures arg minS∈S|w| EM(w, S) . Assuming an additive energy model which adds up local contributions, finding one structure in RNA-FOLDM(w) can be done in time O(n3/ log(n)) using Dynamic Programming [Nussinov, Jacobson (1980),Frid et al. (2010),etc.].

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-9
SLIDE 9

RNA Secondary Structures Our Results Open Problems

Energy Models

Turner model: free energy is the sum of loop energies

Source: [Lorenz, Clote (2011)] CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-10
SLIDE 10

RNA Secondary Structures Our Results Open Problems

Energy Models

Turner model: free energy is the sum of loop energies

Source: [Lorenz, Clote (2011)]

Simplified models:

Base-pair maximization (Watson-Crick model) W: Count the number of Watson-Crick base pairs (C · G and A · U) Base-pair sum: Sum of energy contributions of base pairs (δB(x, x′)) — usually includes weak base pairs G · U Stacked base-pairs: Sum of energy contributions of consecutively nested pairs (δS(x, x′, y, y ′)) Nearest neighbor

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-11
SLIDE 11

RNA Secondary Structures Our Results Open Problems

Energy Models

Turner model: free energy is the sum of loop energies

Source: [Lorenz, Clote (2011)]

Simplified models:

Base-pair maximization (Watson-Crick model) W: Count the number of Watson-Crick base pairs (C · G and A · U) Base-pair sum: Sum of energy contributions of base pairs (δB(x, x′)) — usually includes weak base pairs G · U Stacked base-pairs: Sum of energy contributions of consecutively nested pairs (δS(x, x′, y, y ′)) Nearest neighbor

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-12
SLIDE 12

RNA Secondary Structures Our Results Open Problems

RNA Design Problem

Let M be an energy model. Problem RNA-DESIGNM,Σ.∆ problem Input: Secondary structure S + Energy distance ∆ > 0 Output: RNA sequence w ∈ Σ⋆ — called a design for S — such that: ∀S′ ∈ S|w| \ {S} : EM(w, S′) ≥ EM(w, S) + ∆

  • r ∅ if no such sequence exists.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-13
SLIDE 13

RNA Secondary Structures Our Results Open Problems

RNA Design Problem (simplified)

Simplified formulation for Watson-Crick model W and ∆ = 1: Problem RNA-DESIGNΣ problem Input: Secondary structure S Output: RNA sequence w ∈ Σ⋆ — called a design for S — such that: RNA-FOLDW(w) = {S}

  • r ∅ if no such sequence exists.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-14
SLIDE 14

RNA Secondary Structures Our Results Open Problems

RNA Design Problem (simplified)

Simplified formulation for Watson-Crick model W and ∆ = 1:

Problem RNA-DESIGNΣ problem Input: Secondary structure S Output: RNA sequence w ∈ Σ⋆ — called a design for S — such that: RNA-FOLDW(w) = {S}

  • r ∅ if no such sequence exists.

Example ( ( . ) ( . . ) )

G G A C A G G U C A C A G G U U C U

  • a. Target sec. str. S
  • b. Invalid sequence for S
  • c. Design for S

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-15
SLIDE 15

RNA Secondary Structures Our Results Open Problems

RNA Design Problem (simplified)

Simplified formulation for Watson-Crick model W and ∆ = 1: Problem RNA-DESIGNΣ problem Input: Secondary structure S Output: RNA sequence w ∈ Σ⋆ — called a design for S — such that: RNA-FOLDW(w) = {S}

  • r ∅ if no such sequence exists.

Let Designable(Σ) be the set of all structures for there exists a design.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-16
SLIDE 16

RNA Secondary Structures Our Results Open Problems

Our Results: Definitions and notations

Given a secondary structure S. Let UnpairedS be the set of all unpaired positions of S. Example UnpairedS = {4, 8}

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-17
SLIDE 17

RNA Secondary Structures Our Results Open Problems

Our Results: Definitions and notations

Given a secondary structure S. Let UnpairedS be the set of all unpaired positions of S. S is saturated if UnpairedS = ∅. Let Saturated be the set of all saturated structures. Example not saturated saturated

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-18
SLIDE 18

RNA Secondary Structures Our Results Open Problems

Our Results: Definitions and notations

Given a secondary structure S. Let UnpairedS be the set of all unpaired positions of S. S is saturated if UnpairedS = ∅. Let Saturated be the set of all saturated structures. Let D(S) be the maximal paired degree of nodes in the tree representation of S. The paired degree is the number of nodes representing base pairs. Example

[0, 12] [1, 1] [2, 11] [3, 6] [4, 5] [7, 7] [8, 9] [10, 10] 1 3 2 1 1

D(S) = 3

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-19
SLIDE 19

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-20
SLIDE 20

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; Example

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-21
SLIDE 21

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; Example

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-22
SLIDE 22

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Example

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-23
SLIDE 23

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3?

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-24
SLIDE 24

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In the root: ? ... ? ? ... ? ? ... ? — we can only use C · G or G · C

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-25
SLIDE 25

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In the root: C ... G G ... C C ... G — one of them has to repeat

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-26
SLIDE 26

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In the root:

C . . . G G . . . C C . . . G

— there is an alternative fold

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-27
SLIDE 27

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In an internal node: ... ? ? ... ? ? ... ? ? ...

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-28
SLIDE 28

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In an internal node:

. . . ? C . . . G C . . . G ? . . .

— either we get a repeat, or. . .

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-29
SLIDE 29

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. Question: Why not degree 3? Proof. In an internal node:

. . . C C . . . G G . . . C G . . .

— . . . or, the parent has the reversed base pair of a child

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-30
SLIDE 30

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over Restricted Alphabets

Let Σc,u be an alphabet with c pairs of complementary bases and u bases without a complementary base. R1 For every u ∈ N+, Designable(Σ0,u) = {(n, ∅) | ∀n ∈ N}; R2 Designable(Σ1,0) = (Saturated ∩ {S | D(S) ≤ 2}) ∪ {(n, ∅) | ∀n ∈ N}; R3 Designable(Σ1,1) = {S | D(S) ≤ 2}. This can be easily generalized to: Lemma For any structure S in Designable(Σc,u), D(S) ≤ 2c.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-31
SLIDE 31

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the Complete Alphabet

Let Σ2,0 = {A, U, C, G}.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-32
SLIDE 32

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the Complete Alphabet

Let Σ2,0 = {A, U, C, G}. R4 Designable(Σ2,0) ∩ Saturated = {S | D(S) ≤ 4} ∩ Saturated. Idea. Lemma:

C A U A U G C C G A U A U G

Use this lemma to prove that the structure is unique by a bottom-up tree induction.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-33
SLIDE 33

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the Complete Alphabet

Let Σ2,0 = {A, U, C, G}. R4 Designable(Σ2,0) ∩ Saturated = {S | D(S) ≤ 4} ∩ Saturated. When unpaired positions are allowed in the target structure, our characterization is only partial: R5 (Necessary) If S ∈ Designable(Σ2,0), then S does not contain “a node having degree more than four” (motif m5) and “a node having one or more unpaired children, and degree greater than two” (motif m3 ◦).

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-34
SLIDE 34

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the Complete Alphabet

Let Σ2,0 = {A, U, C, G}. R4 Designable(Σ2,0) ∩ Saturated = {S | D(S) ≤ 4} ∩ Saturated. When unpaired positions are allowed in the target structure, our characterization is only partial: R5 (Necessary) If S ∈ Designable(Σ2,0), then S does not contain “a node having degree more than four” (motif m5) and “a node having one or more unpaired children, and degree greater than two” (motif m3 ◦). R6 (Sufficient) Let Separated be the set of structures for which there exists a separated (proper) coloring of the tree representation, then Separated ⊂ Designable(Σ2,0)

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-35
SLIDE 35

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring

Consider the tree representation TS of structure S. Color every paired node of TS different from the root by black (G · C), white (C · G) or grey color (A · U or U · A). This coloring is called proper if:

1 every node has at most one black, at most one white and at

most two grey children;

2 a grey node has at most one grey child; 3 a black node does not have a white child; and 4 a white node does not have a black child. CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-36
SLIDE 36

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring

Consider the tree representation TS of structure S. Color every paired node of TS different from the root by black (G · C), white (C · G) or grey color (A · U or U · A). This coloring is called proper if:

1 every node has at most one black, at most one white and at

most two grey children;

2 a grey node has at most one grey child; 3 a black node does not have a white child; and 4 a white node does not have a black child.

Given a proper coloring of TS, let the level of each node be the number of black nodes minus the number of white nodes on the path from this node to the root.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-37
SLIDE 37

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring

Consider the tree representation TS of structure S. Color every paired node of TS different from the root by black (G · C), white (C · G) or grey color (A · U or U · A). This coloring is called proper if:

1 every node has at most one black, at most one white and at

most two grey children;

2 a grey node has at most one grey child; 3 a black node does not have a white child; and 4 a white node does not have a black child.

Given a proper coloring of TS, let the level of each node be the number of black nodes minus the number of white nodes on the path from this node to the root. A proper coloring is called separated if the two sets of levels, associated with grey and unpaired nodes respectively, do not intersect.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-38
SLIDE 38

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (example)

Root

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-39
SLIDE 39

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (example)

Root

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-40
SLIDE 40

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (example)

Root 1 1 1 1 2 4 2

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-41
SLIDE 41

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (example)

Root 1 1 1 1 2 4 2

Levels of grey nodes: 0,1 Levels of leaves: 2,4 This is a separated coloring

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-42
SLIDE 42

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (example)

Root 1 1 1 1 2 4 2

Levels of grey nodes: 0,1 Levels of leaves: 2,4 This is a separated coloring Design:

→ GC → CG → AU|UA → U

GAAAAGUUGGUUUUUCCUUCUCAGGUUUUCCUGUUUC

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-43
SLIDE 43

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-44
SLIDE 44

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w. In S, every C, G and A is paired. Hence, in S′, every C, G and A must be paired.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-45
SLIDE 45

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w. In S, every C, G and A is paired. Hence, in S′, every C, G and A must be paired. Lemma Any A · U base pair must be between positions on the same level.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-46
SLIDE 46

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w. In S, every C, G and A is paired. Hence, in S′, every C, G and A must be paired. Lemma Any A · U base pair must be between positions on the same level. Proof. If not that the portion enclosed by this base pair has an imbalance in the number of C and G, hence, not all of them are base-paired, a contradiction.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-47
SLIDE 47

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w. In S, every C, G and A is paired. Hence, in S′, every C, G and A must be paired. Lemma Any A · U base pair must be between positions on the same level. Proof. If not that the portion enclosed by this base pair has an imbalance in the number of C and G, hence, not all of them are base-paired, a contradiction. All U’s unpaired in S, must be also unpaired in S′.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-48
SLIDE 48

RNA Secondary Structures Our Results Open Problems

Our Results: Separated Coloring (sketch of the proof)

Let w be the sequence obtained from the separated coloring. Let S′ be an MFE fold for w. In S, every C, G and A is paired. Hence, in S′, every C, G and A must be paired. Lemma Any A · U base pair must be between positions on the same level. Proof. If not that the portion enclosed by this base pair has an imbalance in the number of C and G, hence, not all of them are base-paired, a contradiction. All U’s unpaired in S, must be also unpaired in S′. The claim follows by the result R4 (for saturated structures).

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-49
SLIDE 49

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the complete alphabet

Let Σ2,0 = {A, U, C, G}. R4 Designable(Σ2,0) ∩ Saturated = {S | D(S) ≤ 4} ∩ Saturated. When unpaired positions are allowed in the target structure, our characterization is only partial: R5 (Necessary) If S ∈ Designable(Σ2,0), then S does not contain “a node having degree more than four” (motif m5) and “a node having one or more unpaired children, and degree greater than two” (motif m3 ◦). R6 (Sufficient) Let Separated be the set of structures for which there exists a separated (proper) coloring of the tree representation, then Separated ⊂ Designable(Σ2,0)

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-50
SLIDE 50

RNA Secondary Structures Our Results Open Problems

Our Results: Designability over the complete alphabet

Let Σ2,0 = {A, U, C, G}. R4 Designable(Σ2,0) ∩ Saturated = {S | D(S) ≤ 4} ∩ Saturated. R5 (Necessary) If S ∈ Designable(Σ2,0), then S does not contain “a node having degree more than four” (motif m5) and “a node having one or more unpaired children, and degree greater than two” (motif m3 ◦). R6 (Sufficient) Let Separated be the set of structures for which there exists a separated (proper) coloring of the tree representation, then Separated ⊂ Designable(Σ2,0) R7 If S ∈ Designable(Σ2,0), then k-stutter S[k] ∈ Designable(Σ2,0).

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-51
SLIDE 51

RNA Secondary Structures Our Results Open Problems

Our Results: k-Stutter (example)

Designable structure: ( ( . ) ( . . ) )

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-52
SLIDE 52

RNA Secondary Structures Our Results Open Problems

Our Results: k-Stutter (example)

Designable structure: A C A G G U U C U Then 2-stutter is designable as well:

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-53
SLIDE 53

RNA Secondary Structures Our Results Open Problems

Our Results: k-Stutter (example)

Designable structure: A C A G G U U C U Then 2-stutter is designable as well:

( ( ( ( . . ) ) ( ( . . . . ) ) ) )

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-54
SLIDE 54

RNA Secondary Structures Our Results Open Problems

Our Results: k-Stutter (example)

Designable structure: A C A G G U U C U Then 2-stutter is designable as well:

A A C C A A G G G G U U U U C C U U

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-55
SLIDE 55

RNA Secondary Structures Our Results Open Problems

Our Results: k-Stutter (example)

Designable structure: A C A G G U U C U Then 2-stutter is designable as well:

A A C C A A G G G G U U U U C C U U

Proof idea: Use K¨

  • nig’s Theorem (size of max. matching = size
  • f min. vertex cover) to show that an MFE structure of the stutter

sequence can’t connect a region to two different regions.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-56
SLIDE 56

RNA Secondary Structures Our Results Open Problems

Our Results: Structure-Approximating Algorithm

R8 Any structure S without m5 and m3 ◦ can be transformed in Θ(n) time into a Σ2,0-designable structure S′, by inflating a subset of its base pairs (at most one per band) so that the greedy coloring of the resulting structure is separated.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-57
SLIDE 57

RNA Secondary Structures Our Results Open Problems

Our Results: Structure-Approximating Algorithm

R8 Any structure S without m5 and m3 ◦ can be transformed in Θ(n) time into a Σ2,0-designable structure S′, by inflating a subset of its base pairs (at most one per band) so that the greedy coloring of the resulting structure is separated.

( . . ) ( . . ) ( . . ) ( . . ) ( . . ) ( ( . . ) )

G U U C C U U G A G U U C U

Root Root Greedy Coloring Band Inflation Design

→ GC → CG → AU|UA → U

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-58
SLIDE 58

RNA Secondary Structures Our Results Open Problems

Our Results: Structure-Approximating Algorithm

R8 Any structure S without m5 and m3 ◦ can be transformed in Θ(n) time into a Σ2,0-designable structure S′, by inflating a subset of its base pairs (at most one per band) so that the greedy coloring of the resulting structure is separated.

( . . ) ( . . ) ( . . ) ( . . ) ( . . ) ( ( . . ) )

G U U C C U U G A G U U C U

Root Root Greedy Coloring Band Inflation Design

→ GC → CG → AU|UA → U

The main idea: Use inflating to separate grey vertices and leaves to odd/even levels.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-59
SLIDE 59

RNA Secondary Structures Our Results Open Problems

Our Results: Structure-Approximating Algorithm

R8 Any structure S without m5 and m3 ◦ can be transformed in Θ(n) time into a Σ2,0-designable structure S′, by inflating a subset of its base pairs (at most one per band) so that the greedy coloring of the resulting structure is separated.

( . . ) ( . . ) ( . . ) ( . . ) ( . . ) ( ( . . ) )

G U U C C U U G A G U U C U

Root Root Greedy Coloring Band Inflation Design

→ GC → CG → AU|UA → U

The main idea: Use inflating to separate grey vertices and leaves to odd/even levels. Remark: Arcs could be added to remove motifs m5 and m3 ◦ (after which the algorithm could be applied).

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-60
SLIDE 60

RNA Secondary Structures Our Results Open Problems

Remark: Breaking motifs

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-61
SLIDE 61

RNA Secondary Structures Our Results Open Problems

Remark: Breaking motifs

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-62
SLIDE 62

RNA Secondary Structures Our Results Open Problems

Open Problems and Future Work

1 What’s the complexity of RNA-DESIGN problem? Could it be

polynomial?

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-63
SLIDE 63

RNA Secondary Structures Our Results Open Problems

Open Problems and Future Work

1 What’s the complexity of RNA-DESIGN problem? Could it be

polynomial?

2 What’s the complexity of RNA-DESIGN problem restricted to

designs that use only one base for all unpaired position? ( . ) ( . ) ( . )

G U C C A G A G U

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-64
SLIDE 64

RNA Secondary Structures Our Results Open Problems

Open Problems and Future Work

1 What’s the complexity of RNA-DESIGN problem? Could it be

polynomial?

2 What’s the complexity of RNA-DESIGN problem restricted to

designs that use only one base for all unpaired position?

3 What’s the complexity of determining if a structure has a

separated coloring?

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-65
SLIDE 65

RNA Secondary Structures Our Results Open Problems

Open Problems and Future Work

1 What’s the complexity of RNA-DESIGN problem? Could it be

polynomial?

2 What’s the complexity of RNA-DESIGN problem restricted to

designs that use only one base for all unpaired position?

3 What’s the complexity of determining if a structure has a

separated coloring?

4 Extend the results to more complex energy models.

Our results hold for the Base-pair sum model, as long as −δB(G, U) is smaller than −δB(C, G) and −δB(A, U).

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating

slide-66
SLIDE 66

RNA Secondary Structures Our Results Open Problems

Open Problems and Future Work

1 What’s the complexity of RNA-DESIGN problem? Could it be

polynomial?

2 What’s the complexity of RNA-DESIGN problem restricted to

designs that use only one base for all unpaired position?

3 What’s the complexity of determining if a structure has a

separated coloring?

4 Extend the results to more complex energy models.

Our results hold for the Base-pair sum model, as long as −δB(G, U) is smaller than −δB(C, G) and −δB(A, U).

5 Find a better bound on the number of arcs that need to be

inflated in our approximation algorithm.

CPM 2015 J´ an Maˇ nuch Combinatorial RNA Design:Designability and Structure-Approximating