On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta - PowerPoint PPT Presentation

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on joint work with Emma Yu Jin CanaDAM 2013 Newfoundland, Canada Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 1 / 17

Plan of Talk RNA Secondary Structure 1 basic definitions enumeration polymer-zeta model (motivation and definition) Enumeration in the Polymer-Zeta Model 2 fundamentals average number of hairpins Overview of Results and Discussion 3 Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 2 / 17

RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Secondary structure: Edges (arcs) are not allowed to cross . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

RNA Secondary Structure From an abstract point of view, RNA molecules of size n consist of a linear chain of n nodes ( ≡ nucleotides) labeled { a , c , g , u } 1 � string s ∈ { a , c , g , u } n called RNA sequence . which may be part of at most one edge connecting nodes of 2 distance (in the chain) at least 2 (counted by hops). Minimal distance: Edge connecting orange nodes allowed. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 3 / 17

Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) η ( k + 1 , n + 1 ) (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) η ( k + 1 , n + 1 ) (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Random sequence: Taking expectation of eq. (1); η ( i , j ) � so-called stickiness p (the expectation of η ) corresponding to the probability for two random nucleotides to be complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

Enumeration Enumerating secondary structures is easy; their number is given by the following recurrence relation: � r ( n + 1 ) = r ( n ) + r ( k ) r ( n − k − 1 ) . 0 � k � n − 2 If we want to take sequence information into account, we can work with � e ( n + 1 ) = e ( n ) + e ( k ) e ( n − k − 1 ) × p (1) 0 � k � n − 2 where η ( i , j ) is the indicator which is 1 iff s i and s j are complementary. Random sequence: Taking expectation of eq. (1); η ( i , j ) � so-called stickiness p (the expectation of η ) corresponding to the probability for two random nucleotides to be complementary. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 4 / 17

Algorithmic challenge Input: RNA sequence (cheap with today’s lab techniques). Output: (Predicted) RNA secondary structure (considered a good approximation of 3D conformation). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 5 / 17

Algorithmic challenge Input: RNA sequence (cheap with today’s lab techniques). Output: (Predicted) RNA secondary structure (considered a good approximation of 3D conformation). Prominent approach: Dynamic programming, i.e. table filling algorithm: Processing input sequence s 1 s 2 · · · s n , 1 V ( i , j ) represents the minimal energy possible for a folding of 2 subsequence s i · · · s j subject to the i -th and j -th nucleotide being paired to each other; W ( i , j ) gives the corresponding minimum without that restriction. 3 � n 3 runtime algorithms (quadratic number of entries each giving rise to linear time). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 5 / 17

Motivation for Polymer-Zeta Model Observation: While computing optimal folding for subsequence s i · · · s j , a pairing of s i and s k only needs to be considered if pairing of s i and s k already implied a minimum while considering s i · · · s j ′ , j ′ < j . Speedup: Bookkeeping (candidate list) of s k observed in minimal pairings for smaller subsequences may reduce the number of combinations to be considered for each entry. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 6 / 17

Motivation for Polymer-Zeta Model Observation: While computing optimal folding for subsequence s i · · · s j , a pairing of s i and s k only needs to be considered if pairing of s i and s k already implied a minimum while considering s i · · · s j ′ , j ′ < j . Speedup: Bookkeeping (candidate list) of s k observed in minimal pairings for smaller subsequences may reduce the number of combinations to be considered for each entry. Polymer-zeta property: probability for the i -th and j -th nucleotides at b distance d = j − i + 1 to form a pair is given by p d = d c (for some constants b > 0 , c > 0 ). � candidate list of (expected) constant length and thus expected quadratic run time algorithm. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 6 / 17

Question addressed here For certain classes of RNA (especially mRNA) it is justified to assume the polymer-zeta property. Question: Is it appropriate in general? Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 7 / 17

Question addressed here For certain classes of RNA (especially mRNA) it is justified to assume the polymer-zeta property. Question: Is it appropriate in general? Approach: We compute the average shape of secondary structures (considered a combinatorial object thus no nucleotides, just size) assuming the polymer-zeta property using methods from enumerative combinatorics and compare it to statistics derived from native foldings (databases). Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 7 / 17

Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . If we additionally compute the expected number of structures with parameter value k (e.g. number of so-called hairpins) E c , b # ( S n , k ) , then E c , b # ( S n , k ) � c , b X = k · n E c , b # ( S n ) k � 1 is the averaged behavior of the parameter in consideration. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

Enumeration in the Polymer-Zeta Model Model: Study r ( n + 1 ) = r ( n ) + � 0 � k � n − 2 r ( k ) r ( n − k − 1 ) × p n − k which – in analogy to Bernoulli model – is the expected number of structures of size n denoted E c , b # ( S n ) . d c for ( c , b ) ∈ { 1 , 2 } 2 (theoretical considerations imply b We considered p d = b = 1 , c = 1 . 5 , fitting to mRNA data yields c = 1 . 47 ). Reason: Our approach only allows integer values for c since p d is introduced into our equations by the following trick on generating functions: Consider the operator Θ = Θ ( z ) = z ∂ ∂ z . Then ( n + 1 ) c z n = bz n ; b For c = 1 , Θ ( n + 1 ) c z n = bz n . b for c = 2 , Θ 2 This way, we can derive appropriate differential equations for generating functions. Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 8 / 17

Average Number of Hairpins Theorem Under the assumption of the ( c , b ) -polymer-zeta model, c ∈ { 1 , 2 } , the average number of hairpins in a secondary structure of size n is asymptotically given by 1 , b = x 1 , b n ( 1 + O ( n − 1 2 )) X n 2 , b = x 2 , b n ( 1 + O (( log n ) − 1 )) X n where x c , b > 0 is a constant and for b ∈ { 1 , 2 } we have x 1 , 1 ≈ 0 . 1326 x 1 , 2 ≈ 0 . 1476 x 2 , 1 ≈ 0 . 1238 x 2 , 2 ≈ 0 . 1489 Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 9 / 17

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta - PowerPoint PPT Presentation

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on joint work with Emma Yu Jin CanaDAM 2013 Newfoundland, Canada Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 1 / 17 Plan of Talk RNA

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CSE 527 Autumn 2006 Lectures 15-16 RNA Secondary Structure Prediction RNA Secondary Structure:

CSE 527 Autumn 2007 Lectures 17-18 RNA Secondary Structure Prediction RNA Secondary Structure:

1. Early combinatorics Robin Wilson 1. Early combinatorics 2. European combinatorics: Middle

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Outline CSEP 590A Summer 2006 Biological roles for RNA What is secondary structure? Lecture

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

Faster folds, Better folds: Genetic Improvement of RNAfold W. B. Langdon Computer Science,

RM Synthesis & SKA CMF George Heald SKA Science working group meeting 22 January 2014 1

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of

Sambuz

Useful Links

Newsletter

Mail Us

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta - PowerPoint PPT Presentation

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on joint work with Emma Yu Jin CanaDAM 2013 Newfoundland, Canada Markus E. Nebel RNA in Polymer-Zeta Model 2013/18/5 1 / 17 Plan of Talk RNA

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CSE 527 Autumn 2006 Lectures 15-16 RNA Secondary Structure Prediction RNA Secondary Structure:

CSE 527 Autumn 2007 Lectures 17-18 RNA Secondary Structure Prediction RNA Secondary Structure:

1. Early combinatorics Robin Wilson 1. Early combinatorics 2. European combinatorics: Middle

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Outline CSEP 590A Summer 2006 Biological roles for RNA What is secondary structure? Lecture

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

Faster folds, Better folds: Genetic Improvement of RNAfold W. B. Langdon Computer Science,

RM Synthesis &amp; SKA CMF George Heald SKA Science working group meeting 22 January 2014 1

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of

Sambuz

Useful Links

Newsletter

Mail Us

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

RM Synthesis & SKA CMF George Heald SKA Science working group meeting 22 January 2014 1